evalutingMPS

MPS has an open bug that we reported in NVIDIA Developer forum with id 3559606. The issue is that when running mulitple clients with MPS is slower than Native CUDA that used time-sharing. Dir kernel_without_inout contains the microbenchmark that will run mulitple times to evaluate the performance improvement provided by MPS compared to Native.

Compile

Use the Makefile in kernel_without_inout.

Adjust the execute script

In the kernel_without_inout directory adjust the jenna_conf/execute.sh. To find the optimal cores use nvidia-smi topo --matrix.

Run

1st runConc.sh

It takes as parameters MPS or No MPS and the concurrency.

2nd Manually

In the kernel_without_inout there are scripts for starting and stopping MPS. Then run multiple times the jenna_conf/execute.sh.

Results

Concurrent instances	MPS	NATIVE
1	1786.074	1784.629
2	1958.4375	2218.039
4	10991.53675	4412.19625

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

evalutingMPS

Compile

Adjust the execute script

Run

1st runConc.sh

2nd Manually

Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

evalutingMPS

Compile

Adjust the execute script

Run

1st runConc.sh

2nd Manually

Results