Add benchmarking performance test and run script #75

rjwills28 · 2023-06-21T12:13:40Z

I have added the Python performance script (i.e. websocket client) that creates N subscriptions to Coniql and measures memory usage, CPU and the number of missed updates. I have also added a bash script that takes care of creating the EPICS db, starts the IOC, starts Coniql and then runs M instances of the Python ws client (to simulate multiple clients connecting to a single Coniql instance).

Results from the Python client script get appended to a file so that you can compare different runs.

The number of PVs to subscribe to, number of samples to collect and number of websocket clients are all configurable.

Do we want this version of the performance tests merged in as part of Coniql? They might be useful for individual developers to run locally when making changes. Otherwise the Python client can be used for the automated performance tests and we just use the bash script as a template.

AlexanderWells-diamond

Looks good and will definitely be helpful for locally checking for significant performance differences when changing Coniql. We should also be able to re-use large parts of this for a container-based performance test setup.

How long do these tests take for you to run? From my understanding the default is to create a bunch of records that add 1 every .1 seconds, and then the client is waiting for 1000 updates, which should take 100 seconds? I'm seeing test runs take 5+ minutes so am confused where the time is being used.

There seems to be a crash when you try and increase the number of PVs in use above 1000.
I'm not sure what the start of the issue is but the last thing in the console is:

Exception in thread Thread-175:
Traceback (most recent call last):
-> Starting subscription: TEST:REC1806
  File "/usr/lib64/python3.8/threading.py", line 932, in _bootstrap_inner
  File "/usr/lib64/python3.8/threading.py", line 870, in run
  File "benchmark/coniql_performance_test.py", line 219, in coniql_subscription
  File "/usr/lib64/python3.8/asyncio/runners.py", line 44, in run
  File "/usr/lib64/python3.8/asyncio/base_events.py", line 616, in run_until_complete
  File "benchmark/coniql_performance_test.py", line 69, in subscribe
  File "/home/eyh46967/dev/coniql/venv_test1/lib64/python3.8/site-packages/websockets/imports.py", line 77, in __getattr__
  File "/home/eyh46967/dev/coniql/venv_test1/lib64/python3.8/site-packages/websockets/imports.py", line 28, in import_name
AttributeError: module 'websockets.legacy.client' has no attribute 'connect'
libgcc_s.so.1 must be installed for pthread_cancel to work
bash: line 1: 561763 Aborted                 (core dumped) python benchmark/coniql_performance_test.py -n 2000 -s 1000 -p 2 -f benchmark/performance_test_results_NClients_1.txt

I've got a few feedback points. This is an unsorted list and some of these points may not be worth acting upon.

Who is the intended reader for the results file? It's fine for human readability, I'm just wondering if we want a slightly more computer-friendly format e.g. csv, json.
Can there be a "total runtime" added to the end of the script?
Is there a way to estimate the remaining time? I notice that increasing the number of clients seems to exponentially increase the runtime which makes it hard to know if it's still running correctly or whether something might have broken
I note that the print statements in the Python file don't seem to appear anywhere - it looks like that printing is never passed back to the stdout of the original shell? This is another thing that would make debugging issues somewhat tricky, especially as I'm assuming stderr is also not being passed along.
- A bit more investigation revealed that this is actually an issue with using VSCode's inbuilt terminal - it seems that it doesn't actually display the new tabs, they're created in the background somewhere. They still run, we just don't see the terminals and so we don't see the output. Do you know if there's a way to detect this from inside the shell?
It's tricky to spot errors in the spawned tabs, as they close at the end of the run. This is easy to recreate if you try running the script outside of the Coniql venv - there will be errors in the terminals but if you wait too long they close and lose the information
Can you make it so that, even if the Python script exits unexpectedly, it will still try to write out whatever results it has collected? Its an irritation that we can just lose all data in some cases

benchmark/run_performance_test.sh

benchmark/coniql_performance_test.py

benchmark/run_performance_test.sh

benchmark/coniql_performance_test.py

benchmark/run_performance_test.sh

AlexanderWells-diamond · 2023-06-22T11:28:44Z

Another point to consider: We've still got the old Python and Javascript benchmarking scripts (along with some docs on how to run them) still present. Do we want to get rid of them? Should we add this script to the docs?

rjwills28 · 2023-06-22T11:41:18Z

Thanks for the detailed review. I'll look at addressing the comments and additional features to the scripts.
I must admit I had not tried running with 1000 PVs. I see my CPU hit 100% when running with 200 PVs so that was as far as I went. I assumed that the performance tests will be run against a single instance of Coniql, which means it will have limits so we might have to think about what a reasonable number is to stress the system and still get meaningful results.

Quick comment on the runtime of the performance test. It runs until it has captures N samples (e.g. 1000 in your example). When you increase the number of PVs, you increase the CPU usage. When you hit 100% CPU then Coniql suffers and starts to miss PV updates. This means it takes longer for us to collect 1000 samples as more updates have been missed, if that makes sense.

In terms of the 'old' benchmarking scripts, I haven't looked at them myself but if we don't think they will be used once we have these tests running then I think we should remove them. Adding instructions on how to run these new scripts should then be added to the documentation.

benchmark/coniql_performance_test.py

Also neatens up the generated database file

codecov · 2023-06-22T14:33:00Z

Codecov Report

Merging #75 (df500bd) into main (8f9c7de) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main      #75   +/-   ##
=======================================
  Coverage   93.30%   93.30%           
=======================================
  Files          10       10           
  Lines         807      807           
=======================================
  Hits          753      753           
  Misses         54       54

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

As opposed to using threads.

No longer need to create a separate venv to run the performance test Python script.

benchmark/coniql_performance_test.py

rjwills28 · 2023-06-26T15:40:34Z

Addressing some of the general comments:

Who is the intended reader for the results file? It's fine for human readability, I'm just wondering if we want a slightly more computer-friendly format e.g. csv, json.

I'm not sure about this one. I assume that we would want to look at the results and compare different runs ourselves. My intention was to write the results to file so that they can be extracted and presented in some way but I'm not sure how you would plan to do that? Perhaps in that case it would be better if they were just raw numbers that were then interpreted by some other method?

Can there be a "total runtime" added to the end of the script?

I have added the total runtime to the end of the script.

Is there a way to estimate the remaining time? I notice that increasing the number of clients seems to exponentially increase the runtime which makes it hard to know if it's still running correctly or whether something might have broken

I have added some progress monitoring to the Python script and also estimate the remaining time based on how long it has already taken to collect x samples.

I note that the print statements in the Python file don't seem to appear anywhere - it looks like that printing is never passed back to the stdout of the original shell? This is another thing that would make debugging issues somewhat tricky, especially as I'm assuming stderr is also not being passed along.

I have configured it so that the stdout & stderr of EPICS, Coniql and the performance test are written to log files so that we can always grab the output from the latest run. These will be saved under /benchmark/logs.

It's tricky to spot errors in the spawned tabs, as they close at the end of the run. This is easy to recreate if you try running the script outside of the Coniql venv - there will be errors in the terminals but if you wait too long they close and lose the information

See above comment.

Can you make it so that, even if the Python script exits unexpectedly, it will still try to write out whatever results it has collected? Its an irritation that we can just lose all data in some cases

Do you mean even if it doesn't complete. I have added a try-except block to run the subscriptions so even if this fails we should still reach the results write out code.

AlexanderWells-diamond

Looking better and better. Very nearly there. A couple of minor things, and an issue with the progress.txt file, to resolve.

As an aside, I'm wondering if it wouldn't be better/easier at this point to have done everything inside a Python script. Using the subprocess module would I think significantly ease some of the headaches of reading output from files. Although a rewrite at this stage probably isn't worth it. I'll see what I want/need to do to get this running well in Kubernetes before suggesting anything more significant.

benchmark/coniql_performance_test.py

benchmark/run_performance_test.sh

benchmark/coniql_performance_test.py

benchmark/run_performance_test.sh

benchmark/coniql_performance_test.py

rjwills28 · 2023-06-27T15:43:39Z

As an aside, I'm wondering if it wouldn't be better/easier at this point to have done everything inside a Python script.

At some point the idea of using Jenkins to run these tests was thrown around and so I went down the bash route. Just let me know if you think we do need to refactor.

AlexanderWells-diamond

Looking good!

rjwills28 · 2023-06-28T15:16:38Z

Thanks @AlexanderWells-diamond for all the help and suggesting on this PR.

rjwills28 added 3 commits June 21, 2023 12:42

Add performance test and run script

b1a4345

Fix linting

52f502c

Add number of pvs as input parameter

7b5d3f4

rjwills28 requested review from aawdls and AlexanderWells-diamond June 21, 2023 12:13

AlexanderWells-diamond reviewed Jun 22, 2023

View reviewed changes

benchmark/coniql_performance_test.py Outdated Show resolved Hide resolved

Make record names consistent for large numbers

df1e4ae

Also neatens up the generated database file

AlexanderWells-diamond force-pushed the add_performance_test_script branch from fdbd34c to df1e4ae Compare June 22, 2023 14:30

rjwills28 added 6 commits June 23, 2023 11:47

Remove unreachable code

5be6543

Refactor to use asyncio to create subscriptions

782aa8a

As opposed to using threads.

Include benchmark test dependencies in venv install

ba414e2

No longer need to create a separate venv to run the performance test Python script.

Add check that processes are running before trying to kill

98c98f6

Improve parsing of command line arguments

556d12f

Add further command line arguments

f88aef2

AlexanderWells-diamond reviewed Jun 23, 2023

View reviewed changes

benchmark/coniql_performance_test.py Outdated Show resolved Hide resolved

Add subscription progress monitoring, timing and logging

1ede423

Fix spelling mistake

1276ac5

rjwills28 requested a review from AlexanderWells-diamond June 26, 2023 15:42

AlexanderWells-diamond requested changes Jun 27, 2023

View reviewed changes

AlexanderWells-diamond reviewed Jun 27, 2023

View reviewed changes

benchmark/coniql_performance_test.py Show resolved Hide resolved

rjwills28 added 4 commits June 28, 2023 14:27

Define /tmp progress file and pass to Python script

8e730b7

Move timer to just before running Python clients

4bbb420

Move signal.stop() to finally block

2d32dfb

Add option to not run the CPU monitor for multiple clients

df500bd

AlexanderWells-diamond self-requested a review June 28, 2023 14:58

AlexanderWells-diamond approved these changes Jun 28, 2023

View reviewed changes

rjwills28 merged commit 1b38af5 into main Jun 28, 2023
18 checks passed

rjwills28 deleted the add_performance_test_script branch June 28, 2023 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarking performance test and run script #75

Add benchmarking performance test and run script #75

rjwills28 commented Jun 21, 2023

AlexanderWells-diamond left a comment

AlexanderWells-diamond commented Jun 22, 2023

rjwills28 commented Jun 22, 2023

codecov bot commented Jun 22, 2023 •

edited

Loading

rjwills28 commented Jun 26, 2023

AlexanderWells-diamond left a comment

rjwills28 commented Jun 27, 2023

AlexanderWells-diamond left a comment

rjwills28 commented Jun 28, 2023

Add benchmarking performance test and run script #75

Add benchmarking performance test and run script #75

Conversation

rjwills28 commented Jun 21, 2023

AlexanderWells-diamond left a comment

Choose a reason for hiding this comment

AlexanderWells-diamond commented Jun 22, 2023

rjwills28 commented Jun 22, 2023

codecov bot commented Jun 22, 2023 • edited Loading

Codecov Report

rjwills28 commented Jun 26, 2023

AlexanderWells-diamond left a comment

Choose a reason for hiding this comment

rjwills28 commented Jun 27, 2023

AlexanderWells-diamond left a comment

Choose a reason for hiding this comment

rjwills28 commented Jun 28, 2023

codecov bot commented Jun 22, 2023 •

edited

Loading