-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance enhancements #94
Conversation
Includes adding the duration of the tests to the result, using ujson to ensure we spend as little time as possible parsing data and so can receive as quickly as possible, using uuid identifiers for subscriptions and making the output look better. The uuid identifiers remove potential issues in Strawberry of multiple clients using the same ID for each subscription - previously it was simply the PV's name. Real clients, e.g. Apollo, do something similar to uuids.
Unbuffered ensures all data is pushed to files ASAP, which is useful when you have to terminate the tests early (or they don't finish properly) The subscript path should always be found in SUB_DIR, which is not the same as ".". Also removes trailing whitespace
The contents are just copied from the .gitignore.
This provides about a 1% CPU increase, and there aren't even any fields that require conversion anyway
This is a minor performance enhancement - when processing metadata, and therefore the Channel's quality, on every iteration the creation of this list took up about 1% total CPU time
There is no reason for these to be async, and doing this gives about a 10% CPU improvement.
Now we get and store the channels during the initial Query/Mutation, which must be async functions anyway as they must call caget/caput. These retrieved channels are then passed to the Resolver step, and as they're there already we no longer need async to re-fetch them. This gives approximately 10% CPU gains.
This will mean Coniql only has a single camonitor for each PV, with the new CASubscriptionManager handling returning the updates to each subscription. With only one client this is ~1-2% CPU improvement. This saving increases with more clients, up to ~6-8% with 20 clients. Note that Channel Access itself does do some connection pooling - only one socket connection will be made to an IOC regardless of how many subscription requests are made, but the amount of data being sent over the wire will increase for each subsequent monitor. By pooling it ourselves we save some network bandwidth.
This is to support asyncio.Queue type hints.
As they're only being scraped once every 15 seconds by Prometheus it is unnecessary to update them significantly more frequently than that
See issue #95 for when to remove these
Codecov Report
@@ Coverage Diff @@
## main #94 +/- ##
==========================================
+ Coverage 93.30% 94.15% +0.85%
==========================================
Files 10 10
Lines 807 856 +49
==========================================
+ Hits 753 806 +53
+ Misses 54 50 -4
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I've tested on my local system and also see the performance improvements. I like the idea of only maintaining one camonitor per PV when there are multiple clients however I have come across a problem with this implementation. If I load a screen or send a subscription from the graphiql interface I see the value of that PV update. However, if I then open a second instance of the screen or send the same subscription from another graphiql instance I see that the value does not get displayed. The only way to get a value to display in this second instance is for that PV itself to get updated (i.e. caput xxx). Not a problem when the PV is updating at a given rate but otherwise I think this could be a problem. Let me know if you need any more details on how to reproduce. |
Thanks for investigating that. I'm a little surprised at that though - line 262 of |
I retested as described and also tried displaying 2 instances of a screen in cs-web-proto subscribing to a PV that is not on a scan update. I got the same result that the second screen/graphiql did not update. Note if both screens are open when Coniql starts up then all is OK. They key is starting Coniql and then opening one screen followed by the other. I added a debug statement before line 262 and saw that this was not printed when the second instance was loaded so I don't think it's even reaching this bit of the code. |
Thanks again, I've clearly missed something. I'll investigate further. |
Previously if the PV never updated its value, additional subscriptions beyond the first would never receive an update as the subscription manager callback would never be invoked. Now we ensure that the initial subscribe function always returns a completed channel, which is immediately consumed by `await value.get()`
I believe I've fixed this in 3013983. It worked in local tests with non-updating and slowly updating PVs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I can confirm that that issue has now been fixed with the latest commit. All looks good on this.
Thank you for the review. I plan to add tests to this PR before merging it. |
These cover multiple subscribers to the same PV, unsubscribing, and automatic camel casing. Also remove unnecessary run_ioc function.
Originally added on the basis we might care about tracking the number of events lost from the queue, but that isn't necessary.
4bce36b
to
3a8c46e
Compare
This branch contains a number of performance enhancements for Coniql. This is a 20-25% decrease in CPU usage, which effectively increases Coniql's requests-per-second from ~1,000 to ~2,000, with larger performance gains for 5+ clients subscribing to the same set of PVs.
The largest performance gains have been from simply removing the
async
keyword wherever possible. Additional gains have been made by creating a connection pool in the CA Plugin, which means we only have 1camonitor
per PV, regardless of the number of client subscriptions for that PV.There is also a fair amount of work enhancing the Python performance tests in here, although those changes do not affect Coniql's performance.
There is a new document titled "Performance", which goes into some details about the work done and where time is spent. Please let me know if this should have more details added.
Also of note is dropping Python 3.7 support. It looks like Strawberry, as of 0.197.0, is only Python 3.8+. I figure we want to keep up to date with Strawberry so this change seems inevitable.
Closes #86 (by deleting that code)
NOTE: I haven't written any tests for this, but I wanted to put the PR up for the actual code to get some more eyes on before I write any.