Skip to content
This repository has been archived by the owner on Jun 25, 2023. It is now read-only.

nginx-uvicorn-asyncio #34

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

nginx-uvicorn-asyncio #34

wants to merge 4 commits into from

Conversation

vzip
Copy link

@vzip vzip commented Jun 8, 2023

Hello everyone. This is the second concept based on nginx that distributes requests according to the round-robin scheme into two applications running in parallel from under uvicorn, each on its own port, each application includes 5 models that receive tasks each in queue, the results are added to a result_queue, asynchronous function pulls out ready tasks and sets futures. At the local stand, more than 9000 passes. I had tried many different combos with gunicorn and two workers but that can run only on 2 cpu 2 workers, another variant make two async flows inside app with 2 queues but because of py is not true async it slowly.

p.s. this is all without any model optimizations, since converting models to onnx is already more like a model test and this is not a runtime, which I plan to add to the current concept and also to the first v1.0b with tornado and redis solution, only there i'm need to deploy the redis inside the docker with the application, otherwise on the test bench here it is probably somewhere far away and network adding an extra 100 + ms I think.
p.s.s. and i saw some participants try use one tokenizer for all models - it is wrong, because it’s different for each model and it can tokenize spaces and special characters in different ways and that can give inaccurate results.

Copy link
Collaborator

@rsolovev rsolovev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vzip thank you for this great iteration, application starts with no issues, but the response format is a bit off and we cant autotest it --

{"worker4":{"EIStakovskii":{"score":0.9995502829551697,"label":"LABEL_0"}},"worker3":{"svalabs":{"score":0.9966347813606262,"label":"SPAM"}},"worker5":{"jy46604790":{"score":0.9940044283866882,"label":"LABEL_0"}},"worker1":{"cardiffnlp":{"score":0.4247715175151825,"label":"POSITIVE"}},"worker2":{"ivanlau":{"score":0.1369515061378479,"label":"Maltese"}}}

("worker" keys are redundant)

@vzip
Copy link
Author

vzip commented Jun 8, 2023

oops.

@rsolovev changed, please, run test again.

@vzip vzip requested a review from rsolovev June 8, 2023 17:27
Copy link
Collaborator

@rsolovev rsolovev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vzip perfect, thank you -- here are the results

@vzip
Copy link
Author

vzip commented Jun 9, 2023

@rsolovev thank you. That is not perfect) i don't understand waht going different on your test env with my) but all scores a twice down different. Will explore more both my solution, will add a lot of model optimization and will try use yours version nvidia driver 11.4 (mine is 12.0)

@vzip vzip requested a review from rsolovev June 19, 2023 04:58
@vzip
Copy link
Author

vzip commented Jun 19, 2023

@rsolovev Hi. Please run test.

p.s. I have been slightly out of the loop due to unforeseen circumstances (my girlfriend went missing while I live in Mexico, and it was quite an emergency situation, but everything turned out fine, and she was found safe and sound!). Over the weekend, I resumed my research and delved into working with T4 GPU memory and maximizing task loads in a single batch. I discovered that it's not always ideal to load too many tasks onto the GPU because the driver settings have certain configurations. If the GPU is fully loaded, the frequency and computing throughput naturally decrease. For now, I have found the optimal configuration with 5 tasks in 1 batch, resulting in an average frequency of 900MHz. However, I'm still experimenting with running two applications in parallel, but there are some issues since I'm using Python and threading in this context. My plan is to either try multiprocessing or rewrite everything in JavaScript. I'm still striving for the best possible outcome without using ONNX.

Copy link
Collaborator

@rsolovev rsolovev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vzip, glad to hear that everything is ok, here is the dashboard for the latest commit

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants