Skip to content
This repository has been archived by the owner on Jun 25, 2023. It is now read-only.

Electriclizard solution #14

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

electriclizard
Copy link
Contributor

Hi 🤚!
Here is my simple solution for nlp models inference server.
It works with cpu and gpu. And i've tested it with wrk benchmark tool.
Documentation will be available at /documentation/swagger-ui endpoint.
It also has some future works, like models weights update process.
I also think that inference models with Nvidia triton inference server will be more efficient, but not sure about the deployment with helm chart
I've also tried threading for parallel models call, but still have save rps on benchmarks, it was expecting, but i tried, now working on increasing rps

@rsolovev
Copy link
Collaborator

Hey @electriclizard, thank you for the great solution. Here are our test results on grafana dashboard

If you would like to work on your solution further, you can continue optimizing/improving it and re-request our review once done. Any contribution during the challenge period will be taken into account while choosing a winner. Many thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants