-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP - Added spec tendrl_performance_enhacements.adoc #174
WIP - Added spec tendrl_performance_enhacements.adoc #174
Conversation
|
||
* Annotate the tendrl flows in different definitions files of tendrl modules to | ||
define the tagged queue name where these jobs would be written | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No impact on tendrl code due to changes to BaseObject.save and load ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't tried yet. I feel job creations which are done explicitly from code might get impacted.
I think you had some design in mind. You can add details as well.
* Document the details of load incurred on storage nodes due to tendrl | ||
components within justified limits (so that storage admin can plan the resource | ||
requirements accordingly) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add item:
- For gluster and ceph sds sync, convert the sync thread to a job and start the job at integration service startup and then triggered periodically.
- after the above change, each sds sync object (eg: volume) should be synced in a separate greenlet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Will add the required details
|
||
* REST layer to write the jobs in tagged queues based on definitions | ||
|
||
* Enhancements for tuning the response time for various GET endpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to BaseObject ("data" attr) will have an API impact, please document it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Will add a dependency on API layer.
bfbc96c
to
64b8f81
Compare
@anivargi kindly review the API dependencies and section specific to it. |
|
||
* Provide guidelines on setting up a clustered etcd for tendrl | ||
|
||
* Tuning of REST endpoints for better performance and predictable response time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this exactly involve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember while initial discussions we talked about tuning GetNodeList
kind of APIs which were taking lot of time to load details. You can comment more if such things to be done.
with defined tags would pick jobs from their specific tagged job queues for | ||
processing | ||
|
||
* Enhance REST layer to create job in tagged job queues based on flow annotation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already adding tags for jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are adding tags. We would need changes to submit the jobs to tagged queues instead of putting to /queue
always. That's the change being talked about here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tendrl-spec: Tendrl/specifications#174 Signed-off-by: Shubhendu <shtripat@redhat.com>
@r0h4n I have pushed the code changes PRs for commons/node-agent for this. |
tendrl-spec: Tendrl/specifications#174 Signed-off-by: Shubhendu <shtripat@redhat.com>
64b8f81
to
f595eca
Compare
@nnDarshan I have the added the alternate mechanism which you suggested for making different sync calls at different intervals. Plz have a look. @r0h4n plz check and comment. |
|
||
== Proposed change | ||
|
||
* Annotate flows in tendrl definition files with tagged queue names (to which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide the details here? What are the different tagged queues and the flows attached to it? Also provide sample definition file with this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the queue changes needs to be a different spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@r0h4n may be you can start another spec linked to this as you have already thought through a lot around this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, should be a different spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@r0h4n ^^^
* Annotate flows in tendrl definition files with tagged queue names (to which | ||
these flows would write the job to) | ||
|
||
* Introduce a tagged job queue mechanism in `tendrl-commons` module. Services |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide specifics/psuedocode about the proposed change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@r0h4n ^^^
with defined tags would pick jobs from their specific tagged job queues for | ||
processing | ||
|
||
* Enhance REST layer to create job in tagged job queues based on flow annotation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Enhance writing/reading to/from etcd to consider whole object details as | ||
single JSON. While writing we need to get the json representation of the object | ||
and write as single field under etcd. While reading, it should be read as single | ||
value and whole object should be weaved back from JSON. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide the psuedocode for this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this logic very trivial in nature? Anyway will add sample.
and write as single field under etcd. While reading, it should be read as single | ||
value and whole object should be weaved back from JSON. | ||
|
||
* Fine tune REST endpoints for better and faster response times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a list of APIs undergo change? what changes are required. I think this should be covered in a different spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shtripat it all depends on the etcd structure to what extent the endpoints be optimized, if etcd can maintain a index of how the objects are updated then we can do some intelligent caching on the API level. Also having dynamic (Memory) and static(os version) attributes separated would also help in improving the response time.
We already got a improvement in nodes list by removing the monitoring data from the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@r0h4n thoughts?
|
||
* Document the details of load incurred on storage nodes due to tendrl | ||
components within justified limits (so that storage admin can plan the resource | ||
requirements accordingly) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 60-66 - where this documention available? Are we spec - ing this out here or in a different spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part this spec only I have listed so we don't forget this. All we need is some wiki docs written for these.
If you want can create individual issues against the Documentation
repo
|
||
* In {gluster/ceph}_integration, change the sds sync as a job and start this job | ||
while startup of these integration services. Once started, these jobs should be | ||
triggered periodically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Psuedocode required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
triggered periodically. | ||
|
||
* Change any explicit raw reads in tendrl components to use load() and then the | ||
required field from object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Psuedocode required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for sure very trivial and I strongly feel there is no need for pseudo code.
All we would be doing is instead os NS._int.client.read
we would be using NS.{mod ns}.objects.{Class}.load()
.
Do you feel we need a pseudo code for these kind of trivial things in spec?
|
||
* Tuning of REST endpoints for better performance and predictable response time | ||
|
||
* Tuning of different components of tendrl for better memory utilizations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are changes proposed to address this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anivargi you are the best person to comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this has to do with only the API. Wouldn't this need to be done for all the backend components as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. based on this discussion I would remove this line which is very specific to API layer. All backend changes anyway are listed in the spec here.
f595eca
to
92e6632
Compare
tendrl-bug-id: Tendrl#172 Signed-off-by: Shubhendu <shtripat@redhat.com>
92e6632
to
c850377
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best to separate out the changes that impact the API into a separate spec issue and remove them from here. Correspondingly, I think this spec issue and the spec itself could be renamed to something more specific.
The new spec could be part of the Gluster Monitoring milestone @anivargi @nthomas-redhat @r0h4n
|
||
* Tuning of REST endpoints for better performance and predictable response time | ||
|
||
* Tuning of different components of tendrl for better memory utilizations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this has to do with only the API. Wouldn't this need to be done for all the backend components as well?
|
||
== Proposed change | ||
|
||
* Annotate flows in tendrl definition files with tagged queue names (to which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, should be a different spec.
This is point under use cases and not specific to API only. |
|
||
== Testing: | ||
|
||
* Verify that load incurred on storage nodes due to tendrl components is within |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does mean defined limits in this context? I mean what is goal here? I think this should be defined here and if it cannot be achieved with specified HW configuration then result should be new limits with reasons why limits have been changed.
* Verify that load incurred on storage nodes due to tendrl components is within | ||
the defined limits | ||
|
||
* Verify the REST endpoints for their response time and it should be within the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question than above.
|
||
* Provide guidelines on standard hardware requirements for tendrl server node | ||
|
||
* Provide guidelines on setting up a clustered etcd for tendrl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are Tendrl components ready for clustered Etcd? Because right now there is only one IP/FQDN for accessing Etcd in configuration.
tendrl-spec: Tendrl/specifications#174 tendrl-bug-id: Tendrl#657 Signed-off-by: Shubhendu <shtripat@redhat.com>
|
||
== Documentation impact: | ||
|
||
* Document for clustered setup of etcd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really consider clustered etcd setup for the milestone #3? I'm asking because such change will have a huge impact on setup and configuration in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion with dev team, this was moved out into 3rd milestone.
|
||
* Document for hardware requirements for tendrl server | ||
|
||
* Document for load details on storage nodes due to tendrl components |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean exactly? Does it refer to expected performance impact of tendrl components on storage nodes?
tendrl-spec: Tendrl/specifications#174 tendrl-bug-id: Tendrl#657 Signed-off-by: Shubhendu <shtripat@redhat.com>
tendrl-spec: Tendrl/specifications#174 tendrl-bug-id: Tendrl#657 Signed-off-by: Shubhendu <shtripat@redhat.com>
tendrl-bug-id: #172
Signed-off-by: Shubhendu shtripat@redhat.com