Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Added spec tendrl_performance_enhacements.adoc #174

Closed

Conversation

shtripat
Copy link
Member

tendrl-bug-id: #172
Signed-off-by: Shubhendu shtripat@redhat.com


* Annotate the tendrl flows in different definitions files of tendrl modules to
define the tagged queue name where these jobs would be written

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No impact on tendrl code due to changes to BaseObject.save and load ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tried yet. I feel job creations which are done explicitly from code might get impacted.
I think you had some design in mind. You can add details as well.

* Document the details of load incurred on storage nodes due to tendrl
components within justified limits (so that storage admin can plan the resource
requirements accordingly)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add item:

  • For gluster and ceph sds sync, convert the sync thread to a job and start the job at integration service startup and then triggered periodically.
  • after the above change, each sds sync object (eg: volume) should be synced in a separate greenlet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Will add the required details


* REST layer to write the jobs in tagged queues based on definitions

* Enhancements for tuning the response time for various GET endpoints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to BaseObject ("data" attr) will have an API impact, please document it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Will add a dependency on API layer.

@shtripat shtripat force-pushed the spec/tendrl-performance-enhancements branch from bfbc96c to 64b8f81 Compare July 24, 2017 08:21
@shtripat
Copy link
Member Author

@anivargi kindly review the API dependencies and section specific to it.


* Provide guidelines on setting up a clustered etcd for tendrl

* Tuning of REST endpoints for better performance and predictable response time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this exactly involve?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember while initial discussions we talked about tuning GetNodeList kind of APIs which were taking lot of time to load details. You can comment more if such things to be done.

with defined tags would pick jobs from their specific tagged job queues for
processing

* Enhance REST layer to create job in tagged job queues based on flow annotation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already adding tags for jobs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are adding tags. We would need changes to submit the jobs to tagged queues instead of putting to /queue always. That's the change being talked about here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anivargi , @shtripat , we need this details in the docs. what are the chnages required on API side. Import flows needs to taken care of immediately and what else? I guess queue(to submit the job) related information to be read from the definition files

shtripat pushed a commit to shtripat/node_agent that referenced this pull request Jul 26, 2017
tendrl-spec: Tendrl/specifications#174
Signed-off-by: Shubhendu <shtripat@redhat.com>
@shtripat
Copy link
Member Author

@r0h4n I have pushed the code changes PRs for commons/node-agent for this.
You would need to push your changes which you have done for performance enhancements under this spec (specifically for tagged job queues and others)

shtripat pushed a commit to shtripat/bridge_common that referenced this pull request Jul 26, 2017
tendrl-spec: Tendrl/specifications#174
Signed-off-by: Shubhendu <shtripat@redhat.com>
@shtripat shtripat force-pushed the spec/tendrl-performance-enhancements branch from 64b8f81 to f595eca Compare July 26, 2017 18:36
@shtripat
Copy link
Member Author

@nnDarshan I have the added the alternate mechanism which you suggested for making different sync calls at different intervals. Plz have a look.

@r0h4n plz check and comment.


== Proposed change

* Annotate flows in tendrl definition files with tagged queue names (to which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide the details here? What are the different tagged queues and the flows attached to it? Also provide sample definition file with this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the queue changes needs to be a different spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r0h4n may be you can start another spec linked to this as you have already thought through a lot around this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, should be a different spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r0h4n ^^^

* Annotate flows in tendrl definition files with tagged queue names (to which
these flows would write the job to)

* Introduce a tagged job queue mechanism in `tendrl-commons` module. Services
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide specifics/psuedocode about the proposed change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r0h4n ^^^

with defined tags would pick jobs from their specific tagged job queues for
processing

* Enhance REST layer to create job in tagged job queues based on flow annotation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anivargi , @shtripat , we need this details in the docs. what are the chnages required on API side. Import flows needs to taken care of immediately and what else? I guess queue(to submit the job) related information to be read from the definition files

* Enhance writing/reading to/from etcd to consider whole object details as
single JSON. While writing we need to get the json representation of the object
and write as single field under etcd. While reading, it should be read as single
value and whole object should be weaved back from JSON.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide the psuedocode for this change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this logic very trivial in nature? Anyway will add sample.

and write as single field under etcd. While reading, it should be read as single
value and whole object should be weaved back from JSON.

* Fine tune REST endpoints for better and faster response times
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a list of APIs undergo change? what changes are required. I think this should be covered in a different spec

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shtripat it all depends on the etcd structure to what extent the endpoints be optimized, if etcd can maintain a index of how the objects are updated then we can do some intelligent caching on the API level. Also having dynamic (Memory) and static(os version) attributes separated would also help in improving the response time.
We already got a improvement in nodes list by removing the monitoring data from the response.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r0h4n thoughts?


* Document the details of load incurred on storage nodes due to tendrl
components within justified limits (so that storage admin can plan the resource
requirements accordingly)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 60-66 - where this documention available? Are we spec - ing this out here or in a different spec?

Copy link
Member Author

@shtripat shtripat Jul 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part this spec only I have listed so we don't forget this. All we need is some wiki docs written for these.
If you want can create individual issues against the Documentation repo


* In {gluster/ceph}_integration, change the sds sync as a job and start this job
while startup of these integration services. Once started, these jobs should be
triggered periodically.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Psuedocode required

Copy link
Member Author

@shtripat shtripat Jul 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was suggested recently. Also portions of this spec are to be implemented/already available from @r0h4n. I would request @r0h4n also to add flesh to this spec.

triggered periodically.

* Change any explicit raw reads in tendrl components to use load() and then the
required field from object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Psuedocode required

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for sure very trivial and I strongly feel there is no need for pseudo code.
All we would be doing is instead os NS._int.client.read we would be using NS.{mod ns}.objects.{Class}.load().

Do you feel we need a pseudo code for these kind of trivial things in spec?


* Tuning of REST endpoints for better performance and predictable response time

* Tuning of different components of tendrl for better memory utilizations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are changes proposed to address this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anivargi you are the best person to comment here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this has to do with only the API. Wouldn't this need to be done for all the backend components as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. based on this discussion I would remove this line which is very specific to API layer. All backend changes anyway are listed in the spec here.

@shtripat shtripat force-pushed the spec/tendrl-performance-enhancements branch from f595eca to 92e6632 Compare July 27, 2017 08:13
tendrl-bug-id: Tendrl#172
Signed-off-by: Shubhendu <shtripat@redhat.com>
@shtripat shtripat force-pushed the spec/tendrl-performance-enhancements branch from 92e6632 to c850377 Compare July 27, 2017 08:27
Copy link
Contributor

@brainfunked brainfunked left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to separate out the changes that impact the API into a separate spec issue and remove them from here. Correspondingly, I think this spec issue and the spec itself could be renamed to something more specific.

The new spec could be part of the Gluster Monitoring milestone @anivargi @nthomas-redhat @r0h4n


* Tuning of REST endpoints for better performance and predictable response time

* Tuning of different components of tendrl for better memory utilizations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this has to do with only the API. Wouldn't this need to be done for all the backend components as well?


== Proposed change

* Annotate flows in tendrl definition files with tagged queue names (to which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, should be a different spec.

@shtripat
Copy link
Member Author

shtripat commented Aug 3, 2017

Not sure what this has to do with only the API. Wouldn't this need to be done for all the backend
components as well?

This is point under use cases and not specific to API only.


== Testing:

* Verify that load incurred on storage nodes due to tendrl components is within
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does mean defined limits in this context? I mean what is goal here? I think this should be defined here and if it cannot be achieved with specified HW configuration then result should be new limits with reasons why limits have been changed.

* Verify that load incurred on storage nodes due to tendrl components is within
the defined limits

* Verify the REST endpoints for their response time and it should be within the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question than above.


* Provide guidelines on standard hardware requirements for tendrl server node

* Provide guidelines on setting up a clustered etcd for tendrl
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are Tendrl components ready for clustered Etcd? Because right now there is only one IP/FQDN for accessing Etcd in configuration.

shtripat pushed a commit to shtripat/bridge_common that referenced this pull request Aug 8, 2017
tendrl-spec: Tendrl/specifications#174
tendrl-bug-id: Tendrl#657
Signed-off-by: Shubhendu <shtripat@redhat.com>

== Documentation impact:

* Document for clustered setup of etcd
Copy link
Contributor

@mbukatov mbukatov Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really consider clustered etcd setup for the milestone #3? I'm asking because such change will have a huge impact on setup and configuration in general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with dev team, this was moved out into 3rd milestone.


* Document for hardware requirements for tendrl server

* Document for load details on storage nodes due to tendrl components
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean exactly? Does it refer to expected performance impact of tendrl components on storage nodes?

shtripat pushed a commit to shtripat/bridge_common that referenced this pull request Aug 29, 2017
tendrl-spec: Tendrl/specifications#174
tendrl-bug-id: Tendrl#657
Signed-off-by: Shubhendu <shtripat@redhat.com>
shtripat pushed a commit to shtripat/bridge_common that referenced this pull request Aug 30, 2017
tendrl-spec: Tendrl/specifications#174
tendrl-bug-id: Tendrl#657
Signed-off-by: Shubhendu <shtripat@redhat.com>
@r0h4n r0h4n closed this Jan 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants