Skip to content

Commit

Permalink
Added spec specs/unmanage_cluster.adoc
Browse files Browse the repository at this point in the history
tendrl-bug-id: Tendrl#252
Signed-off-by: Shubhendu <shtripat@redhat.com>
  • Loading branch information
Shubhendu committed Feb 7, 2018
1 parent e1bf9bd commit da8da4b
Showing 1 changed file with 213 additions and 0 deletions.
213 changes: 213 additions & 0 deletions specs/unmanage_cluster.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
= Introduce a un-manage cluster mechanism in tendrl

The intent of this change is to introduce an un-manage cluster functionality in
tendrl. This makes the cluster known to tendrl but not managed anymore, meaning
the monitoring, alerting and management of the cluster is no more possible from
tendrl. At later stage (if required) admin can decide to re-import the cluster
to start managing it again.

The un-manage functionality is helpful for scenario where admin wants to bring
down the cluster for some critical maintenance activities and doesn't want the
monitoring etc to be performed for that period.

== Problem description

There are situations when admin needs some critical maintenance of the cluster
and during this period he doesn't want any monitoring etc taking place. Also
of he decides to dismantle the cluster at some stage we should have a mechsnism
using which the cluster could be marked as un-managed from tendrl side.

Tendrl also should provide a provision to re-import the cluster at later stage
if admin wants and the process should be quite seamless and no or very less
manual intervention required for this job to be performed.


== Use Cases

This addresses the un-managing and re-import an un-managed cluster at later
stage. The un-manage functionality in tendrl needs to take care of below things

* Stop any services which got started as part of tendrl managing the storage
nodes and disable the services
* Set the cluster state properly so that the same is marked and listed as
un-managed in UI dashboards. No operations should be allowed on the un-managed
cluster and there should not be any monitoring, alerting or entities management
supported on this cluster anymore
* User should have an option to re-import the cluster if needed later and it
should seamlessly work as usual


== Proposed change

* On un-manage cluster start a flow in tendrl server node's node-agent which
creates child jobs on storage nodes to stop tendrl specific services like
collectd and tendrl-gluster-integration

* Mark the cluster flag `is_managed` as `False` so that the cluster could be
listed as un-managed in UI dashboards and all the possible actions could be
disabled for it

* Archive the graphite (monitoring) data for the cluster in archive location so
the grafana dashboards dont list the cluster and its entities anymore

* Delete the grafana alert dashboards for the cluster and its dependent entities

The logic here goes like

** Start a flow in node-agent on tendrl server node for un-manage cluster

** The first atom of the above flow invokes child jobs on the storage node's
node-agent to stop tendrl specific services and marking them dissabled

** In the main atom of the un-manage cluster flow remove if any etcd details for
the cluster and then mark the cluster is_managed flag as `False`

** One of the atoms now un-manage cluster flow, invokes a flow in
monitoring-integration to archive the graphite data for the cluser

** Finally another atom invokes a flow in monitoring-integration to remove the
grafana alert dashboards for the cluster and its dependent entities

So the structure of the un-manage cluster flow would look something as below

```
UnmanageCluster:
tags:
- "tendrl/monitor"
atoms:
- tendrl.objects.Cluster.atoms.StopMonitoringServices
- tendrl.objects.Cluster.atoms.StopIntegrationServices
- tendrl.objects.Cluster.atoms.DeleteClusterDetails
- tendrl.objects.Cluster.atoms.DeleteMonitoringDetails
help: "Unmanage a Gluster Cluster"
enabled: true
inputs:
mandatory:
- TendrlContext.integration_id
run: tendrl.flows.UnmanageCluster
type: Update
uuid: 2f94a48a-05d7-408c-b400-e27827f4efed
version: 1
```

=== Alternatives

None

=== Data model impact

None

=== Impacted Modules:

==== Tendrl API impact:

* Introduce an API `cluster/{int-id}/unmanage` for triggering an un-manage
cluster fow

==== Notifications/Monitoring impact:

* A flow to archive the cluster specific graphite data

* A flow to remove the grafana alerts dashboards for the cluster and its
dependent entities

* Raise an alert once cluster got un-managed with details like where to look
for old graphite data etc

==== Tendrl/common impact:

* A flow un-manage cluster to be tergetted at tendrl server node

==== Tendrl/node_agent impact:

None

==== Sds integration impact:

None

==== Tendrl Dashboard impact:

* UX requirements for invoking an un-manage cluster flow for an existing cluster
is captured at https://redhat.invisionapp.com/share/8QCOEVEY9

=== Security impact:

None

=== Other end user impact:

User gets an option to un-mnaage an existing cluster and can re-import at later
stage

=== Performance impact:

None

=== Other deployer impact:

The tendrl-ansible module need to provide a mechanism to setup tendrl components
and dependencies on additional new node in the cluster.

<TBD> details to be added here of the plyabooks etc.

=== Developer impact:

None


== Implementation:

* https://github.com/Tendrl/commons/issues/797


=== Assignee(s):

Primary assignee:
shtripat
mbukatov

=== Work Items:

* https://github.com/Tendrl/specifications/issues/252


== Dependencies:

None

== Testing:

* Check if UI dashboard has an option to trigget un-manage cluster flow

* Check if the flow gets completed successfully and verify if the grafana
dashboard reflects and cluster details available now for the selected cluster

* Verify that not grafana alert dashboards available now for the un-managed
cluster

* Verify that the clusters list report the cluster as un-managed and import
option is enabled now

* Try to import the cluster back and it should be successful. All grafana
dashboards, grafana alert dashboards and UI reflect the cluster details back

* Invoke the REST end point `clusters/{int-id}/unmanage` and the cluster should
be un-managed successfully


== Documentation impact:

* New un-manage cluster feature should be documented with details like what all
gets disabled / removed in case a cluster is un-managed

* New API end point should be documented with sample input / output structures

== References:

* https://redhat.invisionapp.com/share/8QCOEVEY9

* https://github.com/Tendrl/commons/pull/798

* https://github.com/Tendrl/monitoring-integration/pull/317

0 comments on commit da8da4b

Please sign in to comment.