Skip to content

Commit

Permalink
SLES4SAP-hana-scaleOut-PerfOpt-15.adoc SLES4SAP-hana-scaleout-multita…
Browse files Browse the repository at this point in the history
…rget-perfopt-15.adoc: susTkOver.py
  • Loading branch information
lpinne committed Jul 28, 2023
1 parent 3733c2b commit eba1ef0
Show file tree
Hide file tree
Showing 2 changed files with 115 additions and 20 deletions.
134 changes: 114 additions & 20 deletions adoc/SLES4SAP-hana-scaleOut-PerfOpt-15.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ the parameter sheet and than begin with the installation.
=== Scale-out scenario and resource agents

To automate the failover, the High Availability Extension built into
_{sles4sap}_ is used. Two resource agents have been created to handle the scenario.
{sles4sap} is used. Two resource agents have been created to handle the scenario.

The first is the *SAPHanaController* resource agent (RA), which checks and
manages the {HANA} database instances. This RA is configured as a
Expand Down Expand Up @@ -325,7 +325,7 @@ could be a custom python hook using the SAP provider *srServiceStateChanged()*
available since {HANA} 2.0 SPS01.

To achieve an automation of this resource handling process, use the
{HANA} resource agents included in the _SAPHanaSR-ScaleOut_ RPM package
{HANA} resource agents included in the SAPHanaSR-ScaleOut RPM package
delivered with {sles4sap}.

You can configure the level of automation by setting the parameter
Expand All @@ -337,7 +337,7 @@ Find configuration details in manual page ocf_suse_SAPHanaController(7).

Read the SAP Notes and papers first.

The _SAPHanaSR-ScaleOut_ resource agent software package
The SAPHanaSR-ScaleOut resource agent software package
supports scale-out (multiple-box to multiple-box) system replication with the
following configurations and parameters:
// TODO PRIO2: align prerequisites section with scale-up guide and manual pages
Expand Down Expand Up @@ -382,14 +382,28 @@ However, all nodes in one Linux cluster have to use the same style.
_sapinit_ auto-start.
* The replication mode should be either 'sync' or 'syncmem'. But 'async' is not
supported.
* SAP HANA 2.0 SPS04 or later provides the HA/DR provider hook method srConnectionChanged()
with needed parameters for SAPHanaSrMultiTarget.py.
* SAP HANA 2.0 SPS05 or later provides the HA/DR provider hook method srServiceStateChanged()
with needed parameters for susChkSrv.py.
* SAP HANA 2.0 SPS06 or later provides the HA/DR provider hook method preTakeover() with
multi-target aware parameters and separate return code for Linux HA clusters.
* No other HA/DR provider hook script should be configured for the above mentioned methods.
Hook scripts for other methods, provided in SAPHanaSR-ScaleOut, can be used in parallel,
if not documented contradictingly.
* The Linux cluster needs to be up and running to allow HA/DR provider events being written
into CIB attributes. The current HANA SR status might differ from CIB srHook attribute
after Linux cluster maintenance.
* The user {refSIDadm} needs execution permission as user root for the command
_SAPHanaSR-hookHelper_.
* The Linux cluster can be either freshly installed as described in this guide,
or it can be upgraded as described in respective documentation.
Not allowed is mixing old and new cluster attributes or hook scripts within
one Linux cluster.

Find more details in the REQUIREMENTS section of manual pages
SAPHanaSR-ScaleOut(7), ocf_suse_SAPHanaController(7),
SAPHanaSrMultiTarget.py(7) and SAPHanaSR-manageAttr(8).
SAPHanaSrMultiTarget.py(7), susTkOver.py(7) and SAPHanaSR-manageAttr(8).

[IMPORTANT]
====
Expand All @@ -399,14 +413,15 @@ automated registration of a failed primary, therefore the
`AUTOMATED_REGISTER="false"` is the *default*.
In this case, you need to register a failed primary after a takeover manually.
Use SAP tools like {HANA} Cockpit or *hdbnsutil*.
Use SAP tools like {HANA} Cockpit or *hdbnsutil*. Make sure to use
always the exact site names as already known to the cluster.
====

* For optimal automation, _AUTOMATED_REGISTER="true"_ is recommended.
* Automated start of {HANA} instances during system boot must be switched
*off*.
* You need at least SAPHanaSR-ScaleOut version 0.180, {sles4sap} {pn15} SP1 and
{HANA} 2.0 SPS 4 for all mentioned setups.
{HANA} 2.0 SPS 5 for all mentioned setups.

IMPORTANT: You must implement a valid STONITH method. Without a valid STONITH
method, the complete cluster is unsupported and will not work properly.
Expand Down Expand Up @@ -1371,12 +1386,14 @@ when the system replication is back. This HA/DR provider method is
.Procedure

. Stop {saphana}
. Implement the HA/DR python hook {haDrMultiTargetPy}
. Implement {haDrMultiTargetPy} srConnectionChanged
. Implement susTkOver.py for preTakeover
. Configure system replication operation mode
. Allow {refsidadm} to access the cluster
. Start {HANA}
. Test the hook integration


=== Stopping {saphana}

The {saphana} needs to be stopped at both sites that will be part of the Linux
Expand All @@ -1389,14 +1406,15 @@ cluster. At each site do the following:
~> sapcontrol -nr {refInst} -function GetSystemInstanceList
----

=== Implementing the HA/DR provider hook {haDrMultiTargetPy}

=== Implementing SAPHanaSrMultiTarget.py for srConnectionChanged

// TODO PRIO3: explain new default SAPHanaSrMultiTarget.py, even for non-multi-target

This step must be done on both sites. {HANA} must be stopped to change the
_global.ini_ and allow {HANA} to integrate the HA/DR hook script during start.
global.ini and allow {HANA} to integrate the HA/DR hook script during start.

- Integrate the hook into _global.ini_ ({HANA} needs to be stopped for doing that offline)
- Integrate the hook into global.ini ({HANA} needs to be stopped for doing that offline)
- Check integration of the hook during {HANA} start-up

The ready-to-use HA/DR hook script is shipped with the SAPHanaSR-ScaleOut package in
Expand All @@ -1418,6 +1436,39 @@ ha_dr_saphanasrmultitarget = info
===================================


=== Implementing susTkOver.py for preTakeover

This step must be done on both sites that will be part of the cluster.
Use the {saphana} tools for changing global.ini and integrating the hook script.
In global.ini, the section `[ha_dr_provider_sustkover]` needs to be created.
The section `[trace]` might be adapted.
The ready-to-use HA/DR hook script is shipped with the SAPHanaSR-ScaleOut
package in directory /usr/share/SAPHanaSR-ScaleOut/.
The hook script must be available on all cluster nodes, including the majority
maker. Find more details in manual pages susTkOver.py(7) and
SAPHanaSR-manageProvider(8).

.Adding susTkOver.py via global.ini
===================================
----
[ha_dr_provider_sustkover]
provider = susTkOver
path = /usr/share/SAPHanaSR-ScaleOut/
execution_order = 2
sustkover_timeout = 30
[trace]
ha_dr_sustkover = info
----
===================================

It is again reminded that the srHook script "susTkOver.py" is not available in
the installation ISO media. It is only available in update channels of
{sles4sap} 15 SP4 or earlier. So, for a correctly working setup a full system
patching is mandatory after registering the system to SCC, RMT or SUSE Manager.
From {sles4sap} 15 SP5 onwards the "susTkOver.py" will be included in the ISO.


=== Configuring the system replication operation mode

When your system is connected as an {sapHanaSR} target, you can find an entry
Expand Down Expand Up @@ -1474,13 +1525,16 @@ Replace the {refsidadm} by the SAP system ID admin user.
# SAPHanaSR-ScaleOut needs for srHook
{refsidadm} ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_{refsidLC}_site_srHook_*
{refsidadm} ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_{refsidLC}_gsh *
{refsidadm} ALL=(ALL) NOPASSWD: /usr/sbin/SAPHanaSR-hookHelper --sid={refsidLC} *
----
Below is the result of replacing {refsidLC} with {mysidLc}:
// TODO PRIO2: use variables {sidadm} amd {sidlc}
----
# SAPHanaSR-ScaleOut needs for srHook
ha1adm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_ha1_site_srHook_*
ha1adm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_ha1_gsh *
ha1adm ALL=(ALL) NOPASSWD: /usr/sbin/SAPHanaSR-hookHelper --sid=ha1 *
----
===================================

Expand All @@ -1505,6 +1559,7 @@ Cmnd_Alias SFAIL_SITEA = /usr/sbin/crm_attribute -n hana_{refsidLC}_site_srHo
Cmnd_Alias SOK_SITEB = /usr/sbin/crm_attribute -n hana_{refsidLC}_site_srHook_Site2 -v SOK -t crm_config -s SAPHanaSR
Cmnd_Alias SFAIL_SITEB = /usr/sbin/crm_attribute -n hana_{refsidLC}_site_srHook_Site2 -v SFAIL -t crm_config -s SAPHanaSR
{refsidadm} ALL=(ALL) NOPASSWD: GSH_QUERY, GSH_UPDATE, SOK_GLOB, SFAIL_GLOB, SOK_GLOB_MTS, SFAIL_GLOB_MTS, SOK_SITEA, SFAIL_SITEA, SOK_SITEB, SFAIL_SITEB
{refsidadm} ALL=(ALL) NOPASSWD: /usr/sbin/SAPHanaSR-hookHelper --sid={refsidLC} --case=checkTakeover
----
Manual page SAPHanaSrMultiTarget.py(7) contains additional details.

Expand Down Expand Up @@ -1539,23 +1594,62 @@ Check if {HANA} has finished starting.
----
==========

=== Testing the hook integration

When the {HANA} database has been restarted after the changes, check if the hook
script is called correctly.
A useful verification is to check the {HANA} trace files as {refsidadm}:
=== Testing the HA/DR provider hook script integration

[subs="specialchars,attributes"]
When the {saphana} database has been restarted after the changes, check if the
hook scripts have been loaded correctly.
A useful verification is to check the {saphana} trace files as {refsidadm}.
More complete checks wil be done later, when the Linux cluster is up and running.

==== Checking for SAPHanaSrMultiTarget.py

Check if {saphana} has initialized the SAPHanaSrMultiTarget.py hook script for
the srConnectionChanged events. Check the HANA name server trace files and
the specific hook script trace file. Do this on both sites' master name server.
See also manual page SAPHanaSrMultiTarget.py(7).
----
{mySite1FirstNode}:ha1adm> cdtrace
{mySite1FirstNode}:ha1adm> awk '/ha_dr_SAPHanaS.*crm_attribute/ \
{ printf "%s %s %s %s\n",$2,$3,$5,$16 }' nameserver_{mySite1FirstNode}.*
2021-05-04 12:34:04.476445 ha_dr_SAPHanaS...SFAIL
2021-05-04 12:53:06.316973 ha_dr_SAPHanaS...SOK
~> cdtrace
~> grep HADR.*load.*SAPHanaSrMultiTarget nameserver_*.trc | tail -3
~> grep SAPHanaSr.*init nameserver_*.trc | tail -3
~> grep -A5 "init.called" nameserver_saphanasr_multitarget_hook.trc
----
// TODO PRIO2: output example

// TODO PRIO2: what makes sense right after first init? should we move this example to general testing?
////
After an srConnectionChanged event has been processed by the HA/DR provider
script, check for the correct behaviour. Do this on the primary site´s master
nameserver. See also manual page SAPHanaSrMultiTarget.py(7).
[subs="specialchars,attributes"]
----
{mySite1FirstNode}:{mysidlc}adm~> cdtrace
{mySite1FirstNode}:{mysidlc}adm~> awk '/ha_dr_SAPHanaS.*crm_attribute/ \
{ printf "%s %s %s %s\n",$2,$3,$5,$16 }' nameserver_{mySite1FirstNode}.*.trc
2022-11-11 11:34:04.476445 ha_dr_SAPHanaS...SFAIL
2022-11-11 11:53:06.316973 ha_dr_SAPHanaS...SOK
----
// TODO PRIO2: some content here
----
~> cdtrace
~> grep SAPHanaSr.*srConnection.*CRM nameserver_*.trc
~> grep SAPHanaSr.*srConnection.*fallback nameserver_*.trc
----
////
// TODO PRIO3: align check with manual page SAPHanaSrMultiTarget.py(7)

==== Checking for susTkOver.py

Check if {saphana} has initialized the susTkOver.py hook script for
the preTakeover events. Check the HANA name server trace. Do this on all nodes.
See also manual page susTkOver.py(7).
----
~> cdtrace
~> grep HADR.*load.*susTkOver nameserver_*.trc | tail -3
~> grep susTkOver.init nameserver_*.trc | tail -3
----


== Configuring the cluster and {HANA} resources
.<<Planning>> <<OsSetup>> <<SAPHanaInst>> <<SAPHanaHsr>> <<Integration>> Cluster <<Testing>>
image::SAPHanaSR-ScaleOut-Plan-Phase6.svg[scaledwidth="100%"]
Expand Down
1 change: 1 addition & 0 deletions adoc/SLES4SAP-hana-scaleout-multitarget-perfopt-15.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -1626,6 +1626,7 @@ More specific parameters option to meet a high security level.
Cmnd_Alias SOK = /usr/sbin/crm_attribute -n hana_{refsidLC}_glob_srHook -v SOK -t crm_config -s SAPHanaSR
Cmnd_Alias SFAIL = /usr/sbin/crm_attribute -n hana_{refsidLC}_glob_srHook -v SFAIL -t crm_config -s SAPHanaSR
Cmnd_Alias GSH = /usr/sbin/crm_attribute -n hana_{refsidLC}_glob_srHook -v * -l reboot -t crm_config -s SAPHanaSR
{refsidadm} ALL=(ALL) NOPASSWD: /usr/sbin/SAPHanaSR-hookHelper --sid={refsidLC} --case=checkTakeover
{refsidadm} ALL=(ALL) NOPASSWD: SOK, SFAIL, GSH
----
////
Expand Down

0 comments on commit eba1ef0

Please sign in to comment.