From eba1ef01357a7c392eddf4803f88954978d8d3ba Mon Sep 17 00:00:00 2001 From: Lars Pinne Date: Fri, 28 Jul 2023 11:48:54 +0200 Subject: [PATCH] SLES4SAP-hana-scaleOut-PerfOpt-15.adoc SLES4SAP-hana-scaleout-multitarget-perfopt-15.adoc: susTkOver.py --- adoc/SLES4SAP-hana-scaleOut-PerfOpt-15.adoc | 134 +++++++++++++++--- ...-hana-scaleout-multitarget-perfopt-15.adoc | 1 + 2 files changed, 115 insertions(+), 20 deletions(-) diff --git a/adoc/SLES4SAP-hana-scaleOut-PerfOpt-15.adoc b/adoc/SLES4SAP-hana-scaleOut-PerfOpt-15.adoc index dfd6a310..741895a9 100644 --- a/adoc/SLES4SAP-hana-scaleOut-PerfOpt-15.adoc +++ b/adoc/SLES4SAP-hana-scaleOut-PerfOpt-15.adoc @@ -254,7 +254,7 @@ the parameter sheet and than begin with the installation. === Scale-out scenario and resource agents To automate the failover, the High Availability Extension built into -_{sles4sap}_ is used. Two resource agents have been created to handle the scenario. +{sles4sap} is used. Two resource agents have been created to handle the scenario. The first is the *SAPHanaController* resource agent (RA), which checks and manages the {HANA} database instances. This RA is configured as a @@ -325,7 +325,7 @@ could be a custom python hook using the SAP provider *srServiceStateChanged()* available since {HANA} 2.0 SPS01. To achieve an automation of this resource handling process, use the -{HANA} resource agents included in the _SAPHanaSR-ScaleOut_ RPM package +{HANA} resource agents included in the SAPHanaSR-ScaleOut RPM package delivered with {sles4sap}. You can configure the level of automation by setting the parameter @@ -337,7 +337,7 @@ Find configuration details in manual page ocf_suse_SAPHanaController(7). Read the SAP Notes and papers first. -The _SAPHanaSR-ScaleOut_ resource agent software package +The SAPHanaSR-ScaleOut resource agent software package supports scale-out (multiple-box to multiple-box) system replication with the following configurations and parameters: // TODO PRIO2: align prerequisites section with scale-up guide and manual pages @@ -382,6 +382,20 @@ However, all nodes in one Linux cluster have to use the same style. _sapinit_ auto-start. * The replication mode should be either 'sync' or 'syncmem'. But 'async' is not supported. +* SAP HANA 2.0 SPS04 or later provides the HA/DR provider hook method srConnectionChanged() + with needed parameters for SAPHanaSrMultiTarget.py. +* SAP HANA 2.0 SPS05 or later provides the HA/DR provider hook method srServiceStateChanged() + with needed parameters for susChkSrv.py. +* SAP HANA 2.0 SPS06 or later provides the HA/DR provider hook method preTakeover() with + multi-target aware parameters and separate return code for Linux HA clusters. +* No other HA/DR provider hook script should be configured for the above mentioned methods. + Hook scripts for other methods, provided in SAPHanaSR-ScaleOut, can be used in parallel, + if not documented contradictingly. +* The Linux cluster needs to be up and running to allow HA/DR provider events being written + into CIB attributes. The current HANA SR status might differ from CIB srHook attribute + after Linux cluster maintenance. +* The user {refSIDadm} needs execution permission as user root for the command + _SAPHanaSR-hookHelper_. * The Linux cluster can be either freshly installed as described in this guide, or it can be upgraded as described in respective documentation. Not allowed is mixing old and new cluster attributes or hook scripts within @@ -389,7 +403,7 @@ However, all nodes in one Linux cluster have to use the same style. Find more details in the REQUIREMENTS section of manual pages SAPHanaSR-ScaleOut(7), ocf_suse_SAPHanaController(7), -SAPHanaSrMultiTarget.py(7) and SAPHanaSR-manageAttr(8). +SAPHanaSrMultiTarget.py(7), susTkOver.py(7) and SAPHanaSR-manageAttr(8). [IMPORTANT] ==== @@ -399,14 +413,15 @@ automated registration of a failed primary, therefore the `AUTOMATED_REGISTER="false"` is the *default*. In this case, you need to register a failed primary after a takeover manually. -Use SAP tools like {HANA} Cockpit or *hdbnsutil*. +Use SAP tools like {HANA} Cockpit or *hdbnsutil*. Make sure to use +always the exact site names as already known to the cluster. ==== * For optimal automation, _AUTOMATED_REGISTER="true"_ is recommended. * Automated start of {HANA} instances during system boot must be switched *off*. * You need at least SAPHanaSR-ScaleOut version 0.180, {sles4sap} {pn15} SP1 and - {HANA} 2.0 SPS 4 for all mentioned setups. + {HANA} 2.0 SPS 5 for all mentioned setups. IMPORTANT: You must implement a valid STONITH method. Without a valid STONITH method, the complete cluster is unsupported and will not work properly. @@ -1371,12 +1386,14 @@ when the system replication is back. This HA/DR provider method is .Procedure . Stop {saphana} -. Implement the HA/DR python hook {haDrMultiTargetPy} +. Implement {haDrMultiTargetPy} srConnectionChanged +. Implement susTkOver.py for preTakeover . Configure system replication operation mode . Allow {refsidadm} to access the cluster . Start {HANA} . Test the hook integration + === Stopping {saphana} The {saphana} needs to be stopped at both sites that will be part of the Linux @@ -1389,14 +1406,15 @@ cluster. At each site do the following: ~> sapcontrol -nr {refInst} -function GetSystemInstanceList ---- -=== Implementing the HA/DR provider hook {haDrMultiTargetPy} + +=== Implementing SAPHanaSrMultiTarget.py for srConnectionChanged // TODO PRIO3: explain new default SAPHanaSrMultiTarget.py, even for non-multi-target This step must be done on both sites. {HANA} must be stopped to change the -_global.ini_ and allow {HANA} to integrate the HA/DR hook script during start. +global.ini and allow {HANA} to integrate the HA/DR hook script during start. -- Integrate the hook into _global.ini_ ({HANA} needs to be stopped for doing that offline) +- Integrate the hook into global.ini ({HANA} needs to be stopped for doing that offline) - Check integration of the hook during {HANA} start-up The ready-to-use HA/DR hook script is shipped with the SAPHanaSR-ScaleOut package in @@ -1418,6 +1436,39 @@ ha_dr_saphanasrmultitarget = info =================================== +=== Implementing susTkOver.py for preTakeover + +This step must be done on both sites that will be part of the cluster. +Use the {saphana} tools for changing global.ini and integrating the hook script. +In global.ini, the section `[ha_dr_provider_sustkover]` needs to be created. +The section `[trace]` might be adapted. +The ready-to-use HA/DR hook script is shipped with the SAPHanaSR-ScaleOut +package in directory /usr/share/SAPHanaSR-ScaleOut/. +The hook script must be available on all cluster nodes, including the majority +maker. Find more details in manual pages susTkOver.py(7) and +SAPHanaSR-manageProvider(8). + +.Adding susTkOver.py via global.ini +=================================== +---- +[ha_dr_provider_sustkover] +provider = susTkOver +path = /usr/share/SAPHanaSR-ScaleOut/ +execution_order = 2 +sustkover_timeout = 30 + +[trace] +ha_dr_sustkover = info +---- +=================================== + +It is again reminded that the srHook script "susTkOver.py" is not available in +the installation ISO media. It is only available in update channels of +{sles4sap} 15 SP4 or earlier. So, for a correctly working setup a full system +patching is mandatory after registering the system to SCC, RMT or SUSE Manager. +From {sles4sap} 15 SP5 onwards the "susTkOver.py" will be included in the ISO. + + === Configuring the system replication operation mode When your system is connected as an {sapHanaSR} target, you can find an entry @@ -1474,13 +1525,16 @@ Replace the {refsidadm} by the SAP system ID admin user. # SAPHanaSR-ScaleOut needs for srHook {refsidadm} ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_{refsidLC}_site_srHook_* {refsidadm} ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_{refsidLC}_gsh * +{refsidadm} ALL=(ALL) NOPASSWD: /usr/sbin/SAPHanaSR-hookHelper --sid={refsidLC} * ---- Below is the result of replacing {refsidLC} with {mysidLc}: +// TODO PRIO2: use variables {sidadm} amd {sidlc} ---- # SAPHanaSR-ScaleOut needs for srHook ha1adm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_ha1_site_srHook_* ha1adm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_ha1_gsh * +ha1adm ALL=(ALL) NOPASSWD: /usr/sbin/SAPHanaSR-hookHelper --sid=ha1 * ---- =================================== @@ -1505,6 +1559,7 @@ Cmnd_Alias SFAIL_SITEA = /usr/sbin/crm_attribute -n hana_{refsidLC}_site_srHo Cmnd_Alias SOK_SITEB = /usr/sbin/crm_attribute -n hana_{refsidLC}_site_srHook_Site2 -v SOK -t crm_config -s SAPHanaSR Cmnd_Alias SFAIL_SITEB = /usr/sbin/crm_attribute -n hana_{refsidLC}_site_srHook_Site2 -v SFAIL -t crm_config -s SAPHanaSR {refsidadm} ALL=(ALL) NOPASSWD: GSH_QUERY, GSH_UPDATE, SOK_GLOB, SFAIL_GLOB, SOK_GLOB_MTS, SFAIL_GLOB_MTS, SOK_SITEA, SFAIL_SITEA, SOK_SITEB, SFAIL_SITEB +{refsidadm} ALL=(ALL) NOPASSWD: /usr/sbin/SAPHanaSR-hookHelper --sid={refsidLC} --case=checkTakeover ---- Manual page SAPHanaSrMultiTarget.py(7) contains additional details. @@ -1539,23 +1594,62 @@ Check if {HANA} has finished starting. ---- ========== -=== Testing the hook integration -When the {HANA} database has been restarted after the changes, check if the hook -script is called correctly. -A useful verification is to check the {HANA} trace files as {refsidadm}: +=== Testing the HA/DR provider hook script integration -[subs="specialchars,attributes"] +When the {saphana} database has been restarted after the changes, check if the +hook scripts have been loaded correctly. +A useful verification is to check the {saphana} trace files as {refsidadm}. +More complete checks wil be done later, when the Linux cluster is up and running. + +==== Checking for SAPHanaSrMultiTarget.py + +Check if {saphana} has initialized the SAPHanaSrMultiTarget.py hook script for +the srConnectionChanged events. Check the HANA name server trace files and +the specific hook script trace file. Do this on both sites' master name server. +See also manual page SAPHanaSrMultiTarget.py(7). ---- -{mySite1FirstNode}:ha1adm> cdtrace -{mySite1FirstNode}:ha1adm> awk '/ha_dr_SAPHanaS.*crm_attribute/ \ - { printf "%s %s %s %s\n",$2,$3,$5,$16 }' nameserver_{mySite1FirstNode}.* -2021-05-04 12:34:04.476445 ha_dr_SAPHanaS...SFAIL -2021-05-04 12:53:06.316973 ha_dr_SAPHanaS...SOK +~> cdtrace +~> grep HADR.*load.*SAPHanaSrMultiTarget nameserver_*.trc | tail -3 +~> grep SAPHanaSr.*init nameserver_*.trc | tail -3 +~> grep -A5 "init.called" nameserver_saphanasr_multitarget_hook.trc ---- +// TODO PRIO2: output example +// TODO PRIO2: what makes sense right after first init? should we move this example to general testing? +//// +After an srConnectionChanged event has been processed by the HA/DR provider +script, check for the correct behaviour. Do this on the primary siteĀ“s master +nameserver. See also manual page SAPHanaSrMultiTarget.py(7). +[subs="specialchars,attributes"] +---- +{mySite1FirstNode}:{mysidlc}adm~> cdtrace +{mySite1FirstNode}:{mysidlc}adm~> awk '/ha_dr_SAPHanaS.*crm_attribute/ \ + { printf "%s %s %s %s\n",$2,$3,$5,$16 }' nameserver_{mySite1FirstNode}.*.trc +2022-11-11 11:34:04.476445 ha_dr_SAPHanaS...SFAIL +2022-11-11 11:53:06.316973 ha_dr_SAPHanaS...SOK +---- +// TODO PRIO2: some content here +---- +~> cdtrace +~> grep SAPHanaSr.*srConnection.*CRM nameserver_*.trc +~> grep SAPHanaSr.*srConnection.*fallback nameserver_*.trc +---- +//// // TODO PRIO3: align check with manual page SAPHanaSrMultiTarget.py(7) +==== Checking for susTkOver.py + +Check if {saphana} has initialized the susTkOver.py hook script for +the preTakeover events. Check the HANA name server trace. Do this on all nodes. +See also manual page susTkOver.py(7). +---- +~> cdtrace +~> grep HADR.*load.*susTkOver nameserver_*.trc | tail -3 +~> grep susTkOver.init nameserver_*.trc | tail -3 +---- + + == Configuring the cluster and {HANA} resources .<> <> <> <> <> Cluster <> image::SAPHanaSR-ScaleOut-Plan-Phase6.svg[scaledwidth="100%"] diff --git a/adoc/SLES4SAP-hana-scaleout-multitarget-perfopt-15.adoc b/adoc/SLES4SAP-hana-scaleout-multitarget-perfopt-15.adoc index ae298c80..33b44f02 100644 --- a/adoc/SLES4SAP-hana-scaleout-multitarget-perfopt-15.adoc +++ b/adoc/SLES4SAP-hana-scaleout-multitarget-perfopt-15.adoc @@ -1626,6 +1626,7 @@ More specific parameters option to meet a high security level. Cmnd_Alias SOK = /usr/sbin/crm_attribute -n hana_{refsidLC}_glob_srHook -v SOK -t crm_config -s SAPHanaSR Cmnd_Alias SFAIL = /usr/sbin/crm_attribute -n hana_{refsidLC}_glob_srHook -v SFAIL -t crm_config -s SAPHanaSR Cmnd_Alias GSH = /usr/sbin/crm_attribute -n hana_{refsidLC}_glob_srHook -v * -l reboot -t crm_config -s SAPHanaSR +{refsidadm} ALL=(ALL) NOPASSWD: /usr/sbin/SAPHanaSR-hookHelper --sid={refsidLC} --case=checkTakeover {refsidadm} ALL=(ALL) NOPASSWD: SOK, SFAIL, GSH ---- ////