From 094331847dafbb8ce1c13f4737d6baaea7093c1d Mon Sep 17 00:00:00 2001
From: Meike Chabowski <meike.chabowski@suse.com>
Date: Tue, 15 Aug 2023 18:12:54 +0200
Subject: [PATCH] Implemented doc review enhancements

Did semi-automated stylecheck, spellcheck, linkcheck. Implemented
changes according to SUSE documentation Style Guide.
Fixed typos, shortened sentences for better readability, adjusted format, changed wording to be
more precise, fixed punctuation.
---
 xml/MAIN-SBP-GCC-12.xml | 114 ++++++++++++++++++++--------------------
 1 file changed, 57 insertions(+), 57 deletions(-)
diff --git a/xml/MAIN-SBP-GCC-12.xml b/xml/MAIN-SBP-GCC-12.xml
index 148158c7..cccbff3b 100644
--- a/xml/MAIN-SBP-GCC-12.xml
+++ b/xml/MAIN-SBP-GCC-12.xml
@@ -29,7 +29,7 @@
 
 
    <meta name="series">SUSE Best Practices</meta>
-   <meta name="category">Performance</meta>
+   <meta name="category">System Tuning and Performance</meta>
 
    <meta name="platform">SUSE Linux Enterprise Server 15 SP4</meta>
    <meta name="platform">Development Tools Module</meta>
@@ -140,8 +140,8 @@
    took place in May 2022. Later that month, the entire openSUSE Tumbleweed Linux
    distribution was rebuilt with it and shipped to users. GCC 12.2, with fixes to over 71 bugs, was
    released in August of the same year. Subsequently, it has replaced the compiler in the SUSE Linux
-   Enterprise (SLE) Development Tools Module.  GCC 12.3 followed in May 2023 and apart from further
-   bug fixes also introduced support for Zen 4 based CPUs.  GCC 12 comes with many new features, such as
+   Enterprise (SLE) Development Tools Module. GCC 12.3 followed in May 2023. Apart from further
+   bug fixes, it also introduced support for Zen 4 based CPUs. GCC 12 comes with many new features, such as
    implementing parts of the most recent versions of specifications of various languages (especially
     <literal>C2X</literal>, <literal>C++20</literal>, <literal>C++23</literal>) and their extensions
    (OpenMP, OpenACC), supporting new capabilities of a wide range of computer architectures and
@@ -149,11 +149,11 @@
 
   <para> This document gives an overview of GCC 12. It focuses on selecting appropriate optimization
    options for your application and stresses the benefits of advanced modes of compilation. First,
-   we describe the optimization levels the compiler offers and other important options developers
+   we describe the optimization levels the compiler offers, and other important options developers
    often use. We explain when and how you can benefit from using <emphasis role="bold">Link Time
     Optimization (LTO)</emphasis> and <emphasis role="bold">Profile Guided Optimization
-    (PGO)</emphasis> builds. We also detail their effects when building a set of well known CPU
-   intensive benchmarks, and we look at how these perform on AMD Zen 4 based EPYC 9004 Series
+    (PGO)</emphasis> builds. We also detail their effects when building a set of well-known CPU
+   intensive benchmarks. Finally, we look at how these perform on AMD Zen 4 based EPYC 9004 Series
    Processors.  </para>
   <!-- If we manage to revive FF analysis put the following back: Finally, we take a closer look at
        the effects they have on a big software project: Mozilla Firefox. -->
@@ -217,8 +217,8 @@ warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
     <title>Code using <literal>C++17</literal> features</title>
     <para> Code using <literal>C++17</literal> features should always be compiled with the compiler
      from the Development Tools Module. Linking two objects, such as an application and a shared
-     library, which both use <literal>C++17</literal>, one was built with <literal>g++</literal> 8
-     or earlier and the other with <literal>g++</literal> 9 or later is particularly dangerous
+     library, which both use <literal>C++17</literal>, where one was built with <literal>g++</literal> 8
+     or earlier and the other with <literal>g++</literal> 9 or later, is particularly dangerous. This is
      because <literal>C++</literal> STL objects instantiated by the experimental code may provide
      implementation and even ABI that is different from what the mature implementation expects and
      vice versa. Issues caused by such a mismatch are difficult to predict and may include silent
@@ -237,7 +237,7 @@ warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
      <para> Proposal P0912R5</para>
     </footnote> are also implemented but require that the source file is compiled with the
     <literal>-&#8288;fcoroutines</literal> switch.  GCC 12 also experimentally implements many
-    <literal>C++23</literal> features, if you are interested in the implementation
+    <literal>C++23</literal> features. If you are interested in the implementation
     status of any particular <literal>C++</literal> feature in the compiler or the standard library,
     consult the following pages: </para>
 
@@ -377,8 +377,8 @@ warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    <title>Installing GCC 12 from the Development Tools Module</title>
 
    <para> Similar to other modules and extensions for SUSE Linux Enterprise Server 15, you can
-    activate the Development Tools Module either using the command line tool
-     <command>SUSEConnect</command> or using the <command>YaST</command> setup and configuration
+    activate the Development Tools Module using either the command line tool
+     <command>SUSEConnect</command> or the <command>YaST</command> setup and configuration
     tool. To use the former, carry out the following steps: </para>
 
    <procedure>
@@ -515,7 +515,7 @@ S | Name                         | Summary
     <literal>-&#8288;O3</literal> imply anything about the precision and semantics of
    floating-point operations. Even at the optimization level <literal>-&#8288;O3</literal> GCC
    implements math operations and functions so that they follow the respective IEEE and/or ISO
-   rules<footnote><para>When when the rounding mode is set to the default round-to-nearest (look up
+   rules<footnote><para> When the rounding mode is set to the default round-to-nearest (look up
    <literal>-&#8288;frounding-&#8288;math</literal> in the manual).</para></footnote>
    with the exception of allowing floating-point expression contraction, for example when fusing an
    addition and a multiplication into one operation<footnote><para>See
@@ -531,13 +531,13 @@ S | Name                         | Summary
    imply <literal>-&#8288;ffast-math</literal> along with a few options that disregard strict
    standard compliance. In GCC 12 this level also means the optimizers may introduce data races when
    moving memory stores which may not be safe for multithreaded applications and disregards the
-   possibility of ELF symbol interposition happening at run-time. Additionally, the
+   possibility of ELF symbol interposition happening at runtime. Additionally, the
    Fortran compiler can take advantage of associativity of math operations even across parentheses
    and convert big memory allocations on the heap to allocations on stack. The last mentioned
    transformation may cause the code to violate maximum stack size allowed by
     <command>ulimit</command> which is then reported to the user as a segmentation fault. We often
    use level <literal>-&#8288;Ofast</literal> to build benchmarks. It is a shorthand for the
-   options on top of <literal>-&#8288;O3</literal> which often make them run faster and most
+   options on top of <literal>-&#8288;O3</literal> which often make them run faster. Most
    benchmarks are intentionally written in a way that they run correctly even when these
    rules are relaxed. </para>
 
@@ -581,7 +581,7 @@ S | Name                         | Summary
    therefore be a challenging task but usually is still somewhat possible. </para>
 
   <para> The complete list of optimization and other command line switches is available in the
-   compiler manual, provided in the info format in the package <package>gcc12-info</package> or
+   compiler manual. The manual is provided in the info format in the package <package>gcc12-info</package> or
    online at <link xlink:href="https://gcc.gnu.org/onlinedocs/gcc-12.3.0/gcc/">the GCC project Web
    site</link>. </para>		
 
@@ -600,8 +600,8 @@ S | Name                         | Summary
     aware that its <emphasis role="italic">release</emphasis> optimization level defaults to
      <literal>-&#8288;O3</literal> which might not be what you want. To change it, you must
     modify the <literal>CMAKE_C_FLAGS_RELEASE</literal>, <literal>CMAKE_CXX_FLAGS_RELEASE</literal>
-    and/or <literal>CMAKE_Fortran_FLAGS_RELEASE</literal> variables, since they are appended at the
-    end of the compilation command lines, thus overwriting any level set in the variables
+    and/or <literal>CMAKE_Fortran_FLAGS_RELEASE</literal> variables. Since they are appended at the
+    end of the compilation command lines, they are overwriting any level set in the variables
      <literal>CMAKE_C_FLAGS</literal>, <literal>CMAKE_CXX_FLAGS</literal>, and the like. </para>
   </note>
  </sect1>
@@ -621,7 +621,7 @@ S | Name                         | Summary
    instruction set extensions, you can specify it on the command line. Their complete list is
    available in the manual, but the most prominent one is <literal>-&#8288;march</literal> which
    lets you select a CPU model to generate code for. For example, if you know that your program will
-   only be executed on AMD EPYC 9004 Series Processors which is based on AMD Zen 4 cores or
+   only be executed on AMD EPYC 9004 Series Processors based on AMD Zen 4 cores or
    processors that are compatible with it, you can instruct GCC to take advantage of all the
    instructions the CPU supports with option <literal>-&#8288;march=znver4</literal>. Note that
    on SUSE Linux Enterprise Server 15, the system compiler does not know this particular value of
@@ -640,9 +640,9 @@ S | Name                         | Summary
    <note>
     <title>Running 32-bit code</title>
     <para> SUSE Linux Enterprise Server does not support compilation of 32-bit applications, it
-    only offers runtime support for 32-bit binaries.  In order to do so, you will need 32-bit
+    only offers runtime support for 32-bit binaries. To do so, you will need 32-bit
     libraries your binary depends on which likely include at least glibc which can be found in
-    package <literal>glibc-32bit</literal>.  See <link
+    package <literal>glibc-32bit</literal>. See <link
     xlink:href="https://documentation.suse.com/sles/15-SP4/html/SLES-all/cha-64bit.html">chapter 20
     (32-bit and 64-bit applications in a 64-bit system environment) of the Administration
     Guide</link> for more information. </para>
@@ -656,8 +656,8 @@ S | Name                         | Summary
   <para>
    <xref linkend="fig-gcc12-nonlto-build" xrefstyle="template:Figure %n"/> outlines the classic mode
    of operation of a compiler and a linker. Pieces of a program are compiled and optimized in chunks
-   defined by the user called compilation units to produce so-called object files which already
-   contain binary machine instructions and which are combined together by a linker. Because the
+   defined by the user called compilation units to produce so-called object files. These object files already
+   contain binary machine instructions and are combined together by a linker. Because the
    linker works at such low level, it cannot perform much optimization and the division of the
    program into compilation units thus presents a profound barrier to optimization. </para>
 
@@ -676,7 +676,7 @@ S | Name                         | Summary
   <para> This limitation can be overcome by rearranging the process so that the linker does not
    receive as its input the almost finished object files containing machine instructions, but is
    invoked on files containing so called <emphasis role="italic">intermediate language</emphasis>
-   (IL) which is a much richer representation of each original compilation unit (see figure <xref
+   (IL). This is a much richer representation of each original compilation unit (see figure <xref
     linkend="fig-gcc12-lto-build" xrefstyle="template:figure %n"/>). The linker identifies the input
    as not yet entirely compiled and invokes a linker plugin which in turn runs the compiler again.
    But this time it has at its disposal the representation of the entire program or library that is
@@ -774,7 +774,7 @@ S | Name                         | Summary
     the assembler snippets defining symbols must be placed into a separate assembler source file so
     that they only participate in the final linking step. Global <literal>register</literal>
     variables are not supported by LTO, and programs either must not use this feature or be built the
-    traditional way. It is also possible to exclude just some compilation units from LTO (simply by
+    traditional way. It is also possible to exclude some compilation units from LTO (simply by
     compiling them without <literal>-&#8288;flto</literal> or appending
     <literal>-&#8288;fno-&#8288;lto</literal> to the compilation command line), while the rest of the
     program can still benefit from using this feature.</para>
@@ -799,7 +799,7 @@ int foo_v1 (void)
     themselves otherwise. Violations of (strict) <emphasis role="italic">aliasing</emphasis> rules
     and <literal>C++</literal>
     <emphasis role="italic">one definition rule</emphasis> tend to cause misbehavior significantly
-    more often; the latter is fortunately reported by the <literal>-Wodr</literal> warning which is
+    more often. The latter is fortunately reported by the <literal>-Wodr</literal> warning which is
     on by default and should not be ignored. We have also seen cases where the use of the
      <literal>flatten</literal> function attribute led to unsustainable amount of inlining with LTO.
     Furthermore, LTO is not a good fit for code snippets compiled by <literal>configure</literal>
@@ -832,7 +832,7 @@ int foo_v1 (void)
    then execute the resulting binary in one or multiple <emphasis role="italic">train
     runs</emphasis> during which it will save information about the behavior of the program to
    special files. Afterward, the project needs to be rebuilt again, this time with the
-    <literal>-&#8288;fprofile-use</literal> option which instructs the compiler to look for the
+    <literal>-&#8288;fprofile-use</literal> option. This instructs the compiler to look for the
    files with the measurements and use them when making optimization decisions, a process called
     <emphasis role="italic">Profile-Guided Optimization (PGO)</emphasis>. </para>
 
@@ -870,8 +870,8 @@ int foo_v1 (void)
    non-profit corporation that publishes a variety of industry standard benchmarks to evaluate
    performance and other characteristics of computer systems. Its latest suite of CPU intensive
    workloads, SPEC CPU 2017, is often used to compare compilers and how well they optimize code with
-   different settings because the included benchmarks are well known and represent a wide variety of
-   computation-heavy programs. This section highlights selected results of a GCC 12 evaluation using
+   different settings. This is because the included benchmarks are well known and represent a wide variety of
+   computation-heavy programs. The following section highlights selected results of a GCC 12 evaluation using
    the suite. </para>
 
   <para> Note that when we use SPEC to perform compiler comparisons, we are lenient toward some
@@ -953,7 +953,7 @@ int foo_v1 (void)
     faster than when compiled with GCC 11 and the same optimization level.  Nevertheless, it still
     benefits from the more advanced modes of compilation a lot, together with several other
     benchmarks which are derived from programs that are typically compiled with
-    <literal>-&#8288;O2</literal>, as can be seen in <xref
+    <literal>-&#8288;O2</literal>. This is illustrated in <xref
     linkend="fig-gcc12-specint-o2-pgolto-perf-indiv" xrefstyle="template:figure %n"/>. </para>
 
    <figure xml:id="fig-gcc12-specint-o2-pgolto-perf-indiv">
@@ -975,7 +975,7 @@ int foo_v1 (void)
     (measured without debug info). Note that it does not depict that the size of benchmark
     <literal>548.exchange2_r</literal> grew to 290% and 200% of the original size when built with
     PGO or both PGO and LTO respectively, which looks huge but the growth is from a particularly
-    small base.  It is the only Fortran benchmark in the integer suite and, most importantly, the
+    small base. It is the only Fortran benchmark in the integer suite and, most importantly, the
     size penalty is offset by significant speed-up, making the trade-off reasonable. For
     completeness, we show this result in <xref linkend="fig-gcc12-specint-o2-ltopgo-size-exchange"
     xrefstyle="template:figure %n"/>
@@ -1062,7 +1062,7 @@ int foo_v1 (void)
    </figure>
 
    <para>Many of the SPEC 2017 floating-point benchmarks measure how well a given system can
-    optimize and execute a handful of number crunching loops and they often come from performance
+    optimize and execute a handful of number crunching loops. They often come from performance
     sensitive programs written with traditional compilation method in mind. Consequently there are
     fewer cross-module dependencies, identifying hot paths is less crucial and the overall effect
     of LTO and PGO suite only improves by 5% (see <xref
@@ -1175,8 +1175,8 @@ int foo_v1 (void)
     </mediaobject>
    </figure>
 
-   <para> Floating-point computations tend to particularly benefit from vectorization advancements
-   and so it should be no surprise that the FPrate benchmarks also improve substantially when
+   <para> Floating-point computations tend to particularly benefit from vectorization advancements. 
+   Thus it should be no surprise that the FPrate benchmarks also improve substantially when
    compiled with GCC 12.3, which also emits AVX512 instructions for a Zen 4 based CPU.  The overall
    boost is shown in <xref linkend="fig-gcc12-specfp-ofast-vs7-geomean" xrefstyle="template:figure
    %n"/> whereas <xref linkend="fig-gcc12-specfp-ofast-vs7-indiv" xrefstyle="template:figure %n"/>
@@ -1221,11 +1221,11 @@ int foo_v1 (void)
 
     <para> We have built the benchmarking suite using optimization level
     <literal>-&#8288;O3</literal>, LTO (though without PGO) and
-    <literal>-&#8288;march=native</literal> to target the native ISA of our AMD EPYC 9654 Processor
-    and we compared its runtime score against the suite built with these options and
+    <literal>-&#8288;march=native</literal> to target the native ISA of our AMD EPYC 9654 Processor.
+    Then we compared its runtime score against the suite built with these options and
     <literal>-&#8288;ffast-math</literal>. As you can see in <xref
     linkend="fig-gcc12-specfp-o3-fastmath-geomean" xrefstyle="template:figure %n"/>, the geometric
-    mean grew by over 13%, but a quick look at <xref linkend="fig-gcc12-specfp-o3-fastmath-indiv"
+    mean grew by over 13%. But a quick look at <xref linkend="fig-gcc12-specfp-o3-fastmath-indiv"
     xrefstyle="template:figure %n"/> will tell you that there are four benchmarks with scores which
     improved by more than 20% and that of <literal>510.parest_r</literal> grew by over 76%. </para>
 
@@ -1333,14 +1333,14 @@ int foo_v1 (void)
      <xref linkend="fig-gcc12-specint-ofast-vsllvm-indiv" xrefstyle="template:Figure %n"/> shows
      relative rates of integer benchmarks written in C/C++ and the compilers perform fairly
      similarly there. GCC wins by a large margin on <literal>500.perlbench_r</literal> but loses
-     significantly when compiling <literal>525.x264_r</literal>.  This is because the compiler
+     significantly when compiling <literal>525.x264_r</literal>. This is because the compiler
      chooses a vectorizing factor that is too large for the important loops in this video encoder.
      It is possible to mitigate the problem using compiler option
      <literal>-&#8288;mprefer-&#8288;vector-&#8288;width=128</literal>, with which it is again
      competitive, as you can see in <xref linkend="fig-gcc12-specint-ofast-vsllvm-x264_128"
      xrefstyle="template:figure %n"/>.  This problem is being actively worked on by the upstream
-     GCC community and we plan to use masked vectorized epilogues to minimize the fallout of
-     choosing a large vectorizing factor for the principal vector loop.  Note that PGO can
+     GCC community. We plan to use masked vectorized epilogues to minimize the fallout of
+     choosing a large vectorizing factor for the principal vector loop. Note that PGO can
      substantially help in this case too.
    </para>
 
@@ -1361,14 +1361,14 @@ int foo_v1 (void)
    compiled with LLVM with LTO, we have excluded the benchmark in our comparison of geometric mean
    of SPEC FPrate 2017 suite depicted in <xref linkend="fig-gcc12-specfp-ofast-vsllvm-geomean"
    xrefstyle="template:figure %n"/>. The floating point benchmark suite contains many more Fortran
-   benchmarks and it can be seen that GCC has advantage in having a mature optimization pipeline
+   benchmarks. It can be seen that GCC has advantage in having a mature optimization pipeline
    for this language as well, especially when compiling <literal>503.bwaves_r</literal>,
    <literal>510.parest_r</literal>, <literal>549.fotonik3d_r</literal>,
    <literal>554.roms_r</literal> (see <xref linkend="fig-gcc12-specfp-ofast-vsllvm-indiv"
    xrefstyle="template:figure %n"/>) and the already mentioned <literal>527.cam4_r</literal> (see
-   <xref linkend="fig-gcc12-specfp-ofast-vsllvm-cam4" xrefstyle="template:figure %n"/>).  The
+   <xref linkend="fig-gcc12-specfp-ofast-vsllvm-cam4" xrefstyle="template:figure %n"/>). The
    comparison also shows that the performance of <literal>538.imagick_r</literal> when compiled
-   with GCC 12.3 is substantially smaller.  This is caused by <emphasis role="italic">store-to-load
+   with GCC 12.3 is substantially smaller. This is caused by <emphasis role="italic">store-to-load
    forwarding stall</emphasis> issues, which can be mitigated by relaxing inlining limits,
    something that GCC 13 does automatically.
    </para>
@@ -1415,19 +1415,19 @@ int foo_v1 (void)
     <!-- ICC -->
 
    <para> Even though ICC is not intended as a compiler for AMD processors, it is known for its
-   high-level optimization capabilities, especially when it comes to vectorization.  Therefore we
+   high-level optimization capabilities, especially when it comes to vectorization. Therefore we
    have traditionally included it our comparisons of compilers.  Recently, however, Intel has
-   decided to abandon this compiler and is directing its users towards ICX, a new one built on top
-   of LLVM.  This year we have therefore included not just ICC 2021.9.0 (20230302) but also ICX
-   2023.1.0 in our comparison. In order to keep the amount of presented data in the rest of this
+   decided to abandon this compiler and is directing its users toward ICX, a new one built on top
+   of LLVM. This year we have therefore included not just ICC 2021.9.0 (20230302) but also ICX
+   2023.1.0 in our comparison. To keep the amount of presented data in the rest of this
    section reasonable, we only compare binaries built with <literal>-&#8288;Ofast</literal> and
-   LTO.  We have simply passed <literal>-&#8288;march=native</literal> GCC and ICX.  On the other
+   LTO. We have simply passed <literal>-&#8288;march=native</literal> GCC and ICX. On the other
    hand, we have used <literal>-&#8288;march=core-avx2</literal> option to specify the target ISA
    for the old ICC because it is unclear which option is the most appropriate for AMD EPYC 9654
-   Processor.  This puts this compiler at a disadvantage because it can only emit AVX256
-   instructions while the other two can, and GCC does, make use of AVX512.  We believe that the
+   Processor. This puts this compiler at a disadvantage because it can only emit AVX256
+   instructions while the other two can, and GCC does, make use of AVX512. We believe that the
    comparison is still useful as ICC serves mainly as a base and the focus now shifts to ICX but
-   please keep this in mind when looking at the results below.</para>
+   keep this in mind when looking at the results below.</para>
 
    <figure xml:id="fig-gcc12-specint-ofast-vsicc-geomean">
     <title>Overall performance (bigger is better) of SPEC INTrate 2017 built with ICC 2021.9.0, ICX
@@ -1466,7 +1466,7 @@ int foo_v1 (void)
    </figure>
 
    <figure xml:id="fig-gcc12-specint-ofast-vsicc-x264_128">
-    <title>Runtime performance (bigger is better) of of 525.x264_r benchmark built with ICC
+    <title>Runtime performance (bigger is better) of 525.x264_r benchmark built with ICC
     2021.9.0, ICX 2023.1.0 and with GCC 12.3 using -mprefer-vector-width=128</title>
     <mediaobject>
      <imageobject role="fo">
@@ -1498,13 +1498,13 @@ int foo_v1 (void)
    </figure>
 
    <para>While GCC achieves the best geometric mean, it is important to look at individual results
-   too because the overall picture is mixed (see <xref linkend="fig-gcc12-specfp-ofast-vsicc-indiv"
-   xrefstyle="template:figure %n"/>), each of the three compilers managed to be the fastest in at
-   least one benchmark.  We do not know the reason for rather poor performance of ICX on
-   <literal>554.roms_r</literal> but we have seen a similar issue with the compiler on an Intel
+   too. The overall picture is mixed (see <xref linkend="fig-gcc12-specfp-ofast-vsicc-indiv"
+   xrefstyle="template:figure %n"/>), as each of the three compilers managed to be the fastest in at
+   least one benchmark. We do not know the reason for rather poor performance of ICX on
+   <literal>554.roms_r</literal>. But we have seen a similar issue with the compiler on an Intel
    Cascadelake server machine too, so it is not a consequence of using an Intel compiler on an AMD
-   platform.  For completeness, <literal>521.wrf_r</literal> results for ICC and ICX are provided
-   in <xref linkend="fig-gcc12-specfp-ofast-vsicc-wrf" xrefstyle="template:figure %n"/>.  In
+   platform. For completeness, <literal>521.wrf_r</literal> results for ICC and ICX are provided
+   in <xref linkend="fig-gcc12-specfp-ofast-vsicc-wrf" xrefstyle="template:figure %n"/>. In
    conclusion, GCC manages to perform consistently and competitively against these high-performance
    compilers.</para>