Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TudatPy structure #159

Draft
wants to merge 48 commits into
base: develop
Choose a base branch
from
Draft

TudatPy structure #159

wants to merge 48 commits into from

Conversation

alfonsoSR
Copy link

@alfonsoSR alfonsoSR commented Aug 8, 2024

Issues addressed by this PR

Critical

  • Import time: Compiling tudatpy into a single library (kernel) results into long import times regardless of how small the part of tudatpy being loaded is.
  • Docstrings: Docstrings are currently defined in YAML files and added to functions with a package called multidoc. Separating the docstrings from the source code is not standard, makes it difficult to know if a function is documented or if its documentation is up to date and complicates the development process to the point that docstrings are not included in the current conda distribution (or are included but do not work).
  • Syntax higlighting: The inability to inspect the kernel currently prevents proper syntax highlighting and autocompletion and results into linters throwing warnings at correct code.
  • Autogeneration of __init__.py files: The autogeneration of __init__.py is problematic for modules including both C++ and Python functionality, adds complexity to the build process and makes it difficult for new developers to understand how to contribute code to the project.
  • Organization of source code: It would be nice to have a more intuitive organization of the source code used for the exposition (e.g. having expose_X.cpp under tudatpy/X rather than tudatpy/kernel/X as the former also contains the __init__.py file and, potentially, python-native functionality). The presence of _import_all_kernel_modules.py files and the fact that they are autogenerated also makes it difficult to understand where to contribute code or where to find the source code of the function you're using.
  • CMake: The CMakeLists.txt are very intimidating, mostly due to the use of custom tools like YACMA and the lack of maintenance, which often results into patches. Replacing custom tools with standard ones and trying to reduce the size of the CMake files would make the modification of the build more approachable.

Non-critical

  • Project layout: Tudatpy currently uses a flat layout (i.e. tudatpy/__init__.py). Migrating to a src layout (i.e. tudatpy/src/tudatpy/__init__.py) would allow for the creation of a stub-only package at no cost and significantly simplify a potential unification of tudat and tudatpy into a standalone repository (e.g. adding tudat, spice, sofa... to tudatpy/src). Both flat and src layouts are standard and accepted, but I consider the latter more suitable for this kind of project.

Solutions

Import time

I modified the build process to generate one shared library per submodule (i.e. per expose_X.cpp file). This results into import times being proportional to the size of the imported modules (e.g. import time will still be relatively high for a program loading numerical_simulation, but not for one loading math or astro, since those modules are much smaller), which could be reduced by splitting big submodules per functionality (e.g. numerical_simulation -> dynamics & estimation or estimation_setup.observation being splitted into submodules) or optimizing the compilation process.

Organization of source code

I moved each expose_X.cpp script to its submodule's directory (as suggested above) and modified the build process such that the associated kernel is saved in the same directory. This allows for the replacement of the autogenerated __init__.py and _import_all_kernel_modules.py files with a standard __init__.py scripts to be maintained by developers. Thus, a hybrid submodule (one including exposed and python-native functionality) X would have the following structure:

tudatpy/src/tudatpy/X
    |_ __init__.py
    |_ expose_X.cpp
    |_ expose_X.so  (after compilation)
    |_ foo.py 

and the __init__.py file would look as follows:

from .expose_X import *
from .foo import <public-functionality-from-foo.py>    # Manually specified by users

Under the assumption that all the functionality defined in expose_X.cpp should belong to the API, the star import from expose_X prevents developers from having to update the __init__.py files when they expose a new function, effectively replacing autogeneration, while the manual imports from the python scripts allow them to keep "private" functions and helpers out of tudatpy's API. The presence of the shared libraries in the source directories is just a personal preference, but moving them into a lib or kernel directory would not be a problem.

NOTE: A positive side effect of having split the kernel is that the header files associated with the exposition scripts are no longer needed.

Docstrings

For each exposed function, I read the content of the docstring from the YAML file, gave it an adequate format and pasted it together with the source code that exposes the function. This eliminates the need for multidoc and the results into docstrings being included by default in all tudatpy distributions. As an example, this is the code snippet used to expose spice.load_kernel():

m.def("load_kernel", &tudat::spice_interface::loadSpiceKernelInTudat,
          py::arg("kernel_file"),
          R"doc(Loads a Spice kernel into the pool.

          This function loads a Spice kernel into the kernel pool, from which it can be used
          by the various internal spice routines. Matters regarding the manner in which Spice
          handles different kernels containing the same information can be found in the spice
          required reading documentation, kernel section. Wrapper for the
          `furnsh_c <https://naif.jpl.nasa.gov/pub/naif/toolkit_docs/C/cspice/furnsh_c.html>`_ function.

          :param file_path: Path to the spice kernel to be loaded.
          )doc");

Syntax highlighting

Taking #154 as reference, I included automatic stub generation as part of the build process. Stubs are generated using pybind11-stubgen following the compilation of tudatpy, and stored in a stub-only package with the same structure as the source directory, as described in Packaging type information.

The package is automatically installed together with tudatpy and provides syntax highlighters with information about the available functionality per submodule, function and class signatures, docstrings... This allows for regular autocompletion and greatly simplifies the access to typing and usage information.

Some notes on stub generation

The reason why I opted for pybind11-stubgen instead of mypy's stubgen is that the latter is not compatible with the generation of stub-only packages, is bad at inferring the signature of exposed functions and includes undesired information in the docstrings.

While significantly better than stubgen, pybind11-stubgen includes undesired from __future__ imports and is not particularly good at generating stubs for __init__.py files. Thus, while the vast majority of the stub generation process relies on pybind11-stubgen, I wrote some post-processing functions and a parser to generate __init__.pyi stubs, remove undesired imports and ensure proper indentation within docstrings.

CMake

I replaced YACMA with standard alternatives based on pybind11, raised CMake's minimum required version to 3.15 (which seems like the equivalent of C++11 or Python 3.0) and tried to reorganize the code and add comments to make it less intimidating and easier to understand.

I also noticed that Tudatpy's cmake files relied on the existence of environment variables that are only defined in Tudat's CMakeLists.txt, so I modified the cmake setup to ensure that, as long as tudat is installed, it is straightforward to build tudatpy as a standalone library.

Missing

The only thing missing from the PR with respect to the current version of Tudatpy is the generation of the API website, which I haven't still adapted to the new placement of the docstrings. This prevents us from merging into main but should not be a problem if we first merge into develop as all the API documentation is already accessible via autocompletion.

I think it is a good idea to first merge this and get feedback and undetected issues from users while I work on the API generation, which can be then added as a feature before merging into main.

- Splitted tudatpy kernel into submodules
- Replaced YACMA with standard method to find python (CMake)
- Introduced standard src structure for future integration of stubs
@alfonsoSR alfonsoSR changed the title Restructure TudatPy TudatPy structure Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

1 participant