Merge pull request #363 from ACEsuit/develop

Add mutli-GPU training, data parrallelisation, pair repulsion, distance transform and other features
ACEsuit · Apr 23, 2024 · e7d95f4 · e7d95f4
2 parents 88d49f9 + ecc6e41
commit e7d95f4
Show file tree

Hide file tree

Showing 44 changed files with 3,186 additions and 528 deletions.
diff --git a/.github/workflow/lint.yml b/.github/workflow/lint.yml
@@ -7,7 +7,7 @@ on: []
   #   branches: []
   # pull_request:
   #   branches: []
- 
+
 
 jobs:
   build-linux:
@@ -43,4 +43,4 @@ jobs:
 #        run: mypy --config-file=.mypy.ini .
 
       - name: Test with pytest
-        run: python -m pytest tests
+        run: python -m pytest tests
diff --git a/.github/workflow/pre-commit.yaml b/.github/workflow/pre-commit.yaml
@@ -0,0 +1,14 @@
+name: Pre-Commit Checks
+
+on:
+  pull_request:
+  push:
+    branches: [main]
+
+jobs:
+  pre-commit:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - uses: actions/setup-python@v3
+      - uses: pre-commit/action@v3.0.0
diff --git a/.github/workflow/unittest.yaml b/.github/workflow/unittest.yaml
@@ -0,0 +1,29 @@
+name: unit tests
+on:
+  pull_request:
+  push:
+    branches: [main]
+
+jobs:
+  pytest-container:
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v3
+      - uses: actions/setup-python@v4
+        with:
+          python-version: "3.10"
+          cache: "pip"
+
+      - name: Install requirements
+        run: |
+          pip install -U pip
+          pip install .[dev]
+
+      - name: Log installed environment
+        run: |
+          python3 -m pip freeze
+
+      - name: Run unit tests
+        run: |
+          pytest tests
diff --git a/.gitignore b/.gitignore
@@ -21,3 +21,12 @@ build/
 
 # Distribution
 dist/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# DS_Store
+.DS_Store
+/wandb
+/checkpoints
+*.model
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,50 @@
+repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.5.0
+    hooks:
+      - id: end-of-file-fixer
+      - id: trailing-whitespace
+
+  - repo: https://github.com/psf/black-pre-commit-mirror
+    rev: 23.12.1
+    hooks:
+      - id: black
+        name: Black Formating
+
+  - repo: https://github.com/pycqa/isort
+    rev: 5.13.2
+    hooks:
+      - id: isort
+        name: Sort imports
+
+  # Failing - to be investigated separately
+  # - repo: local
+  #   hooks:
+  #     - id: pylint
+  #       name: Pylint Checks
+  #       entry: pylint
+  #       language: system
+  #       types: [python]
+  #       args:
+  #         [
+  #           "--rcfile=pyproject.toml", 
+  #           "mace", 
+  #           "tests", 
+  #           "scripts"
+  #         ]  
+
+  # - repo: local
+  #   hooks:
+  #     - id: mypy
+  #       name: mypy type checks
+  #       entry: mypy
+  #       language: system
+  #       types: [python]
+  #       args:
+  #         [
+  #           --config-file=.mypy.ini,
+  #           mace,
+  #           tests,
+  #           scripts
+  #         ]
+
diff --git a/README.md b/README.md
@@ -8,21 +8,28 @@
 
 ## Table of contents
 
-- [About MACE](#about-mace)
-- [Documentation](#documentation)
-- [Installation](#installation)
-- [Usage](#usage)
-  - [Training](#training)
-  - [Evaluation](#evaluation)
-- [Tutorial](#tutorial)
-- [Weights and Biases](#weights-and-biases-for-experiment-tracking)
-- [Development](#development)
-- [Pretrained foundation models](#pretrained-foundation-models)
-  - [MACE-MP: Materials Project Force Fields](#mace-mp-materials-project-force-fields)
-  - [MACE-OFF: Transferable Organic Force Fields](#mace-off-transferable-organic-force-fields)
-- [References](#references)
-- [Contact](#contact)
-- [License](#license)
+- [MACE](#mace)
+  - [Table of contents](#table-of-contents)
+  - [About MACE](#about-mace)
+  - [Documentation](#documentation)
+  - [Installation](#installation)
+    - [pip installation](#pip-installation)
+    - [conda installation](#conda-installation)
+    - [pip installation from source](#pip-installation-from-source)
+  - [Usage](#usage)
+    - [Training](#training)
+    - [Evaluation](#evaluation)
+  - [Tutorial](#tutorial)
+  - [Weights and Biases for experiment tracking](#weights-and-biases-for-experiment-tracking)
+  - [Pretrained Foundation Models](#pretrained-foundation-models)
+    - [MACE-MP: Materials Project Force Fields](#mace-mp-materials-project-force-fields)
+      - [Example usage in ASE](#example-usage-in-ase)
+    - [MACE-OFF: Transferable Organic Force Fields](#mace-off-transferable-organic-force-fields)
+      - [Example usage in ASE](#example-usage-in-ase-1)
+  - [Development](#development)
+  - [References](#references)
+  - [Contact](#contact)
+  - [License](#license)
 
 ## About MACE
 
@@ -130,7 +137,7 @@ mace_run_train \
 
 To give a specific validation set, use the argument `--valid_file`. To set a larger batch size for evaluating the validation set, specify `--valid_batch_size`.
 
-To control the model's size, you need to change `--hidden_irreps`. For most applications, the recommended default model size is `--hidden_irreps='256x0e'` (meaning 256 invariant messages) or `--hidden_irreps='128x0e + 128x1o'`. If the model is not accurate enough, you can include higher order features, e.g., `128x0e + 128x1o + 128x2e`, or increase the number of channels to `256`.
+To control the model's size, you need to change `--hidden_irreps`. For most applications, the recommended default model size is `--hidden_irreps='256x0e'` (meaning 256 invariant messages) or `--hidden_irreps='128x0e + 128x1o'`. If the model is not accurate enough, you can include higher order features, e.g., `128x0e + 128x1o + 128x2e`, or increase the number of channels to `256`. It is also possible to specify the model using the     `--num_channels=128` and `--max_L=1`keys. 
 
 It is usually preferred to add the isolated atoms to the training set, rather than reading in their energies through the command line like in the example above. To label them in the training set, set `config_type=IsolatedAtom` in their info fields. If you prefer not to use or do not know the energies of the isolated atoms, you can use the option `--E0s="average"` which estimates the atomic energies using least squares regression.
 
@@ -142,8 +149,37 @@ The keywords `--batch_size` and `--max_num_epochs` should be adapted based on th
 
 The code can handle training set with heterogeneous labels, for example containing both bulk structures with stress and isolated molecules. In this example, to make the code ignore stress on molecules, append to your molecules configuration a `config_stress_weight = 0.0`.
 
+#### Apple Silicon GPU acceleration
+
 To use Apple Silicon GPU acceleration make sure to install the latest PyTorch version and specify `--device=mps`.
 
+#### Multi-GPU training
+
+For multi-GPU training, use the `--distributed` flag. This will use PyTorch's DistributedDataParallel module to train the model on multiple GPUs. Combine with on-line data loading for large datasets (see below). An example slurm script can be found in `mace/scripts/distributed_example.sbatch`.
+
+#### YAML configuration
+
+Option to parse all or some arguments using a YAML is available. For example, to train a model using the arguments above, you can create a YAML file `your_configs.yaml` with the following content:
+
+```yaml
+name: nacl
+seed: 2024
+train_file: train.xyz
+swa: yes
+start_swa: 1200
+max_num_epochs: 1500
+device: cpu
+test_file: test.xyz
+E0s:
+  41: -1029.2809654211628
+  38: -1484.1187695035828
+  8: -2042.0330099956639
+config_type_weights:
+  Default: 1.0
+
+```
+And append to the command line `--config="your_configs.yaml"`. Any argument specified in the command line will overwrite the one in the YAML file.
+
 ### Evaluation
 
 To evaluate your MACE model on an XYZ file, run the `mace_eval_configs`:
@@ -157,10 +193,56 @@ mace_eval_configs \
 
 ## Tutorial
 
-You can run our [Colab tutorial](https://colab.research.google.com/drive/1D6EtMUjQPey_GkuxUAbPgld6_9ibIa-V?authuser=1#scrollTo=Z10787RE1N8T) to quickly get started with MACE. 
+You can run our [Colab tutorial](https://colab.research.google.com/drive/1D6EtMUjQPey_GkuxUAbPgld6_9ibIa-V?authuser=1#scrollTo=Z10787RE1N8T) to quickly get started with MACE.
 
 We also have a more detailed user and developer tutorial at https://github.com/ilyes319/mace-tutorials
 
+## On-line data loading for large datasets
+
+If you have a large dataset that might not fit into the GPU memory it is recommended to preprocess the data on a CPU and use on-line dataloading for training the model. To preprocess your dataset specified as an xyz file run the `preprocess_data.py` script. An example is given here:
+
+```sh
+mkdir processed_data
+python ./mace/scripts/preprocess_data.py \
+    --train_file="/path/to/train_large.xyz" \
+    --valid_fraction=0.05 \
+    --test_file="/path/to/test_large.xyz" \
+    --atomic_numbers="[1, 6, 7, 8, 9, 15, 16, 17, 35, 53]" \
+    --r_max=4.5 \
+    --h5_prefix="processed_data/" \
+    --compute_statistics \
+    --E0s="average" \
+    --seed=123 \
+```
+
+To see all options and a little description of them run `python ./mace/scripts/preprocess_data.py --help` . The script will create a number of HDF5 files in the `processed_data` folder which can be used for training. There wiull be one file for trainin, one for validation and a separate one for each `config_type` in the test set. To train the model use the `run_train.py` script as follows:
+
+```sh
+python ./mace/scripts/run_train.py \
+    --name="MACE_on_big_data" \
+    --num_workers=16 \
+    --train_file="./processed_data/train.h5" \
+    --valid_file="./processed_data/valid.h5" \
+    --test_dir="./processed_data" \
+    --statistics_file="./processed_data/statistics.json" \
+    --model="ScaleShiftMACE" \
+    --num_interactions=2 \
+    --num_channels=128 \
+    --max_L=1 \
+    --correlation=3 \
+    --batch_size=32 \
+    --valid_batch_size=32 \
+    --max_num_epochs=100 \
+    --swa \
+    --start_swa=60 \
+    --ema \
+    --ema_decay=0.99 \
+    --amsgrad \
+    --error_table='PerAtomMAE' \
+    --device=cuda \
+    --seed=123 \
+```
+
 ## Weights and Biases for experiment tracking
 
 If you would like to use MACE with Weights and Biases to log your experiments simply install with
@@ -180,6 +262,9 @@ We have collaborated with the Materials Project (MP) to train a universal MACE p
 The models are releaed on GitHub at https://github.com/ACEsuit/mace-mp.
 If you use them please cite [our paper](https://arxiv.org/abs/2401.00096) which also contains an large range of example applications and benchmarks.
 
+> [!CAUTION]
+> The MACE-MP models are trained on MPTrj raw DFT energies from VASP outputs, and are not directly comparable to the MP's DFT energies or CHGNet's energies, which have been applied MP2020Compatibility corrections for some transition metal oxides, fluorides (GGA/GGA+U mixing corrections), and 14 anions species (anion corrections). For more details, please refer to the [MP Documentation](https://docs.materialsproject.org/methodology/materials-methodology/thermodynamic-stability/thermodynamic-stability/anion-and-gga-gga+u-mixing) and [MP2020Compatibility.yaml](https://github.com/materialsproject/pymatgen/blob/master/pymatgen/entries/MP2020Compatibility.yaml).
+
 #### Example usage in ASE
 ```py
 from mace.calculators import mace_mp
@@ -193,7 +278,7 @@ print(atoms.get_potential_energy())
 
 ### MACE-OFF: Transferable Organic Force Fields
 
-There is a series (small, medium, large) transferable organic force fields. These can be used for the simulation of organic molecules, crystals and molecular liquids, or as a starting point for fine-tuning on a new dataset. The models are released under the [ASL license](https://github.com/gabor1/ASL). 
+There is a series (small, medium, large) transferable organic force fields. These can be used for the simulation of organic molecules, crystals and molecular liquids, or as a starting point for fine-tuning on a new dataset. The models are released under the [ASL license](https://github.com/gabor1/ASL).
 The models are releaed on GitHub at https://github.com/ACEsuit/mace-off.
 If you use them please cite [our paper](https://arxiv.org/abs/2312.15211) which also contains detailed benchmarks and example applications.
 
@@ -208,15 +293,45 @@ atoms.calc = calc
 print(atoms.get_potential_energy())
 ```
 
-## Development
+### Finetuning foundation models
 
-We use `black`, `isort`, `pylint`, and `mypy`.
-Run the following to format and check your code:
+To finetune one of the mace-mp-0 foundation model, you can use the `mace_run_train` script with the extra argument `--foundation_model=model_type`. For example to finetune the small model on a new dataset, you can use:
 
 ```sh
-bash ./scripts/run_checks.sh
+mace_run_train \
+  --name="MACE" \
+  --foundation_model="small" \
+  --train_file="train.xyz" \
+  --valid_fraction=0.05 \
+  --test_file="test.xyz" \
+  --energy_weight=1.0 \
+  --forces_weight=1.0 \
+  --E0s="average" \
+  --lr=0.01 \
+  --scaling="rms_forces_scaling" \
+  --batch_size=2 \
+  --max_num_epochs=6 \
+  --ema \
+  --ema_decay=0.99 \
+  --amsgrad \
+  --default_dtype="float32" \
+  --device=cuda \
+  --seed=3 
 ```
+Other options are "medium" and "large", or the path to a foundation model. 
+If you want to finetune another model, the model will be loaded from the path provided `--foundation_model=$path_model`, but you will need to provide the full set of hyperparameters (hidden irreps, r_max, etc.) matching the model.
 
+## Development
+
+This project uses [pre-commit](https://pre-commit.com/) to execute code formatting and linting on commit.
+We also use `black`, `isort`, `pylint`, and `mypy`.
+We recommend setting up your development environment by installing the `dev` packages
+into your python environment:
+```bash
+pip install -e ".[dev]"
+pre-commit install
+```
+The second line will initialise `pre-commit` to automaticaly run code checks on commit.
 We have CI set up to check this, but we _highly_ recommend that you run those commands
 before you commit (and push) to avoid accidentally committing bad code.
 

diff --git a/mace/__version__.py b/mace/__version__.py
@@ -1 +1 @@
-__version__ = "0.3.4"
+__version__ = "0.3.5"