Skip to content

Commit

Permalink
add 3bpa script
Browse files Browse the repository at this point in the history
  • Loading branch information
birdyLinch committed Sep 28, 2024
1 parent 06cfaa4 commit 3382ad7
Show file tree
Hide file tree
Showing 10 changed files with 663 additions and 1 deletion.
2 changes: 1 addition & 1 deletion scripts/distributed_example.sbatch
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ srun python mace/scripts/run_train.py \
--default_dtype='float32' \
--device='cuda' \
--distributed \
--seed=2222 \
--seed=2222 \
15 changes: 15 additions & 0 deletions scripts_3bpa/3bpa.sbatch
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/bin/bash
#SBATCH --job-name=train-mace # job name
#SBATCH --partition=gpu_p6s
#SBATCH --account=gax@h100 # account
#SBATCH -C h100 # target H100 nodes
#SBATCH --nodes=1 # number of node
#SBATCH --ntasks-per-node=1 # number of MPI tasks per node (here = number of GPUs per node)
#SBATCH --gres=gpu:1 # number of GPUs per node (max 4 for H100 nodes)
#SBATCH --cpus-per-task=12 # number of CPUs per task (here 1/4 of the node)
#SBATCH --time=1:00:00 # maximum execution time requested (HH:MM:SS)
#SBATCH --output=mace-train-%J
#SBATCH --error=mace-train-%J


srun 3bpa.sh
42 changes: 42 additions & 0 deletions scripts_3bpa/3bpa.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash
module load pytorch-gpu/py3/2.3.1
conda activate mace-tn
export PYTHONPATH=${SCRATCH}/.conda/envs/mace-tn/lib/python3.11/site-packages/
ROOT_DIR=/lustre/fsn1/projects/rech/gax/unh55hx/tensornetwork/mace-main
cd $ROOT_DIR

echo $PYTHONPATH

python mace/cli/run_train.py \
--name="MACE_3bpa" \
--train_file="/lustre/fsn1/projects/rech/gax/unh55hx/tensornetwork/data/dataset_3BPA/train_300K.xyz" \
--valid_fraction=0.1 \
--test_file="/lustre/fsn1/projects/rech/gax/unh55hx/tensornetwork/data/dataset_3BPA/test_300K.xyz" \
--energy_weight=27.0 \
--forces_weight=1000.0 \
--config_type_weights='{"Default":1.0}' \
--E0s='{1: -13.587222780835477, 6: -1029.4889999855063, 7: -1484.9814568572233, 8: -2041.9816003861047}' \
--model="ScaleShiftMACE" \
--interaction_first="RealAgnosticResidualInteractionBlock" \
--interaction="RealAgnosticResidualInteractionBlock" \
--num_interactions=2 \
--max_ell=3 \
--hidden_irreps='256x0e + 256x1o + 256x2e' \
--num_cutoff_basis=5 \
--correlation=3 \
--r_max=5.0 \
--scaling='rms_forces_scaling' \
--batch_size=5 \
--max_num_epochs=2000 \
--patience=256 \
--weight_decay=5e-7 \
--ema \
--ema_decay=0.99 \
--amsgrad \
--default_dtype="float32"\
--clip_grad=None \
--device=cuda \
--seed=123 \


# --statistics_file='./h5_data/statistics.json' \
6 changes: 6 additions & 0 deletions scripts_3bpa/mace-train-58133
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Loading pytorch-gpu/py3/2.3.1
Loading requirement: cuda/12.4.1 nccl/2.21.5-1-cuda cudnn/9.2.0.82-cuda
gcc/11.3.1 openmpi/4.1.5-cuda intel-oneapi-mkl/2024.1 magma/2.8.0-cuda
sox/14.4.2 hdf5/1.12.0-mpi-cuda libjpeg-turbo/2.1.3 ffmpeg/6.1.1
python: can't open file '/lustre/fsn1/projects/rech/gax/unh55hx/tensornetwork/mace-main/mace/scripts/run_train.py': [Errno 2] No such file or directory
srun: error: jzxh033: task 0: Exited with exit code 2
6 changes: 6 additions & 0 deletions scripts_3bpa/mace-train-58134
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Loading pytorch-gpu/py3/2.3.1
Loading requirement: cuda/12.4.1 nccl/2.21.5-1-cuda cudnn/9.2.0.82-cuda
gcc/11.3.1 openmpi/4.1.5-cuda intel-oneapi-mkl/2024.1 magma/2.8.0-cuda
sox/14.4.2 hdf5/1.12.0-mpi-cuda libjpeg-turbo/2.1.3 ffmpeg/6.1.1
3bpa.sh: line 10: mace_run_train: command not found
srun: error: jzxh033: task 0: Exited with exit code 127
21 changes: 21 additions & 0 deletions scripts_3bpa/mace-train-58135
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Loading pytorch-gpu/py3/2.3.1
Loading requirement: cuda/12.4.1 nccl/2.21.5-1-cuda cudnn/9.2.0.82-cuda
gcc/11.3.1 openmpi/4.1.5-cuda intel-oneapi-mkl/2024.1 magma/2.8.0-cuda
sox/14.4.2 hdf5/1.12.0-mpi-cuda libjpeg-turbo/2.1.3 ffmpeg/6.1.1
/lustre/fsn1/projects/rech/gax/unh55hx/.conda/envs/mace-tn/lib/python3.11/site-packages/e3nn/o3/_wigner.py:10: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
_Jd, _W3j_flat, _W3j_indices = torch.load(os.path.join(os.path.dirname(__file__), 'constants.pt'))
Traceback (most recent call last):
File "/lustre/fsn1/projects/rech/gax/unh55hx/tensornetwork/mace-main/mace/cli/run_train.py", line 26, in <module>
from mace.calculators.foundations_models import mace_mp, mace_off
File "/lustre/fsn1/projects/rech/gax/unh55hx/mace_multi_head_interface/mace/calculators/__init__.py", line 1, in <module>
from .foundations_models import mace_anicc, mace_mp, mace_off
File "/lustre/fsn1/projects/rech/gax/unh55hx/mace_multi_head_interface/mace/calculators/foundations_models.py", line 10, in <module>
from .mace import MACECalculator
File "/lustre/fsn1/projects/rech/gax/unh55hx/mace_multi_head_interface/mace/calculators/mace.py", line 19, in <module>
from mace.modules.utils import extract_invariant
File "/lustre/fsn1/projects/rech/gax/unh55hx/mace_multi_head_interface/mace/modules/__init__.py", line 5, in <module>
from .blocks import (
File "/lustre/fsn1/projects/rech/gax/unh55hx/mace_multi_head_interface/mace/modules/blocks.py", line 35, in <module>
import hydra
ModuleNotFoundError: No module named 'hydra'
srun: error: jzxh033: task 0: Exited with exit code 1
22 changes: 22 additions & 0 deletions scripts_3bpa/mace-train-58136
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Loading pytorch-gpu/py3/2.3.1
Loading requirement: cuda/12.4.1 nccl/2.21.5-1-cuda cudnn/9.2.0.82-cuda
gcc/11.3.1 openmpi/4.1.5-cuda intel-oneapi-mkl/2024.1 magma/2.8.0-cuda
sox/14.4.2 hdf5/1.12.0-mpi-cuda libjpeg-turbo/2.1.3 ffmpeg/6.1.1
/lustre/fsn1/projects/rech/gax/unh55hx/.conda/envs/mace-tn/lib/python3.11/site-packages/
/lustre/fsn1/projects/rech/gax/unh55hx/.conda/envs/mace-tn/lib/python3.11/site-packages/e3nn/o3/_wigner.py:10: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
_Jd, _W3j_flat, _W3j_indices = torch.load(os.path.join(os.path.dirname(__file__), 'constants.pt'))
Traceback (most recent call last):
File "/lustre/fsn1/projects/rech/gax/unh55hx/tensornetwork/mace-main/mace/cli/run_train.py", line 26, in <module>
from mace.calculators.foundations_models import mace_mp, mace_off
File "/lustre/fsn1/projects/rech/gax/unh55hx/mace_multi_head_interface/mace/calculators/__init__.py", line 1, in <module>
from .foundations_models import mace_anicc, mace_mp, mace_off
File "/lustre/fsn1/projects/rech/gax/unh55hx/mace_multi_head_interface/mace/calculators/foundations_models.py", line 10, in <module>
from .mace import MACECalculator
File "/lustre/fsn1/projects/rech/gax/unh55hx/mace_multi_head_interface/mace/calculators/mace.py", line 19, in <module>
from mace.modules.utils import extract_invariant
File "/lustre/fsn1/projects/rech/gax/unh55hx/mace_multi_head_interface/mace/modules/__init__.py", line 5, in <module>
from .blocks import (
File "/lustre/fsn1/projects/rech/gax/unh55hx/mace_multi_head_interface/mace/modules/blocks.py", line 35, in <module>
import hydra
ModuleNotFoundError: No module named 'hydra'
srun: error: jzxh033: task 0: Exited with exit code 1
Loading

0 comments on commit 3382ad7

Please sign in to comment.