Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/develop' into dynamic-graphs
Browse files Browse the repository at this point in the history
  • Loading branch information
mishooax committed Sep 17, 2024
2 parents a157104 + dbee83b commit d998e51
Show file tree
Hide file tree
Showing 17 changed files with 665 additions and 66 deletions.
34 changes: 34 additions & 0 deletions .github/workflows/changelog-release-update.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# .github/workflows/update-changelog.yaml
name: "Update Changelog"

on:
release:
types: [released]

permissions:
pull-requests: write
contents: write

jobs:
update:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ github.event.release.target_commitish }}

- name: Update Changelog
uses: stefanzweifel/changelog-updater-action@v1
with:
latest-version: ${{ github.event.release.tag_name }}
heading-text: ${{ github.event.release.name }}

- name: Create Pull Request
uses: peter-evans/create-pull-request@v6
with:
branch: docs/changelog-update-${{ github.event.release.tag_name }}
title: '[Changelog] Update to ${{ github.event.release.tag_name }}'
add-paths: |
CHANGELOG.md
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ celerybeat.pid

# Environments
.env
.envrc
.venv
env/
venv/
Expand Down
6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ repos:
- id: check-added-large-files # Check for large files added to git
- id: check-merge-conflict # Check for files that contain merge conflict
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 24.4.2
rev: 24.8.0
hooks:
- id: black
args: [--line-length=120]
Expand All @@ -34,7 +34,7 @@ repos:
- --force-single-line-imports
- --profile black
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.4.6
rev: v0.6.3
hooks:
- id: ruff
# Next line if for documenation cod snippets
Expand Down Expand Up @@ -65,6 +65,6 @@ repos:
- id: optional-dependencies-all
args: ["--inplace", "--exclude-keys=dev,docs,tests", "--group=dev=all,docs,tests"]
- repo: https://github.com/tox-dev/pyproject-fmt
rev: "2.1.3"
rev: "2.2.1"
hooks:
- id: pyproject-fmt
24 changes: 13 additions & 11 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,24 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
Please add your functional changes to the appropriate section in the PR.
Keep it human-readable, your future self will thank you!

## [Unreleased]
## [Unreleased](https://github.com/ecmwf/anemoi-models/compare/0.3.0...HEAD)

## [0.3.0](https://github.com/ecmwf/anemoi-models/compare/0.2.1...0.3.0) - Remapping of (meteorological) Variables

### Added

- CI workflow to update the changelog on release
- Remapper: Preprocessor for remapping one variable to multiple ones. Includes changes to the data indices since the remapper changes the number of variables. With optional config keywords.

### Changed

- Update CI to inherit from common infrastructue reusable workflows
- run downstream-ci only when src and tests folders have changed
- New error messages for wrongs graphs.
- Update CI to inherit from common infrastructue reusable workflows
- run downstream-ci only when src and tests folders have changed
- New error messages for wrongs graphs.

### Removed

## [0.2.1] - Dependency update
## [0.2.1](https://github.com/ecmwf/anemoi-models/compare/0.2.0...0.2.1) - Dependency update

### Added

Expand All @@ -31,7 +36,7 @@ Keep it human-readable, your future self will thank you!

- anemoi-datasets dependency

## [0.2.0] - Support Heterodata
## [0.2.0](https://github.com/ecmwf/anemoi-models/compare/0.1.0...0.2.0) - Support Heterodata

### Added

Expand All @@ -41,15 +46,12 @@ Keep it human-readable, your future self will thank you!

- Updated to support new PyTorch Geometric HeteroData structure (defined by `anemoi-graphs` package).

## [0.1.0] - Initial Release
## [0.1.0](https://github.com/ecmwf/anemoi-models/releases/tag/0.1.0) - Initial Release

### Added

- Documentation
- Initial code release with models, layers, distributed, preprocessing, and data_indices
- Added Changelog

<!-- Add Git Diffs for Links above -->
[unreleased]: https://github.com/ecmwf/anemoi-models/compare/0.2.1...HEAD
[0.2.1]: https://github.com/ecmwf/anemoi-models/compare/0.2.0...0.2.1
[0.2.0]: https://github.com/ecmwf/anemoi-models/compare/0.1.0...0.2.0
[0.1.0]: https://github.com/ecmwf/anemoi-models/releases/tag/0.1.0
25 changes: 23 additions & 2 deletions docs/modules/data_indices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,33 @@ config entry:
:alt: Schematic of IndexCollection with Data Indexing on Data and Model levels.
:align: center

The are two Index-levels:
Additionally, prognostic and forcing variables can be remapped and
converted to multiple variables. The conversion is then done by the
remapper-preprocessor.

.. code:: yaml
data:
remapped:
d:
- "d_1"
- "d_2"
There are two main Index-levels:

- Data: The data at "Zarr"-level provided by Anemoi-Datasets
- Model: The "squeezed" tensors with irrelevant parts missing.

These are both split into two versions:
Additionally, there are two internal model levels (After preprocessor
and before postprocessor) that are necessary because of the possiblity
to remap variables to multiple variables.

- Internal Data: Variables from Data-level that are used internally in
the model, but not exposed to the user.
- Internal Model: Variables from Model-level that are used internally
in the model, but not exposed to the user.

All indices at the different levels are split into two versions:

- Input: The data going into training / model
- Output: The data produced by training / model
Expand Down
13 changes: 13 additions & 0 deletions docs/modules/preprocessing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,16 @@ following classes:
:members:
:no-undoc-members:
:show-inheritance:

**********
Remapper
**********

The remapper module is used to remap one variable to multiple other
variables that have been listed in data.remapped:. The module contains
the following classes:

.. automodule:: anemoi.models.preprocessing.remapper
:members:
:no-undoc-members:
:show-inheritance:
63 changes: 60 additions & 3 deletions src/anemoi/models/data_indices/collection.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,26 +25,76 @@ class IndexCollection:

def __init__(self, config, name_to_index) -> None:
self.config = OmegaConf.to_container(config, resolve=True)

self.name_to_index = dict(sorted(name_to_index.items(), key=operator.itemgetter(1)))
self.forcing = [] if config.data.forcing is None else OmegaConf.to_container(config.data.forcing, resolve=True)
self.diagnostic = (
[] if config.data.diagnostic is None else OmegaConf.to_container(config.data.diagnostic, resolve=True)
)
# config.data.remapped is an optional dictionary with every remapper as one entry
self.remapped = (
dict()
if config.data.get("remapped") is None
else OmegaConf.to_container(config.data.remapped, resolve=True)
)
self.forcing_remapped = self.forcing.copy()

assert set(self.diagnostic).isdisjoint(self.forcing), (
f"Diagnostic and forcing variables overlap: {set(self.diagnostic).intersection(self.forcing)}. ",
"Please drop them at a dataset-level to exclude them from the training data.",
)
self.name_to_index = dict(sorted(name_to_index.items(), key=operator.itemgetter(1)))
assert set(self.remapped).isdisjoint(self.diagnostic), (
"Remapped variable overlap with diagnostic variables. Not implemented.",
)
assert set(self.remapped).issubset(self.name_to_index), (
"Remapping a variable that does not exist in the dataset. Check for typos: ",
f"{set(self.remapped).difference(self.name_to_index)}",
)
name_to_index_model_input = {
name: i for i, name in enumerate(key for key in self.name_to_index if key not in self.diagnostic)
}
name_to_index_model_output = {
name: i for i, name in enumerate(key for key in self.name_to_index if key not in self.forcing)
}
# remove remapped variables from internal data and model indices
name_to_index_internal_data_input = {
name: i for i, name in enumerate(key for key in self.name_to_index if key not in self.remapped)
}
name_to_index_internal_model_input = {
name: i for i, name in enumerate(key for key in name_to_index_model_input if key not in self.remapped)
}
name_to_index_internal_model_output = {
name: i for i, name in enumerate(key for key in name_to_index_model_output if key not in self.remapped)
}
# for all variables to be remapped we add the resulting remapped variables to the end of the tensors
# keep track of that in the index collections
for key in self.remapped:
for mapped in self.remapped[key]:
# add index of remapped variables to dictionary
name_to_index_internal_model_input[mapped] = len(name_to_index_internal_model_input)
name_to_index_internal_data_input[mapped] = len(name_to_index_internal_data_input)
if key not in self.forcing:
# do not include forcing variables in the remapped model output
name_to_index_internal_model_output[mapped] = len(name_to_index_internal_model_output)
else:
# add remapped forcing variables to forcing_remapped
self.forcing_remapped += [mapped]
if key in self.forcing:
# if key is in forcing we need to remove it from forcing_remapped after remapped variables have been added
self.forcing_remapped.remove(key)

self.data = DataIndex(self.diagnostic, self.forcing, self.name_to_index)
self.internal_data = DataIndex(
self.diagnostic,
self.forcing_remapped,
name_to_index_internal_data_input,
) # internal after the remapping applied to data (training)
self.model = ModelIndex(self.diagnostic, self.forcing, name_to_index_model_input, name_to_index_model_output)
self.internal_model = ModelIndex(
self.diagnostic,
self.forcing_remapped,
name_to_index_internal_model_input,
name_to_index_internal_model_output,
) # internal after the remapping applied to model (inference)

def __repr__(self) -> str:
return f"IndexCollection(config={self.config}, name_to_index={self.name_to_index})"
Expand All @@ -54,7 +104,12 @@ def __eq__(self, other):
# don't attempt to compare against unrelated types
return NotImplemented

return self.model == other.model and self.data == other.data
return (
self.model == other.model
and self.data == other.data
and self.internal_model == other.internal_model
and self.internal_data == other.internal_data
)

def __getitem__(self, key):
return getattr(self, key)
Expand All @@ -63,6 +118,8 @@ def todict(self):
return {
"data": self.data.todict(),
"model": self.model.todict(),
"internal_model": self.internal_model.todict(),
"internal_data": self.internal_data.todict(),
}

@staticmethod
Expand Down
2 changes: 1 addition & 1 deletion src/anemoi/models/interface/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ def _build_model(self) -> None:
"""Builds the model and pre- and post-processors."""
# Instantiate processors
processors = [
[name, instantiate(processor, statistics=self.statistics, data_indices=self.data_indices)]
[name, instantiate(processor, data_indices=self.data_indices, statistics=self.statistics)]
for name, processor in self.config.data.processors.items()
]

Expand Down
19 changes: 10 additions & 9 deletions src/anemoi/models/models/encoder_processor_decoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,22 +104,23 @@ def __init__(
)

def _calculate_shapes_and_indices(self, data_indices: dict) -> None:
self.num_input_channels = len(data_indices.model.input)
self.num_output_channels = len(data_indices.model.output)
self._internal_input_idx = data_indices.model.input.prognostic
self._internal_output_idx = data_indices.model.output.prognostic
self.num_input_channels = len(data_indices.internal_model.input)
self.num_output_channels = len(data_indices.internal_model.output)
self._internal_input_idx = data_indices.internal_model.input.prognostic
self._internal_output_idx = data_indices.internal_model.output.prognostic

def _assert_matching_indices(self, data_indices: dict) -> None:

assert len(self._internal_output_idx) == len(data_indices.model.output.full) - len(
data_indices.model.output.diagnostic
assert len(self._internal_output_idx) == len(data_indices.internal_model.output.full) - len(
data_indices.internal_model.output.diagnostic
), (
f"Mismatch between the internal data indices ({len(self._internal_output_idx)}) and the output indices excluding "
f"diagnostic variables ({len(data_indices.model.output.full) - len(data_indices.model.output.diagnostic)})",
f"Mismatch between the internal data indices ({len(self._internal_output_idx)}) and "
f"the internal output indices excluding diagnostic variables "
f"({len(data_indices.internal_model.output.full) - len(data_indices.internal_model.output.diagnostic)})",
)
assert len(self._internal_input_idx) == len(
self._internal_output_idx,
), f"Model indices must match {self._internal_input_idx} != {self._internal_output_idx}"
), f"Internal model indices must match {self._internal_input_idx} != {self._internal_output_idx}"

def _define_tensor_sizes(self, config: DotDict) -> None:
self._data_grid_size = self._graph_data[self._graph_name_data].num_nodes
Expand Down
10 changes: 6 additions & 4 deletions src/anemoi/models/preprocessing/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
from torch import Tensor
from torch import nn

from anemoi.models.data_indices.collection import IndexCollection

LOGGER = logging.getLogger(__name__)


Expand All @@ -23,19 +25,19 @@ class BasePreprocessor(nn.Module):
def __init__(
self,
config=None,
data_indices: Optional[IndexCollection] = None,
statistics: Optional[dict] = None,
data_indices: Optional[dict] = None,
) -> None:
"""Initialize the preprocessor.
Parameters
----------
config : DotDict
configuration object
configuration object of the processor
data_indices : IndexCollection
Data indices for input and output variables
statistics : dict
Data statistics dictionary
data_indices : dict
Data indices for input and output variables
"""
super().__init__()

Expand Down
16 changes: 9 additions & 7 deletions src/anemoi/models/preprocessing/imputer.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,15 @@ def __init__(
Parameters
----------
config : DotDict
configuration object
configuration object of the processor
data_indices : IndexCollection
Data indices for input and output variables
statistics : dict
Data statistics dictionary
data_indices : dict
Data indices for input and output variables
"""
super().__init__(config, statistics, data_indices)
super().__init__(config, data_indices, statistics)

self.nan_locations = None
self.data_indices = data_indices

def _validate_indices(self):
assert len(self.index_training_input) == len(self.index_inference_input) <= len(self.replacement), (
Expand Down Expand Up @@ -174,8 +173,8 @@ class InputImputer(BaseImputer):
def __init__(
self,
config=None,
data_indices: Optional[IndexCollection] = None,
statistics: Optional[dict] = None,
data_indices: Optional[dict] = None,
) -> None:
super().__init__(config, data_indices, statistics)

Expand All @@ -201,7 +200,10 @@ class ConstantImputer(BaseImputer):
"""

def __init__(
self, config=None, statistics: Optional[dict] = None, data_indices: Optional[IndexCollection] = None
self,
config=None,
data_indices: Optional[IndexCollection] = None,
statistics: Optional[dict] = None,
) -> None:
super().__init__(config, data_indices, statistics)

Expand Down
Loading

0 comments on commit d998e51

Please sign in to comment.