🚀 Composer v0.8.0

Composer v0.8.0 is released! Install via pip:

pip install --upgrade mosaicml==0.8.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.0

New Features

🤗 HuggingFace ComposerModel

Train your HuggingFace models with Composer! We introduced a HuggingFaceModel that converts your existing 🤗 Transformers models into a ComposerModel.

For example:

import transformers
from composer.models import HuggingFaceModel

# Define the model
hf_model = transformers.AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Convert it into a ComposerModel
model = HuggingFaceModel(hf_model)

# Construct the trainer
trainer = Trainer(
    ...,
    model,
)

# Train!
trainer.fit()

For more information, see the example on fine-tuning a pretrained BERT with Composer.

🫕 Fused Layer Norm

Fused LayerNorm replaces implementations of torch.nn.LayerNorm with a apex.normalization.fused_layer_norm. The fused kernel provides increased GPU utilization.

For example:

from composer.trainer import Trainer
from composer.algorithms import FusedLayerNorm

# Initialize the algorithm
alg = FusedLayerNorm()

# Construct the trainer
trainer = Trainer(
    algorithms=alg,
)

# Train!
trainer.fit()

See the method card for more information.

💾 Ignore Checkpoint Parameters

If you have a checkpoint and don't want to restore some elements of the chceckpoint to the state, we added a load_ignore_keys parameter. Any specified (nested) keys will be ignored. Glob syntax is supported!

For example, to restore a checkpoint without the seed:
```
from composer import Trainer

trainer = Trainer(
    ...,
    load_path="path/to/my/checkpoint.pt",
    load_ignore_keys=["state/rank_zero_seed", "rng"],
)
```
See the Trainer API Reference for more information.

🪣 Object Stores

Composer v0.8.0 introduces an abstract Object Store API to support multiple object store drivers, such as boto3 (for Amazon S3) and Paramiko (for SFTP), in addition to the existing libcloud implementation.

For example, if you are training on AWS where credentials are available in the environment, here's how to to save checkpoints to a S3 object store via Boto3.

from composer import Trainer
from composer.loggers import ObjectStoreLogger
from composer.utils.object_store import S3ObjectStore

logger = ObjectStoreLogger(
    object_store_cls=S3ObjectStore,
    object_store_kwargs={
        # These arguments will be passed into the S3ObjectStore -- e.g.:
        # object_store = S3ObjectStore(**object_store_kwargs)
        # Refer to the S3ObjectStore class for documentation
        'bucket': 'my-bucket',
    },
)

trainer = Trainer(
    ...,
    loggers=logger,
)

# Train!
trainer.fit()

See the Object Store API Reference for more information.

🪨 Artifact Metadata

Composer automatically logs the epoch, batch, sample, and token counts as metadata when storing artifacts in Weights & Biases. See the API Reference for more information.

API Changes

✂️ Gradient Clipping is now an Algorithm

To clean up the Trainer, we moved gradient clipping into an Algorithm. The grad_clip_norm argument in the Trainer is deprecated and will be removed in a future version of Composer. Instead, use the Gradient Clipping algorithm:

For example:
```
from composer.algorithms import GradientClipping
from composer.trainer import Trainer

# Configure gradient clipping
gradient_clipping = GradientClipping()

# Configure the trainer
trainer = Trainer(
    ...,
    algorithms=gradient_clipping,
)

# Train!
trainer.fit()
```
See the method card for more information.
🕒️ Removed batch_num_samples and batch_num_tokens from the state.

State properties batch_num_samples and batch_num_tokens have been removed.
Instead, use State.timestamp for token and sample tracking.
🧑‍🤝‍🧑 DDP Sync Strategy

We changed the default DDP Sync Strategy to MULTI_AUTO_SYNC, as FORCED_SYNC doesn't work with all algorithms.
🏃 Moved the run_name into the State

The run_name has been added to the State object, so it is persisted with checkpoints. It has been removed from the Logger.

Bug Fixes

In the Object Store Logger, added in retries for credential validation, and validating credentials only on global rank zero. (#1144)
Fixed a bug in the speed monitor where it returned negative wall clock times. (#1123)
Fixed how block-wise Stochastic Depth could freeze the trainer. (#1087)
Fixed a bug in the [MLPerfCallback] where sample counts were incorrect on per-sharded datasets. (#1156)

Changelog

v0.7.1...v0.8.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8.0

🚀 Composer v0.8.0

New Features

API Changes

Bug Fixes

Changelog