Release version v0.5.0
We are excited to share Composer v0.5, a library of speed-up methods for efficient neural network training. This release features:
- Revamped checkpointing API based on community feedback
- New baselines: ResNet34-SSD, GPT-3, and Vision Transformers
- Additional improvements to our documentation
- Support for
bfloat16
- Streaming dataset support
- Unified functional API for our algorithms
Highlights
Checkpointing API
Checkpointing models are now a Callback, so that users can easily write and add their own callbacks. The callback is automatically appended if a save_folder
is provided to the Trainer.
trainer = Trainer(
model=model,
algorithms=algorithms,
save_folder="checkpoints",
save_interval="1ep"
)
Alternatively, CheckpointSaver
can be directly added as a callback:
trainer = Trainer(..., callbacks=[
CheckpointSaver(
save_folder='checkpoints',
name_format="ep{epoch}-ba{batch}/rank_{rank}",
save_latest_format="latest/rank_{rank}",
save_interval="1ep",
weights_only=False,
)
])
Subclass from CheckpointSaver
to add your own logic for saving the best model, or saving at specific intervals. Thanks to @mansheej @siriuslee and other users for their feedback.
bloat16
We've added experimental support for bfloat16
, which can be provided via the precision
argument to the Trainer:
trainer = Trainer(
...,
precision="bfloat16"
)
Streaming datasets
We've added support for fast streaming datasets. For NLP-based datasets such as C4, we use the HuggingFace datasets backend, and add dataset-specific shuffling, tokenization , and grouping on-the-fly. To support data parallel training, we added specific sharding logic for efficiency. See C4Datasets
for more details.
Vision streaming datasets are supported via a patched version of the webdatasets
package, and added support for data sharding by workers for fast augmentations. See composer.datasets.webdataset
for more details.
Baseline GPT-3, ResNet34-SSD, and Vision Transformer benchmarks
Configurations for GPT-3-like models ranging from 125m to 760m parameters are now released, and use DeepSpeed Zero Stage 0 for memory-efficient training.
We've also added the Single Shot Detection (SSD) model (Wei et al, 2016) with a ResNet34 backbone, based on the MLPerf reference implementation.
Our first Vision Transformer benchmark is the ViT-S/16 model from Touvron et al, 2021, and based on the vit-pytorch
package.
See below for the full details:
What's Changed
- Export Transforms in
composer.algorithms
by @ajaysaini725 in #603 - Make batchnorm default for UNet by @dskhudia in #535
- Fix no_op_model algorithm by @dskhudia in #614
- Pin pre-1.0 packages by @bandish-shah in #595
- Updated dark mode composer logo, and graph by @nqn in #617
- Jenkins + Docker Improvements by @ravi-mosaicml in #621
- update README links by @hanlint in #628
- Remove all old timing calls by @ravi-mosaicml in #594
- Remove state shorthand by @mvpatel2000 in #629
- add bfloat16 support by @nikhilsardana in #433
- v0.4.0 Hotfix: Docker documentation updates by @bandish-shah in #631
- Fix wrong icons in the method cards by @hanlint in #636
- fix autocast for pytorch < 1.10 by @nikhilsardana in #639
- Add tutorial notebooks to the README by @moinnadeem in #630
- Converted Stateless Schedulers to Classes by @ravi-mosaicml in #632
- Jenkinsfile Fixes Part 2 by @ravi-mosaicml in #627
- Add C4 Streaming dataset by @abhi-mosaic in #489
- CONTRIBUTING.md additions by @kobindra in #648
- Hide showing
object
as a base class; fix skipping documentation offorward
; fixed docutils dependency. by @ravi-mosaicml in #643 - Matthew/functional docstrings update by @growlix in #622
- docstrings improvements for core modules by @dskhudia in #598
- ssd-resnet34 on COCO map 0.23 by @florescl in #646
- Fix broken "best practices" link by @growlix in #649
- Update progressive resizing to work for semantic segmentation by @coryMosaicML in #604
- Let C4 Dataset overwrite
num_workers
if set incorrectly by @abhi-mosaic in #655 - Lazy imports for
pycocotools
by @abhi-mosaic in #656 - W&B excludes final eval metrics when plotted as a fxn of epoch or trainer/global_step by @growlix in #633
- Update GPT3-yamls for default 8xA100-40GB by @abhi-mosaic in #663
- Set WandB default to log rank zero only by @abhi-mosaic in #461
- Update schedulers guide by @hanlint in #661
- [XS] Fix a TQDM deserialization bug by @jbloxham in #665
- Add defaults to the docstrings for algorithms by @hanlint in #662
- Fix ZeRO config by @jbloxham in #667
- [XS] fix formatting for colout by @hanlint in #666
- Composer.core docstring touch-up by @ravi-mosaicml in #657
- Add Uniform bounding box sampling option for CutOut and CutMix by @coryMosaicML in #634
- Update README.md by @ravi-mosaicml in #678
- Fix bug in trainer test by @hanlint in #651
- InMemoryLogger has get_timeseries() method by @growlix in #644
- Batchwise resolution for SWA by @growlix in #654
- Fixed the conda build script so it runs on jenkins by @ravi-mosaicml in #676
- Yahp version update to 0.1.0 by @Averylamp in #674
- Streaming vision datasets by @knighton in #284
- Fix DeepSpeed checkpointing by @jbloxham in #686
- Vit by @A-Jacobson in #243
- [S] cleanup tldr; standardize
__all__
by @hanlint in #688 - Unify algorithms part 2: mixup, cutmix, label smoothing by @dblalock in #658
composer.optim
docstrings by @jbloxham in #653- Fix DatasetHparams, WebDatasetHparams docstring by @growlix in #697
- Models docstrings by @A-Jacobson in #469
- docstrings improvements for composer.datasets by @dskhudia in #694
- Updated contributing.md and the style guide by @ravi-mosaicml in #670
- Ability to retry ADE20k crop transform by @Landanjs in #702
- Add mmsegmentation DeepLabv3(+) by @Landanjs in #684
- Unify functional API part 3 by @dblalock in #715
- Update example notebooks by @coryMosaicML in #707
- [Checkpointing - PR1] Store the
rank_zero_seed
on state by @ravi-mosaicml in #680 - [Checkpointing - PR2] Added in new Checkpointing Events by @ravi-mosaicml in #690
- [Checkpointing - PR3] Clean up RNG and State serialization by @ravi-mosaicml in #692
- [Checkpointing - PR4] Refactored the
CheckpointLoader
into aload_checkpoint
function by @ravi-mosaicml in #693 - Update {blurpool,factorize,ghostbn} method cards by @dblalock in #711
- [Checkpointing - PR 5] Move the
CheckpointSaver
to a callback. by @ravi-mosaicml in #687 - Update datasets docstrings by @growlix in #709
- add notebooks and functional api by @hanlint in #714
- Migrating from PTL notebook by @florescl in #436
- Docs 0.4.1: Profiler section and tutorials by @bandish-shah in #696
- Improve datasets docstrings by @knighton in #695
- Update
C4Dataset
to repeat, handlemax_samples
safely by @abhi-mosaic in #722 - Fix docs build by @ravi-mosaicml in #773
- v0.5 Release by @hanlint in #732
New Contributors
- @nikhilsardana made their first contribution in #433
- @knighton made their first contribution in #284
Full Changelog: v0.4.0...v0.5.0