Skip to content

v0.12.0

Compare
Choose a tag to compare
@bandish-shah bandish-shah released this 23 Dec 00:13
· 1093 commits to dev since this release

πŸš€ Composer v0.12.0

Composer v0.12.0 is released! Install via pip:

pip install mosaicml==0.12.0

New Features

  1. πŸͺ΅ Logging and ObjectStore Enhancements

    There are multiple improvements to our logging and object store support in this release.

    • Image visualization using our CometMLLogger (#1710)

      We've added support for using our ImageVisualizer callback with CometML to log images and segmentation masks to CometML.

      from composer.trainer import Trainer
      
      trainer = Trainer(...,
          callbacks=[ImageVisualizer()],
          loggers=[CometMLLogger()]
      )
    • Added direct support for Oracle Cloud Infrastructure (OCI) as an ObjectStore (#1774) and support for Google Cloud Storage (GCS) via URI (#1833)

      To use, you can simply set your save_folder or load_path to a URI beginning with oci:// or gs://, to save and load with OCI and GCS respectively.

      from composer.trainer import Trainer
      
      # Checkpoint saving to Google Cloud Storage.
      trainer = Trainer(
          model=model,
          save_folder="gs://my-bucket/{run_name}/checkpoints",
          run_name='my-run',
          save_interval="1ep",
          save_filename="ep{epoch}.pt",
          save_num_checkpoints_to_keep=0,  # delete all checkpoints locally
          ...
      )
      
      trainer.fit()
    • Added basic support for logging with MLFlow (#1795)

      We've added basic support for using MLFlow to log experiment metrics.

      from composer.loggers import MLFlowLogger
      from composer.trainer import Trainer
      
      mlflow_logger = MLFlowLogger(experiment_name=mlflow_exp_name,
                                   run_name=mlflow_run_name,
                                   tracking_uri=mlflow_uri)
      trainer = Trainer(..., loggers=[mlflow_logger])
    • Simplified console and progress bar logging (#1694)

      To turn off the progress bar, set progress_bar=False. To turn on logging directly to the console, set log_to_console=True. To control the frequency of logging to console, set console_log_interval (e.g. to 1ep or 1ba).

    • getfile supports URIs (#1750)

      Our get_file utility now supports URIs directly (s3://, oci://, and gs://) for downloading files.

  2. πŸƒβ€β™€οΈ Support for Mid-Epoch Resumption with the latest release of Streaming

    We've added support in Composer for the latest release of our Streaming library. This includes awesome new features like instant mid epoch resumption and deterministic shuffling, regardless of the number of nodes. See the Streaming release notes for more!

  3. 🚨 New algorithm - GyroDropout!

    Thanks to @jelite for adding a new algorithm, GyroDropout to Composer! Please see the method card for more details.

  4. πŸ€— HuggingFace + Composer improvements

    We've added a new utility to load a πŸ€— HuggingFace model and tokenizer out of a Composer checkpoint (#1754), making the pretraining -> finetuning workflow even easier in Composer. Check out the docs for more details, and our example notebook for a full tutorial (#1775)!

  5. πŸŽ“ GradMonitor -> OptimizerMonitor

    Renames our GradMonitor callback to OptimizerMonitor, and adds the ability to track optimizer specific metrics. Check out the docs for more details, and add to your code just like any other callback!

    from composer.callbacks import OptimizerMonitor
    from composer.trainer import Trainer
    
    trainer = Trainer(
        ..., 
        callbacks=[OptimizerMonitor(log_optimizer_metrics=log_optimizer_metrics)]
    )
  6. 🐳 New PyTorch and CUDA versions

    We've expanded our library of Docker images with support for PyTorch 1.13 + CUDA 11.7:

    • mosaicml/pytorch:1.13.0_cu117-python3.10-ubuntu20.04
    • mosaicml/pytorch:1.13.0_cpu-python3.10-ubuntu20.04

    The mosaicml/pytorch:latest, mosaicml/pytorch:cpu_latest and mosaicml/composer:0.12.0 tags are now built from PyTorch 1.13 based images. Please see our DockerHub repository for additional details.

API changes

  1. Replace grad_accum with device_train_microbatch_size (#1749, #1776)

    We're deprecating the grad_accum Trainer argument in favor of the more intuitive device_train_microbatch_size. Instead of thinking about how to divide your specified minibatch into microbatches, simply specify the size of your microbatch. For example, let's say you want to split your minibatch of 2048 into two microbatches of 1024:

    from composer import Trainer
    
    trainer = Trainer(
        ...,
        device_train_microbatch_size=1024,
    )

    If you want Composer to tune the microbatch for you automatically, enable automatic microbatching as follows:

    from composer import Trainer
    
    trainer = Trainer(
        ...,
        device_train_microbatch_size='auto',
    )

    The grad_accum argument is still supported but will be deprecated in the next Composer release.

  2. Renamed precisions (#1761)

    We've renamed precision attributes for clarity. The following values have been removed: ['amp', 'fp16', bf16'].

    We have added the following values, prefixed with 'amp' to clarify when an Automatic Mixed Precision type is being used: ['amp_fp16', 'amp_bf16'].

    The fp32 precision value remains unchanged.

Deprecations

  1. Removed support for YAHP (#1512)
  2. Removed COCO and SSD datasets (#1717)
  3. Fully removed Streaming v1 support, please see the mosaicml/streaming project for our next-gen streaming datasets (#1787)
  4. Deprecated FusedLayerNorm algorithm (#1789)
  5. Fully removed grad_clip_norm training argument, please use the GradientClipping algorithm instead (#1768)
  6. Removed data_fit, data_epoch, and data_batch from Logger (#1826)

Bug Fixes

  • Fix FSDP checkpoint strategy (#1734)
  • Fix gradient clipping with FSDP (#1740)
  • Adds more supported FSDP config flags (sync_module_states, forward_prefecth, limit_all_gathers) (#1794)
  • Allow FULL precision with FSDP (#1796)
  • Fix eval_microbatch modification on EVAL_BEFORE_FORWARD event (#1739)
  • Fix algorithm API backwards compatibility in checkpoints (#1741)
  • Fixes a bad None check preventing setting device_id to 0 (#1767)
  • Unregister engine to make cleaning up memory easier (#1769)
  • Fix issue if metric_names is not a list (#1798)
  • Match implementation for list and tensor batch splitting (#1804)
  • Fixes infinite eval issue (#1815)

What's Changed

New Contributors

Full Changelog: v0.11.1...v0.12.0