Releases · mosaicml/composer

19 Apr 15:41

mvpatel2000

v0.21.3

d39a5e0

v0.21.3

Bug Fixes

1. Increased Robustness to Checkpoint Loading

We've patched several edge cases in loading sharded checkpoints, especially with DTensors, which should decrease memory usage when loading checkpoints. We've also hardened retry logic against object cloud failure, ensuring higher robustness to transient network issues.

What's Changed

Raise daily test timeout by @mvpatel2000 in #3172
fix remote file naming by @cli99 in #3173
[fix] DTensor + SHARD_GRAD_OP + use_orig_params by @bigning in #3175
Bump db sdk by @dakinggg in #3176
Build latest pytorch nightly images by @dakinggg in #3179
Add FP8 TransformerEngine activation checkpointing by @cli99 in #3156
Enabling the computation of validation loss and other metrics when using sequence parallelism by @ShashankMosaicML in #3183
Update mosaic_fsdp_utils.py by @vchiley in #3185
Fix the FSDP.optim_state_dict_to_load OOM by @bigning in #3184
Revert "Update mosaic_fsdp_utils.py" by @vchiley in #3187
Bump databricks-sdk from 0.24.0 to 0.25.1 by @dependabot in #3190
Add version tag to local builds by @mvpatel2000 in #3188
Update NeptuneLogger by @AleksanderWWW in #3165
Filter neptune warning in doctests by @mvpatel2000 in #3195
Removal of metrics deepcopy before computing the metrics by @gregjauvion in #3180
Fix MLFlow Tag Name for Resumption by @KuuCi in #3194
Fix mistral gating by @dakinggg in #3199
Bump version to 0.21.3 by @mvpatel2000 in #3198

New Contributors

@gregjauvion made their first contribution in #3180

Full Changelog: v0.21.2...v0.21.3

Contributors

bigning, vchiley, and 8 other contributors

Assets 2

03 Apr 21:14

mvpatel2000

v0.21.2

082d4e0

v0.21.2

Bug Fixes

1. Enable torch 2.2.2 (#3161)

Composer currently monkeypatches PyTorch for nightly versions in order to fix upstream bugs. With the release of torch 2.2.2, these monkeypatches were mistakenly applied to the stable release due to incorrect gating on imports. This release fixes the gating, enabling torch 2.2.2.

2. MPS Metric Computation on CPU (#3105)

Due to bugs in computing torchmetrics on Mac devices, we move metric computation onto CPU. This previously had issues with data not properly moving to CPU.

Thank you to @hyenal for this contribution!

3. Batch Sampler Support (#3105)

Composer now supports batch sampler, which previously resulted in an error if specified in the dataloader.

Thank you to @Ghelfi for this contribution!

What's Changed

Make codequality callable by @mvpatel2000 in #3133
Explicitly print checkpoint downloading exception by @bigning in #3131
Change release actions by @mvpatel2000 in #3136
Passing rank and num_replicas to dist.get_sampler by @ShashankMosaicML in #3137
Fix broadcast by @mvpatel2000 in #3138
Compressor fixes by @mbway in #3142
In case of MPS device also copy batch to CPU by @hyenal in #3105
Composer object store download retry by @bigning in #3140
Bump databricks-sdk from 0.22.0 to 0.23.0 by @dependabot in #3144
Update transformers requirement from !=4.34.0,<4.39,>=4.11 to >=4.11,!=4.34.0,<4.40 by @dependabot in #3148
Update protobuf requirement from <3.21 to <5.27 by @dependabot in #3147
Bump traitlets from 5.14.1 to 5.14.2 by @dependabot in #3145
Bump to 0.21 by @mvpatel2000 in #3150
Fixing sequence parallel error conditions and adding type float for microbatch_size in typehints by @ShashankMosaicML in #3139
Fix torch monkeypatch version check by @dakinggg in #3155
Update torchmetrics requirement from <1.3.2,>=0.10.0 to >=0.10.0,<1.3.3 by @dependabot in #3157
Bump gitpython from 3.1.42 to 3.1.43 by @dependabot in #3160
Prevent crash if signal handler cannot be set by @mbway in #3152
Pin pillow for code quality workflow by @dakinggg in #3162
Fix torch version check by @dakinggg in #3161
add more retry to checkpoint downloading by @bigning in #3164
Append to gpu rank log files instead of throwing error by @jjanezhang in #3166
Call set_epoch on Dataloader.batch_sampler if defined by @Ghelfi in #3124
Bump version to 0.21.2 by @mvpatel2000 in #3168

New Contributors

@hyenal made their first contribution in #3105
@Ghelfi made their first contribution in #3124

Full Changelog: v0.21.1...v0.21.2

Contributors

bigning, mbway, and 7 other contributors

Assets 2

22 Mar 01:08

mvpatel2000

v0.21.1

1b87a07

v0.21.1

Bug Fixes

1. Fix to HSDP checkpoint loading

The previous release broke checkpoint loading when using HSDP with mutliple replicas. This patch release fixes checkpoint loading.

What's Changed

Fix broadcast by @mvpatel2000 in #3138

Full Changelog: v0.21.0...v0.21.1

Contributors

mvpatel2000

Assets 2

21 Mar 21:19

mvpatel2000

v0.21.0

c36d3e1

v0.21.0

What's New

1. Aggregate Memory Monitoring (#3042)

The Memory Monitor callback now supports aggregating memory statistics across nodes. Getting summary stats for a run's memory usage across the cluster can dramatically help debug straggler nodes or non-homogenous workloads. The memory monitor can now aggregate and log combined values at a user specified frequency.

Example:

from composer import Trainer
from composer.callbacks import MemoryMonitor

trainer = Trainer(
    model=model,
    train_dataloader=train_dataloader,
    optimizers=optimizer,
    max_duration="1ep",
    callbacks=[
        MemoryMonitor(
            dist_aggregate_batch_interval=10,  # aggregate every 10 batches
        )
    ],
)

2. Advanced Compression Options (#3118)

Large model checkpoints can be expensive to store and transfer. In this release, we've upgraded our compression support to accept several new formats which result in better compression-time tradeoffs using CLI tools. In order to use compression, you can post-fix your checkpoint name with a compression path. We know support the following extensions:

bz2
gz
lz4
lzma
lzo
xz
zst

Example:

from composer import Trainer
from composer.callbacks import MemoryMonitor

trainer = Trainer(
    model=model,
    train_dataloader=train_dataloader,
    optimizers=optimizer,
    max_duration="1ep",
    save_filename='ep{epoch}-ba{batch}-rank{rank}.pt.lz4',
)

Thank you to @mbway for adding this support!

What's Changed

Rename composer_run_name tag to run_name when logging to MLflow by @jerrychen109 in #3040
enable aggregate mem monitoring by @vchiley in #3042
Bump junitparser from 3.1.1 to 3.1.2 by @dependabot in #3056
Add SHARD_GRAD_OP to device mesh error check by @mvpatel2000 in #3058
Add torch 2.2.1 support by @mvpatel2000 in #3059
Use testing repo actions for linting by @b-chu in #3060
Link autoresume docs back to watchdog by @aspfohl in #3052
Deprecate get_state and remove deprecations by @b-chu in #3017
Bump version to 0.20.1 by @mvpatel2000 in #3061
Remove s3_bucket pytest cli flag by @b-chu in #3064
Remove s3_bucket flag from gpu test by @b-chu in #3065
Clean Up OOM Observer Remote Uploader Download path by @j316chuck in #3070
Fix daily test for iteration by @b-chu in #3068
Remove "generation_length" in favor of "generation_kwargs" by @maxisawesome in #3014
Bump packaging by @mvpatel2000 in #3072
Use ci-testing repo for CPU and GPU tests by @b-chu in #3062
Add new torch monkeypatches to Composer by @mvpatel2000 in #3063
Add initial support for neuron devices by @bfontain in #3049
Stripping whitespaces as default for QATask ICL eval by @ksreenivasan in #3073
Add ICL base class to all by @mvpatel2000 in #3079
pass prelimiter into ALL ICL datasets by @eitanturok in #3069
Bump sentencepiece from 0.1.99 to 0.2.0 by @dependabot in #3083
Add Iteration related Events to callbacks by @b-chu in #3077
Add Iteration related Events by @b-chu in #3076
Bump CI/CD to v3 by @mvpatel2000 in #3086
Add docstring to _iteration_length by @b-chu in #3088
Check FSDP module has _device_mesh before getting it by @eracah in #3091
Bump minor version in base image by @mvpatel2000 in #3092
Enforce async logging flush in mlflow logger at post_close call by @chenmoneygithub in #3093
Warning log to info log by @aspfohl in #3096
Bump transformers by @dakinggg in #3095
Change style for splitting on commas by @b-chu in #3078
Remove slash by @b-chu in #3098
Allowing for fractional number of samples per rank by @ShashankMosaicML in #3075
Output eval logging (batch level) by @maxisawesome in #2977
Replace errors with warnings for eval args by @mvpatel2000 in #3100
Ability to load sharded checkpoints with remote symlink load_path by @eracah in #3097
Improvements to NeptuneLogger by @AleksanderWWW in #3085
Revert "Improvements to NeptuneLogger" by @mvpatel2000 in #3111
Bump mlflow min pin by @dakinggg in #3110
Fix rounding issue in interval calculation by @dakinggg in #3109
Bump coverage[toml] from 7.4.1 to 7.4.3 by @dependabot in #3102
Uses v0.0.4 of ci-testing by @b-chu in #3112
Add versioned deprecation warning by @irenedea in #2984
Update Flash Attention to 2.5.5 by @Skylion007 in #3113
Setting the max duration to current timestamp in the same units as cu… by @ShashankMosaicML in #3090
Making default_split_batch public by @ShashankMosaicML in #3116
Adding log exception to Mosaic Logger by @jjanezhang in #3089
Add checks to schedulers by @b-chu in #3115
Removed default attrs from exception class in the attrs dict by @jjanezhang in #3126
Bump coverage[toml] from 7.4.3 to 7.4.4 by @dependabot in #3121
Refactor initialization by @Practicinginhell in #3127
Bump databricks sdk version by @dakinggg in #3128
Update packaging requirement from <23.3,>=21.3.0 to >=21.3.0,<24.1 by @dependabot in #3122
Remove rng from save_weights_only ckpt by @eracah in #3129
More compression options by @mbway in #3118
Only broadcast distcp files by @mvpatel2000 in #3130
Bump version to 0.21 by @mvpatel2000 in #3132

New Contributors

@ksreenivasan made their first contribution in #3073
@eitanturok made their first contribution in #3069
@Practicinginhell made their first contribution in #3127
@mbway made their first contribution in #3118

Full Changelog: v0.20.1...v0.21.0

Contributors

Skylion007, mbway, and 19 other contributors

Assets 2

27 Feb 19:51

mvpatel2000

v0.20.1

118c7f2

v0.20.1

What's New

1. Torch 2.2.1 Support

Composer now supports torch 2.2.1! We've raised the pin to allow the latest torch, and we've upstreamed all torch monkeypatches so Composer can run out of the box with the latest and greatest torch features.

What's Changed

Add torch 2.2.1 support by @mvpatel2000 in #3059
Bump version to 0.20.1 by @mvpatel2000 in #3061

Contributors

mvpatel2000

Assets 2

23 Feb 18:39

j316chuck

v0.20.0

9ecea4f

v0.20.0

What's New

1. New Neptune Logger

Composer now supports logging training data to neptune.ai using the NeptuneLogger. To get started:

neptune_project = 'test_project'
neptune_api_token = 'test_token'

neptune_logger = NeptuneLogger(
    project=neptune_project,
    api_token=neptune_api_token,
    rank_zero_only=False,
    mode='debug',
    upload_artifacts=True,
)

We also have an example project demonstrating all the awesome things you can do with this integration!

Additional information on the NeptuneLogger can be found in the docs.

2. OOM observer callback with memory visualizations

Composer now has an OOM observer callback. When a model runs out of memory, this callback helps produce a trace which identifies memory allocations, which can be critical to designing strategies to mitigate memory usage.

Example:

from composer import Trainer
from composer.callbacks import OOMObserver
# constructing trainer object with this callback
trainer = Trainer(
    model=model,
    train_dataloader=train_dataloader,
    eval_dataloader=eval_dataloader,
    optimizers=optimizer,
    max_duration="1ep",
    callbacks=[
        OOMObserver(
            folder="traces",
            overwrite=true,
            filename="rank{rank}_oom",
            remote_filename="oci://bucket_name/{run_name}/oom_traces/rank{rank}_oom",
        )
    ],
)

OOM Visualization:

3. Log all gpu rank stdout/err to MosaicML platform

Composer has expanded it's integration with the MosaicML platform.. Now, we can view all gpu rank stdout/stderrs with MCLI logs to enable more comprehensive analysis of jobs.

Example:

mcli logs <run-name> --node x --gpu x

Note, this defaults to node rank 0 if --node is not provided.

Also, we can find the logs of any global gpu rank with the command:

mcli logs <run-name> --global-gpu-rank x

Bug Fixes

Only save RNG on rank 0 by @mvpatel2000 in #2998
[Auto-microbatch fix] FSDP reshard and cleanup after OOM to fix the cuda memory leak by @bigning in #3030
Fix skip_first for profiler during resumption by @bigning in #2986
Race condition fix in checkpoint loading util by @jessechancy in #3001

What's Changed

Remove .ci folder and move FILE_HEADER and CODEOWNERS by @irenedea in #2957
Modify UCObjectStore.list_objects to lists all files recursively by @irenedea in #2959
Refactor MemorySnapshot by @cli99 in #2960
Log all gpu rank stdout/err to MosaicML platform by @jjanezhang in #2839
Add Torch 2.2 tests by @mvpatel2000 in #2970
Memory snapshot dump pickle by @cli99 in #2968
Neptune logger by @AleksanderWWW in #2447
Fix torch pins in tests by @mvpatel2000 in #2973
Add a register_model_with_run_id api to MLflowLogger by @dakinggg in #2967
Remove bespoke codeowners by @mvpatel2000 in #2971
Add a BEFORE_LOAD event by @snarayan21 in #2974
More torch 2.2 fixes by @mvpatel2000 in #2975
Adding the step argument to logger.log_table by @ShashankMosaicML in #2961
Fix daily tests for torch 2.2 by @mvpatel2000 in #2980
Format load_path with name by @mvpatel2000 in #2978
Bump to 0.19.1 by @mvpatel2000 in #2979
Fix UC object store bugfix by @nancyhung in #2982
[Bugfix][UC] Add back the full object path by @nancyhung in #2988
Minor cleanup of UC get_object_size by @dakinggg in #2989
Pin UC to earlier version by @dakinggg in #2990
Revert "fix skip_first for resumption" by @bigning in #2991
Broadcast files for HSDP by @mvpatel2000 in #2914
Bump ipykernel from 6.29.0 to 6.29.2 by @dependabot in #2994
Bump yamllint from 1.33.0 to 1.34.0 by @dependabot in #2995
Refactor update_metric by @maxisawesome in #2965
Add azure integration test by @mvpatel2000 in #2996
Fix Profiler schedule skip_first by @bigning in #2992
Remove planner validation by @mvpatel2000 in #2985
Fix load for non-HSDP device mesh by @mvpatel2000 in #2997
Update NCCL arg since torch deprecated old one by @mvpatel2000 in #3000
Add bias argument to LPLN by @mvpatel2000 in #2999
Revert "Add bias argument to LPLN" by @mvpatel2000 in #3003
Revert "Update NCCL arg since torch deprecated old one" by @mvpatel2000 in #3004
Add torch 2.3 image for aws cluster by @j316chuck in #3002
Patch torch 2.3 aws naming by @j316chuck in #3006
Add debug log before training loop starts by @mvpatel2000 in #3005
Deprecate ffcv code by @j316chuck in #3007
Remove log for mosaicml logger by @mvpatel2000 in #3008
[EASY] Always log 1st batch when resuming training by @bigning in #3009
Use reusable actions for linting by @b-chu in #2948
Make CodeEval respect device_eval_batch_size by @josejg in #2969
Use Mosaic constant for GPU file prefix by @jjanezhang in #3018
Fall back to normal logging when gpu prefix is not present by @jjanezhang in #3020
Revert "Use reusable actions for linting" to fix CI/CD by @mvpatel2000 in #3023
Change to pull_request_target by @b-chu in #3025
Bump gitpython from 3.1.41 to 3.1.42 by @dependabot in #3031
Bump yamllint from 1.34.0 to 1.35.1 by @dependabot in #3034
Update torchmetrics requirement from <1.3.1,>=0.10.0 to >=0.10.0,<1.3.2 by @dependabot in #3035
Bump pypandoc from 1.12 to 1.13 by @dependabot in #3033
Add tensorboard images support by @Menduist in #3021
Add sorted to logs for checkpoint broadcast by @mvpatel2000 in #3036
Friendlier device mesh error by @mvpatel2000 in #3039
Upgrade to python3.11 for torch nightly by @j316chuck in #3038
Download symlink once by @mvpatel2000 in #3043
Add min size to OCI download by @mvpatel2000 in #3044
Lint fix by @mvpatel2000 in #3045
Revert "Change to pull_request_target " by @mvpatel2000 in #3047
Bump composer version 0.19.2 by @j316chuck in #3048
Update XLA support by @bfontain in #2964
Bump composer version 0.20.0 by @j316chuck in #3051
Update ruff. Fix PLE & LOG lints by @Skylion007 in #3050

New Contributors

@AleksanderWWW made their first contribution in #2447
@ShashankMosaicML made their first contribution in #2961
@nancyhung made their first contribution in #2982
@bigning made their first contribution in #2986
@jessechancy made their first contribution in #3001
@josejg made their first contribution in #2969
@Menduist made their first contribution in #3021
@bfontain made their first contribution in #2964

**Full Chang...

Contributors

Skylion007, bigning, and 17 other contributors

Assets 2

08 Feb 20:49

milocress

v0.19.1

e49670a

v0.19.1

What's New

1. New Event: BEFORE_LOAD (#2974)

Composer now has the events Event.BEFORE_LOAD, which lets users modify state before a model is loaded. This is particularly useful for accessing certain attributes which may not exist at Event.INIT, such as the dataloader state.

2. Registering model in MLFlow with run id (#2967)

The MLFlow logger now has register_model_with_run_id, which allows users to register a model based on the run_id. This is a different way of registering the model which preserves the link to the mlflow runs.

What's Changed

before_load event added #2974
Add a register_model_with_run_id api to MLflowLogger #2967

Full Changelog: v0.19.0...v0.19.1

Assets 2

02 Feb 09:07

j316chuck

v0.19.0

de89606

v0.19.0

What's New

1. Improved DTensor Support

Composer now supports elastic saving and loading of DTensors at various mesh sizes.

2. Checkpoint Saving and Loading from Databricks MLFlow

Composer now supports saving and loading checkpoints to Databricks-managed MLFlow.

composer_model = MyComposerModel(...)

trainer = Trainer(
      model=composer_model,
      save_folder= 'dbfs:/databricks/mlflow-tracking/{mlflow_experiment_id}/{mlflow_run_id}/artifacts',
      logger=MLFlowLogger(...),
      load_path= 'dbfs:/databricks/mlflow-tracking/{mlflow_experiment_id}/{mlflow_run_id}/artifacts',
      ...
)

3. Better Communication Computation Overlap in FSDP

Composer now has improved communication/computation overlap in our FSDP code which should improve MFU across several architectures.

4. Python3.11 + Torch2.2 Support

Initial support of Python3.11 + Torch2.2 added in Composer.

5. PEFT LoRA

PEFT LoRA is now supported in the HuggingFaceModel class.

6. Refactored Evaluation

in_context_learning_evaluation.py has a new design with cleaner abstractions and easier interfaces to work wtih.

7. Azure Checkpointing

Composer now supports saving your model in Azure.

8. MLFlow Checkpointing

Composer now supports saving your model in MLFlow.

Bug Fixes

Fix MLFlowLogger test by @ngcgarcia in #2912
Fix bug with CoT early stopping and LLama2 tokenizer by @bmosaicml in #2902
Fix split_batch bug with empty generation_kwargs by @maxisawesome in #2913
Only load RNG keys that exist by @mvpatel2000 in #2901
Fix daily tests by @mvpatel2000 in #2891
Fix seed for FSDP wrap by @mvpatel2000 in #2833
Fix load_ignore_keys with rng by @mvpatel2000 in #2803
Fix mosaicml logger on close by @mvpatel2000 in #2816
Fix torch profiler error on close by @mvpatel2000 in #2818
Fix import for daily test by @snarayan21 in #2826
Fix how single value tensors are logged by @aspfohl in #2831
Fix torch bump by @j316chuck in #2855
Fix MPS with sequence loss by @JAEarly in #2834

What's Changed

Bump transformers version by @dakinggg in #2781
Bump sphinxext-opengraph from 0.9.0 to 0.9.1 by @dependabot in #2784
Bump coverage[toml] from 7.3.0 to 7.3.3 by @dependabot in #2783
Update torch requirement from <2.1.2,>=1.13.1 to >=1.13.1,<2.1.3 by @dependabot in #2785
[UCVolumes] Rely on databricks-sdk auth for the right requirements by @panchalhp-db in #2789
Enable system metrics in mosaic mlflow logger by @chenmoneygithub in #2775
Update parse_uri by @irenedea in #2787
default to no torch profiler memory timeline by @cli99 in #2790
Add eot token to ICL generate kwargs by @bmosaicml in #2782
Add nightly image for torch 2.2.0-12-20-23 by @j316chuck in #2791
Add torch nightly 12-13 by @j316chuck in #2792
Add process group as arg to FSDP by @mvpatel2000 in #2794
Bump coverage[toml] from 7.3.3 to 7.3.4 by @dependabot in #2798
Bump ipykernel from 6.26.0 to 6.28.0 by @dependabot in #2806
Bump junitparser from 3.1.0 to 3.1.1 by @dependabot in #2805
Bump pytest from 7.4.3 to 7.4.4 by @dependabot in #2807
Avoid futures on close for MosaicML logger by @mvpatel2000 in #2804
Require sync module states with HSDP by @mvpatel2000 in #2812
Better communication computation overlap by @snarayan21 in #2811
Improve error message for speed monitor by @mvpatel2000 in #2801
Bump torch version -- DO NOT RELEASE by @mvpatel2000 in #2814
Bump torchvision for nightly by @mvpatel2000 in #2815
Correct multi-unshard stream patching for torch 2.2.0dev, and stream waiting correctness. by @snarayan21 in #2817
Bump traitlets from 5.13.0 to 5.14.1 by @dependabot in #2822
All unshard streams wait on computation every step by @snarayan21 in #2823
Add encoding=utf-8 by @dakinggg in #2824
[MLFlowObjectStore] [1/2] Base implementation for MLFlowObjectStore by @jerrychen109 in #2802
Remove fused layernorm (already deprecated for 2 versions) by @mvpatel2000 in #2827
checkpoint saver tracks all checkpoints/intervals in state by @aspfohl in #2819
code-quality timeout update by @aspfohl in #2830
Adds DTensor Support by @mvpatel2000 in #2821
Remove duplicate checkpoint verifications by @eracah in #2828
Remove fsdp patch for comm overlap by @mvpatel2000 in #2836
Allow hsdp by @mvpatel2000 in #2838
Bump torch 2.1.2 by @mvpatel2000 in #2840
Upgrade pyright to 1.1.310 by @b-chu in #2841
[MLFlowObjectStore] [2/2] Support checkpointing with MLFlow by @jerrychen109 in #2810
update nightly to torch 2.3 by @j316chuck in #2842
Pin sphinxcontrib applehelp by @mvpatel2000 in #2854
Torch 2.3 patch by @dakinggg in #2849
Update mosaicml-cli requirement from <0.6,>=0.5.25 to >=0.5.25,<0.7 by @dependabot in #2866
Rewrite to use individual state functions by @mvpatel2000 in #2860
Add custom stopping criteria to ICL generate tasks by @bmosaicml in #2800
Add save_ignore_keys by @mvpatel2000 in #2868
Remome log debug by @mvpatel2000 in #2871
Update monkeypatch to put barrier in optim load by @mvpatel2000 in #2874
Remove toml by @b-chu in #2872
Update license by @b-chu in #2875
Add ignore_metrics field to the MLflow logger by @ngcgarcia in #2869
Convert print to log.info by @mvpatel2000 in #2876
Bump version to 0.18.0 by @irenedea in #2877
Removed commented-out unshard streams patching. by @snarayan21 in #2873
Make code quality workflow reusable by @b-chu in #2878
Bump gitpython from 3.1.40 to 3.1.41 by @dependabot in #2885
Bump torchmetrics by @mvpatel2000 in #2890
Bump transformers to 4.37 by @dakinggg in #2894
Azure checkpointing support by @mvpatel2000 in #2893
Pass PG into checkpoint load and load rng with state_dict by @mvpatel2000 in #2897
Remove monkeypatch and new state dict APIs for torch 2.2 by @mvpatel2000 in #2899
Bump version to 0.18.1 by @b-chu in #2905
Refactor in_context_learning_evaluation.py by @maxisawesome in #2713
Fix FP8 checkpoint resumption with onnx export flag by @j316chuck in #2907
Add Python 3.11 + FA 2.5.0 + Torch 2.3.0 Image by @KuuCi in #2898
Add yamllint to pre commit by @b-chu in #2909
Add ignore_hyperparameters to MLFlowLogger by @ngcgarcia in #2908
Bump coverage[toml] from 7.3.4 to 7.4.1 by @dependabot in #2915
Add checkpoint test for 0.18.1 by @b-chu in #2906
Integrate PEFT LoRA with HuggingFaceModel by @dakinggg in #2829

New Contributors

@jerrychen109 made their first contribution in #2802
@JAEarly made their first contribution in https://github.com/mosa...

Contributors

eracah, JAEarly, and 16 other contributors

Assets 2

01 Feb 04:48

b-chu

v0.18.2

cc5e35b

v0.18.2

Bug Fixes

Fix lp layernorm weight by @snarayan21 in #2954

What's Changed

Fix lp layernorm weight by @snarayan21 in #2954
Bump version to 0.18.2 by @b-chu

Full Changelog: v0.18.1...v0.18.2

Contributors

b-chu and snarayan21

Assets 2

30 Jan 23:10

b-chu

v0.18.1

79e3309

v0.18.1

Bug Fixes

Fix MPS with sequence loss by @JAEarly in #2834
Fix daily tests by @mvpatel2000 in #2891
Remove monkeypatch and new state dict APIs for torch 2.2 by @mvpatel2000 in #2899
Only load RNG keys that exist by @mvpatel2000 in #2901

What's Changed

Bump version to 0.18.0 by @irenedea in #2877
Removed commented-out unshard streams patching. by @snarayan21 in #2873
Make code quality workflow reusable by @b-chu in #2878
Bump gitpython from 3.1.40 to 3.1.41 by @dependabot in #2885
Fix MPS with sequence loss by @JAEarly in #2834
Bump torchmetrics by @mvpatel2000 in #2890
Fix daily tests by @mvpatel2000 in #2891
Bump transformers to 4.37 by @dakinggg in #2894
Azure checkpointing support by @mvpatel2000 in #2893
Pass PG into checkpoint load and load rng with state_dict by @mvpatel2000 in #2897
Remove monkeypatch and new state dict APIs for torch 2.2 by @mvpatel2000 in #2899
Only load RNG keys that exist by @mvpatel2000 in #2901
Bump version to 0.18.1 by @b-chu in #2905

New Contributors

@JAEarly made their first contribution in #2834

Full Changelog: v0.18.0...v0.18.1

Contributors

JAEarly, irenedea, and 5 other contributors

Assets 2

Releases: mosaicml/composer

v0.21.3

Bug Fixes

What's Changed

New Contributors

Contributors

v0.21.2

Bug Fixes

1. Enable torch 2.2.2 (#3161)

2. MPS Metric Computation on CPU (#3105)

3. Batch Sampler Support (#3105)

What's Changed

New Contributors

Contributors

v0.21.1

Bug Fixes

1. Fix to HSDP checkpoint loading

What's Changed

Contributors

v0.21.0

What's New

1. Aggregate Memory Monitoring (#3042)

2. Advanced Compression Options (#3118)

What's Changed

New Contributors

Contributors

v0.20.1

What's New

1. Torch 2.2.1 Support

What's Changed

Contributors

v0.20.0

What's New

1. New Neptune Logger

2. OOM observer callback with memory visualizations

3. Log all gpu rank stdout/err to MosaicML platform

Bug Fixes

What's Changed

New Contributors

Contributors

v0.19.1

What's New

What's Changed

v0.19.0

What's New

1. Improved DTensor Support

2. Checkpoint Saving and Loading from Databricks MLFlow

3. Better Communication Computation Overlap in FSDP

4. Python3.11 + Torch2.2 Support

5. PEFT LoRA

6. Refactored Evaluation

7. Azure Checkpointing

8. MLFlow Checkpointing

Bug Fixes

What's Changed

New Contributors

Contributors

v0.18.2

Bug Fixes

What's Changed

Contributors

v0.18.1

Bug Fixes

What's Changed

New Contributors

Contributors