From a6ab3fe9fb218e5a0ff294d9efb0c4de8f8d8ef5 Mon Sep 17 00:00:00 2001 From: JackCaoG <59073027+JackCaoG@users.noreply.github.com> Date: Tue, 16 Jul 2024 16:15:04 -0700 Subject: [PATCH] Minor update to the docs (#7691) --- docs/ddp.md | 4 ++-- docs/fsdp.md | 2 +- docs/fsdpv2.md | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/ddp.md b/docs/ddp.md index 09e1c12f9d5..1fe68fa6cd5 100644 --- a/docs/ddp.md +++ b/docs/ddp.md @@ -1,8 +1,8 @@ -# How to do `DistributedDataParallel` +# How to do DistributedDataParallel(DDP) This document shows how to use torch.nn.parallel.DistributedDataParallel in xla, and further describes its difference against the native xla data parallel -approach. +approach. You can find a minimum runnable example [here](https://github.com/pytorch/xla/blob/master/examples/data_parallel/train_resnet_ddp.py). ## Background / Motivation diff --git a/docs/fsdp.md b/docs/fsdp.md index f9a49812e12..3c86e99cbc9 100644 --- a/docs/fsdp.md +++ b/docs/fsdp.md @@ -61,7 +61,7 @@ The implementation of this class is largely inspired by and mostly follows the s --- ### Example training scripts on MNIST and ImageNet - +* Minimum example : [`examples/fsdp/train_resnet_fsdp_auto_wrap.py`](https://github.com/pytorch/xla/blob/master/examples/fsdp/train_resnet_fsdp_auto_wrap.py) * MNIST: [`test/test_train_mp_mnist_fsdp_with_ckpt.py`](https://github.com/pytorch/xla/blob/master/test/test_train_mp_mnist_fsdp_with_ckpt.py) (it also tests checkpoint consolidation) * ImageNet: [`test/test_train_mp_imagenet_fsdp.py`](https://github.com/pytorch/xla/blob/master/test/test_train_mp_imagenet_fsdp.py) diff --git a/docs/fsdpv2.md b/docs/fsdpv2.md index fe9b782a082..6ad04dc1eab 100644 --- a/docs/fsdpv2.md +++ b/docs/fsdpv2.md @@ -1,10 +1,10 @@ -# Fully Sharded Data Parallel via SPMD +# Fully Sharded Data Parallel(FSDP) via SPMD Fully Sharded Data Parallel via SPMD or FSDPv2 is an utility that re-expresses the famous FSDP algorithm in SPMD. [This](https://github.com/pytorch/xla/blob/master/torch_xla/experimental/spmd_fully_sharded_data_parallel.py) is an experimental feature that aiming to offer a familiar interface for users to enjoy all the benefits that SPMD brings into the table. The design doc is [here](https://github.com/pytorch/xla/issues/6379). -Please review the [SPMD user guide](./spmd.md) before proceeding. +Please review the [SPMD user guide](./spmd_basic.md) before proceeding. You can also find a minimum runnable example [here](https://github.com/pytorch/xla/blob/master/examples/fsdp/train_decoder_only_fsdp_v2.py). Example usage: ```python3