From 9773b39a49954c3742d7e96f96f94b2e3a237711 Mon Sep 17 00:00:00 2001 From: JackCaoG <59073027+JackCaoG@users.noreply.github.com> Date: Thu, 18 Jul 2024 09:36:16 -0700 Subject: [PATCH 1/2] Update eager.md --- docs/eager.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/eager.md b/docs/eager.md index a1da0b5c634..e1c8e85910c 100644 --- a/docs/eager.md +++ b/docs/eager.md @@ -59,7 +59,7 @@ The implementation of the `torch_xla.experimental.compile` is actually pretty st ```python torch_xla.experimental.eager_mode(True) -compiled_model = torch_xla.compile(model, backend="openxla") +compiled_model = torch.compile(model, backend="openxla") ``` It is recommened to use the `torch.compile` instead of `torch_xla.experimental.compile` for inference to reduce the tracing overhad. From 026591574280b7caab78b97a373ca8e3138b3cb5 Mon Sep 17 00:00:00 2001 From: JackCaoG <59073027+JackCaoG@users.noreply.github.com> Date: Thu, 18 Jul 2024 16:15:16 -0700 Subject: [PATCH 2/2] Update eager.md --- docs/eager.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/eager.md b/docs/eager.md index e1c8e85910c..6e5413d63d4 100644 --- a/docs/eager.md +++ b/docs/eager.md @@ -79,9 +79,7 @@ step_fn = torch_xla.experimental.compile(step_fn) ``` In training we asked user to refactor the `step_fn` out because it is usually better to compile the model's forward, backward and optimizer together. The long term goal is to also use `torch.compile` for training but right now we recommend user to use `torch_xla.experimental.compile`(for perfomrance reason). -## Performance - -# Benchmark +## Benchmark I run a 2 layer decoder only model training(it is pretty much just a llama2) with fake data on a single chip of v4-8 for 300 steps. Below is the number I observed.