Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update eager.md #7710

Merged
merged 2 commits into from
Jul 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions docs/eager.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ The implementation of the `torch_xla.experimental.compile` is actually pretty st
```python
torch_xla.experimental.eager_mode(True)

compiled_model = torch_xla.compile(model, backend="openxla")
compiled_model = torch.compile(model, backend="openxla")
```
It is recommened to use the `torch.compile` instead of `torch_xla.experimental.compile` for inference to reduce the tracing overhad.

Expand All @@ -79,9 +79,7 @@ step_fn = torch_xla.experimental.compile(step_fn)
```
In training we asked user to refactor the `step_fn` out because it is usually better to compile the model's forward, backward and optimizer together. The long term goal is to also use `torch.compile` for training but right now we recommend user to use `torch_xla.experimental.compile`(for perfomrance reason).

## Performance

# Benchmark
## Benchmark

I run a 2 layer decoder only model training(it is pretty much just a llama2) with fake data on a single chip of v4-8 for 300 steps. Below is the number I observed.

Expand Down
Loading