diff --git a/docs/eager.md b/docs/eager.md
index e1c8e85910c..6e5413d63d4 100644
--- a/docs/eager.md
+++ b/docs/eager.md
@@ -79,9 +79,7 @@ step_fn = torch_xla.experimental.compile(step_fn)
 ```
 In training we asked user to refactor the `step_fn` out because it is usually better to compile the model's forward, backward and optimizer together. The long term goal is to also use `torch.compile` for training but right now we recommend user to use `torch_xla.experimental.compile`(for perfomrance reason).
 
-## Performance
-
-# Benchmark
+## Benchmark
 
 I run a 2 layer decoder only model training(it is pretty much just a llama2) with fake data on a single chip of v4-8 for 300 steps. Below is the number I observed.