Add support for dynamic shape in dynamo #7676

wonjoolee95 · 2024-07-12T21:00:28Z

TODO

Remove debugging code and add comments
Add unit tests
Handle error case when TorchDynamo passes us int types

wonjoolee95 · 2024-07-12T21:02:33Z

With the current changes, the following code generates correct results without recompiling the graph:

    ###
    # torch.compile dynamic shape ON
    torch._dynamo.config.automatic_dynamic_shapes = True
    compiled_fn = torch.compile(fn, backend='openxla', dynamic=True)
    a = torch.randn(3, 4, device=device)
    b = torch.ones(4, device=device)
    ret = compiled_fn(a, b)
    xm.mark_step()
    print(f'[Testing] {ret=}')
    print(f'--------------------')

    c = torch.randn(4, 5, device=device)
    d = torch.ones(5, device=device)
    ret2 = compiled_fn(c, d)
    xm.mark_step()
    print(f'[Testing] {ret2=}')
    print(f'--------------------')

As for next steps, I'll clean up some code and add some unit tests.

test/dynamo/test_dynamo.py

JackCaoG · 2024-07-16T20:22:18Z

seems like a bunch of test failed and a lot of them are real failures. @wonjoolee95 let me know if you need help debugging them

test/dynamo/test_dynamo.py

torch_xla/core/dynamo_bridge.py

wonjoolee95 · 2024-07-18T05:26:51Z

test/dynamo/test_dynamo.py

+      #   self.assertTrue(
+      #       torch.allclose(output_cpu_new_shape, output_new_shape.cpu(), rtol=1e-05, atol=1e-05))


This part is odd. When I run these tests, the allclose fails because in some iteration of the data loader with this new_shape, the differences are as big as 0.2.

This is fixed with the explicit mark_step call within else statement under torch._dynamo.config.assume_static_by_default.

torch_xla/core/dynamo_bridge.py

JackCaoG · 2024-07-19T22:15:46Z

test/dynamo/test_dynamo.py

+    for data, _ in loader_new_shape:
+      output_new_shape = dynamo_resnet18(data)
+      output_cpu_new_shape = resnet18(data.cpu())
+      # # TPU has some precision issues, skipping allclose check


remove one #

JackCaoG · 2024-07-19T22:16:06Z

test/dynamo/test_dynamo.py

+                output_new_shape.cpu(),
+                rtol=1e-05,
+                atol=1e-05))
+


maybe also check the CompileTime and ExecuteTime here

also can you make another test to test the case of

fn(shape_a) fn(shape_b) fn(shape_c) fn(shape_a)

want to make sure we don't forgot the old shapes that's cached.

JackCaoG · 2024-07-19T22:19:00Z

torch_xla/core/dynamo_bridge.py

+  # Values: tuple of (xla_args_sharding_spec, args_and_out, graph_hash,
+  # arg_index_to_need_update_index, none_remover, graph_input_matcher,
+  # dumb_return_handler, xla_args_need_update).
+  input_shape_mappings: dict[tuple[int, ...], tuple[object, ...]] = {}


ust typing.Dict and typing.Tuple otherwise the python 3.8 CI in upstream will fail

torch_xla/core/dynamo_bridge.py

JackCaoG · 2024-07-19T22:20:40Z

torch_xla/core/dynamo_bridge.py

+        input_shape_mappings[arg_input_shapes] = (
+            xla_args_sharding_spec, args_and_out, graph_hash,
+            arg_index_to_need_update_index, none_remover, graph_input_matcher,
+            dumb_return_handler, xla_args_need_update)


I think you don't need this here

IIUC, we actually need this here. And we actually don't need this same logic in extract_internal above (removed this in the newest commit). The reason is when dynamic=True, only optimized_mod is called. Other functions (including extract_internal) are not called.

ok then you will run into the same old problem right?
first time

extract_graph_helper -> optimized_mod

in this case you do the compile, but you do not cache the input_shape_mappings

when optimized_mod is called the first tiem you will need to call extract_graph_helper again which is wasteful.

you should just do the caching(input_shape_mappings[arg_input_shapes] =) inside the extract_graph_helper

let me fix this too..

JackCaoG · 2024-07-19T22:22:20Z

torch_xla/core/dynamo_bridge.py

+    dynamo_extract_graph_helper_metric_count = metrics.counter_value(
+        'DynamoExtractCompiledGraph')


will run_node call extract_compiled_graph too?

It's hard to see from documentations. However, when I try comparing metrics before/after run_node, from what I can see, it's not calling extract_compiled_graph.

ok then I am confused what this dynamo_extract_graph_helper_metric_count is doing here

This code (run_node) is executed when we're fetching the fallback ops. And in this code below, we clear our metric counters via metrics.clear_counters(). So we need a way to restore this counter, so we can verify extract_compiled_graph only gets called once in our unit tests.

Ah I see, I can fix it later. I think the right thing to do is to define a region where counter does not incremented.

JackCaoG · 2024-07-19T22:23:06Z

@ysiraichi FYI

wonjoolee95 · 2024-07-21T00:36:23Z

The PR should be in a reasonable state, now just seeing 2 failures the GPU tests requiring torch CUDA tests:

#1: DynamoInferenceBasicTest.test_dynamic_shape_resnet180 (True):
Input tensor is not an XLA tensor: CUDAFloatType

#2: DynamoInferenceBasicTest.test_resnet180 (True)
  File "/__w/xla/xla/pytorch/xla/test/dynamo/test_dynamo.py", line 370, in test_resnet18
    self.assertEqual(met.metric_data('CompileTime')[0], 1)
TypeError: 'NoneType' object is not subscriptable

For the first error, the stack trace points to:

xla/torch_xla/core/dynamo_bridge.py

Lines 292 to 295 in 4ba63ff

    
           pytree.tree_map_only( 
        
               torch.Tensor, 
        
               lambda xla_arg: torch_xla._XLAC._xla_get_tensor_id(xla_arg), 
        
               xla_args))

It seems like we may want to do an additional isinstance(arg, torch.Tensor) check here.

JackCaoG · 2024-07-22T17:20:09Z

I will pick this up and try to fix error today

JackCaoG · 2024-07-22T23:21:06Z

@alanwaketan There are a few places I want to fix but maybe we should just merge this pr to unblock Woosuk now. I am also running some benchmarks

alanwaketan

Approved to unblock.

wonjoolee95 force-pushed the wonjoo/dynamo-dynamic-shape branch from 630bba3 to 0ef3768 Compare July 13, 2024 00:24

wonjoolee95 changed the title ~~[WIP] Add support for dynamic shape in dynamo~~ Add support for dynamic shape in dynamo Jul 15, 2024

JackCaoG reviewed Jul 15, 2024

View reviewed changes

test/dynamo/test_dynamo.py Outdated Show resolved Hide resolved

wonjoolee95 marked this pull request as ready for review July 15, 2024 21:23

wonjoolee95 requested a review from JackCaoG July 16, 2024 17:07

JackCaoG reviewed Jul 16, 2024

View reviewed changes

test/dynamo/test_dynamo.py Outdated Show resolved Hide resolved

JackCaoG reviewed Jul 16, 2024

View reviewed changes

torch_xla/core/dynamo_bridge.py Outdated Show resolved Hide resolved

JackCaoG reviewed Jul 16, 2024

View reviewed changes

torch_xla/core/dynamo_bridge.py Outdated Show resolved Hide resolved

wonjoolee95 mentioned this pull request Jul 17, 2024

Support Dynamo level Caching pytorch/pytorch#125958

Open

miladm assigned wonjoolee95 Jul 17, 2024

miladm added the dynamism Dynamic Shape Features label Jul 17, 2024

wonjoolee95 commented Jul 18, 2024

View reviewed changes

JackCaoG reviewed Jul 18, 2024

View reviewed changes

torch_xla/core/dynamo_bridge.py Show resolved Hide resolved

wonjoolee95 force-pushed the wonjoo/dynamo-dynamic-shape branch from e8334d4 to fcd08bb Compare July 19, 2024 03:39

wonjoolee95 added 8 commits July 19, 2024 21:52

Add support for dynamic shape in dynamo

36b7e69

Add test and clean up some comments

0175117

Remove some more debug code

63be2ef

Update tests

75de8d5

Introduce flag and update tests

f266986

Update way to get dynamic_shape flag

14947bd

Fix regressing unit tests

e4b52d5

Clean up some code

8b8897c

wonjoolee95 force-pushed the wonjoo/dynamo-dynamic-shape branch from 858359e to 8b8897c Compare July 19, 2024 21:52

JackCaoG reviewed Jul 19, 2024

View reviewed changes

torch_xla/core/dynamo_bridge.py Show resolved Hide resolved

JackCaoG reviewed Jul 19, 2024

View reviewed changes

Address comments

cdaefe4

JackCaoG added the tpuci label Jul 22, 2024

JackCaoG added 3 commits July 22, 2024 20:50

refactor and add new tests

bf0e640

add new tests

5b8b67f

add test that check mixing static and dynamic

256c4ac

JackCaoG requested a review from alanwaketan July 22, 2024 23:20

alanwaketan approved these changes Jul 23, 2024

View reviewed changes

JackCaoG merged commit 2b6b461 into master Jul 23, 2024
23 checks passed

JackCaoG deleted the wonjoo/dynamo-dynamic-shape branch July 23, 2024 01:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for dynamic shape in dynamo #7676

Add support for dynamic shape in dynamo #7676

wonjoolee95 commented Jul 12, 2024

wonjoolee95 commented Jul 12, 2024

JackCaoG commented Jul 16, 2024

wonjoolee95 Jul 18, 2024

wonjoolee95 Jul 19, 2024

JackCaoG Jul 19, 2024

wonjoolee95 Jul 19, 2024

JackCaoG Jul 19, 2024

JackCaoG Jul 19, 2024

JackCaoG Jul 19, 2024

wonjoolee95 Jul 19, 2024

JackCaoG Jul 19, 2024

wonjoolee95 Jul 19, 2024

JackCaoG Jul 19, 2024

JackCaoG Jul 19, 2024

JackCaoG Jul 22, 2024

JackCaoG Jul 19, 2024

wonjoolee95 Jul 19, 2024

JackCaoG Jul 19, 2024

wonjoolee95 Jul 19, 2024

JackCaoG Jul 20, 2024

JackCaoG commented Jul 19, 2024

wonjoolee95 commented Jul 21, 2024

JackCaoG commented Jul 22, 2024

JackCaoG commented Jul 22, 2024

alanwaketan left a comment

		# self.assertTrue(
		# torch.allclose(output_cpu_new_shape, output_new_shape.cpu(), rtol=1e-05, atol=1e-05))

		dynamo_extract_graph_helper_metric_count = metrics.counter_value(
		'DynamoExtractCompiledGraph')

Add support for dynamic shape in dynamo #7676

Add support for dynamic shape in dynamo #7676

Conversation

wonjoolee95 commented Jul 12, 2024

wonjoolee95 commented Jul 12, 2024

JackCaoG commented Jul 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JackCaoG commented Jul 19, 2024

wonjoolee95 commented Jul 21, 2024

JackCaoG commented Jul 22, 2024

JackCaoG commented Jul 22, 2024

alanwaketan left a comment

Choose a reason for hiding this comment