Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix parsing integer batch size within export #1004

Open
wants to merge 2 commits into
base: gh/vmoens/18/base
Choose a base branch
from

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Sep 20, 2024

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Sep 20, 2024
ghstack-source-id: 73e7dd429770e1c383b3b2a1c28dbbf661d65f07
Pull Request resolved: #1004
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 20, 2024
Copy link

github-actions bot commented Sep 20, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}18$. Worsened: $\large\color{#d91a1a}17$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 38.5420μs 19.7824μs 50.5499 KOps/s 49.6138 KOps/s $\color{#35bf28}+1.89\%$
test_plain_set_stack_nested 68.7990μs 19.4874μs 51.3153 KOps/s 49.0805 KOps/s $\color{#35bf28}+4.55\%$
test_plain_set_nested_inplace 85.4400μs 21.0734μs 47.4532 KOps/s 45.0632 KOps/s $\textbf{\color{#35bf28}+5.30\%}$
test_plain_set_stack_nested_inplace 74.0590μs 21.1939μs 47.1834 KOps/s 45.4684 KOps/s $\color{#35bf28}+3.77\%$
test_items 26.1410μs 4.1291μs 242.1842 KOps/s 237.3341 KOps/s $\color{#35bf28}+2.04\%$
test_items_nested 0.5688ms 0.3730ms 2.6812 KOps/s 2.7413 KOps/s $\color{#d91a1a}-2.19\%$
test_items_nested_locked 0.5030ms 0.3726ms 2.6835 KOps/s 2.7477 KOps/s $\color{#d91a1a}-2.34\%$
test_items_nested_leaf 0.1286ms 68.6555μs 14.5655 KOps/s 14.3974 KOps/s $\color{#35bf28}+1.17\%$
test_items_stack_nested 0.5022ms 0.3745ms 2.6704 KOps/s 2.7293 KOps/s $\color{#d91a1a}-2.16\%$
test_items_stack_nested_leaf 0.1286ms 70.9499μs 14.0944 KOps/s 14.1513 KOps/s $\color{#d91a1a}-0.40\%$
test_items_stack_nested_locked 0.5940ms 0.3759ms 2.6605 KOps/s 2.7490 KOps/s $\color{#d91a1a}-3.22\%$
test_keys 31.0890μs 3.5671μs 280.3401 KOps/s 281.3400 KOps/s $\color{#d91a1a}-0.36\%$
test_keys_nested 0.1796ms 99.9570μs 10.0043 KOps/s 9.9581 KOps/s $\color{#35bf28}+0.46\%$
test_keys_nested_locked 1.8124ms 0.1049ms 9.5329 KOps/s 9.5227 KOps/s $\color{#35bf28}+0.11\%$
test_keys_nested_leaf 0.1537ms 84.6151μs 11.8182 KOps/s 11.9151 KOps/s $\color{#d91a1a}-0.81\%$
test_keys_stack_nested 0.1642ms 0.1006ms 9.9375 KOps/s 9.9904 KOps/s $\color{#d91a1a}-0.53\%$
test_keys_stack_nested_leaf 0.1472ms 82.8254μs 12.0736 KOps/s 11.9905 KOps/s $\color{#35bf28}+0.69\%$
test_keys_stack_nested_locked 0.1761ms 0.1053ms 9.4929 KOps/s 9.6317 KOps/s $\color{#d91a1a}-1.44\%$
test_values 10.1710μs 1.0480μs 954.2378 KOps/s 962.7476 KOps/s $\color{#d91a1a}-0.88\%$
test_values_nested 0.1317ms 75.0840μs 13.3184 KOps/s 13.6575 KOps/s $\color{#d91a1a}-2.48\%$
test_values_nested_locked 0.1391ms 74.0021μs 13.5131 KOps/s 13.6190 KOps/s $\color{#d91a1a}-0.78\%$
test_values_nested_leaf 0.1125ms 61.2305μs 16.3317 KOps/s 16.4462 KOps/s $\color{#d91a1a}-0.70\%$
test_values_stack_nested 0.1499ms 75.7689μs 13.1980 KOps/s 13.5130 KOps/s $\color{#d91a1a}-2.33\%$
test_values_stack_nested_leaf 0.1239ms 60.0065μs 16.6649 KOps/s 16.5073 KOps/s $\color{#35bf28}+0.95\%$
test_values_stack_nested_locked 0.1327ms 75.6589μs 13.2172 KOps/s 13.4496 KOps/s $\color{#d91a1a}-1.73\%$
test_membership 2.0428μs 0.7150μs 1.3986 MOps/s 1.4000 MOps/s $\color{#d91a1a}-0.10\%$
test_membership_nested 18.7350μs 2.8040μs 356.6350 KOps/s 367.3173 KOps/s $\color{#d91a1a}-2.91\%$
test_membership_nested_leaf 31.2570μs 2.8205μs 354.5507 KOps/s 367.9096 KOps/s $\color{#d91a1a}-3.63\%$
test_membership_stacked_nested 27.0910μs 2.8082μs 356.0976 KOps/s 369.3306 KOps/s $\color{#d91a1a}-3.58\%$
test_membership_stacked_nested_leaf 40.0350μs 2.8203μs 354.5684 KOps/s 371.5639 KOps/s $\color{#d91a1a}-4.57\%$
test_membership_nested_last 21.5700μs 4.1540μs 240.7310 KOps/s 254.7662 KOps/s $\textbf{\color{#d91a1a}-5.51\%}$
test_membership_nested_leaf_last 27.0500μs 4.0975μs 244.0494 KOps/s 255.7240 KOps/s $\color{#d91a1a}-4.57\%$
test_membership_stacked_nested_last 34.8150μs 10.7535μs 92.9930 KOps/s 253.9894 KOps/s $\textbf{\color{#d91a1a}-63.39\%}$
test_membership_stacked_nested_leaf_last 57.1480μs 10.5687μs 94.6186 KOps/s 255.1744 KOps/s $\textbf{\color{#d91a1a}-62.92\%}$
test_nested_getleaf 55.0130μs 10.7740μs 92.8163 KOps/s 95.2586 KOps/s $\color{#d91a1a}-2.56\%$
test_nested_get 35.6270μs 10.1341μs 98.6767 KOps/s 97.8167 KOps/s $\color{#35bf28}+0.88\%$
test_stacked_getleaf 51.9970μs 10.8244μs 92.3840 KOps/s 92.2174 KOps/s $\color{#35bf28}+0.18\%$
test_stacked_get 53.2500μs 10.2194μs 97.8528 KOps/s 96.0961 KOps/s $\color{#35bf28}+1.83\%$
test_nested_getitemleaf 52.5580μs 11.1762μs 89.4759 KOps/s 88.5681 KOps/s $\color{#35bf28}+1.02\%$
test_nested_getitem 54.4420μs 10.3866μs 96.2782 KOps/s 91.9748 KOps/s $\color{#35bf28}+4.68\%$
test_stacked_getitemleaf 50.5840μs 11.2344μs 89.0120 KOps/s 89.3670 KOps/s $\color{#d91a1a}-0.40\%$
test_stacked_getitem 34.1540μs 10.3995μs 96.1580 KOps/s 96.8501 KOps/s $\color{#d91a1a}-0.71\%$
test_lock_nested 83.1001ms 0.5714ms 1.7501 KOps/s 2.0184 KOps/s $\textbf{\color{#d91a1a}-13.30\%}$
test_lock_stack_nested 0.5482ms 0.4423ms 2.2611 KOps/s 2.1270 KOps/s $\textbf{\color{#35bf28}+6.30\%}$
test_unlock_nested 83.8251ms 0.4898ms 2.0416 KOps/s 2.3680 KOps/s $\textbf{\color{#d91a1a}-13.78\%}$
test_unlock_stack_nested 0.5848ms 0.3579ms 2.7941 KOps/s 2.5649 KOps/s $\textbf{\color{#35bf28}+8.93\%}$
test_flatten_speed 0.3175ms 90.1992μs 11.0866 KOps/s 11.3896 KOps/s $\color{#d91a1a}-2.66\%$
test_unflatten_speed 0.6707ms 0.4686ms 2.1339 KOps/s 2.1408 KOps/s $\color{#d91a1a}-0.32\%$
test_common_ops 6.3078ms 1.0765ms 928.9386 Ops/s 886.7283 Ops/s $\color{#35bf28}+4.76\%$
test_creation 23.1840μs 2.0909μs 478.2715 KOps/s 478.8113 KOps/s $\color{#d91a1a}-0.11\%$
test_creation_empty 46.3570μs 15.2282μs 65.6675 KOps/s 56.7594 KOps/s $\textbf{\color{#35bf28}+15.69\%}$
test_creation_nested_1 0.2100ms 18.4522μs 54.1942 KOps/s 47.7274 KOps/s $\textbf{\color{#35bf28}+13.55\%}$
test_creation_nested_2 0.1620ms 22.9222μs 43.6258 KOps/s 39.5244 KOps/s $\textbf{\color{#35bf28}+10.38\%}$
test_clone 65.8640μs 17.0727μs 58.5732 KOps/s 57.3704 KOps/s $\color{#35bf28}+2.10\%$
test_getitem[int] 1.0355ms 16.5308μs 60.4932 KOps/s 57.7392 KOps/s $\color{#35bf28}+4.77\%$
test_getitem[slice_int] 0.1383ms 31.5632μs 31.6825 KOps/s 31.2822 KOps/s $\color{#35bf28}+1.28\%$
test_getitem[range] 0.2126ms 57.4919μs 17.3938 KOps/s 16.6669 KOps/s $\color{#35bf28}+4.36\%$
test_getitem[tuple] 0.1407ms 25.5269μs 39.1744 KOps/s 38.0632 KOps/s $\color{#35bf28}+2.92\%$
test_getitem[list] 0.1786ms 53.0200μs 18.8608 KOps/s 18.0920 KOps/s $\color{#35bf28}+4.25\%$
test_setitem_dim[int] 74.6900μs 32.8488μs 30.4425 KOps/s 29.2600 KOps/s $\color{#35bf28}+4.04\%$
test_setitem_dim[slice_int] 0.1146ms 62.2202μs 16.0719 KOps/s 15.7617 KOps/s $\color{#35bf28}+1.97\%$
test_setitem_dim[range] 0.1343ms 83.2617μs 12.0103 KOps/s 11.4831 KOps/s $\color{#35bf28}+4.59\%$
test_setitem_dim[tuple] 79.0590μs 49.6216μs 20.1525 KOps/s 20.1118 KOps/s $\color{#35bf28}+0.20\%$
test_setitem 0.1204ms 28.2045μs 35.4553 KOps/s 34.2134 KOps/s $\color{#35bf28}+3.63\%$
test_set 0.1083ms 27.4115μs 36.4810 KOps/s 34.9232 KOps/s $\color{#35bf28}+4.46\%$
test_set_shared 1.3110ms 0.2118ms 4.7212 KOps/s 4.6673 KOps/s $\color{#35bf28}+1.15\%$
test_update 0.1428ms 33.4392μs 29.9051 KOps/s 28.7465 KOps/s $\color{#35bf28}+4.03\%$
test_update_nested 0.1310ms 44.0220μs 22.7159 KOps/s 22.1692 KOps/s $\color{#35bf28}+2.47\%$
test_update__nested 0.1067ms 35.3321μs 28.3029 KOps/s 28.5885 KOps/s $\color{#d91a1a}-1.00\%$
test_set_nested 0.1021ms 30.0506μs 33.2772 KOps/s 31.4557 KOps/s $\textbf{\color{#35bf28}+5.79\%}$
test_set_nested_new 0.1009ms 35.5562μs 28.1245 KOps/s 27.5339 KOps/s $\color{#35bf28}+2.14\%$
test_select 0.1292ms 53.3357μs 18.7492 KOps/s 17.7269 KOps/s $\textbf{\color{#35bf28}+5.77\%}$
test_select_nested 0.1457ms 60.4124μs 16.5529 KOps/s 15.5329 KOps/s $\textbf{\color{#35bf28}+6.57\%}$
test_exclude_nested 0.1670ms 77.0049μs 12.9862 KOps/s 12.7989 KOps/s $\color{#35bf28}+1.46\%$
test_empty[True] 0.5134ms 0.3190ms 3.1352 KOps/s 3.1098 KOps/s $\color{#35bf28}+0.82\%$
test_empty[False] 7.7320μs 1.2727μs 785.7050 KOps/s 827.5798 KOps/s $\textbf{\color{#d91a1a}-5.06\%}$
test_unbind_speed 0.5270ms 0.3015ms 3.3164 KOps/s 3.2257 KOps/s $\color{#35bf28}+2.81\%$
test_unbind_speed_stack0 0.4310ms 0.2879ms 3.4731 KOps/s 3.3317 KOps/s $\color{#35bf28}+4.25\%$
test_unbind_speed_stack1 97.2526ms 0.7900ms 1.2658 KOps/s 1.3295 KOps/s $\color{#d91a1a}-4.79\%$
test_split 89.0874ms 2.1889ms 456.8563 Ops/s 449.6076 Ops/s $\color{#35bf28}+1.61\%$
test_chunk 3.1147ms 2.0317ms 492.2060 Ops/s 449.6004 Ops/s $\textbf{\color{#35bf28}+9.48\%}$
test_creation[device0] 0.2828ms 0.1187ms 8.4253 KOps/s 8.6325 KOps/s $\color{#d91a1a}-2.40\%$
test_creation_from_tensor 3.6866ms 0.1192ms 8.3874 KOps/s 8.5766 KOps/s $\color{#d91a1a}-2.21\%$
test_add_one[memmap_tensor0] 0.1938ms 7.5083μs 133.1868 KOps/s 137.8180 KOps/s $\color{#d91a1a}-3.36\%$
test_contiguous[memmap_tensor0] 16.8710μs 1.9104μs 523.4640 KOps/s 537.8001 KOps/s $\color{#d91a1a}-2.67\%$
test_stack[memmap_tensor0] 37.3700μs 5.6493μs 177.0139 KOps/s 177.5228 KOps/s $\color{#d91a1a}-0.29\%$
test_memmaptd_index 1.1470ms 0.4072ms 2.4559 KOps/s 2.5098 KOps/s $\color{#d91a1a}-2.15\%$
test_memmaptd_index_astensor 1.2238ms 0.4890ms 2.0449 KOps/s 2.0958 KOps/s $\color{#d91a1a}-2.43\%$
test_memmaptd_index_op 1.4619ms 0.9758ms 1.0248 KOps/s 985.0547 Ops/s $\color{#35bf28}+4.03\%$
test_serialize_model 0.1270s 0.1214s 8.2362 Ops/s 8.3265 Ops/s $\color{#d91a1a}-1.08\%$
test_serialize_model_pickle 0.4603s 0.3927s 2.5467 Ops/s 2.5104 Ops/s $\color{#35bf28}+1.45\%$
test_serialize_weights 0.1263s 0.1141s 8.7634 Ops/s 7.5141 Ops/s $\textbf{\color{#35bf28}+16.63\%}$
test_serialize_weights_returnearly 0.1761s 0.1608s 6.2180 Ops/s 6.2331 Ops/s $\color{#d91a1a}-0.24\%$
test_serialize_weights_pickle 0.6051s 0.4463s 2.2408 Ops/s 2.5372 Ops/s $\textbf{\color{#d91a1a}-11.68\%}$
test_serialize_weights_filesystem 0.1490s 0.1429s 6.9974 Ops/s 6.9807 Ops/s $\color{#35bf28}+0.24\%$
test_serialize_model_filesystem 0.1620s 0.1495s 6.6902 Ops/s 6.1022 Ops/s $\textbf{\color{#35bf28}+9.64\%}$
test_reshape_pytree 0.1273ms 38.6397μs 25.8801 KOps/s 25.4779 KOps/s $\color{#35bf28}+1.58\%$
test_reshape_td 0.1123ms 45.9621μs 21.7571 KOps/s 20.8746 KOps/s $\color{#35bf28}+4.23\%$
test_view_pytree 80.6310μs 38.8247μs 25.7568 KOps/s 25.5150 KOps/s $\color{#35bf28}+0.95\%$
test_view_td 0.1401ms 52.9638μs 18.8808 KOps/s 18.7321 KOps/s $\color{#35bf28}+0.79\%$
test_unbind_pytree 97.0720μs 36.2219μs 27.6076 KOps/s 28.0252 KOps/s $\color{#d91a1a}-1.49\%$
test_unbind_td 0.3619ms 45.1836μs 22.1319 KOps/s 22.0007 KOps/s $\color{#35bf28}+0.60\%$
test_split_pytree 96.9110μs 38.9820μs 25.6529 KOps/s 26.7646 KOps/s $\color{#d91a1a}-4.15\%$
test_split_td 0.4471ms 57.7893μs 17.3042 KOps/s 16.5645 KOps/s $\color{#35bf28}+4.47\%$
test_add_pytree 0.1267ms 45.9991μs 21.7395 KOps/s 22.0554 KOps/s $\color{#d91a1a}-1.43\%$
test_add_td 0.2135ms 77.4837μs 12.9059 KOps/s 12.2225 KOps/s $\textbf{\color{#35bf28}+5.59\%}$
test_compile_add_one_nested[tensordict-compile] 0.1088ms 58.7833μs 17.0116 KOps/s 16.8283 KOps/s $\color{#35bf28}+1.09\%$
test_compile_add_one_nested[tensordict-eager] 0.2975ms 0.1801ms 5.5514 KOps/s 5.6220 KOps/s $\color{#d91a1a}-1.26\%$
test_compile_add_one_nested[pytree-compile] 0.1286ms 58.2946μs 17.1543 KOps/s 16.9319 KOps/s $\color{#35bf28}+1.31\%$
test_compile_add_one_nested[pytree-eager] 0.2826ms 0.1442ms 6.9339 KOps/s 7.0179 KOps/s $\color{#d91a1a}-1.20\%$
test_compile_copy_nested[tensordict-compile] 56.6960μs 22.2979μs 44.8473 KOps/s 44.7865 KOps/s $\color{#35bf28}+0.14\%$
test_compile_copy_nested[tensordict-eager] 0.1352ms 67.3926μs 14.8384 KOps/s 14.5036 KOps/s $\color{#35bf28}+2.31\%$
test_compile_copy_nested[pytree-compile] 0.1474ms 75.2525μs 13.2886 KOps/s 13.1963 KOps/s $\color{#35bf28}+0.70\%$
test_compile_copy_nested[pytree-eager] 0.1409ms 67.9936μs 14.7073 KOps/s 14.5719 KOps/s $\color{#35bf28}+0.93\%$
test_compile_add_one_flat[tensordict-compile] 0.2948ms 0.1751ms 5.7104 KOps/s 5.7402 KOps/s $\color{#d91a1a}-0.52\%$
test_compile_add_one_flat[tensordict-eager] 0.3586ms 0.1904ms 5.2521 KOps/s 5.2515 KOps/s $\color{#35bf28}+0.01\%$
test_compile_add_one_flat[tensorclass-compile] 0.1098ms 47.5585μs 21.0268 KOps/s 20.4597 KOps/s $\color{#35bf28}+2.77\%$
test_compile_add_one_flat[tensorclass-eager] 0.1409ms 69.3090μs 14.4281 KOps/s 13.9754 KOps/s $\color{#35bf28}+3.24\%$
test_compile_add_one_flat[pytree-compile] 0.2856ms 0.1754ms 5.7027 KOps/s 5.7140 KOps/s $\color{#d91a1a}-0.20\%$
test_compile_add_one_flat[pytree-eager] 0.4494ms 0.2925ms 3.4186 KOps/s 3.5461 KOps/s $\color{#d91a1a}-3.60\%$
test_compile_add_self_flat[tensordict-eager] 0.3641ms 0.2025ms 4.9373 KOps/s 4.8849 KOps/s $\color{#35bf28}+1.07\%$
test_compile_add_self_flat[tensordict-compile] 0.2975ms 0.1742ms 5.7409 KOps/s 5.6865 KOps/s $\color{#35bf28}+0.96\%$
test_compile_add_self_flat[tensorclass-eager] 0.1367ms 61.6051μs 16.2324 KOps/s 15.9178 KOps/s $\color{#35bf28}+1.98\%$
test_compile_add_self_flat[tensorclass-compile] 0.3221ms 47.1525μs 21.2078 KOps/s 20.8523 KOps/s $\color{#35bf28}+1.70\%$
test_compile_add_self_flat[pytree-eager] 0.5096ms 0.2311ms 4.3278 KOps/s 4.3636 KOps/s $\color{#d91a1a}-0.82\%$
test_compile_add_self_flat[pytree-compile] 0.3147ms 0.1784ms 5.6043 KOps/s 5.5455 KOps/s $\color{#35bf28}+1.06\%$
test_compile_copy_flat[tensordict-compile] 0.1935ms 0.1025ms 9.7553 KOps/s 9.6987 KOps/s $\color{#35bf28}+0.58\%$
test_compile_copy_flat[tensordict-eager] 0.1326ms 57.6592μs 17.3433 KOps/s 17.6862 KOps/s $\color{#d91a1a}-1.94\%$
test_compile_copy_flat[pytree-compile] 0.1487ms 75.3510μs 13.2712 KOps/s 12.8677 KOps/s $\color{#35bf28}+3.14\%$
test_compile_copy_flat[pytree-eager] 0.1607ms 67.2171μs 14.8772 KOps/s 14.5479 KOps/s $\color{#35bf28}+2.26\%$
test_compile_assign_and_add[tensordict-compile] 0.2981ms 0.1973ms 5.0693 KOps/s 5.0475 KOps/s $\color{#35bf28}+0.43\%$
test_compile_assign_and_add[tensordict-eager] 1.8216ms 1.6661ms 600.2194 Ops/s 613.7995 Ops/s $\color{#d91a1a}-2.21\%$
test_compile_assign_and_add[pytree-compile] 0.2961ms 0.1931ms 5.1784 KOps/s 5.2348 KOps/s $\color{#d91a1a}-1.08\%$
test_compile_assign_and_add[pytree-eager] 1.3466ms 1.1084ms 902.2325 Ops/s 938.4428 Ops/s $\color{#d91a1a}-3.86\%$
test_compile_assign_and_add_stack[compile] 0.6988ms 0.4197ms 2.3829 KOps/s 2.4213 KOps/s $\color{#d91a1a}-1.59\%$
test_compile_assign_and_add_stack[eager] 5.1835ms 3.6639ms 272.9300 Ops/s 271.9522 Ops/s $\color{#35bf28}+0.36\%$
test_compile_indexing[tensor-tensordict-compile] 0.1127ms 34.1144μs 29.3132 KOps/s 27.8742 KOps/s $\textbf{\color{#35bf28}+5.16\%}$
test_compile_indexing[tensor-tensordict-eager] 0.5745ms 49.3461μs 20.2650 KOps/s 20.5062 KOps/s $\color{#d91a1a}-1.18\%$
test_compile_indexing[tensor-tensorclass-compile] 93.2440μs 30.3686μs 32.9287 KOps/s 32.5524 KOps/s $\color{#35bf28}+1.16\%$
test_compile_indexing[tensor-tensorclass-eager] 91.3410μs 29.4037μs 34.0093 KOps/s 35.2423 KOps/s $\color{#d91a1a}-3.50\%$
test_compile_indexing[tensor-pytree-compile] 0.1092ms 29.9655μs 33.3717 KOps/s 33.1807 KOps/s $\color{#35bf28}+0.58\%$
test_compile_indexing[tensor-pytree-eager] 82.0840μs 28.8389μs 34.6753 KOps/s 35.0715 KOps/s $\color{#d91a1a}-1.13\%$
test_compile_indexing[slice-tensordict-compile] 0.1388ms 73.7836μs 13.5531 KOps/s 13.1991 KOps/s $\color{#35bf28}+2.68\%$
test_compile_indexing[slice-tensordict-eager] 0.5843ms 28.4332μs 35.1702 KOps/s 35.1585 KOps/s $\color{#35bf28}+0.03\%$
test_compile_indexing[slice-tensorclass-compile] 0.1397ms 69.4012μs 14.4090 KOps/s 14.6128 KOps/s $\color{#d91a1a}-1.40\%$
test_compile_indexing[slice-tensorclass-eager] 88.6260μs 23.4339μs 42.6732 KOps/s 43.5208 KOps/s $\color{#d91a1a}-1.95\%$
test_compile_indexing[slice-pytree-compile] 0.1512ms 68.0684μs 14.6911 KOps/s 14.4374 KOps/s $\color{#35bf28}+1.76\%$
test_compile_indexing[slice-pytree-eager] 95.3490μs 23.3439μs 42.8378 KOps/s 43.8131 KOps/s $\color{#d91a1a}-2.23\%$
test_compile_indexing[int-tensordict-compile] 0.1522ms 72.4697μs 13.7989 KOps/s 13.4824 KOps/s $\color{#35bf28}+2.35\%$
test_compile_indexing[int-tensordict-eager] 0.8124ms 27.9231μs 35.8126 KOps/s 35.7625 KOps/s $\color{#35bf28}+0.14\%$
test_compile_indexing[int-tensorclass-compile] 0.1211ms 67.8713μs 14.7338 KOps/s 14.7870 KOps/s $\color{#d91a1a}-0.36\%$
test_compile_indexing[int-tensorclass-eager] 0.1830ms 23.5389μs 42.4829 KOps/s 43.7517 KOps/s $\color{#d91a1a}-2.90\%$
test_compile_indexing[int-pytree-compile] 0.1357ms 67.7184μs 14.7670 KOps/s 14.6891 KOps/s $\color{#35bf28}+0.53\%$
test_compile_indexing[int-pytree-eager] 89.0070μs 23.0255μs 43.4302 KOps/s 43.3968 KOps/s $\color{#35bf28}+0.08\%$
test_mod_add[eager] 88.2950μs 23.3155μs 42.8900 KOps/s 39.5736 KOps/s $\textbf{\color{#35bf28}+8.38\%}$
test_mod_add[compile] 93.6960μs 39.1258μs 25.5586 KOps/s 25.3350 KOps/s $\color{#35bf28}+0.88\%$
test_mod_add[compile-overhead] 0.1005ms 39.0315μs 25.6203 KOps/s 25.3702 KOps/s $\color{#35bf28}+0.99\%$
test_mod_wrap[eager] 0.3978ms 0.2104ms 4.7531 KOps/s 4.7347 KOps/s $\color{#35bf28}+0.39\%$
test_mod_wrap[compile] 0.3439ms 0.2381ms 4.1998 KOps/s 4.3303 KOps/s $\color{#d91a1a}-3.01\%$
test_mod_wrap[compile-overhead] 0.3783ms 0.2351ms 4.2529 KOps/s 4.4118 KOps/s $\color{#d91a1a}-3.60\%$
test_mod_wrap_and_backward[eager] 15.0106ms 11.8807ms 84.1699 Ops/s 89.7438 Ops/s $\textbf{\color{#d91a1a}-6.21\%}$
test_mod_wrap_and_backward[compile] 19.6489ms 12.7381ms 78.5044 Ops/s 90.2389 Ops/s $\textbf{\color{#d91a1a}-13.00\%}$
test_mod_wrap_and_backward[compile-overhead] 18.9460ms 12.6595ms 78.9918 Ops/s 90.9444 Ops/s $\textbf{\color{#d91a1a}-13.14\%}$
test_seq_add[eager] 0.1801ms 88.1814μs 11.3403 KOps/s 10.7672 KOps/s $\textbf{\color{#35bf28}+5.32\%}$
test_seq_add[compile] 0.1553ms 66.2891μs 15.0854 KOps/s 15.2111 KOps/s $\color{#d91a1a}-0.83\%$
test_seq_add[compile-overhead] 0.1329ms 65.7982μs 15.1980 KOps/s 15.3163 KOps/s $\color{#d91a1a}-0.77\%$
test_seq_wrap[eager] 0.6736ms 0.3771ms 2.6515 KOps/s 2.6006 KOps/s $\color{#35bf28}+1.96\%$
test_seq_wrap[compile] 1.2529ms 0.2717ms 3.6808 KOps/s 3.5045 KOps/s $\textbf{\color{#35bf28}+5.03\%}$
test_seq_wrap[compile-overhead] 1.2391ms 0.2726ms 3.6684 KOps/s 3.7482 KOps/s $\color{#d91a1a}-2.13\%$
test_func_call_runtime[False-eager] 0.7123ms 0.5377ms 1.8598 KOps/s 1.9127 KOps/s $\color{#d91a1a}-2.76\%$
test_func_call_runtime[False-compile] 0.6971ms 0.5043ms 1.9829 KOps/s 2.0120 KOps/s $\color{#d91a1a}-1.45\%$
test_func_call_runtime[False-compile-overhead] 0.6250ms 0.5055ms 1.9784 KOps/s 2.0410 KOps/s $\color{#d91a1a}-3.07\%$
test_func_call_runtime[True-eager] 1.0700ms 0.7549ms 1.3247 KOps/s 1.3410 KOps/s $\color{#d91a1a}-1.21\%$
test_func_call_runtime[True-compile] 0.8856ms 0.5204ms 1.9215 KOps/s 1.9800 KOps/s $\color{#d91a1a}-2.96\%$
test_func_call_runtime[True-compile-overhead] 0.6391ms 0.5169ms 1.9347 KOps/s 1.9722 KOps/s $\color{#d91a1a}-1.90\%$
test_func_call_cm_runtime[False-eager] 0.8098ms 0.5335ms 1.8745 KOps/s 1.8728 KOps/s $\color{#35bf28}+0.09\%$
test_func_call_cm_runtime[False-compile] 0.7087ms 0.5056ms 1.9780 KOps/s 2.0164 KOps/s $\color{#d91a1a}-1.90\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6721ms 0.5071ms 1.9722 KOps/s 2.0351 KOps/s $\color{#d91a1a}-3.09\%$
test_func_call_cm_runtime[True-eager] 1.4571ms 0.9034ms 1.1069 KOps/s 1.1395 KOps/s $\color{#d91a1a}-2.86\%$
test_func_call_cm_runtime[True-compile] 0.9247ms 0.7586ms 1.3183 KOps/s 1.3709 KOps/s $\color{#d91a1a}-3.84\%$
test_func_call_cm_runtime[True-compile-overhead] 1.2547ms 0.7603ms 1.3153 KOps/s 1.3582 KOps/s $\color{#d91a1a}-3.16\%$
test_vmap_func_call_cm_runtime[eager] 2.6181ms 1.9168ms 521.7104 Ops/s 542.2596 Ops/s $\color{#d91a1a}-3.79\%$
test_vmap_func_call_cm_runtime[compile] 2.9771ms 1.9675ms 508.2506 Ops/s 525.3640 Ops/s $\color{#d91a1a}-3.26\%$
test_vmap_func_call_cm_runtime[compile-overhead] 3.1061ms 1.9930ms 501.7450 Ops/s 529.3039 Ops/s $\textbf{\color{#d91a1a}-5.21\%}$
test_distributed 0.3023ms 0.1242ms 8.0537 KOps/s 7.7870 KOps/s $\color{#35bf28}+3.43\%$
test_tdmodule 29.4040μs 17.0262μs 58.7332 KOps/s 58.9595 KOps/s $\color{#d91a1a}-0.38\%$
test_tdmodule_dispatch 68.0480μs 33.6856μs 29.6863 KOps/s 27.7089 KOps/s $\textbf{\color{#35bf28}+7.14\%}$
test_tdseq 38.8720μs 19.6929μs 50.7798 KOps/s 49.7676 KOps/s $\color{#35bf28}+2.03\%$
test_tdseq_dispatch 71.7240μs 40.0761μs 24.9525 KOps/s 24.7712 KOps/s $\color{#35bf28}+0.73\%$
test_instantiation_functorch 1.7291ms 1.5919ms 628.1860 Ops/s 637.6976 Ops/s $\color{#d91a1a}-1.49\%$
test_instantiation_td 2.1782ms 1.2030ms 831.2598 Ops/s 861.5557 Ops/s $\color{#d91a1a}-3.52\%$
test_exec_functorch 0.4276ms 0.1888ms 5.2955 KOps/s 5.3236 KOps/s $\color{#d91a1a}-0.53\%$
test_exec_functional_call 0.3077ms 0.1756ms 5.6963 KOps/s 5.6768 KOps/s $\color{#35bf28}+0.34\%$
test_exec_td 0.2721ms 0.1696ms 5.8974 KOps/s 5.7662 KOps/s $\color{#35bf28}+2.28\%$
test_exec_td_decorator 0.4890ms 0.2238ms 4.4683 KOps/s 4.3538 KOps/s $\color{#35bf28}+2.63\%$
test_vmap_mlp_speed[True-True] 1.1306ms 0.6615ms 1.5117 KOps/s 1.5717 KOps/s $\color{#d91a1a}-3.82\%$
test_vmap_mlp_speed[True-False] 0.9338ms 0.6569ms 1.5223 KOps/s 1.5713 KOps/s $\color{#d91a1a}-3.12\%$
test_vmap_mlp_speed[False-True] 0.7715ms 0.5167ms 1.9355 KOps/s 2.0499 KOps/s $\textbf{\color{#d91a1a}-5.58\%}$
test_vmap_mlp_speed[False-False] 0.7855ms 0.5185ms 1.9287 KOps/s 2.0441 KOps/s $\textbf{\color{#d91a1a}-5.65\%}$
test_vmap_mlp_speed_decorator[True-True] 1.3561ms 0.6392ms 1.5644 KOps/s 1.6177 KOps/s $\color{#d91a1a}-3.29\%$
test_vmap_mlp_speed_decorator[True-False] 0.9912ms 0.6422ms 1.5570 KOps/s 1.6313 KOps/s $\color{#d91a1a}-4.55\%$
test_vmap_mlp_speed_decorator[False-True] 0.9990ms 0.5348ms 1.8698 KOps/s 1.9710 KOps/s $\textbf{\color{#d91a1a}-5.14\%}$
test_vmap_mlp_speed_decorator[False-False] 0.7133ms 0.5296ms 1.8884 KOps/s 1.9805 KOps/s $\color{#d91a1a}-4.65\%$
test_to_module_speed[True] 1.6254ms 1.2831ms 779.3677 Ops/s 770.3320 Ops/s $\color{#35bf28}+1.17\%$
test_to_module_speed[False] 2.0428ms 1.2569ms 795.6280 Ops/s 794.0504 Ops/s $\color{#35bf28}+0.20\%$
test_tc_init 96.0100μs 43.1498μs 23.1751 KOps/s 23.2218 KOps/s $\color{#d91a1a}-0.20\%$
test_tc_init_nested 0.1802ms 86.3518μs 11.5805 KOps/s 11.4341 KOps/s $\color{#35bf28}+1.28\%$
test_tc_first_layer_tensor 30.9880μs 1.6386μs 610.2749 KOps/s 662.9586 KOps/s $\textbf{\color{#d91a1a}-7.95\%}$
test_tc_first_layer_nontensor 28.6430μs 4.8365μs 206.7592 KOps/s 208.0672 KOps/s $\color{#d91a1a}-0.63\%$
test_tc_second_layer_tensor 29.7160μs 2.9564μs 338.2515 KOps/s 361.6508 KOps/s $\textbf{\color{#d91a1a}-6.47\%}$
test_tc_second_layer_nontensor 29.9060μs 6.1351μs 162.9966 KOps/s 164.5984 KOps/s $\color{#d91a1a}-0.97\%$
test_unbind 0.4918s 13.6038ms 73.5090 Ops/s 76.5935 Ops/s $\color{#d91a1a}-4.03\%$
test_full_like 17.7971ms 8.1308ms 122.9894 Ops/s 132.7128 Ops/s $\textbf{\color{#d91a1a}-7.33\%}$
test_zeros_like 3.2583ms 2.8792ms 347.3206 Ops/s 354.2205 Ops/s $\color{#d91a1a}-1.95\%$
test_ones_like 3.7530ms 3.4125ms 293.0438 Ops/s 293.3948 Ops/s $\color{#d91a1a}-0.12\%$
test_clone 6.7312ms 5.1586ms 193.8520 Ops/s 198.3409 Ops/s $\color{#d91a1a}-2.26\%$
test_squeeze 65.8530μs 12.1194μs 82.5124 KOps/s 79.2930 KOps/s $\color{#35bf28}+4.06\%$
test_unsqueeze 0.3712ms 96.4651μs 10.3664 KOps/s 10.4398 KOps/s $\color{#d91a1a}-0.70\%$
test_split 0.3941ms 0.1994ms 5.0152 KOps/s 4.9451 KOps/s $\color{#35bf28}+1.42\%$
test_permute 0.4633ms 0.2284ms 4.3774 KOps/s 4.3784 KOps/s $\color{#d91a1a}-0.02\%$
test_stack 32.3185ms 25.1218ms 39.8060 Ops/s 40.8482 Ops/s $\color{#d91a1a}-2.55\%$
test_cat 28.8153ms 24.8583ms 40.2280 Ops/s 40.0390 Ops/s $\color{#35bf28}+0.47\%$

Copy link

github-actions bot commented Sep 20, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 228. Improved: $\large\color{#35bf28}22$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1398ms 14.0510μs 71.1694 KOps/s 71.6220 KOps/s $\color{#d91a1a}-0.63\%$
test_plain_set_stack_nested 40.8610μs 14.0714μs 71.0659 KOps/s 70.5660 KOps/s $\color{#35bf28}+0.71\%$
test_plain_set_nested_inplace 44.4510μs 14.9467μs 66.9044 KOps/s 66.6646 KOps/s $\color{#35bf28}+0.36\%$
test_plain_set_stack_nested_inplace 0.1871ms 14.9939μs 66.6940 KOps/s 66.6082 KOps/s $\color{#35bf28}+0.13\%$
test_items 29.3310μs 2.8550μs 350.2578 KOps/s 347.6205 KOps/s $\color{#35bf28}+0.76\%$
test_items_nested 0.3789ms 0.3287ms 3.0421 KOps/s 3.0848 KOps/s $\color{#d91a1a}-1.38\%$
test_items_nested_locked 0.3891ms 0.3311ms 3.0205 KOps/s 3.0425 KOps/s $\color{#d91a1a}-0.72\%$
test_items_nested_leaf 77.6720μs 55.6077μs 17.9831 KOps/s 17.8809 KOps/s $\color{#35bf28}+0.57\%$
test_items_stack_nested 0.3918ms 0.3327ms 3.0060 KOps/s 3.0146 KOps/s $\color{#d91a1a}-0.28\%$
test_items_stack_nested_leaf 86.0220μs 56.5979μs 17.6685 KOps/s 17.4597 KOps/s $\color{#35bf28}+1.20\%$
test_items_stack_nested_locked 0.3873ms 0.3337ms 2.9963 KOps/s 3.0367 KOps/s $\color{#d91a1a}-1.33\%$
test_keys 37.5410μs 3.4019μs 293.9538 KOps/s 274.8335 KOps/s $\textbf{\color{#35bf28}+6.96\%}$
test_keys_nested 96.9030μs 55.8942μs 17.8910 KOps/s 17.6889 KOps/s $\color{#35bf28}+1.14\%$
test_keys_nested_locked 2.5347ms 62.1416μs 16.0923 KOps/s 16.1348 KOps/s $\color{#d91a1a}-0.26\%$
test_keys_nested_leaf 74.1420μs 46.9139μs 21.3157 KOps/s 21.3000 KOps/s $\color{#35bf28}+0.07\%$
test_keys_stack_nested 84.8720μs 56.7365μs 17.6254 KOps/s 17.6302 KOps/s $\color{#d91a1a}-0.03\%$
test_keys_stack_nested_leaf 74.2420μs 46.9471μs 21.3006 KOps/s 20.5544 KOps/s $\color{#35bf28}+3.63\%$
test_keys_stack_nested_locked 0.1166ms 61.4624μs 16.2701 KOps/s 16.1109 KOps/s $\color{#35bf28}+0.99\%$
test_values 5.4752μs 0.8714μs 1.1476 MOps/s 1.1753 MOps/s $\color{#d91a1a}-2.35\%$
test_values_nested 72.4920μs 40.4113μs 24.7456 KOps/s 24.2922 KOps/s $\color{#35bf28}+1.87\%$
test_values_nested_locked 70.5510μs 42.2952μs 23.6434 KOps/s 23.3334 KOps/s $\color{#35bf28}+1.33\%$
test_values_nested_leaf 67.9020μs 34.9982μs 28.5729 KOps/s 28.0646 KOps/s $\color{#35bf28}+1.81\%$
test_values_stack_nested 78.5910μs 41.3561μs 24.1802 KOps/s 23.8290 KOps/s $\color{#35bf28}+1.47\%$
test_values_stack_nested_leaf 71.7220μs 35.8991μs 27.8559 KOps/s 27.6089 KOps/s $\color{#35bf28}+0.89\%$
test_values_stack_nested_locked 85.0520μs 43.0954μs 23.2044 KOps/s 22.8338 KOps/s $\color{#35bf28}+1.62\%$
test_membership 1.5476μs 0.5040μs 1.9842 MOps/s 1.9828 MOps/s $\color{#35bf28}+0.07\%$
test_membership_nested 19.1605μs 1.9089μs 523.8555 KOps/s 530.4487 KOps/s $\color{#d91a1a}-1.24\%$
test_membership_nested_leaf 13.4055μs 1.8915μs 528.6671 KOps/s 531.3800 KOps/s $\color{#d91a1a}-0.51\%$
test_membership_stacked_nested 29.7810μs 1.9695μs 507.7383 KOps/s 522.3695 KOps/s $\color{#d91a1a}-2.80\%$
test_membership_stacked_nested_leaf 32.6010μs 1.9825μs 504.4088 KOps/s 516.6656 KOps/s $\color{#d91a1a}-2.37\%$
test_membership_nested_last 38.4010μs 2.8505μs 350.8172 KOps/s 351.9531 KOps/s $\color{#d91a1a}-0.32\%$
test_membership_nested_leaf_last 26.1300μs 2.8229μs 354.2396 KOps/s 355.6954 KOps/s $\color{#d91a1a}-0.41\%$
test_membership_stacked_nested_last 29.0310μs 3.1879μs 313.6906 KOps/s 234.5753 KOps/s $\textbf{\color{#35bf28}+33.73\%}$
test_membership_stacked_nested_leaf_last 29.9410μs 3.2153μs 311.0106 KOps/s 237.3361 KOps/s $\textbf{\color{#35bf28}+31.04\%}$
test_nested_getleaf 35.0010μs 6.1846μs 161.6922 KOps/s 161.9744 KOps/s $\color{#d91a1a}-0.17\%$
test_nested_get 27.3600μs 5.7342μs 174.3936 KOps/s 172.6007 KOps/s $\color{#35bf28}+1.04\%$
test_stacked_getleaf 35.0400μs 6.0353μs 165.6916 KOps/s 164.5749 KOps/s $\color{#35bf28}+0.68\%$
test_stacked_get 33.0910μs 5.6195μs 177.9531 KOps/s 174.2211 KOps/s $\color{#35bf28}+2.14\%$
test_nested_getitemleaf 33.8610μs 6.1483μs 162.6457 KOps/s 161.1655 KOps/s $\color{#35bf28}+0.92\%$
test_nested_getitem 33.0800μs 5.7548μs 173.7666 KOps/s 172.0980 KOps/s $\color{#35bf28}+0.97\%$
test_stacked_getitemleaf 37.6710μs 6.0543μs 165.1723 KOps/s 163.3039 KOps/s $\color{#35bf28}+1.14\%$
test_stacked_getitem 33.9910μs 5.7794μs 173.0291 KOps/s 173.8919 KOps/s $\color{#d91a1a}-0.50\%$
test_lock_nested 5.0900ms 0.4207ms 2.3771 KOps/s 2.3530 KOps/s $\color{#35bf28}+1.02\%$
test_lock_stack_nested 0.4354ms 0.3843ms 2.6023 KOps/s 2.6160 KOps/s $\color{#d91a1a}-0.52\%$
test_unlock_nested 0.7607ms 0.3583ms 2.7913 KOps/s 2.7656 KOps/s $\color{#35bf28}+0.93\%$
test_unlock_stack_nested 0.3725ms 0.3240ms 3.0863 KOps/s 3.1061 KOps/s $\color{#d91a1a}-0.64\%$
test_flatten_speed 0.1495ms 69.6921μs 14.3488 KOps/s 14.2443 KOps/s $\color{#35bf28}+0.73\%$
test_unflatten_speed 0.3385ms 0.2808ms 3.5614 KOps/s 3.4071 KOps/s $\color{#35bf28}+4.53\%$
test_common_ops 1.5521ms 1.2773ms 782.9121 Ops/s 731.5360 Ops/s $\textbf{\color{#35bf28}+7.02\%}$
test_creation 33.5810μs 1.4832μs 674.1959 KOps/s 667.4252 KOps/s $\color{#35bf28}+1.01\%$
test_creation_empty 45.7410μs 15.5172μs 64.4445 KOps/s 65.0973 KOps/s $\color{#d91a1a}-1.00\%$
test_creation_nested_1 46.3510μs 17.3372μs 57.6794 KOps/s 57.3238 KOps/s $\color{#35bf28}+0.62\%$
test_creation_nested_2 65.6110μs 19.8274μs 50.4352 KOps/s 49.8204 KOps/s $\color{#35bf28}+1.23\%$
test_clone 59.8920μs 29.6163μs 33.7652 KOps/s 34.1279 KOps/s $\color{#d91a1a}-1.06\%$
test_getitem[int] 1.3547ms 16.2531μs 61.5269 KOps/s 56.8697 KOps/s $\textbf{\color{#35bf28}+8.19\%}$
test_getitem[slice_int] 0.1198ms 27.6207μs 36.2047 KOps/s 32.4668 KOps/s $\textbf{\color{#35bf28}+11.51\%}$
test_getitem[range] 0.2343ms 0.1131ms 8.8418 KOps/s 8.8031 KOps/s $\color{#35bf28}+0.44\%$
test_getitem[tuple] 0.1205ms 23.6230μs 42.3316 KOps/s 40.7698 KOps/s $\color{#35bf28}+3.83\%$
test_getitem[list] 0.2026ms 0.1022ms 9.7876 KOps/s 9.2848 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_setitem_dim[int] 70.6020μs 46.3912μs 21.5558 KOps/s 19.4015 KOps/s $\textbf{\color{#35bf28}+11.10\%}$
test_setitem_dim[slice_int] 97.1420μs 69.5090μs 14.3866 KOps/s 14.2352 KOps/s $\color{#35bf28}+1.06\%$
test_setitem_dim[range] 0.1595ms 0.1307ms 7.6490 KOps/s 7.5977 KOps/s $\color{#35bf28}+0.68\%$
test_setitem_dim[tuple] 0.1034ms 63.2289μs 15.8156 KOps/s 15.7832 KOps/s $\color{#35bf28}+0.21\%$
test_setitem 84.7130μs 42.5314μs 23.5120 KOps/s 23.7585 KOps/s $\color{#d91a1a}-1.04\%$
test_set 0.1153ms 41.6547μs 24.0069 KOps/s 24.1831 KOps/s $\color{#d91a1a}-0.73\%$
test_set_shared 0.3733ms 52.0617μs 19.2080 KOps/s 19.3457 KOps/s $\color{#d91a1a}-0.71\%$
test_update 0.3018ms 50.0804μs 19.9679 KOps/s 19.8293 KOps/s $\color{#35bf28}+0.70\%$
test_update_nested 0.1190ms 57.2637μs 17.4631 KOps/s 17.5418 KOps/s $\color{#d91a1a}-0.45\%$
test_update__nested 0.1036ms 60.4565μs 16.5408 KOps/s 16.6118 KOps/s $\color{#d91a1a}-0.43\%$
test_set_nested 0.1019ms 44.0056μs 22.7244 KOps/s 22.6947 KOps/s $\color{#35bf28}+0.13\%$
test_set_nested_new 0.1119ms 47.5709μs 21.0212 KOps/s 21.3297 KOps/s $\color{#d91a1a}-1.45\%$
test_select 0.1103ms 61.1058μs 16.3651 KOps/s 16.2440 KOps/s $\color{#35bf28}+0.75\%$
test_select_nested 82.4820μs 42.0248μs 23.7955 KOps/s 23.5172 KOps/s $\color{#35bf28}+1.18\%$
test_exclude_nested 0.1022ms 58.8460μs 16.9935 KOps/s 16.8691 KOps/s $\color{#35bf28}+0.74\%$
test_empty[True] 0.2960ms 0.2412ms 4.1465 KOps/s 4.0987 KOps/s $\color{#35bf28}+1.17\%$
test_empty[False] 4.1951μs 0.7357μs 1.3593 MOps/s 1.3490 MOps/s $\color{#35bf28}+0.76\%$
test_to 71.3820μs 24.8164μs 40.2960 KOps/s 38.6131 KOps/s $\color{#35bf28}+4.36\%$
test_to_nonblocking 61.9120μs 24.1171μs 41.4643 KOps/s 39.3731 KOps/s $\textbf{\color{#35bf28}+5.31\%}$
test_unbind_speed 0.3135ms 0.2823ms 3.5428 KOps/s 3.5121 KOps/s $\color{#35bf28}+0.87\%$
test_unbind_speed_stack0 0.3618ms 0.2816ms 3.5516 KOps/s 3.5456 KOps/s $\color{#35bf28}+0.17\%$
test_unbind_speed_stack1 93.3092ms 0.7086ms 1.4113 KOps/s 1.5315 KOps/s $\textbf{\color{#d91a1a}-7.85\%}$
test_split 95.4190ms 2.1751ms 459.7553 Ops/s 436.5296 Ops/s $\textbf{\color{#35bf28}+5.32\%}$
test_chunk 95.2672ms 2.1528ms 464.5174 Ops/s 428.9869 Ops/s $\textbf{\color{#35bf28}+8.28\%}$
test_creation[device0] 0.2907ms 0.1267ms 7.8901 KOps/s 7.5570 KOps/s $\color{#35bf28}+4.41\%$
test_creation_from_tensor 0.3594ms 0.1303ms 7.6749 KOps/s 7.4000 KOps/s $\color{#35bf28}+3.71\%$
test_add_one[memmap_tensor0] 0.2198ms 8.9249μs 112.0467 KOps/s 106.3867 KOps/s $\textbf{\color{#35bf28}+5.32\%}$
test_contiguous[memmap_tensor0] 33.0310μs 2.2021μs 454.1068 KOps/s 447.1720 KOps/s $\color{#35bf28}+1.55\%$
test_stack[memmap_tensor0] 51.4410μs 6.8241μs 146.5391 KOps/s 142.9476 KOps/s $\color{#35bf28}+2.51\%$
test_memmaptd_index 1.1631ms 0.4293ms 2.3293 KOps/s 2.2891 KOps/s $\color{#35bf28}+1.76\%$
test_memmaptd_index_astensor 0.7244ms 0.4791ms 2.0874 KOps/s 2.0080 KOps/s $\color{#35bf28}+3.96\%$
test_memmaptd_index_op 1.4160ms 1.0275ms 973.2404 Ops/s 912.6920 Ops/s $\textbf{\color{#35bf28}+6.63\%}$
test_serialize_model 0.1316s 0.1299s 7.6977 Ops/s 7.6824 Ops/s $\color{#35bf28}+0.20\%$
test_serialize_model_pickle 1.3515s 1.2121s 0.8250 Ops/s 0.8228 Ops/s $\color{#35bf28}+0.26\%$
test_serialize_weights 0.2253s 0.1426s 7.0132 Ops/s 7.0324 Ops/s $\color{#d91a1a}-0.27\%$
test_serialize_weights_returnearly 0.2336s 56.9592ms 17.5564 Ops/s 17.6422 Ops/s $\color{#d91a1a}-0.49\%$
test_serialize_weights_pickle 1.3718s 1.2164s 0.8221 Ops/s 0.8217 Ops/s $\color{#35bf28}+0.06\%$
test_reshape_pytree 63.6120μs 35.8947μs 27.8593 KOps/s 27.4861 KOps/s $\color{#35bf28}+1.36\%$
test_reshape_td 74.9420μs 42.1493μs 23.7252 KOps/s 23.3973 KOps/s $\color{#35bf28}+1.40\%$
test_view_pytree 66.3510μs 35.4150μs 28.2367 KOps/s 27.5089 KOps/s $\color{#35bf28}+2.65\%$
test_view_td 85.0620μs 46.0091μs 21.7348 KOps/s 20.8806 KOps/s $\color{#35bf28}+4.09\%$
test_unbind_pytree 63.9920μs 35.0768μs 28.5089 KOps/s 27.9981 KOps/s $\color{#35bf28}+1.82\%$
test_unbind_td 0.5109ms 43.7185μs 22.8736 KOps/s 22.9630 KOps/s $\color{#d91a1a}-0.39\%$
test_split_pytree 0.5287ms 47.0563μs 21.2511 KOps/s 21.3861 KOps/s $\color{#d91a1a}-0.63\%$
test_split_td 0.1476ms 55.9662μs 17.8679 KOps/s 17.5668 KOps/s $\color{#35bf28}+1.71\%$
test_add_pytree 0.1001ms 57.7554μs 17.3144 KOps/s 17.5550 KOps/s $\color{#d91a1a}-1.37\%$
test_add_td 0.1640ms 96.5171μs 10.3609 KOps/s 11.0197 KOps/s $\textbf{\color{#d91a1a}-5.98\%}$
test_compile_add_one_nested[tensordict-compile] 0.4282ms 0.2127ms 4.7012 KOps/s 4.6114 KOps/s $\color{#35bf28}+1.95\%$
test_compile_add_one_nested[tensordict-eager] 0.1979ms 0.1514ms 6.6038 KOps/s 6.6746 KOps/s $\color{#d91a1a}-1.06\%$
test_compile_add_one_nested[pytree-compile] 0.1830ms 0.1453ms 6.8841 KOps/s 6.8742 KOps/s $\color{#35bf28}+0.14\%$
test_compile_add_one_nested[pytree-eager] 0.2527ms 0.1855ms 5.3896 KOps/s 5.4415 KOps/s $\color{#d91a1a}-0.95\%$
test_compile_copy_nested[tensordict-compile] 50.8910μs 21.9777μs 45.5008 KOps/s 43.5984 KOps/s $\color{#35bf28}+4.36\%$
test_compile_copy_nested[tensordict-eager] 90.6420μs 44.2572μs 22.5952 KOps/s 22.5008 KOps/s $\color{#35bf28}+0.42\%$
test_compile_copy_nested[pytree-compile] 0.2377ms 63.1173μs 15.8435 KOps/s 15.6026 KOps/s $\color{#35bf28}+1.54\%$
test_compile_copy_nested[pytree-eager] 86.7320μs 49.0089μs 20.4045 KOps/s 20.4098 KOps/s $\color{#d91a1a}-0.03\%$
test_compile_add_one_flat[tensordict-compile] 0.3861ms 0.3217ms 3.1088 KOps/s 3.1230 KOps/s $\color{#d91a1a}-0.45\%$
test_compile_add_one_flat[tensordict-eager] 0.2824ms 0.2099ms 4.7632 KOps/s 4.7392 KOps/s $\color{#35bf28}+0.51\%$
test_compile_add_one_flat[tensorclass-compile] 0.1843ms 0.1287ms 7.7682 KOps/s 7.6369 KOps/s $\color{#35bf28}+1.72\%$
test_compile_add_one_flat[tensorclass-eager] 0.1101ms 59.8019μs 16.7219 KOps/s 15.7429 KOps/s $\textbf{\color{#35bf28}+6.22\%}$
test_compile_add_one_flat[pytree-compile] 0.3951ms 0.3221ms 3.1045 KOps/s 3.1058 KOps/s $\color{#d91a1a}-0.04\%$
test_compile_add_one_flat[pytree-eager] 0.6945ms 0.6423ms 1.5570 KOps/s 1.6058 KOps/s $\color{#d91a1a}-3.04\%$
test_compile_add_self_flat[tensordict-eager] 0.2947ms 0.2476ms 4.0386 KOps/s 4.0024 KOps/s $\color{#35bf28}+0.90\%$
test_compile_add_self_flat[tensordict-compile] 0.3825ms 0.3248ms 3.0785 KOps/s 3.0800 KOps/s $\color{#d91a1a}-0.05\%$
test_compile_add_self_flat[tensorclass-eager] 0.1159ms 69.4929μs 14.3900 KOps/s 13.7655 KOps/s $\color{#35bf28}+4.54\%$
test_compile_add_self_flat[tensorclass-compile] 0.1734ms 0.1308ms 7.6478 KOps/s 7.4615 KOps/s $\color{#35bf28}+2.50\%$
test_compile_add_self_flat[pytree-eager] 0.6038ms 0.5336ms 1.8741 KOps/s 1.8880 KOps/s $\color{#d91a1a}-0.74\%$
test_compile_add_self_flat[pytree-compile] 0.3989ms 0.3222ms 3.1035 KOps/s 3.1115 KOps/s $\color{#d91a1a}-0.26\%$
test_compile_copy_flat[tensordict-compile] 67.5010μs 18.5081μs 54.0304 KOps/s 55.1974 KOps/s $\color{#d91a1a}-2.11\%$
test_compile_copy_flat[tensordict-eager] 64.4020μs 26.7970μs 37.3176 KOps/s 37.1119 KOps/s $\color{#35bf28}+0.55\%$
test_compile_copy_flat[pytree-compile] 0.1107ms 69.4912μs 14.3903 KOps/s 14.5702 KOps/s $\color{#d91a1a}-1.23\%$
test_compile_copy_flat[pytree-eager] 79.6920μs 51.6724μs 19.3527 KOps/s 19.5388 KOps/s $\color{#d91a1a}-0.95\%$
test_compile_assign_and_add[tensordict-compile] 2.3169ms 0.8121ms 1.2314 KOps/s 1.1100 KOps/s $\textbf{\color{#35bf28}+10.94\%}$
test_compile_assign_and_add[tensordict-eager] 3.4347ms 3.2951ms 303.4788 Ops/s 300.5918 Ops/s $\color{#35bf28}+0.96\%$
test_compile_assign_and_add[pytree-compile] 2.3125ms 0.8151ms 1.2269 KOps/s 1.1244 KOps/s $\textbf{\color{#35bf28}+9.12\%}$
test_compile_assign_and_add[pytree-eager] 3.5630ms 3.3262ms 300.6429 Ops/s 304.4343 Ops/s $\color{#d91a1a}-1.25\%$
test_compile_indexing[tensor-tensordict-compile] 0.1528ms 0.1093ms 9.1467 KOps/s 8.8319 KOps/s $\color{#35bf28}+3.56\%$
test_compile_indexing[tensor-tensordict-eager] 0.1952ms 65.8807μs 15.1790 KOps/s 15.0117 KOps/s $\color{#35bf28}+1.11\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1496ms 0.1034ms 9.6672 KOps/s 9.5485 KOps/s $\color{#35bf28}+1.24\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1467ms 44.2961μs 22.5754 KOps/s 22.2333 KOps/s $\color{#35bf28}+1.54\%$
test_compile_indexing[tensor-pytree-compile] 0.1588ms 0.1086ms 9.2080 KOps/s 9.3104 KOps/s $\color{#d91a1a}-1.10\%$
test_compile_indexing[tensor-pytree-eager] 92.9220μs 44.2527μs 22.5975 KOps/s 22.4984 KOps/s $\color{#35bf28}+0.44\%$
test_compile_indexing[slice-tensordict-compile] 0.1989ms 0.1379ms 7.2541 KOps/s 7.1707 KOps/s $\color{#35bf28}+1.16\%$
test_compile_indexing[slice-tensordict-eager] 0.1634ms 25.4665μs 39.2673 KOps/s 38.1116 KOps/s $\color{#35bf28}+3.03\%$
test_compile_indexing[slice-tensorclass-compile] 0.1672ms 0.1318ms 7.5883 KOps/s 7.3971 KOps/s $\color{#35bf28}+2.59\%$
test_compile_indexing[slice-tensorclass-eager] 56.6620μs 20.3289μs 49.1910 KOps/s 46.8778 KOps/s $\color{#35bf28}+4.93\%$
test_compile_indexing[slice-pytree-compile] 0.1838ms 0.1331ms 7.5104 KOps/s 7.2176 KOps/s $\color{#35bf28}+4.06\%$
test_compile_indexing[slice-pytree-eager] 56.7810μs 20.4175μs 48.9777 KOps/s 47.1847 KOps/s $\color{#35bf28}+3.80\%$
test_compile_indexing[int-tensordict-compile] 0.1812ms 0.1394ms 7.1743 KOps/s 7.1279 KOps/s $\color{#35bf28}+0.65\%$
test_compile_indexing[int-tensordict-eager] 0.4911ms 24.5580μs 40.7199 KOps/s 38.7132 KOps/s $\textbf{\color{#35bf28}+5.18\%}$
test_compile_indexing[int-tensorclass-compile] 0.1966ms 0.1340ms 7.4601 KOps/s 7.3090 KOps/s $\color{#35bf28}+2.07\%$
test_compile_indexing[int-tensorclass-eager] 0.1541ms 22.5983μs 44.2511 KOps/s 46.9272 KOps/s $\textbf{\color{#d91a1a}-5.70\%}$
test_compile_indexing[int-pytree-compile] 0.1854ms 0.1338ms 7.4711 KOps/s 7.4841 KOps/s $\color{#d91a1a}-0.17\%$
test_compile_indexing[int-pytree-eager] 65.9520μs 20.5969μs 48.5509 KOps/s 47.6069 KOps/s $\color{#35bf28}+1.98\%$
test_mod_add[eager] 81.6420μs 32.0422μs 31.2088 KOps/s 30.4546 KOps/s $\color{#35bf28}+2.48\%$
test_mod_add[compile] 0.3827ms 69.8231μs 14.3219 KOps/s 13.9641 KOps/s $\color{#35bf28}+2.56\%$
test_mod_add[compile-overhead] 0.2627ms 0.1364ms 7.3301 KOps/s 7.0108 KOps/s $\color{#35bf28}+4.55\%$
test_mod_wrap[eager] 0.3235ms 0.2443ms 4.0935 KOps/s 4.0007 KOps/s $\color{#35bf28}+2.32\%$
test_mod_wrap[compile] 1.4681ms 0.2998ms 3.3359 KOps/s 3.1661 KOps/s $\textbf{\color{#35bf28}+5.36\%}$
test_mod_wrap[compile-overhead] 7.6595ms 4.0040ms 249.7505 Ops/s 248.9984 Ops/s $\color{#35bf28}+0.30\%$
test_mod_wrap_and_backward[eager] 1.4577ms 1.3667ms 731.7052 Ops/s 687.6753 Ops/s $\textbf{\color{#35bf28}+6.40\%}$
test_mod_wrap_and_backward[compile] 1.5795ms 1.3348ms 749.1638 Ops/s 686.2619 Ops/s $\textbf{\color{#35bf28}+9.17\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3432ms 0.9067ms 1.1029 KOps/s 971.2357 Ops/s $\textbf{\color{#35bf28}+13.56\%}$
test_seq_add[eager] 0.1498ms 97.6527μs 10.2404 KOps/s 10.1878 KOps/s $\color{#35bf28}+0.52\%$
test_seq_add[compile] 0.1477ms 81.0903μs 12.3319 KOps/s 12.1919 KOps/s $\color{#35bf28}+1.15\%$
test_seq_add[compile-overhead] 0.1535ms 0.1148ms 8.7102 KOps/s 8.5528 KOps/s $\color{#35bf28}+1.84\%$
test_seq_wrap[eager] 0.4456ms 0.3875ms 2.5808 KOps/s 2.5402 KOps/s $\color{#35bf28}+1.60\%$
test_seq_wrap[compile] 0.3812ms 0.3176ms 3.1487 KOps/s 3.1004 KOps/s $\color{#35bf28}+1.56\%$
test_seq_wrap[compile-overhead] 0.3023ms 0.2229ms 4.4871 KOps/s 4.4311 KOps/s $\color{#35bf28}+1.26\%$
test_func_call_runtime[False-eager] 0.8167ms 0.7386ms 1.3540 KOps/s 1.3303 KOps/s $\color{#35bf28}+1.78\%$
test_func_call_runtime[False-compile] 0.8794ms 0.7999ms 1.2502 KOps/s 1.2299 KOps/s $\color{#35bf28}+1.65\%$
test_func_call_runtime[False-compile-overhead] 0.4139ms 0.3626ms 2.7579 KOps/s 2.7281 KOps/s $\color{#35bf28}+1.09\%$
test_func_call_runtime[True-eager] 0.9725ms 0.9013ms 1.1095 KOps/s 1.0722 KOps/s $\color{#35bf28}+3.48\%$
test_func_call_runtime[True-compile] 0.9312ms 0.8344ms 1.1985 KOps/s 1.1780 KOps/s $\color{#35bf28}+1.74\%$
test_func_call_runtime[True-compile-overhead] 0.4542ms 0.3984ms 2.5100 KOps/s 2.4984 KOps/s $\color{#35bf28}+0.46\%$
test_func_call_cm_runtime[False-eager] 0.8102ms 0.7407ms 1.3501 KOps/s 1.2517 KOps/s $\textbf{\color{#35bf28}+7.86\%}$
test_func_call_cm_runtime[False-compile] 0.9490ms 0.8051ms 1.2421 KOps/s 1.2227 KOps/s $\color{#35bf28}+1.59\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4387ms 0.3664ms 2.7295 KOps/s 2.7347 KOps/s $\color{#d91a1a}-0.19\%$
test_func_call_cm_runtime[True-eager] 1.1212ms 1.0030ms 996.9759 Ops/s 983.8462 Ops/s $\color{#35bf28}+1.33\%$
test_func_call_cm_runtime[True-compile] 0.9491ms 0.8624ms 1.1595 KOps/s 1.1391 KOps/s $\color{#35bf28}+1.79\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4832ms 0.4234ms 2.3617 KOps/s 2.3428 KOps/s $\color{#35bf28}+0.80\%$
test_vmap_func_call_cm_runtime[eager] 2.5686ms 2.0924ms 477.9122 Ops/s 475.5572 Ops/s $\color{#35bf28}+0.50\%$
test_vmap_func_call_cm_runtime[compile] 0.9772ms 0.8818ms 1.1341 KOps/s 1.1198 KOps/s $\color{#35bf28}+1.28\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4791ms 0.4309ms 2.3205 KOps/s 2.3269 KOps/s $\color{#d91a1a}-0.28\%$
test_distributed 2.2133ms 0.2002ms 4.9944 KOps/s 8.9291 KOps/s $\textbf{\color{#d91a1a}-44.07\%}$
test_tdmodule 80.4520μs 15.0300μs 66.5335 KOps/s 63.5575 KOps/s $\color{#35bf28}+4.68\%$
test_tdmodule_dispatch 57.8110μs 28.7745μs 34.7530 KOps/s 34.6011 KOps/s $\color{#35bf28}+0.44\%$
test_tdseq 42.6210μs 16.0971μs 62.1231 KOps/s 63.1077 KOps/s $\color{#d91a1a}-1.56\%$
test_tdseq_dispatch 56.8020μs 32.5273μs 30.7434 KOps/s 31.2041 KOps/s $\color{#d91a1a}-1.48\%$
test_instantiation_functorch 2.4227ms 1.8886ms 529.5004 Ops/s 522.7627 Ops/s $\color{#35bf28}+1.29\%$
test_instantiation_td 1.7868ms 1.2015ms 832.2859 Ops/s 826.2625 Ops/s $\color{#35bf28}+0.73\%$
test_exec_functorch 0.2819ms 0.2080ms 4.8078 KOps/s 4.6742 KOps/s $\color{#35bf28}+2.86\%$
test_exec_functional_call 0.2703ms 0.2120ms 4.7172 KOps/s 4.6576 KOps/s $\color{#35bf28}+1.28\%$
test_exec_td 0.2799ms 0.2180ms 4.5862 KOps/s 4.5472 KOps/s $\color{#35bf28}+0.86\%$
test_exec_td_decorator 0.6798ms 0.2584ms 3.8697 KOps/s 3.7960 KOps/s $\color{#35bf28}+1.94\%$
test_vmap_mlp_speed[True-True] 0.7645ms 0.6906ms 1.4479 KOps/s 1.4324 KOps/s $\color{#35bf28}+1.09\%$
test_vmap_mlp_speed[True-False] 0.7468ms 0.6868ms 1.4561 KOps/s 1.4434 KOps/s $\color{#35bf28}+0.88\%$
test_vmap_mlp_speed[False-True] 0.7086ms 0.5804ms 1.7230 KOps/s 1.6704 KOps/s $\color{#35bf28}+3.15\%$
test_vmap_mlp_speed[False-False] 0.6687ms 0.6078ms 1.6451 KOps/s 1.7065 KOps/s $\color{#d91a1a}-3.60\%$
test_vmap_mlp_speed_decorator[True-True] 1.4322ms 0.6822ms 1.4659 KOps/s 1.4666 KOps/s $\color{#d91a1a}-0.05\%$
test_vmap_mlp_speed_decorator[True-False] 0.8429ms 0.6807ms 1.4691 KOps/s 1.4720 KOps/s $\color{#d91a1a}-0.19\%$
test_vmap_mlp_speed_decorator[False-True] 0.7100ms 0.6085ms 1.6434 KOps/s 1.6749 KOps/s $\color{#d91a1a}-1.88\%$
test_vmap_mlp_speed_decorator[False-False] 0.7492ms 0.6256ms 1.5985 KOps/s 1.6477 KOps/s $\color{#d91a1a}-2.99\%$
test_vmap_transformer_speed[True-True] 8.8495ms 8.4518ms 118.3179 Ops/s 117.7615 Ops/s $\color{#35bf28}+0.47\%$
test_vmap_transformer_speed[True-False] 8.9342ms 8.4537ms 118.2908 Ops/s 117.7776 Ops/s $\color{#35bf28}+0.44\%$
test_vmap_transformer_speed[False-True] 8.4434ms 8.1908ms 122.0881 Ops/s 120.7464 Ops/s $\color{#35bf28}+1.11\%$
test_vmap_transformer_speed[False-False] 8.3043ms 8.1979ms 121.9827 Ops/s 119.8967 Ops/s $\color{#35bf28}+1.74\%$
test_vmap_transformer_speed_decorator[True-True] 19.8267ms 19.7100ms 50.7356 Ops/s 50.6794 Ops/s $\color{#35bf28}+0.11\%$
test_vmap_transformer_speed_decorator[True-False] 20.7671ms 19.8264ms 50.4379 Ops/s 50.1700 Ops/s $\color{#35bf28}+0.53\%$
test_vmap_transformer_speed_decorator[False-True] 20.7505ms 19.6091ms 50.9968 Ops/s 51.3185 Ops/s $\color{#d91a1a}-0.63\%$
test_vmap_transformer_speed_decorator[False-False] 19.6557ms 19.5184ms 51.2338 Ops/s 51.1055 Ops/s $\color{#35bf28}+0.25\%$
test_to_module_speed[True] 1.2098ms 0.9383ms 1.0657 KOps/s 1.0593 KOps/s $\color{#35bf28}+0.61\%$
test_to_module_speed[False] 1.3441ms 0.9228ms 1.0837 KOps/s 1.0953 KOps/s $\color{#d91a1a}-1.06\%$
test_tc_init 62.3120μs 32.5688μs 30.7042 KOps/s 30.8415 KOps/s $\color{#d91a1a}-0.44\%$
test_tc_init_nested 0.1038ms 66.6339μs 15.0074 KOps/s 15.5366 KOps/s $\color{#d91a1a}-3.41\%$
test_tc_first_layer_tensor 5.3887μs 0.6797μs 1.4713 MOps/s 1.4640 MOps/s $\color{#35bf28}+0.50\%$
test_tc_first_layer_nontensor 33.0610μs 2.2435μs 445.7403 KOps/s 441.3346 KOps/s $\color{#35bf28}+1.00\%$
test_tc_second_layer_tensor 47.2713μs 1.3843μs 722.3918 KOps/s 730.4920 KOps/s $\color{#d91a1a}-1.11\%$
test_tc_second_layer_nontensor 31.7110μs 2.9376μs 340.4139 KOps/s 341.8278 KOps/s $\color{#d91a1a}-0.41\%$
test_unbind 0.1956s 12.2958ms 81.3286 Ops/s 90.4173 Ops/s $\textbf{\color{#d91a1a}-10.05\%}$
test_full_like 0.6570ms 0.5756ms 1.7373 KOps/s 1.7427 KOps/s $\color{#d91a1a}-0.31\%$
test_zeros_like 0.2836ms 0.1980ms 5.0506 KOps/s 5.0494 KOps/s $\color{#35bf28}+0.03\%$
test_ones_like 0.2333ms 0.1979ms 5.0529 KOps/s 5.0547 KOps/s $\color{#d91a1a}-0.03\%$
test_clone 0.4779ms 0.4149ms 2.4102 KOps/s 2.4117 KOps/s $\color{#d91a1a}-0.06\%$
test_squeeze 38.1210μs 9.8297μs 101.7323 KOps/s 99.6491 KOps/s $\color{#35bf28}+2.09\%$
test_unsqueeze 0.2800ms 75.0819μs 13.3188 KOps/s 13.1423 KOps/s $\color{#35bf28}+1.34\%$
test_split 0.2596ms 0.1534ms 6.5206 KOps/s 6.3078 KOps/s $\color{#35bf28}+3.37\%$
test_permute 0.2385ms 0.1743ms 5.7369 KOps/s 5.5181 KOps/s $\color{#35bf28}+3.97\%$
test_stack 1.2546ms 0.8439ms 1.1850 KOps/s 1.1658 KOps/s $\color{#35bf28}+1.65\%$
test_cat 1.2476ms 1.2314ms 812.0726 Ops/s 811.7995 Ops/s $\color{#35bf28}+0.03\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Sep 20, 2024
ghstack-source-id: 18a5798c5377d3e5b65e7b6c87d59917c474fd64
Pull Request resolved: #1004
@vmoens vmoens changed the title [BugFix] Fix parsing integer batch size in AOT [BugFix] Fix parsing integer batch size within export Sep 20, 2024
x_new, y_new = torch.zeros(5, 100), torch.zeros(5, 100)
export_test = export_mod(x_new, y_new)
eager_test = test(x_new, y_new)
assert eager_test.batch_size == export_test.batch_size
Copy link
Contributor Author

@vmoens vmoens Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang this test fails when using dynamic shape - the eager shape is [5] but the export is [].
Both across strict=False and True.

The batch size [s0] becomes [] when using dynamic shapes and when the 2nd output shape mismatches the 1st.

We do get a warning though

W0920 10:19:28.564000 20340 torch/fx/experimental/symbolic_shapes.py:5136] Ignored guard Eq(s0, 5) == False, this could result in accuracy problems

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there's something a bit nontrivial going on here. In torch.compile eager, if we produce a fresh TensorDict and that TensorDict holds a list of dynamic ints, then in the residual bytecode we have to construct the TensorDict and also put in the freshly computed dynamic shapes from the FX graph (that has some int outputs now). So actually building a TensorDict isn't just a matter of putting in the right tensors, you also have to put some ints in too. Does this work?

Assuming this does work, export also has to be setup to do the same thing as well. It wouldn't be surprising if it didn't. In particular, if all export is doing is a pytree unflatten on Tensor leaves, the batch size won't be modified at all. To address this, we need to fix the export bug. But I also saw the comment about TensorDict not being pytree-able, so I am uncertain about the status there.

If you want to workaround, perhaps batch size can store rank instead of size and lazily compute it from tensor if it's not set? Better to fix things though. Just not sure what you expect to work and not work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this does work, export also has to be setup to do the same thing as well. It wouldn't be surprising if it didn't. In particular, if all export is doing is a pytree unflatten on Tensor leaves, the batch size won't be modified at all. To address this, we need to fix the export bug. But I also saw the comment about TensorDict not being pytree-able, so I am uncertain about the status there.

TensorDict is pytreeable but you can deactivate it, this is what the comment is about (don't do it or the test will fail)

Copy link
Contributor Author

@vmoens vmoens Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what works and what doesn't

    class Test(torch.nn.Module):
            def forward(self, x: torch.Tensor, y: torch.Tensor):
                return TensorDict(
                    {
                        "x": x,
                        "y": y,
                    },
                    batch_size=x.shape[0],
                )
     x, y = torch.zeros(5, 100), torch.zeros(5, 100)
     result = torch.export.export(test, args=(x, y), strict=False, dynamic_shapes={
                    "x": {0: torch.export.Dim("batch"), 1: torch.export.Dim("time")},
                    "y": {0: torch.export.Dim("batch"), 1: torch.export.Dim("time")},
                })
    result = torch.export.export(test, args=(x, y), strict=False, **kwargs)
    export_mod = result.module()
    x_new, y_new = torch.zeros(5, 100), torch.zeros(5, 100)
    export_test = export_mod(x_new, y_new)
    eager_test = test(x_new, y_new)
    assert torch.Size([5]) == eager_test.batch_size == export_test.batch_size # Works because x and x_new have the same shape

    x_new, y_new = torch.zeros(2, 100), torch.zeros(2, 100)
    export_test = export_mod(x_new, y_new)
    eager_test = test(x_new, y_new)
    assert torch.Size([2]) == eager_test.batch_size == export_test.batch_size # Fails! now export_test.batch_size is torch.Size([])

So it's a weird behaviour, the SymInt just vanished into thin air in the second case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] TensorDict with dynamic, input-dependent batch_size is not torch.export.exportable
3 participants