Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix vmap monkey patching #1009

Merged
merged 2 commits into from
Sep 25, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Sep 25, 2024

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Sep 25, 2024
ghstack-source-id: 55194bcc1564a29121ea514fdb595c97d860d5ee
Pull Request resolved: #1009
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 25, 2024
@vmoens
Copy link
Contributor Author

vmoens commented Sep 25, 2024

The goal is to close pytorch/pytorch#134004 whilst waiting for pytorch/pytorch#135471 to be merged

@vmoens vmoens added bug Something isn't working Refactor Refactoring code - not a new feature labels Sep 25, 2024
Copy link

github-actions bot commented Sep 25, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}27$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 52.2980μs 21.7896μs 45.8935 KOps/s 48.7749 KOps/s $\textbf{\color{#d91a1a}-5.91\%}$
test_plain_set_stack_nested 58.1790μs 21.5553μs 46.3923 KOps/s 48.4428 KOps/s $\color{#d91a1a}-4.23\%$
test_plain_set_nested_inplace 68.4580μs 23.3214μs 42.8791 KOps/s 45.2207 KOps/s $\textbf{\color{#d91a1a}-5.18\%}$
test_plain_set_stack_nested_inplace 68.9800μs 23.4359μs 42.6696 KOps/s 44.7519 KOps/s $\color{#d91a1a}-4.65\%$
test_items 25.6480μs 4.1323μs 241.9935 KOps/s 245.3002 KOps/s $\color{#d91a1a}-1.35\%$
test_items_nested 0.4805ms 0.3652ms 2.7381 KOps/s 2.7794 KOps/s $\color{#d91a1a}-1.49\%$
test_items_nested_locked 0.9930ms 0.3783ms 2.6437 KOps/s 2.7720 KOps/s $\color{#d91a1a}-4.63\%$
test_items_nested_leaf 0.1286ms 67.8675μs 14.7346 KOps/s 14.7810 KOps/s $\color{#d91a1a}-0.31\%$
test_items_stack_nested 0.6316ms 0.3696ms 2.7060 KOps/s 2.7292 KOps/s $\color{#d91a1a}-0.85\%$
test_items_stack_nested_leaf 0.1399ms 71.5571μs 13.9749 KOps/s 14.3144 KOps/s $\color{#d91a1a}-2.37\%$
test_items_stack_nested_locked 0.7094ms 0.3671ms 2.7242 KOps/s 2.7707 KOps/s $\color{#d91a1a}-1.68\%$
test_keys 25.7280μs 3.4869μs 286.7856 KOps/s 282.2439 KOps/s $\color{#35bf28}+1.61\%$
test_keys_nested 0.1925ms 99.6706μs 10.0330 KOps/s 9.8383 KOps/s $\color{#35bf28}+1.98\%$
test_keys_nested_locked 1.5537ms 0.1042ms 9.5927 KOps/s 9.4033 KOps/s $\color{#35bf28}+2.01\%$
test_keys_nested_leaf 0.3674ms 83.6682μs 11.9520 KOps/s 11.8708 KOps/s $\color{#35bf28}+0.68\%$
test_keys_stack_nested 0.3011ms 0.1015ms 9.8537 KOps/s 10.1046 KOps/s $\color{#d91a1a}-2.48\%$
test_keys_stack_nested_leaf 0.4868ms 84.0192μs 11.9020 KOps/s 12.3312 KOps/s $\color{#d91a1a}-3.48\%$
test_keys_stack_nested_locked 0.1862ms 0.1051ms 9.5128 KOps/s 9.7035 KOps/s $\color{#d91a1a}-1.97\%$
test_values 15.7796μs 1.0379μs 963.4605 KOps/s 834.2970 KOps/s $\textbf{\color{#35bf28}+15.48\%}$
test_values_nested 0.1355ms 75.4153μs 13.2599 KOps/s 13.7965 KOps/s $\color{#d91a1a}-3.89\%$
test_values_nested_locked 0.4227ms 78.5624μs 12.7287 KOps/s 13.8952 KOps/s $\textbf{\color{#d91a1a}-8.40\%}$
test_values_nested_leaf 0.1115ms 62.5477μs 15.9878 KOps/s 16.2343 KOps/s $\color{#d91a1a}-1.52\%$
test_values_stack_nested 0.1366ms 76.6555μs 13.0454 KOps/s 13.6255 KOps/s $\color{#d91a1a}-4.26\%$
test_values_stack_nested_leaf 0.1074ms 62.0800μs 16.1082 KOps/s 16.8759 KOps/s $\color{#d91a1a}-4.55\%$
test_values_stack_nested_locked 0.4366ms 76.6285μs 13.0500 KOps/s 13.7424 KOps/s $\textbf{\color{#d91a1a}-5.04\%}$
test_membership 4.7833μs 0.7518μs 1.3301 MOps/s 1.1254 MOps/s $\textbf{\color{#35bf28}+18.19\%}$
test_membership_nested 21.7100μs 2.8130μs 355.4885 KOps/s 362.4918 KOps/s $\color{#d91a1a}-1.93\%$
test_membership_nested_leaf 40.4070μs 2.7899μs 358.4337 KOps/s 358.7731 KOps/s $\color{#d91a1a}-0.09\%$
test_membership_stacked_nested 28.3930μs 2.8634μs 349.2363 KOps/s 364.2244 KOps/s $\color{#d91a1a}-4.12\%$
test_membership_stacked_nested_leaf 79.1280μs 2.8196μs 354.6604 KOps/s 363.1711 KOps/s $\color{#d91a1a}-2.34\%$
test_membership_nested_last 30.5370μs 4.1660μs 240.0402 KOps/s 247.6243 KOps/s $\color{#d91a1a}-3.06\%$
test_membership_nested_leaf_last 38.0210μs 4.0904μs 244.4750 KOps/s 245.6277 KOps/s $\color{#d91a1a}-0.47\%$
test_membership_stacked_nested_last 26.0590μs 4.7040μs 212.5837 KOps/s 77.8645 KOps/s $\textbf{\color{#35bf28}+173.02\%}$
test_membership_stacked_nested_leaf_last 32.8120μs 4.6964μs 212.9303 KOps/s 78.0929 KOps/s $\textbf{\color{#35bf28}+172.66\%}$
test_nested_getleaf 56.0740μs 10.4346μs 95.8349 KOps/s 94.3466 KOps/s $\color{#35bf28}+1.58\%$
test_nested_get 39.2930μs 10.1036μs 98.9743 KOps/s 101.3059 KOps/s $\color{#d91a1a}-2.30\%$
test_stacked_getleaf 32.8910μs 10.6130μs 94.2239 KOps/s 95.6416 KOps/s $\color{#d91a1a}-1.48\%$
test_stacked_get 34.3240μs 10.1152μs 98.8608 KOps/s 100.0685 KOps/s $\color{#d91a1a}-1.21\%$
test_nested_getitemleaf 41.1970μs 11.2685μs 88.7426 KOps/s 89.8133 KOps/s $\color{#d91a1a}-1.19\%$
test_nested_getitem 29.5350μs 10.5025μs 95.2157 KOps/s 97.1449 KOps/s $\color{#d91a1a}-1.99\%$
test_stacked_getitemleaf 31.9600μs 11.1691μs 89.5330 KOps/s 91.2733 KOps/s $\color{#d91a1a}-1.91\%$
test_stacked_getitem 42.3490μs 10.3184μs 96.9141 KOps/s 97.3932 KOps/s $\color{#d91a1a}-0.49\%$
test_lock_nested 83.7402ms 0.5838ms 1.7130 KOps/s 2.0717 KOps/s $\textbf{\color{#d91a1a}-17.32\%}$
test_lock_stack_nested 0.7301ms 0.4594ms 2.1768 KOps/s 2.2783 KOps/s $\color{#d91a1a}-4.45\%$
test_unlock_nested 85.0735ms 0.5025ms 1.9899 KOps/s 2.4766 KOps/s $\textbf{\color{#d91a1a}-19.65\%}$
test_unlock_stack_nested 0.4752ms 0.3744ms 2.6712 KOps/s 2.7862 KOps/s $\color{#d91a1a}-4.13\%$
test_flatten_speed 0.1724ms 87.9505μs 11.3700 KOps/s 11.5543 KOps/s $\color{#d91a1a}-1.59\%$
test_unflatten_speed 0.5987ms 0.4667ms 2.1425 KOps/s 2.2013 KOps/s $\color{#d91a1a}-2.67\%$
test_common_ops 4.3499ms 1.1850ms 843.8790 Ops/s 870.5472 Ops/s $\color{#d91a1a}-3.06\%$
test_creation 19.7470μs 2.0488μs 488.0804 KOps/s 474.6944 KOps/s $\color{#35bf28}+2.82\%$
test_creation_empty 47.6900μs 20.5012μs 48.7775 KOps/s 53.8904 KOps/s $\textbf{\color{#d91a1a}-9.49\%}$
test_creation_nested_1 76.3540μs 23.9944μs 41.6763 KOps/s 45.9136 KOps/s $\textbf{\color{#d91a1a}-9.23\%}$
test_creation_nested_2 63.0990μs 28.9047μs 34.5964 KOps/s 38.1514 KOps/s $\textbf{\color{#d91a1a}-9.32\%}$
test_clone 64.0600μs 17.3105μs 57.7685 KOps/s 59.3395 KOps/s $\color{#d91a1a}-2.65\%$
test_getitem[int] 1.3449ms 16.9403μs 59.0308 KOps/s 60.1002 KOps/s $\color{#d91a1a}-1.78\%$
test_getitem[slice_int] 0.1391ms 30.2961μs 33.0076 KOps/s 32.7277 KOps/s $\color{#35bf28}+0.86\%$
test_getitem[range] 0.1685ms 56.9686μs 17.5535 KOps/s 17.1206 KOps/s $\color{#35bf28}+2.53\%$
test_getitem[tuple] 0.1570ms 25.5103μs 39.1998 KOps/s 40.0823 KOps/s $\color{#d91a1a}-2.20\%$
test_getitem[list] 0.1846ms 52.7222μs 18.9674 KOps/s 18.7168 KOps/s $\color{#35bf28}+1.34\%$
test_setitem_dim[int] 54.9430μs 33.3224μs 30.0098 KOps/s 31.3605 KOps/s $\color{#d91a1a}-4.31\%$
test_setitem_dim[slice_int] 0.1259ms 63.0067μs 15.8713 KOps/s 16.3971 KOps/s $\color{#d91a1a}-3.21\%$
test_setitem_dim[range] 0.1470ms 83.8605μs 11.9246 KOps/s 11.8479 KOps/s $\color{#35bf28}+0.65\%$
test_setitem_dim[tuple] 78.8870μs 50.6479μs 19.7442 KOps/s 20.8567 KOps/s $\textbf{\color{#d91a1a}-5.33\%}$
test_setitem 78.1670μs 31.0947μs 32.1598 KOps/s 33.5439 KOps/s $\color{#d91a1a}-4.13\%$
test_set 97.6740μs 30.7556μs 32.5144 KOps/s 34.6424 KOps/s $\textbf{\color{#d91a1a}-6.14\%}$
test_set_shared 2.0354ms 0.2173ms 4.6021 KOps/s 4.7130 KOps/s $\color{#d91a1a}-2.35\%$
test_update 0.1404ms 39.7323μs 25.1684 KOps/s 27.0372 KOps/s $\textbf{\color{#d91a1a}-6.91\%}$
test_update_nested 0.1139ms 49.4738μs 20.2127 KOps/s 21.0086 KOps/s $\color{#d91a1a}-3.79\%$
test_update__nested 82.2840μs 35.1012μs 28.4890 KOps/s 29.1290 KOps/s $\color{#d91a1a}-2.20\%$
test_set_nested 86.7930μs 33.2378μs 30.0863 KOps/s 31.9093 KOps/s $\textbf{\color{#d91a1a}-5.71\%}$
test_set_nested_new 85.0800μs 38.6232μs 25.8912 KOps/s 27.2813 KOps/s $\textbf{\color{#d91a1a}-5.10\%}$
test_select 0.1127ms 55.7381μs 17.9410 KOps/s 18.3787 KOps/s $\color{#d91a1a}-2.38\%$
test_select_nested 0.9143ms 61.8742μs 16.1618 KOps/s 16.7865 KOps/s $\color{#d91a1a}-3.72\%$
test_exclude_nested 0.1424ms 76.1496μs 13.1320 KOps/s 13.6080 KOps/s $\color{#d91a1a}-3.50\%$
test_empty[True] 0.3700ms 0.3161ms 3.1635 KOps/s 3.2102 KOps/s $\color{#d91a1a}-1.45\%$
test_empty[False] 9.6580μs 1.2163μs 822.1964 KOps/s 796.9680 KOps/s $\color{#35bf28}+3.17\%$
test_unbind_speed 0.6470ms 0.3118ms 3.2074 KOps/s 3.3831 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_unbind_speed_stack0 0.4215ms 0.3000ms 3.3334 KOps/s 3.4831 KOps/s $\color{#d91a1a}-4.30\%$
test_unbind_speed_stack1 87.9470ms 0.8704ms 1.1489 KOps/s 1.4069 KOps/s $\textbf{\color{#d91a1a}-18.34\%}$
test_split 86.6370ms 2.1418ms 466.9025 Ops/s 469.7001 Ops/s $\color{#d91a1a}-0.60\%$
test_chunk 2.3751ms 1.9913ms 502.1962 Ops/s 464.3759 Ops/s $\textbf{\color{#35bf28}+8.14\%}$
test_creation[device0] 0.2365ms 0.1179ms 8.4822 KOps/s 8.6678 KOps/s $\color{#d91a1a}-2.14\%$
test_creation_from_tensor 3.3812ms 0.1188ms 8.4210 KOps/s 8.5504 KOps/s $\color{#d91a1a}-1.51\%$
test_add_one[memmap_tensor0] 0.2279ms 7.1088μs 140.6713 KOps/s 137.8141 KOps/s $\color{#35bf28}+2.07\%$
test_contiguous[memmap_tensor0] 16.5110μs 1.9241μs 519.7237 KOps/s 525.3771 KOps/s $\color{#d91a1a}-1.08\%$
test_stack[memmap_tensor0] 51.2760μs 5.7116μs 175.0833 KOps/s 180.0253 KOps/s $\color{#d91a1a}-2.75\%$
test_memmaptd_index 1.1251ms 0.3963ms 2.5233 KOps/s 2.5611 KOps/s $\color{#d91a1a}-1.47\%$
test_memmaptd_index_astensor 0.9703ms 0.4748ms 2.1060 KOps/s 2.1257 KOps/s $\color{#d91a1a}-0.93\%$
test_memmaptd_index_op 86.3334ms 1.1314ms 883.8788 Ops/s 981.0463 Ops/s $\textbf{\color{#d91a1a}-9.90\%}$
test_serialize_model 0.1258s 0.1170s 8.5469 Ops/s 8.4573 Ops/s $\color{#35bf28}+1.06\%$
test_serialize_model_pickle 0.4476s 0.3960s 2.5250 Ops/s 2.4970 Ops/s $\color{#35bf28}+1.12\%$
test_serialize_weights 0.1248s 0.1140s 8.7681 Ops/s 7.7194 Ops/s $\textbf{\color{#35bf28}+13.59\%}$
test_serialize_weights_returnearly 0.2491s 0.1737s 5.7562 Ops/s 6.5200 Ops/s $\textbf{\color{#d91a1a}-11.72\%}$
test_serialize_weights_pickle 0.5254s 0.4135s 2.4184 Ops/s 2.5572 Ops/s $\textbf{\color{#d91a1a}-5.43\%}$
test_serialize_weights_filesystem 0.1466s 0.1387s 7.2115 Ops/s 7.1054 Ops/s $\color{#35bf28}+1.49\%$
test_serialize_model_filesystem 0.1592s 0.1495s 6.6901 Ops/s 5.9280 Ops/s $\textbf{\color{#35bf28}+12.86\%}$
test_reshape_pytree 72.3160μs 39.6347μs 25.2304 KOps/s 25.2799 KOps/s $\color{#d91a1a}-0.20\%$
test_reshape_td 0.1061ms 47.7679μs 20.9346 KOps/s 21.3047 KOps/s $\color{#d91a1a}-1.74\%$
test_view_pytree 99.9540μs 38.7653μs 25.7963 KOps/s 25.9234 KOps/s $\color{#d91a1a}-0.49\%$
test_view_td 0.1134ms 53.0904μs 18.8358 KOps/s 19.1047 KOps/s $\color{#d91a1a}-1.41\%$
test_unbind_pytree 92.9570μs 37.0322μs 27.0035 KOps/s 27.9476 KOps/s $\color{#d91a1a}-3.38\%$
test_unbind_td 0.3003ms 46.0888μs 21.6972 KOps/s 22.4558 KOps/s $\color{#d91a1a}-3.38\%$
test_split_pytree 0.1048ms 38.6126μs 25.8983 KOps/s 26.6138 KOps/s $\color{#d91a1a}-2.69\%$
test_split_td 87.4255ms 67.1036μs 14.9023 KOps/s 17.6721 KOps/s $\textbf{\color{#d91a1a}-15.67\%}$
test_add_pytree 0.1105ms 45.3439μs 22.0537 KOps/s 21.8766 KOps/s $\color{#35bf28}+0.81\%$
test_add_td 0.1663ms 84.5650μs 11.8252 KOps/s 11.3427 KOps/s $\color{#35bf28}+4.25\%$
test_compile_add_one_nested[tensordict-compile] 0.1199ms 56.6133μs 17.6637 KOps/s 17.7455 KOps/s $\color{#d91a1a}-0.46\%$
test_compile_add_one_nested[tensordict-eager] 0.4225ms 0.1814ms 5.5118 KOps/s 5.7209 KOps/s $\color{#d91a1a}-3.66\%$
test_compile_add_one_nested[pytree-compile] 0.1196ms 56.0033μs 17.8561 KOps/s 17.6617 KOps/s $\color{#35bf28}+1.10\%$
test_compile_add_one_nested[pytree-eager] 0.5563ms 0.1399ms 7.1496 KOps/s 7.0578 KOps/s $\color{#35bf28}+1.30\%$
test_compile_copy_nested[tensordict-compile] 66.3040μs 20.7076μs 48.2914 KOps/s 45.7408 KOps/s $\textbf{\color{#35bf28}+5.58\%}$
test_compile_copy_nested[tensordict-eager] 0.1412ms 68.8566μs 14.5229 KOps/s 14.7009 KOps/s $\color{#d91a1a}-1.21\%$
test_compile_copy_nested[pytree-compile] 0.1457ms 74.0461μs 13.5051 KOps/s 13.5881 KOps/s $\color{#d91a1a}-0.61\%$
test_compile_copy_nested[pytree-eager] 0.1620ms 67.0779μs 14.9080 KOps/s 15.1320 KOps/s $\color{#d91a1a}-1.48\%$
test_compile_add_one_flat[tensordict-compile] 0.2699ms 0.1738ms 5.7521 KOps/s 5.7966 KOps/s $\color{#d91a1a}-0.77\%$
test_compile_add_one_flat[tensordict-eager] 0.3887ms 0.1923ms 5.2003 KOps/s 5.3294 KOps/s $\color{#d91a1a}-2.42\%$
test_compile_add_one_flat[tensorclass-compile] 0.1201ms 45.8752μs 21.7983 KOps/s 21.9534 KOps/s $\color{#d91a1a}-0.71\%$
test_compile_add_one_flat[tensorclass-eager] 0.1622ms 69.3193μs 14.4260 KOps/s 14.5270 KOps/s $\color{#d91a1a}-0.70\%$
test_compile_add_one_flat[pytree-compile] 0.2748ms 0.1748ms 5.7222 KOps/s 5.7472 KOps/s $\color{#d91a1a}-0.44\%$
test_compile_add_one_flat[pytree-eager] 0.5202ms 0.2851ms 3.5074 KOps/s 3.5386 KOps/s $\color{#d91a1a}-0.88\%$
test_compile_add_self_flat[tensordict-eager] 0.3999ms 0.2042ms 4.8964 KOps/s 4.8916 KOps/s $\color{#35bf28}+0.10\%$
test_compile_add_self_flat[tensordict-compile] 0.2814ms 0.1737ms 5.7576 KOps/s 5.7865 KOps/s $\color{#d91a1a}-0.50\%$
test_compile_add_self_flat[tensorclass-eager] 0.2237ms 63.8992μs 15.6496 KOps/s 16.0886 KOps/s $\color{#d91a1a}-2.73\%$
test_compile_add_self_flat[tensorclass-compile] 0.1635ms 46.3140μs 21.5917 KOps/s 21.6165 KOps/s $\color{#d91a1a}-0.11\%$
test_compile_add_self_flat[pytree-eager] 0.3947ms 0.2336ms 4.2799 KOps/s 4.3620 KOps/s $\color{#d91a1a}-1.88\%$
test_compile_add_self_flat[pytree-compile] 0.2958ms 0.1771ms 5.6464 KOps/s 5.6683 KOps/s $\color{#d91a1a}-0.39\%$
test_compile_copy_flat[tensordict-compile] 0.1956ms 0.1056ms 9.4683 KOps/s 9.6928 KOps/s $\color{#d91a1a}-2.32\%$
test_compile_copy_flat[tensordict-eager] 0.1612ms 59.8495μs 16.7086 KOps/s 17.5609 KOps/s $\color{#d91a1a}-4.85\%$
test_compile_copy_flat[pytree-compile] 0.1552ms 78.5939μs 12.7236 KOps/s 13.0467 KOps/s $\color{#d91a1a}-2.48\%$
test_compile_copy_flat[pytree-eager] 0.1306ms 70.2863μs 14.2275 KOps/s 14.6734 KOps/s $\color{#d91a1a}-3.04\%$
test_compile_assign_and_add[tensordict-compile] 0.3340ms 0.1969ms 5.0782 KOps/s 5.0713 KOps/s $\color{#35bf28}+0.14\%$
test_compile_assign_and_add[tensordict-eager] 2.1445ms 1.6526ms 605.1061 Ops/s 611.1594 Ops/s $\color{#d91a1a}-0.99\%$
test_compile_assign_and_add[pytree-compile] 0.3028ms 0.1931ms 5.1786 KOps/s 5.1902 KOps/s $\color{#d91a1a}-0.22\%$
test_compile_assign_and_add[pytree-eager] 1.8310ms 1.0836ms 922.8499 Ops/s 936.6354 Ops/s $\color{#d91a1a}-1.47\%$
test_compile_assign_and_add_stack[compile] 0.5223ms 0.4209ms 2.3756 KOps/s 2.3652 KOps/s $\color{#35bf28}+0.44\%$
test_compile_assign_and_add_stack[eager] 5.9997ms 3.9313ms 254.3658 Ops/s 267.1683 Ops/s $\color{#d91a1a}-4.79\%$
test_compile_indexing[tensor-tensordict-compile] 0.1009ms 35.1636μs 28.4385 KOps/s 29.1521 KOps/s $\color{#d91a1a}-2.45\%$
test_compile_indexing[tensor-tensordict-eager] 1.0655ms 47.3337μs 21.1266 KOps/s 21.4957 KOps/s $\color{#d91a1a}-1.72\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1090ms 29.5188μs 33.8767 KOps/s 33.5196 KOps/s $\color{#35bf28}+1.07\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1021ms 28.8401μs 34.6740 KOps/s 35.5792 KOps/s $\color{#d91a1a}-2.54\%$
test_compile_indexing[tensor-pytree-compile] 0.1063ms 29.4140μs 33.9974 KOps/s 32.9661 KOps/s $\color{#35bf28}+3.13\%$
test_compile_indexing[tensor-pytree-eager] 93.6360μs 29.4356μs 33.9724 KOps/s 35.2825 KOps/s $\color{#d91a1a}-3.71\%$
test_compile_indexing[slice-tensordict-compile] 0.1538ms 73.9409μs 13.5243 KOps/s 13.5873 KOps/s $\color{#d91a1a}-0.46\%$
test_compile_indexing[slice-tensordict-eager] 0.3735ms 28.0011μs 35.7129 KOps/s 36.9020 KOps/s $\color{#d91a1a}-3.22\%$
test_compile_indexing[slice-tensorclass-compile] 0.1534ms 68.7749μs 14.5402 KOps/s 14.8349 KOps/s $\color{#d91a1a}-1.99\%$
test_compile_indexing[slice-tensorclass-eager] 73.8580μs 23.2837μs 42.9485 KOps/s 43.8890 KOps/s $\color{#d91a1a}-2.14\%$
test_compile_indexing[slice-pytree-compile] 0.1856ms 67.8846μs 14.7309 KOps/s 14.9396 KOps/s $\color{#d91a1a}-1.40\%$
test_compile_indexing[slice-pytree-eager] 63.2180μs 23.4039μs 42.7279 KOps/s 44.0893 KOps/s $\color{#d91a1a}-3.09\%$
test_compile_indexing[int-tensordict-compile] 0.1805ms 74.2538μs 13.4673 KOps/s 13.7046 KOps/s $\color{#d91a1a}-1.73\%$
test_compile_indexing[int-tensordict-eager] 1.0261ms 27.9085μs 35.8314 KOps/s 37.0712 KOps/s $\color{#d91a1a}-3.34\%$
test_compile_indexing[int-tensorclass-compile] 0.3095ms 70.3797μs 14.2086 KOps/s 14.9343 KOps/s $\color{#d91a1a}-4.86\%$
test_compile_indexing[int-tensorclass-eager] 69.5100μs 23.1146μs 43.2628 KOps/s 44.1376 KOps/s $\color{#d91a1a}-1.98\%$
test_compile_indexing[int-pytree-compile] 0.1453ms 67.8278μs 14.7432 KOps/s 14.9297 KOps/s $\color{#d91a1a}-1.25\%$
test_compile_indexing[int-pytree-eager] 73.2270μs 23.0671μs 43.3518 KOps/s 44.1907 KOps/s $\color{#d91a1a}-1.90\%$
test_mod_add[eager] 94.7580μs 25.9495μs 38.5363 KOps/s 40.2539 KOps/s $\color{#d91a1a}-4.27\%$
test_mod_add[compile] 99.1760μs 39.1535μs 25.5405 KOps/s 25.7271 KOps/s $\color{#d91a1a}-0.73\%$
test_mod_add[compile-overhead] 88.8360μs 38.9173μs 25.6955 KOps/s 25.6494 KOps/s $\color{#35bf28}+0.18\%$
test_mod_wrap[eager] 0.3497ms 0.2119ms 4.7191 KOps/s 4.7350 KOps/s $\color{#d91a1a}-0.33\%$
test_mod_wrap[compile] 0.6170ms 0.2456ms 4.0719 KOps/s 4.2422 KOps/s $\color{#d91a1a}-4.01\%$
test_mod_wrap[compile-overhead] 0.7639ms 0.2344ms 4.2659 KOps/s 4.2862 KOps/s $\color{#d91a1a}-0.47\%$
test_mod_wrap_and_backward[eager] 13.4322ms 11.3196ms 88.3426 Ops/s 88.7243 Ops/s $\color{#d91a1a}-0.43\%$
test_mod_wrap_and_backward[compile] 16.1203ms 12.0888ms 82.7212 Ops/s 78.9625 Ops/s $\color{#35bf28}+4.76\%$
test_mod_wrap_and_backward[compile-overhead] 14.5347ms 11.7905ms 84.8142 Ops/s 87.0312 Ops/s $\color{#d91a1a}-2.55\%$
test_seq_add[eager] 0.1780ms 93.7766μs 10.6636 KOps/s 11.0667 KOps/s $\color{#d91a1a}-3.64\%$
test_seq_add[compile] 0.1636ms 65.7102μs 15.2183 KOps/s 15.5979 KOps/s $\color{#d91a1a}-2.43\%$
test_seq_add[compile-overhead] 0.1328ms 62.9611μs 15.8828 KOps/s 15.8953 KOps/s $\color{#d91a1a}-0.08\%$
test_seq_wrap[eager] 0.5603ms 0.3925ms 2.5479 KOps/s 2.5218 KOps/s $\color{#35bf28}+1.04\%$
test_seq_wrap[compile] 1.1874ms 0.2734ms 3.6579 KOps/s 3.6428 KOps/s $\color{#35bf28}+0.41\%$
test_seq_wrap[compile-overhead] 1.2524ms 0.2788ms 3.5870 KOps/s 3.6404 KOps/s $\color{#d91a1a}-1.47\%$
test_func_call_runtime[False-eager] 0.6870ms 0.5261ms 1.9009 KOps/s 1.8614 KOps/s $\color{#35bf28}+2.12\%$
test_func_call_runtime[False-compile] 0.6084ms 0.4989ms 2.0044 KOps/s 2.0005 KOps/s $\color{#35bf28}+0.20\%$
test_func_call_runtime[False-compile-overhead] 1.0587ms 0.5089ms 1.9651 KOps/s 1.9951 KOps/s $\color{#d91a1a}-1.50\%$
test_func_call_runtime[True-eager] 1.0917ms 0.7475ms 1.3379 KOps/s 1.3175 KOps/s $\color{#35bf28}+1.55\%$
test_func_call_runtime[True-compile] 0.9634ms 0.5155ms 1.9400 KOps/s 1.9110 KOps/s $\color{#35bf28}+1.52\%$
test_func_call_runtime[True-compile-overhead] 0.6151ms 0.5109ms 1.9574 KOps/s 1.9100 KOps/s $\color{#35bf28}+2.48\%$
test_func_call_cm_runtime[False-eager] 0.7801ms 0.5189ms 1.9270 KOps/s 1.8997 KOps/s $\color{#35bf28}+1.44\%$
test_func_call_cm_runtime[False-compile] 0.7687ms 0.4957ms 2.0172 KOps/s 1.9778 KOps/s $\color{#35bf28}+1.99\%$
test_func_call_cm_runtime[False-compile-overhead] 1.0526ms 0.4965ms 2.0142 KOps/s 1.9950 KOps/s $\color{#35bf28}+0.96\%$
test_func_call_cm_runtime[True-eager] 1.3950ms 0.8789ms 1.1377 KOps/s 1.1280 KOps/s $\color{#35bf28}+0.86\%$
test_func_call_cm_runtime[True-compile] 1.1498ms 0.7367ms 1.3575 KOps/s 1.3330 KOps/s $\color{#35bf28}+1.84\%$
test_func_call_cm_runtime[True-compile-overhead] 1.3415ms 0.7409ms 1.3498 KOps/s 1.3267 KOps/s $\color{#35bf28}+1.74\%$
test_vmap_func_call_cm_runtime[eager] 2.3638ms 1.8467ms 541.5097 Ops/s 537.2644 Ops/s $\color{#35bf28}+0.79\%$
test_vmap_func_call_cm_runtime[compile] 2.7057ms 1.8990ms 526.5870 Ops/s 503.5937 Ops/s $\color{#35bf28}+4.57\%$
test_vmap_func_call_cm_runtime[compile-overhead] 2.6303ms 1.9032ms 525.4310 Ops/s 523.4744 Ops/s $\color{#35bf28}+0.37\%$
test_distributed 0.2502ms 0.1247ms 8.0224 KOps/s 7.9604 KOps/s $\color{#35bf28}+0.78\%$
test_tdmodule 0.1674ms 20.0570μs 49.8578 KOps/s 54.4591 KOps/s $\textbf{\color{#d91a1a}-8.45\%}$
test_tdmodule_dispatch 71.1830μs 39.1946μs 25.5137 KOps/s 26.7488 KOps/s $\color{#d91a1a}-4.62\%$
test_tdseq 40.9370μs 21.9598μs 45.5379 KOps/s 46.5019 KOps/s $\color{#d91a1a}-2.07\%$
test_tdseq_dispatch 65.1020μs 44.0241μs 22.7148 KOps/s 22.6249 KOps/s $\color{#35bf28}+0.40\%$
test_instantiation_functorch 1.7041ms 1.5693ms 637.2401 Ops/s 618.5027 Ops/s $\color{#35bf28}+3.03\%$
test_instantiation_td 4.1929ms 1.1992ms 833.9171 Ops/s 862.6042 Ops/s $\color{#d91a1a}-3.33\%$
test_exec_functorch 0.2724ms 0.1844ms 5.4220 KOps/s 5.3810 KOps/s $\color{#35bf28}+0.76\%$
test_exec_functional_call 0.2842ms 0.1749ms 5.7165 KOps/s 5.7982 KOps/s $\color{#d91a1a}-1.41\%$
test_exec_td 0.2502ms 0.1752ms 5.7085 KOps/s 5.8878 KOps/s $\color{#d91a1a}-3.04\%$
test_exec_td_decorator 0.3170ms 0.2254ms 4.4369 KOps/s 4.4754 KOps/s $\color{#d91a1a}-0.86\%$
test_vmap_mlp_speed[True-True] 1.1029ms 0.6626ms 1.5092 KOps/s 1.5478 KOps/s $\color{#d91a1a}-2.49\%$
test_vmap_mlp_speed[True-False] 0.8669ms 0.6455ms 1.5493 KOps/s 1.5679 KOps/s $\color{#d91a1a}-1.19\%$
test_vmap_mlp_speed[False-True] 0.9780ms 0.4955ms 2.0180 KOps/s 2.0442 KOps/s $\color{#d91a1a}-1.28\%$
test_vmap_mlp_speed[False-False] 0.7807ms 0.4957ms 2.0173 KOps/s 2.0418 KOps/s $\color{#d91a1a}-1.20\%$
test_vmap_mlp_speed_decorator[True-True] 1.4455ms 0.6414ms 1.5590 KOps/s 1.6042 KOps/s $\color{#d91a1a}-2.82\%$
test_vmap_mlp_speed_decorator[True-False] 1.1250ms 0.6437ms 1.5535 KOps/s 1.6111 KOps/s $\color{#d91a1a}-3.58\%$
test_vmap_mlp_speed_decorator[False-True] 0.8266ms 0.5187ms 1.9279 KOps/s 1.9604 KOps/s $\color{#d91a1a}-1.66\%$
test_vmap_mlp_speed_decorator[False-False] 0.8025ms 0.5187ms 1.9278 KOps/s 1.9685 KOps/s $\color{#d91a1a}-2.07\%$
test_to_module_speed[True] 1.9223ms 1.3169ms 759.3339 Ops/s 777.0628 Ops/s $\color{#d91a1a}-2.28\%$
test_to_module_speed[False] 1.5056ms 1.2644ms 790.8737 Ops/s 799.3849 Ops/s $\color{#d91a1a}-1.06\%$
test_tc_init 0.1204ms 48.1498μs 20.7685 KOps/s 21.7712 KOps/s $\color{#d91a1a}-4.61\%$
test_tc_init_nested 0.1747ms 96.4526μs 10.3678 KOps/s 10.9840 KOps/s $\textbf{\color{#d91a1a}-5.61\%}$
test_tc_first_layer_tensor 17.9140μs 1.5321μs 652.6787 KOps/s 670.0815 KOps/s $\color{#d91a1a}-2.60\%$
test_tc_first_layer_nontensor 27.8220μs 4.6816μs 213.6019 KOps/s 216.1416 KOps/s $\color{#d91a1a}-1.17\%$
test_tc_second_layer_tensor 36.2540μs 2.8335μs 352.9169 KOps/s 362.1665 KOps/s $\color{#d91a1a}-2.55\%$
test_tc_second_layer_nontensor 27.8620μs 6.0408μs 165.5400 KOps/s 169.0408 KOps/s $\color{#d91a1a}-2.07\%$
test_unbind 0.4662s 13.0895ms 76.3973 Ops/s 75.8455 Ops/s $\color{#35bf28}+0.73\%$
test_full_like 8.6785ms 7.3364ms 136.3065 Ops/s 149.3339 Ops/s $\textbf{\color{#d91a1a}-8.72\%}$
test_zeros_like 3.4883ms 2.8714ms 348.2572 Ops/s 381.1541 Ops/s $\textbf{\color{#d91a1a}-8.63\%}$
test_ones_like 3.6852ms 3.3257ms 300.6894 Ops/s 328.5795 Ops/s $\textbf{\color{#d91a1a}-8.49\%}$
test_clone 5.8746ms 5.1104ms 195.6780 Ops/s 210.3674 Ops/s $\textbf{\color{#d91a1a}-6.98\%}$
test_squeeze 74.3490μs 13.0779μs 76.4649 KOps/s 73.4051 KOps/s $\color{#35bf28}+4.17\%$
test_unsqueeze 0.1861ms 93.0653μs 10.7451 KOps/s 10.8166 KOps/s $\color{#d91a1a}-0.66\%$
test_split 0.5568ms 0.1939ms 5.1570 KOps/s 5.1202 KOps/s $\color{#35bf28}+0.72\%$
test_permute 0.3089ms 0.2191ms 4.5651 KOps/s 4.6143 KOps/s $\color{#d91a1a}-1.07\%$
test_stack 30.7159ms 26.4943ms 37.7439 Ops/s 40.4269 Ops/s $\textbf{\color{#d91a1a}-6.64\%}$
test_cat 28.1366ms 25.7979ms 38.7629 Ops/s 40.7365 Ops/s $\color{#d91a1a}-4.84\%$

Copy link

github-actions bot commented Sep 25, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 228. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}30$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1473ms 14.6117μs 68.4385 KOps/s 76.5902 KOps/s $\textbf{\color{#d91a1a}-10.64\%}$
test_plain_set_stack_nested 48.0410μs 14.6277μs 68.3632 KOps/s 75.2726 KOps/s $\textbf{\color{#d91a1a}-9.18\%}$
test_plain_set_nested_inplace 45.6010μs 15.7354μs 63.5511 KOps/s 70.8747 KOps/s $\textbf{\color{#d91a1a}-10.33\%}$
test_plain_set_stack_nested_inplace 43.7610μs 15.7963μs 63.3060 KOps/s 71.1738 KOps/s $\textbf{\color{#d91a1a}-11.05\%}$
test_items 25.1500μs 2.9266μs 341.6961 KOps/s 341.4362 KOps/s $\color{#35bf28}+0.08\%$
test_items_nested 0.3655ms 0.3220ms 3.1058 KOps/s 3.0951 KOps/s $\color{#35bf28}+0.35\%$
test_items_nested_locked 0.3953ms 0.3271ms 3.0573 KOps/s 3.0645 KOps/s $\color{#d91a1a}-0.24\%$
test_items_nested_leaf 90.4410μs 55.6422μs 17.9720 KOps/s 17.9633 KOps/s $\color{#35bf28}+0.05\%$
test_items_stack_nested 0.3863ms 0.3267ms 3.0608 KOps/s 3.0584 KOps/s $\color{#35bf28}+0.08\%$
test_items_stack_nested_leaf 93.1810μs 56.7889μs 17.6091 KOps/s 17.5020 KOps/s $\color{#35bf28}+0.61\%$
test_items_stack_nested_locked 0.3858ms 0.3294ms 3.0361 KOps/s 3.0705 KOps/s $\color{#d91a1a}-1.12\%$
test_keys 28.1300μs 3.3957μs 294.4900 KOps/s 286.8991 KOps/s $\color{#35bf28}+2.65\%$
test_keys_nested 89.7610μs 55.5731μs 17.9943 KOps/s 18.2205 KOps/s $\color{#d91a1a}-1.24\%$
test_keys_nested_locked 2.2021ms 61.9875μs 16.1323 KOps/s 16.0659 KOps/s $\color{#35bf28}+0.41\%$
test_keys_nested_leaf 76.2110μs 47.3708μs 21.1101 KOps/s 21.0465 KOps/s $\color{#35bf28}+0.30\%$
test_keys_stack_nested 90.7010μs 55.8950μs 17.8907 KOps/s 17.6864 KOps/s $\color{#35bf28}+1.15\%$
test_keys_stack_nested_leaf 77.9810μs 47.7978μs 20.9215 KOps/s 20.7600 KOps/s $\color{#35bf28}+0.78\%$
test_keys_stack_nested_locked 96.1910μs 60.9235μs 16.4140 KOps/s 15.9931 KOps/s $\color{#35bf28}+2.63\%$
test_values 4.8000μs 0.8370μs 1.1948 MOps/s 1.1747 MOps/s $\color{#35bf28}+1.71\%$
test_values_nested 94.2010μs 40.5860μs 24.6390 KOps/s 24.4954 KOps/s $\color{#35bf28}+0.59\%$
test_values_nested_locked 75.8400μs 42.7900μs 23.3700 KOps/s 23.5153 KOps/s $\color{#d91a1a}-0.62\%$
test_values_nested_leaf 64.4310μs 35.1779μs 28.4270 KOps/s 28.4630 KOps/s $\color{#d91a1a}-0.13\%$
test_values_stack_nested 74.3710μs 41.5323μs 24.0777 KOps/s 23.9836 KOps/s $\color{#35bf28}+0.39\%$
test_values_stack_nested_leaf 66.9710μs 35.5549μs 28.1255 KOps/s 27.9393 KOps/s $\color{#35bf28}+0.67\%$
test_values_stack_nested_locked 83.7710μs 43.7535μs 22.8553 KOps/s 22.9430 KOps/s $\color{#d91a1a}-0.38\%$
test_membership 1.8405μs 0.5052μs 1.9795 MOps/s 1.9781 MOps/s $\color{#35bf28}+0.07\%$
test_membership_nested 18.6905μs 1.8052μs 553.9682 KOps/s 547.4736 KOps/s $\color{#35bf28}+1.19\%$
test_membership_nested_leaf 10.5367μs 1.7903μs 558.5689 KOps/s 552.7976 KOps/s $\color{#35bf28}+1.04\%$
test_membership_stacked_nested 46.1900μs 1.8732μs 533.8462 KOps/s 528.2587 KOps/s $\color{#35bf28}+1.06\%$
test_membership_stacked_nested_leaf 32.9000μs 1.8835μs 530.9283 KOps/s 533.5136 KOps/s $\color{#d91a1a}-0.48\%$
test_membership_nested_last 35.8600μs 2.7465μs 364.0940 KOps/s 364.9603 KOps/s $\color{#d91a1a}-0.24\%$
test_membership_nested_leaf_last 27.9310μs 2.7677μs 361.3100 KOps/s 363.7175 KOps/s $\color{#d91a1a}-0.66\%$
test_membership_stacked_nested_last 38.4700μs 7.8120μs 128.0077 KOps/s 316.5517 KOps/s $\textbf{\color{#d91a1a}-59.56\%}$
test_membership_stacked_nested_leaf_last 45.0000μs 7.8806μs 126.8941 KOps/s 318.4709 KOps/s $\textbf{\color{#d91a1a}-60.16\%}$
test_nested_getleaf 33.1800μs 6.0865μs 164.2975 KOps/s 163.0349 KOps/s $\color{#35bf28}+0.77\%$
test_nested_get 28.0810μs 5.7203μs 174.8171 KOps/s 172.6997 KOps/s $\color{#35bf28}+1.23\%$
test_stacked_getleaf 38.8000μs 6.0710μs 164.7179 KOps/s 164.5571 KOps/s $\color{#35bf28}+0.10\%$
test_stacked_get 34.6010μs 5.7501μs 173.9107 KOps/s 176.2136 KOps/s $\color{#d91a1a}-1.31\%$
test_nested_getitemleaf 35.8600μs 6.1162μs 163.4994 KOps/s 163.3630 KOps/s $\color{#35bf28}+0.08\%$
test_nested_getitem 28.5200μs 5.7728μs 173.2271 KOps/s 172.3702 KOps/s $\color{#35bf28}+0.50\%$
test_stacked_getitemleaf 36.1600μs 6.1178μs 163.4585 KOps/s 163.8933 KOps/s $\color{#d91a1a}-0.27\%$
test_stacked_getitem 25.9100μs 5.7649μs 173.4632 KOps/s 172.9474 KOps/s $\color{#35bf28}+0.30\%$
test_lock_nested 4.7092ms 0.4185ms 2.3896 KOps/s 2.4364 KOps/s $\color{#d91a1a}-1.92\%$
test_lock_stack_nested 0.4191ms 0.3681ms 2.7166 KOps/s 2.6785 KOps/s $\color{#35bf28}+1.43\%$
test_unlock_nested 0.7462ms 0.3528ms 2.8347 KOps/s 2.8608 KOps/s $\color{#d91a1a}-0.91\%$
test_unlock_stack_nested 0.3335ms 0.3071ms 3.2563 KOps/s 3.1993 KOps/s $\color{#35bf28}+1.78\%$
test_flatten_speed 99.4210μs 70.3966μs 14.2052 KOps/s 14.6724 KOps/s $\color{#d91a1a}-3.18\%$
test_unflatten_speed 0.3442ms 0.2825ms 3.5393 KOps/s 3.5291 KOps/s $\color{#35bf28}+0.29\%$
test_common_ops 1.5744ms 1.2633ms 791.5703 Ops/s 845.6460 Ops/s $\textbf{\color{#d91a1a}-6.39\%}$
test_creation 32.6700μs 1.4668μs 681.7752 KOps/s 698.2502 KOps/s $\color{#d91a1a}-2.36\%$
test_creation_empty 51.3010μs 16.9207μs 59.0994 KOps/s 73.4737 KOps/s $\textbf{\color{#d91a1a}-19.56\%}$
test_creation_nested_1 48.0400μs 18.8575μs 53.0292 KOps/s 65.2287 KOps/s $\textbf{\color{#d91a1a}-18.70\%}$
test_creation_nested_2 57.7700μs 21.3673μs 46.8005 KOps/s 55.4558 KOps/s $\textbf{\color{#d91a1a}-15.61\%}$
test_clone 70.7210μs 28.4263μs 35.1786 KOps/s 35.7000 KOps/s $\color{#d91a1a}-1.46\%$
test_getitem[int] 91.0624ms 22.7164μs 44.0210 KOps/s 64.5042 KOps/s $\textbf{\color{#d91a1a}-31.75\%}$
test_getitem[slice_int] 0.1182ms 26.6866μs 37.4720 KOps/s 37.2882 KOps/s $\color{#35bf28}+0.49\%$
test_getitem[range] 0.2324ms 0.1085ms 9.2176 KOps/s 9.6174 KOps/s $\color{#d91a1a}-4.16\%$
test_getitem[tuple] 0.1216ms 23.2436μs 43.0226 KOps/s 43.5181 KOps/s $\color{#d91a1a}-1.14\%$
test_getitem[list] 0.1945ms 97.5885μs 10.2471 KOps/s 10.2296 KOps/s $\color{#35bf28}+0.17\%$
test_setitem_dim[int] 66.7300μs 44.5713μs 22.4359 KOps/s 21.8092 KOps/s $\color{#35bf28}+2.87\%$
test_setitem_dim[slice_int] 91.5910μs 67.5034μs 14.8141 KOps/s 15.1261 KOps/s $\color{#d91a1a}-2.06\%$
test_setitem_dim[range] 0.1605ms 0.1259ms 7.9453 KOps/s 8.1035 KOps/s $\color{#d91a1a}-1.95\%$
test_setitem_dim[tuple] 85.1310μs 60.3367μs 16.5737 KOps/s 16.9289 KOps/s $\color{#d91a1a}-2.10\%$
test_setitem 89.2210μs 41.5768μs 24.0519 KOps/s 25.6618 KOps/s $\textbf{\color{#d91a1a}-6.27\%}$
test_set 78.4110μs 40.8605μs 24.4735 KOps/s 26.4357 KOps/s $\textbf{\color{#d91a1a}-7.42\%}$
test_set_shared 0.3331ms 49.9682μs 20.0127 KOps/s 20.2516 KOps/s $\color{#d91a1a}-1.18\%$
test_update 87.1310μs 49.5877μs 20.1663 KOps/s 21.9836 KOps/s $\textbf{\color{#d91a1a}-8.27\%}$
test_update_nested 88.3200μs 57.2713μs 17.4608 KOps/s 19.0265 KOps/s $\textbf{\color{#d91a1a}-8.23\%}$
test_update__nested 0.1126ms 57.5725μs 17.3694 KOps/s 17.4709 KOps/s $\color{#d91a1a}-0.58\%$
test_set_nested 90.5810μs 42.8461μs 23.3394 KOps/s 25.0268 KOps/s $\textbf{\color{#d91a1a}-6.74\%}$
test_set_nested_new 82.4200μs 46.4655μs 21.5214 KOps/s 22.4436 KOps/s $\color{#d91a1a}-4.11\%$
test_select 97.5120μs 59.6215μs 16.7725 KOps/s 17.4946 KOps/s $\color{#d91a1a}-4.13\%$
test_select_nested 0.4913ms 42.7345μs 23.4003 KOps/s 23.9539 KOps/s $\color{#d91a1a}-2.31\%$
test_exclude_nested 92.3910μs 58.5374μs 17.0831 KOps/s 17.3942 KOps/s $\color{#d91a1a}-1.79\%$
test_empty[True] 0.2732ms 0.2425ms 4.1242 KOps/s 4.1013 KOps/s $\color{#35bf28}+0.56\%$
test_empty[False] 3.7940μs 0.7395μs 1.3523 MOps/s 1.3526 MOps/s $\color{#d91a1a}-0.02\%$
test_to 53.9010μs 24.5707μs 40.6988 KOps/s 39.8909 KOps/s $\color{#35bf28}+2.03\%$
test_to_nonblocking 62.8210μs 23.6418μs 42.2980 KOps/s 41.4903 KOps/s $\color{#35bf28}+1.95\%$
test_unbind_speed 0.3365ms 0.2753ms 3.6328 KOps/s 3.6998 KOps/s $\color{#d91a1a}-1.81\%$
test_unbind_speed_stack0 0.3165ms 0.2656ms 3.7651 KOps/s 3.7310 KOps/s $\color{#35bf28}+0.91\%$
test_unbind_speed_stack1 90.6388ms 0.6965ms 1.4358 KOps/s 1.4419 KOps/s $\color{#d91a1a}-0.42\%$
test_split 92.1653ms 2.1486ms 465.4150 Ops/s 470.8769 Ops/s $\color{#d91a1a}-1.16\%$
test_chunk 94.1542ms 2.1559ms 463.8447 Ops/s 472.2673 Ops/s $\color{#d91a1a}-1.78\%$
test_creation[device0] 0.3405ms 0.1229ms 8.1365 KOps/s 8.0612 KOps/s $\color{#35bf28}+0.93\%$
test_creation_from_tensor 0.4876ms 0.1257ms 7.9540 KOps/s 7.8767 KOps/s $\color{#35bf28}+0.98\%$
test_add_one[memmap_tensor0] 0.1735ms 8.5236μs 117.3213 KOps/s 115.4556 KOps/s $\color{#35bf28}+1.62\%$
test_contiguous[memmap_tensor0] 30.0000μs 2.1753μs 459.6978 KOps/s 445.3857 KOps/s $\color{#35bf28}+3.21\%$
test_stack[memmap_tensor0] 36.2300μs 6.4397μs 155.2879 KOps/s 151.8354 KOps/s $\color{#35bf28}+2.27\%$
test_memmaptd_index 1.0033ms 0.4120ms 2.4270 KOps/s 2.4633 KOps/s $\color{#d91a1a}-1.48\%$
test_memmaptd_index_astensor 0.7162ms 0.4621ms 2.1643 KOps/s 2.1640 KOps/s $+0.01\%$
test_memmaptd_index_op 1.4035ms 1.0038ms 996.2011 Ops/s 1.0370 KOps/s $\color{#d91a1a}-3.93\%$
test_serialize_model 0.1307s 0.1299s 7.6992 Ops/s 7.7409 Ops/s $\color{#d91a1a}-0.54\%$
test_serialize_model_pickle 1.3464s 1.2118s 0.8252 Ops/s 0.8207 Ops/s $\color{#35bf28}+0.56\%$
test_serialize_weights 0.1310s 0.1292s 7.7429 Ops/s 7.0179 Ops/s $\textbf{\color{#35bf28}+10.33\%}$
test_serialize_weights_returnearly 0.2377s 61.5034ms 16.2593 Ops/s 18.0950 Ops/s $\textbf{\color{#d91a1a}-10.14\%}$
test_serialize_weights_pickle 1.7665s 1.2603s 0.7935 Ops/s 0.6197 Ops/s $\textbf{\color{#35bf28}+28.05\%}$
test_reshape_pytree 64.2800μs 34.9538μs 28.6092 KOps/s 28.9190 KOps/s $\color{#d91a1a}-1.07\%$
test_reshape_td 77.1010μs 41.4799μs 24.1081 KOps/s 24.3113 KOps/s $\color{#d91a1a}-0.84\%$
test_view_pytree 73.9810μs 34.9471μs 28.6147 KOps/s 29.1777 KOps/s $\color{#d91a1a}-1.93\%$
test_view_td 82.1210μs 45.1941μs 22.1268 KOps/s 21.3256 KOps/s $\color{#35bf28}+3.76\%$
test_unbind_pytree 64.0900μs 34.0869μs 29.3368 KOps/s 29.9581 KOps/s $\color{#d91a1a}-2.07\%$
test_unbind_td 0.7108ms 42.5218μs 23.5174 KOps/s 23.8993 KOps/s $\color{#d91a1a}-1.60\%$
test_split_pytree 78.1910μs 47.0598μs 21.2496 KOps/s 22.6536 KOps/s $\textbf{\color{#d91a1a}-6.20\%}$
test_split_td 0.1839ms 55.1990μs 18.1163 KOps/s 15.8880 KOps/s $\textbf{\color{#35bf28}+14.02\%}$
test_add_pytree 0.1028ms 55.2818μs 18.0891 KOps/s 18.4068 KOps/s $\color{#d91a1a}-1.73\%$
test_add_td 0.1452ms 90.8862μs 11.0028 KOps/s 11.7744 KOps/s $\textbf{\color{#d91a1a}-6.55\%}$
test_compile_add_one_nested[tensordict-compile] 0.4054ms 0.2088ms 4.7886 KOps/s 4.8204 KOps/s $\color{#d91a1a}-0.66\%$
test_compile_add_one_nested[tensordict-eager] 0.1937ms 0.1490ms 6.7093 KOps/s 6.8061 KOps/s $\color{#d91a1a}-1.42\%$
test_compile_add_one_nested[pytree-compile] 0.2073ms 0.1469ms 6.8062 KOps/s 7.1389 KOps/s $\color{#d91a1a}-4.66\%$
test_compile_add_one_nested[pytree-eager] 0.2409ms 0.1777ms 5.6286 KOps/s 5.8290 KOps/s $\color{#d91a1a}-3.44\%$
test_compile_copy_nested[tensordict-compile] 55.6610μs 22.2324μs 44.9793 KOps/s 50.0286 KOps/s $\textbf{\color{#d91a1a}-10.09\%}$
test_compile_copy_nested[tensordict-eager] 87.4510μs 43.3828μs 23.0506 KOps/s 23.7127 KOps/s $\color{#d91a1a}-2.79\%$
test_compile_copy_nested[pytree-compile] 0.2420ms 63.0894μs 15.8505 KOps/s 16.0430 KOps/s $\color{#d91a1a}-1.20\%$
test_compile_copy_nested[pytree-eager] 0.1005ms 49.0015μs 20.4075 KOps/s 20.4780 KOps/s $\color{#d91a1a}-0.34\%$
test_compile_add_one_flat[tensordict-compile] 0.4081ms 0.3094ms 3.2321 KOps/s 3.2487 KOps/s $\color{#d91a1a}-0.51\%$
test_compile_add_one_flat[tensordict-eager] 0.2692ms 0.2079ms 4.8101 KOps/s 4.8680 KOps/s $\color{#d91a1a}-1.19\%$
test_compile_add_one_flat[tensorclass-compile] 0.1720ms 0.1269ms 7.8776 KOps/s 8.0869 KOps/s $\color{#d91a1a}-2.59\%$
test_compile_add_one_flat[tensorclass-eager] 0.1099ms 61.6414μs 16.2229 KOps/s 17.2910 KOps/s $\textbf{\color{#d91a1a}-6.18\%}$
test_compile_add_one_flat[pytree-compile] 0.3685ms 0.3107ms 3.2184 KOps/s 3.2147 KOps/s $\color{#35bf28}+0.12\%$
test_compile_add_one_flat[pytree-eager] 0.7185ms 0.6040ms 1.6556 KOps/s 1.7244 KOps/s $\color{#d91a1a}-3.99\%$
test_compile_add_self_flat[tensordict-eager] 0.2946ms 0.2464ms 4.0578 KOps/s 4.0648 KOps/s $\color{#d91a1a}-0.17\%$
test_compile_add_self_flat[tensordict-compile] 0.3660ms 0.3119ms 3.2059 KOps/s 3.2300 KOps/s $\color{#d91a1a}-0.74\%$
test_compile_add_self_flat[tensorclass-eager] 0.1065ms 69.1628μs 14.4586 KOps/s 14.7985 KOps/s $\color{#d91a1a}-2.30\%$
test_compile_add_self_flat[tensorclass-compile] 0.1793ms 0.1259ms 7.9401 KOps/s 8.1215 KOps/s $\color{#d91a1a}-2.23\%$
test_compile_add_self_flat[pytree-eager] 0.5967ms 0.5147ms 1.9429 KOps/s 1.9997 KOps/s $\color{#d91a1a}-2.84\%$
test_compile_add_self_flat[pytree-compile] 0.3557ms 0.3094ms 3.2319 KOps/s 3.2330 KOps/s $\color{#d91a1a}-0.03\%$
test_compile_copy_flat[tensordict-compile] 43.3700μs 17.8564μs 56.0022 KOps/s 60.4932 KOps/s $\textbf{\color{#d91a1a}-7.42\%}$
test_compile_copy_flat[tensordict-eager] 69.4100μs 27.4253μs 36.4627 KOps/s 37.2380 KOps/s $\color{#d91a1a}-2.08\%$
test_compile_copy_flat[pytree-compile] 0.1070ms 67.9420μs 14.7184 KOps/s 14.8054 KOps/s $\color{#d91a1a}-0.59\%$
test_compile_copy_flat[pytree-eager] 91.8900μs 51.8180μs 19.2983 KOps/s 19.6484 KOps/s $\color{#d91a1a}-1.78\%$
test_compile_assign_and_add[tensordict-compile] 2.3056ms 0.8075ms 1.2384 KOps/s 1.1388 KOps/s $\textbf{\color{#35bf28}+8.75\%}$
test_compile_assign_and_add[tensordict-eager] 3.2641ms 3.0763ms 325.0613 Ops/s 327.9306 Ops/s $\color{#d91a1a}-0.87\%$
test_compile_assign_and_add[pytree-compile] 2.2796ms 0.8018ms 1.2473 KOps/s 1.1353 KOps/s $\textbf{\color{#35bf28}+9.86\%}$
test_compile_assign_and_add[pytree-eager] 3.2487ms 3.1033ms 322.2415 Ops/s 325.7881 Ops/s $\color{#d91a1a}-1.09\%$
test_compile_indexing[tensor-tensordict-compile] 0.1591ms 0.1080ms 9.2561 KOps/s 9.3371 KOps/s $\color{#d91a1a}-0.87\%$
test_compile_indexing[tensor-tensordict-eager] 0.1915ms 60.1702μs 16.6195 KOps/s 16.0320 KOps/s $\color{#35bf28}+3.66\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1409ms 0.1009ms 9.9061 KOps/s 9.7728 KOps/s $\color{#35bf28}+1.36\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1720ms 42.5894μs 23.4800 KOps/s 23.3756 KOps/s $\color{#35bf28}+0.45\%$
test_compile_indexing[tensor-pytree-compile] 0.1496ms 0.1052ms 9.5035 KOps/s 9.8887 KOps/s $\color{#d91a1a}-3.90\%$
test_compile_indexing[tensor-pytree-eager] 87.9610μs 42.5063μs 23.5259 KOps/s 22.6068 KOps/s $\color{#35bf28}+4.07\%$
test_compile_indexing[slice-tensordict-compile] 0.2059ms 0.1386ms 7.2144 KOps/s 7.5614 KOps/s $\color{#d91a1a}-4.59\%$
test_compile_indexing[slice-tensordict-eager] 0.1555ms 24.0468μs 41.5855 KOps/s 41.6255 KOps/s $\color{#d91a1a}-0.10\%$
test_compile_indexing[slice-tensorclass-compile] 0.2071ms 0.1276ms 7.8378 KOps/s 7.9461 KOps/s $\color{#d91a1a}-1.36\%$
test_compile_indexing[slice-tensorclass-eager] 70.9410μs 20.1958μs 49.5153 KOps/s 49.5986 KOps/s $\color{#d91a1a}-0.17\%$
test_compile_indexing[slice-pytree-compile] 0.1850ms 0.1281ms 7.8090 KOps/s 7.8376 KOps/s $\color{#d91a1a}-0.36\%$
test_compile_indexing[slice-pytree-eager] 57.3710μs 20.3708μs 49.0899 KOps/s 50.5460 KOps/s $\color{#d91a1a}-2.88\%$
test_compile_indexing[int-tensordict-compile] 0.1826ms 0.1344ms 7.4409 KOps/s 7.5187 KOps/s $\color{#d91a1a}-1.03\%$
test_compile_indexing[int-tensordict-eager] 0.5111ms 23.8349μs 41.9553 KOps/s 41.2057 KOps/s $\color{#35bf28}+1.82\%$
test_compile_indexing[int-tensorclass-compile] 0.1634ms 0.1280ms 7.8110 KOps/s 7.8787 KOps/s $\color{#d91a1a}-0.86\%$
test_compile_indexing[int-tensorclass-eager] 0.1199ms 20.8187μs 48.0337 KOps/s 49.8992 KOps/s $\color{#d91a1a}-3.74\%$
test_compile_indexing[int-pytree-compile] 0.1906ms 0.1280ms 7.8128 KOps/s 7.8503 KOps/s $\color{#d91a1a}-0.48\%$
test_compile_indexing[int-pytree-eager] 52.1010μs 20.2113μs 49.4772 KOps/s 49.7982 KOps/s $\color{#d91a1a}-0.64\%$
test_mod_add[eager] 71.8010μs 31.5597μs 31.6860 KOps/s 33.9726 KOps/s $\textbf{\color{#d91a1a}-6.73\%}$
test_mod_add[compile] 0.1751ms 68.4185μs 14.6159 KOps/s 14.2624 KOps/s $\color{#35bf28}+2.48\%$
test_mod_add[compile-overhead] 0.2616ms 0.1330ms 7.5173 KOps/s 7.0118 KOps/s $\textbf{\color{#35bf28}+7.21\%}$
test_mod_wrap[eager] 0.3464ms 0.2443ms 4.0934 KOps/s 4.1686 KOps/s $\color{#d91a1a}-1.80\%$
test_mod_wrap[compile] 0.6390ms 0.2841ms 3.5197 KOps/s 3.4206 KOps/s $\color{#35bf28}+2.90\%$
test_mod_wrap[compile-overhead] 7.6480ms 4.1055ms 243.5728 Ops/s 246.0068 Ops/s $\color{#d91a1a}-0.99\%$
test_mod_wrap_and_backward[eager] 1.4965ms 1.3657ms 732.2291 Ops/s 679.5456 Ops/s $\textbf{\color{#35bf28}+7.75\%}$
test_mod_wrap_and_backward[compile] 1.5872ms 1.3106ms 763.0222 Ops/s 701.3656 Ops/s $\textbf{\color{#35bf28}+8.79\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3376ms 0.8990ms 1.1123 KOps/s 953.0049 Ops/s $\textbf{\color{#35bf28}+16.72\%}$
test_seq_add[eager] 0.1517ms 93.6108μs 10.6825 KOps/s 10.7992 KOps/s $\color{#d91a1a}-1.08\%$
test_seq_add[compile] 0.5246ms 77.2442μs 12.9460 KOps/s 12.4963 KOps/s $\color{#35bf28}+3.60\%$
test_seq_add[compile-overhead] 0.1603ms 0.1113ms 8.9854 KOps/s 9.0779 KOps/s $\color{#d91a1a}-1.02\%$
test_seq_wrap[eager] 0.4418ms 0.3799ms 2.6323 KOps/s 2.6760 KOps/s $\color{#d91a1a}-1.63\%$
test_seq_wrap[compile] 0.3778ms 0.3035ms 3.2944 KOps/s 3.2190 KOps/s $\color{#35bf28}+2.34\%$
test_seq_wrap[compile-overhead] 0.2592ms 0.2137ms 4.6786 KOps/s 4.5779 KOps/s $\color{#35bf28}+2.20\%$
test_func_call_runtime[False-eager] 0.8184ms 0.7347ms 1.3611 KOps/s 1.3064 KOps/s $\color{#35bf28}+4.19\%$
test_func_call_runtime[False-compile] 0.9276ms 0.7561ms 1.3226 KOps/s 1.2997 KOps/s $\color{#35bf28}+1.77\%$
test_func_call_runtime[False-compile-overhead] 0.4204ms 0.3523ms 2.8386 KOps/s 2.8019 KOps/s $\color{#35bf28}+1.31\%$
test_func_call_runtime[True-eager] 0.9610ms 0.8894ms 1.1243 KOps/s 1.1072 KOps/s $\color{#35bf28}+1.55\%$
test_func_call_runtime[True-compile] 0.9383ms 0.7758ms 1.2889 KOps/s 1.2685 KOps/s $\color{#35bf28}+1.61\%$
test_func_call_runtime[True-compile-overhead] 0.4892ms 0.3740ms 2.6739 KOps/s 2.6759 KOps/s $\color{#d91a1a}-0.07\%$
test_func_call_cm_runtime[False-eager] 0.7840ms 0.7266ms 1.3762 KOps/s 1.3613 KOps/s $\color{#35bf28}+1.10\%$
test_func_call_cm_runtime[False-compile] 0.8891ms 0.7570ms 1.3209 KOps/s 1.2953 KOps/s $\color{#35bf28}+1.98\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4412ms 0.3530ms 2.8329 KOps/s 2.8085 KOps/s $\color{#35bf28}+0.87\%$
test_func_call_cm_runtime[True-eager] 1.0686ms 0.9783ms 1.0222 KOps/s 1.0051 KOps/s $\color{#35bf28}+1.70\%$
test_func_call_cm_runtime[True-compile] 1.0260ms 0.8109ms 1.2332 KOps/s 1.2171 KOps/s $\color{#35bf28}+1.32\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4414ms 0.3970ms 2.5189 KOps/s 2.4920 KOps/s $\color{#35bf28}+1.08\%$
test_vmap_func_call_cm_runtime[eager] 2.4962ms 2.0532ms 487.0368 Ops/s 483.0382 Ops/s $\color{#35bf28}+0.83\%$
test_vmap_func_call_cm_runtime[compile] 0.9428ms 0.8182ms 1.2221 KOps/s 1.1962 KOps/s $\color{#35bf28}+2.16\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4583ms 0.4050ms 2.4689 KOps/s 2.4694 KOps/s $\color{#d91a1a}-0.02\%$
test_distributed 2.7136ms 0.1894ms 5.2802 KOps/s 8.8202 KOps/s $\textbf{\color{#d91a1a}-40.14\%}$
test_tdmodule 0.2631ms 15.9217μs 62.8073 KOps/s 69.4944 KOps/s $\textbf{\color{#d91a1a}-9.62\%}$
test_tdmodule_dispatch 50.2400μs 30.5473μs 32.7361 KOps/s 37.5434 KOps/s $\textbf{\color{#d91a1a}-12.80\%}$
test_tdseq 25.3800μs 16.3703μs 61.0864 KOps/s 68.2745 KOps/s $\textbf{\color{#d91a1a}-10.53\%}$
test_tdseq_dispatch 53.9210μs 33.4116μs 29.9298 KOps/s 34.3292 KOps/s $\textbf{\color{#d91a1a}-12.82\%}$
test_instantiation_functorch 2.0047ms 1.8308ms 546.2184 Ops/s 548.5747 Ops/s $\color{#d91a1a}-0.43\%$
test_instantiation_td 1.7687ms 1.1685ms 855.8304 Ops/s 848.6527 Ops/s $\color{#35bf28}+0.85\%$
test_exec_functorch 0.3496ms 0.2110ms 4.7391 KOps/s 4.8194 KOps/s $\color{#d91a1a}-1.66\%$
test_exec_functional_call 0.3115ms 0.2234ms 4.4761 KOps/s 4.6685 KOps/s $\color{#d91a1a}-4.12\%$
test_exec_td 0.2789ms 0.2283ms 4.3795 KOps/s 4.3335 KOps/s $\color{#35bf28}+1.06\%$
test_exec_td_decorator 0.8523ms 0.2701ms 3.7022 KOps/s 3.8768 KOps/s $\color{#d91a1a}-4.50\%$
test_vmap_mlp_speed[True-True] 0.7666ms 0.6830ms 1.4642 KOps/s 1.4668 KOps/s $\color{#d91a1a}-0.18\%$
test_vmap_mlp_speed[True-False] 0.7614ms 0.6841ms 1.4618 KOps/s 1.4706 KOps/s $\color{#d91a1a}-0.60\%$
test_vmap_mlp_speed[False-True] 0.6612ms 0.5774ms 1.7320 KOps/s 1.7333 KOps/s $\color{#d91a1a}-0.07\%$
test_vmap_mlp_speed[False-False] 0.6420ms 0.5736ms 1.7435 KOps/s 1.7592 KOps/s $\color{#d91a1a}-0.89\%$
test_vmap_mlp_speed_decorator[True-True] 1.2335ms 0.6727ms 1.4864 KOps/s 1.5092 KOps/s $\color{#d91a1a}-1.50\%$
test_vmap_mlp_speed_decorator[True-False] 0.7840ms 0.6708ms 1.4908 KOps/s 1.5015 KOps/s $\color{#d91a1a}-0.71\%$
test_vmap_mlp_speed_decorator[False-True] 0.7272ms 0.5843ms 1.7114 KOps/s 1.6906 KOps/s $\color{#35bf28}+1.23\%$
test_vmap_mlp_speed_decorator[False-False] 0.6886ms 0.5854ms 1.7082 KOps/s 1.6720 KOps/s $\color{#35bf28}+2.16\%$
test_vmap_transformer_speed[True-True] 8.3700ms 8.2960ms 120.5400 Ops/s 120.1404 Ops/s $\color{#35bf28}+0.33\%$
test_vmap_transformer_speed[True-False] 8.6670ms 8.3971ms 119.0891 Ops/s 120.2400 Ops/s $\color{#d91a1a}-0.96\%$
test_vmap_transformer_speed[False-True] 8.4092ms 8.2957ms 120.5448 Ops/s 123.0963 Ops/s $\color{#d91a1a}-2.07\%$
test_vmap_transformer_speed[False-False] 8.5334ms 8.3108ms 120.3253 Ops/s 123.8357 Ops/s $\color{#d91a1a}-2.83\%$
test_vmap_transformer_speed_decorator[True-True] 20.4039ms 20.0035ms 49.9912 Ops/s 51.7147 Ops/s $\color{#d91a1a}-3.33\%$
test_vmap_transformer_speed_decorator[True-False] 20.1065ms 19.5820ms 51.0673 Ops/s 51.6551 Ops/s $\color{#d91a1a}-1.14\%$
test_vmap_transformer_speed_decorator[False-True] 19.6443ms 19.2568ms 51.9298 Ops/s 52.1132 Ops/s $\color{#d91a1a}-0.35\%$
test_vmap_transformer_speed_decorator[False-False] 19.3843ms 19.2839ms 51.8568 Ops/s 52.0436 Ops/s $\color{#d91a1a}-0.36\%$
test_to_module_speed[True] 1.3873ms 0.9345ms 1.0701 KOps/s 1.0678 KOps/s $\color{#35bf28}+0.21\%$
test_to_module_speed[False] 1.2673ms 0.8915ms 1.1217 KOps/s 1.0951 KOps/s $\color{#35bf28}+2.43\%$
test_tc_init 56.1710μs 34.6229μs 28.8826 KOps/s 30.2922 KOps/s $\color{#d91a1a}-4.65\%$
test_tc_init_nested 0.1158ms 72.5173μs 13.7898 KOps/s 15.0748 KOps/s $\textbf{\color{#d91a1a}-8.52\%}$
test_tc_first_layer_tensor 4.1316μs 0.6696μs 1.4933 MOps/s 1.4954 MOps/s $\color{#d91a1a}-0.14\%$
test_tc_first_layer_nontensor 26.3900μs 2.2010μs 454.3398 KOps/s 453.0034 KOps/s $\color{#35bf28}+0.30\%$
test_tc_second_layer_tensor 9.1050μs 1.3594μs 735.6153 KOps/s 741.9127 KOps/s $\color{#d91a1a}-0.85\%$
test_tc_second_layer_nontensor 30.1900μs 2.9209μs 342.3587 KOps/s 343.6494 KOps/s $\color{#d91a1a}-0.38\%$
test_unbind 0.1966s 12.3039ms 81.2754 Ops/s 93.5989 Ops/s $\textbf{\color{#d91a1a}-13.17\%}$
test_full_like 0.6571ms 0.5747ms 1.7401 KOps/s 1.7382 KOps/s $\color{#35bf28}+0.11\%$
test_zeros_like 0.2768ms 0.1979ms 5.0527 KOps/s 5.0536 KOps/s $\color{#d91a1a}-0.02\%$
test_ones_like 0.2333ms 0.1978ms 5.0568 KOps/s 5.0579 KOps/s $\color{#d91a1a}-0.02\%$
test_clone 0.4480ms 0.4144ms 2.4131 KOps/s 2.4139 KOps/s $\color{#d91a1a}-0.03\%$
test_squeeze 50.0610μs 9.6076μs 104.0845 KOps/s 101.5683 KOps/s $\color{#35bf28}+2.48\%$
test_unsqueeze 0.2193ms 73.4244μs 13.6194 KOps/s 14.0090 KOps/s $\color{#d91a1a}-2.78\%$
test_split 0.4298ms 0.1552ms 6.4441 KOps/s 6.4738 KOps/s $\color{#d91a1a}-0.46\%$
test_permute 0.2267ms 0.1730ms 5.7817 KOps/s 5.5896 KOps/s $\color{#35bf28}+3.44\%$
test_stack 1.2661ms 0.8626ms 1.1593 KOps/s 1.1424 KOps/s $\color{#35bf28}+1.48\%$
test_cat 1.2566ms 1.2314ms 812.0774 Ops/s 811.8455 Ops/s $\color{#35bf28}+0.03\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Sep 25, 2024
ghstack-source-id: 69f4795d5cb81db7b79d9c98626414c4cc5ce886
Pull Request resolved: #1009
@vmoens vmoens merged commit f2e6dd3 into gh/vmoens/19/base Sep 25, 2024
51 checks passed
vmoens added a commit that referenced this pull request Sep 25, 2024
ghstack-source-id: 69f4795d5cb81db7b79d9c98626414c4cc5ce886
Pull Request resolved: #1009
@vmoens vmoens deleted the gh/vmoens/19/head branch September 25, 2024 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactor Refactoring code - not a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants