Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix memmap nontensor #676

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

[BugFix] Fix memmap nontensor #676

wants to merge 22 commits into from

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 15, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 15, 2024
Copy link

github-actions bot commented Feb 15, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 126. Improved: $\large\color{#35bf28}19$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.0170μs 17.6886μs 56.5337 KOps/s 59.0600 KOps/s $\color{#d91a1a}-4.28\%$
test_plain_set_stack_nested 38.9420μs 17.6580μs 56.6316 KOps/s 58.5451 KOps/s $\color{#d91a1a}-3.27\%$
test_plain_set_nested_inplace 56.8660μs 20.0460μs 49.8852 KOps/s 51.3049 KOps/s $\color{#d91a1a}-2.77\%$
test_plain_set_stack_nested_inplace 47.7290μs 19.8088μs 50.4827 KOps/s 51.4766 KOps/s $\color{#d91a1a}-1.93\%$
test_items 39.2620μs 2.4353μs 410.6276 KOps/s 412.5595 KOps/s $\color{#d91a1a}-0.47\%$
test_items_nested 1.1821ms 0.2681ms 3.7293 KOps/s 3.5597 KOps/s $\color{#35bf28}+4.76\%$
test_items_nested_locked 0.5525ms 0.2673ms 3.7411 KOps/s 3.6242 KOps/s $\color{#35bf28}+3.23\%$
test_items_nested_leaf 0.4837ms 0.1657ms 6.0334 KOps/s 5.9370 KOps/s $\color{#35bf28}+1.62\%$
test_items_stack_nested 0.7434ms 0.2698ms 3.7069 KOps/s 3.5750 KOps/s $\color{#35bf28}+3.69\%$
test_items_stack_nested_leaf 0.2867ms 0.1645ms 6.0777 KOps/s 5.7991 KOps/s $\color{#35bf28}+4.80\%$
test_items_stack_nested_locked 0.4147ms 0.2711ms 3.6889 KOps/s 3.5504 KOps/s $\color{#35bf28}+3.90\%$
test_keys 40.0740μs 3.8966μs 256.6373 KOps/s 262.0613 KOps/s $\color{#d91a1a}-2.07\%$
test_keys_nested 2.1539ms 0.1480ms 6.7576 KOps/s 6.5896 KOps/s $\color{#35bf28}+2.55\%$
test_keys_nested_locked 0.3403ms 0.1513ms 6.6090 KOps/s 6.3909 KOps/s $\color{#35bf28}+3.41\%$
test_keys_nested_leaf 39.7251ms 0.1374ms 7.2781 KOps/s 7.4425 KOps/s $\color{#d91a1a}-2.21\%$
test_keys_stack_nested 0.2078ms 0.1498ms 6.6774 KOps/s 6.5086 KOps/s $\color{#35bf28}+2.59\%$
test_keys_stack_nested_leaf 0.2330ms 0.1324ms 7.5518 KOps/s 7.4214 KOps/s $\color{#35bf28}+1.76\%$
test_keys_stack_nested_locked 0.2579ms 0.1555ms 6.4310 KOps/s 6.2883 KOps/s $\color{#35bf28}+2.27\%$
test_values 10.2415μs 1.1513μs 868.6135 KOps/s 829.0809 KOps/s $\color{#35bf28}+4.77\%$
test_values_nested 0.1246ms 52.1253μs 19.1846 KOps/s 18.9920 KOps/s $\color{#35bf28}+1.01\%$
test_values_nested_locked 0.1198ms 52.0013μs 19.2303 KOps/s 19.1940 KOps/s $\color{#35bf28}+0.19\%$
test_values_nested_leaf 0.1320ms 46.4633μs 21.5223 KOps/s 21.5686 KOps/s $\color{#d91a1a}-0.21\%$
test_values_stack_nested 0.1014ms 52.6154μs 19.0058 KOps/s 19.0602 KOps/s $\color{#d91a1a}-0.29\%$
test_values_stack_nested_leaf 0.1018ms 46.3039μs 21.5964 KOps/s 21.8043 KOps/s $\color{#d91a1a}-0.95\%$
test_values_stack_nested_locked 93.2530μs 52.6074μs 19.0087 KOps/s 18.9874 KOps/s $\color{#35bf28}+0.11\%$
test_membership 25.3170μs 1.3665μs 731.8171 KOps/s 727.8413 KOps/s $\color{#35bf28}+0.55\%$
test_membership_nested 0.1454ms 3.7580μs 266.1021 KOps/s 284.6205 KOps/s $\textbf{\color{#d91a1a}-6.51\%}$
test_membership_nested_leaf 28.4620μs 3.4714μs 288.0642 KOps/s 280.4851 KOps/s $\color{#35bf28}+2.70\%$
test_membership_stacked_nested 30.0160μs 3.4311μs 291.4494 KOps/s 287.3892 KOps/s $\color{#35bf28}+1.41\%$
test_membership_stacked_nested_leaf 33.2820μs 3.4579μs 289.1942 KOps/s 286.4770 KOps/s $\color{#35bf28}+0.95\%$
test_membership_nested_last 27.7010μs 4.2885μs 233.1836 KOps/s 150.2656 KOps/s $\textbf{\color{#35bf28}+55.18\%}$
test_membership_nested_leaf_last 40.0540μs 4.2793μs 233.6846 KOps/s 148.3670 KOps/s $\textbf{\color{#35bf28}+57.50\%}$
test_membership_stacked_nested_last 30.5460μs 4.8682μs 205.4162 KOps/s 149.5503 KOps/s $\textbf{\color{#35bf28}+37.36\%}$
test_membership_stacked_nested_leaf_last 29.5040μs 4.9179μs 203.3401 KOps/s 151.3027 KOps/s $\textbf{\color{#35bf28}+34.39\%}$
test_nested_getleaf 46.4760μs 10.3867μs 96.2773 KOps/s 94.9451 KOps/s $\color{#35bf28}+1.40\%$
test_nested_get 48.7000μs 9.9170μs 100.8372 KOps/s 100.6989 KOps/s $\color{#35bf28}+0.14\%$
test_stacked_getleaf 33.6130μs 10.2505μs 97.5559 KOps/s 96.5730 KOps/s $\color{#35bf28}+1.02\%$
test_stacked_get 30.9780μs 9.8310μs 101.7187 KOps/s 100.3840 KOps/s $\color{#35bf28}+1.33\%$
test_nested_getitemleaf 46.0260μs 10.7693μs 92.8564 KOps/s 83.9395 KOps/s $\textbf{\color{#35bf28}+10.62\%}$
test_nested_getitem 52.1770μs 10.3519μs 96.6005 KOps/s 87.3683 KOps/s $\textbf{\color{#35bf28}+10.57\%}$
test_stacked_getitemleaf 35.3950μs 10.8229μs 92.3969 KOps/s 83.4025 KOps/s $\textbf{\color{#35bf28}+10.78\%}$
test_stacked_getitem 33.5220μs 10.0960μs 99.0491 KOps/s 88.3304 KOps/s $\textbf{\color{#35bf28}+12.13\%}$
test_lock_nested 0.9962ms 0.3432ms 2.9138 KOps/s 2.9525 KOps/s $\color{#d91a1a}-1.31\%$
test_lock_stack_nested 0.4684ms 0.2944ms 3.3969 KOps/s 3.3655 KOps/s $\color{#35bf28}+0.93\%$
test_unlock_nested 76.5086ms 0.4136ms 2.4177 KOps/s 2.3706 KOps/s $\color{#35bf28}+1.99\%$
test_unlock_stack_nested 0.3878ms 0.3028ms 3.3023 KOps/s 3.2650 KOps/s $\color{#35bf28}+1.14\%$
test_flatten_speed 0.5880ms 0.2797ms 3.5746 KOps/s 2.6894 KOps/s $\textbf{\color{#35bf28}+32.91\%}$
test_unflatten_speed 0.8340ms 0.4026ms 2.4838 KOps/s 2.1927 KOps/s $\textbf{\color{#35bf28}+13.28\%}$
test_common_ops 1.3321ms 0.6924ms 1.4443 KOps/s 1.4253 KOps/s $\color{#35bf28}+1.33\%$
test_creation 48.2800μs 1.8554μs 538.9652 KOps/s 538.0319 KOps/s $\color{#35bf28}+0.17\%$
test_creation_empty 33.0410μs 11.1843μs 89.4108 KOps/s 97.2262 KOps/s $\textbf{\color{#d91a1a}-8.04\%}$
test_creation_nested_1 34.9150μs 13.7300μs 72.8330 KOps/s 78.2442 KOps/s $\textbf{\color{#d91a1a}-6.92\%}$
test_creation_nested_2 59.2200μs 16.8702μs 59.2760 KOps/s 61.8481 KOps/s $\color{#d91a1a}-4.16\%$
test_clone 55.1420μs 12.9924μs 76.9682 KOps/s 75.7549 KOps/s $\color{#35bf28}+1.60\%$
test_getitem[int] 32.0900μs 11.1826μs 89.4247 KOps/s 91.6301 KOps/s $\color{#d91a1a}-2.41\%$
test_getitem[slice_int] 54.6520μs 22.5285μs 44.3882 KOps/s 45.3959 KOps/s $\color{#d91a1a}-2.22\%$
test_getitem[range] 0.1238ms 39.6450μs 25.2239 KOps/s 23.9905 KOps/s $\textbf{\color{#35bf28}+5.14\%}$
test_getitem[tuple] 49.4220μs 18.5084μs 54.0296 KOps/s 54.9616 KOps/s $\color{#d91a1a}-1.70\%$
test_getitem[list] 0.1400ms 35.7411μs 27.9790 KOps/s 27.0691 KOps/s $\color{#35bf28}+3.36\%$
test_setitem_dim[int] 71.8830μs 34.2063μs 29.2344 KOps/s 33.7055 KOps/s $\textbf{\color{#d91a1a}-13.27\%}$
test_setitem_dim[slice_int] 94.8770μs 61.0108μs 16.3906 KOps/s 17.9437 KOps/s $\textbf{\color{#d91a1a}-8.66\%}$
test_setitem_dim[range] 0.1465ms 79.3541μs 12.6017 KOps/s 13.2282 KOps/s $\color{#d91a1a}-4.74\%$
test_setitem_dim[tuple] 89.7260μs 50.0103μs 19.9959 KOps/s 21.6495 KOps/s $\textbf{\color{#d91a1a}-7.64\%}$
test_setitem 74.6480μs 19.6313μs 50.9390 KOps/s 51.8625 KOps/s $\color{#d91a1a}-1.78\%$
test_set 48.3700μs 18.9520μs 52.7648 KOps/s 53.2925 KOps/s $\color{#d91a1a}-0.99\%$
test_set_shared 4.3888ms 0.1415ms 7.0686 KOps/s 7.0101 KOps/s $\color{#35bf28}+0.83\%$
test_update 0.1047ms 22.5894μs 44.2685 KOps/s 46.2170 KOps/s $\color{#d91a1a}-4.22\%$
test_update_nested 81.7620μs 29.9471μs 33.3922 KOps/s 33.7249 KOps/s $\color{#d91a1a}-0.99\%$
test_set_nested 69.2880μs 20.8765μs 47.9008 KOps/s 47.6488 KOps/s $\color{#35bf28}+0.53\%$
test_set_nested_new 74.0070μs 24.6474μs 40.5723 KOps/s 39.9891 KOps/s $\color{#35bf28}+1.46\%$
test_select 98.0820μs 37.8709μs 26.4055 KOps/s 26.4346 KOps/s $\color{#d91a1a}-0.11\%$
test_select_nested 0.1082ms 58.0848μs 17.2162 KOps/s 16.6912 KOps/s $\color{#35bf28}+3.15\%$
test_exclude_nested 0.2049ms 0.1167ms 8.5706 KOps/s 8.3632 KOps/s $\color{#35bf28}+2.48\%$
test_empty[True] 1.2681ms 0.4103ms 2.4370 KOps/s 2.3653 KOps/s $\color{#35bf28}+3.03\%$
test_empty[False] 5.7086μs 1.0919μs 915.7941 KOps/s 935.7048 KOps/s $\color{#d91a1a}-2.13\%$
test_unbind_speed 0.4383ms 0.2450ms 4.0812 KOps/s 4.1220 KOps/s $\color{#d91a1a}-0.99\%$
test_unbind_speed_stack0 0.3902ms 0.2416ms 4.1383 KOps/s 4.1762 KOps/s $\color{#d91a1a}-0.91\%$
test_unbind_speed_stack1 0.6703ms 0.5865ms 1.7049 KOps/s 1.4888 KOps/s $\textbf{\color{#35bf28}+14.51\%}$
test_split 0.1221s 1.6307ms 613.2429 Ops/s 605.1362 Ops/s $\color{#35bf28}+1.34\%$
test_chunk 1.7570ms 1.4433ms 692.8692 Ops/s 685.6746 Ops/s $\color{#35bf28}+1.05\%$
test_creation[device0] 3.9154ms 0.1047ms 9.5478 KOps/s 9.7770 KOps/s $\color{#d91a1a}-2.34\%$
test_creation_from_tensor 0.1903ms 81.6014μs 12.2547 KOps/s 12.0652 KOps/s $\color{#35bf28}+1.57\%$
test_add_one[memmap_tensor0] 0.1400ms 5.2159μs 191.7219 KOps/s 182.8035 KOps/s $\color{#35bf28}+4.88\%$
test_contiguous[memmap_tensor0] 9.2070μs 0.6375μs 1.5686 MOps/s 1.5813 MOps/s $\color{#d91a1a}-0.80\%$
test_stack[memmap_tensor0] 46.0160μs 3.5169μs 284.3385 KOps/s 276.3111 KOps/s $\color{#35bf28}+2.91\%$
test_memmaptd_index 1.1346ms 0.2382ms 4.1975 KOps/s 4.2787 KOps/s $\color{#d91a1a}-1.90\%$
test_memmaptd_index_astensor 0.6851ms 0.3010ms 3.3219 KOps/s 3.3894 KOps/s $\color{#d91a1a}-1.99\%$
test_memmaptd_index_op 1.3500ms 0.6000ms 1.6667 KOps/s 1.6905 KOps/s $\color{#d91a1a}-1.41\%$
test_serialize_model 0.2198s 0.1149s 8.7020 Ops/s 8.4232 Ops/s $\color{#35bf28}+3.31\%$
test_serialize_model_pickle 0.4624s 0.3766s 2.6550 Ops/s 2.6215 Ops/s $\color{#35bf28}+1.28\%$
test_serialize_weights 0.1018s 96.6751ms 10.3439 Ops/s 9.9789 Ops/s $\color{#35bf28}+3.66\%$
test_serialize_weights_returnearly 0.1403s 0.1241s 8.0598 Ops/s 8.0668 Ops/s $\color{#d91a1a}-0.09\%$
test_serialize_weights_pickle 0.4731s 0.4265s 2.3444 Ops/s 2.3345 Ops/s $\color{#35bf28}+0.42\%$
test_serialize_weights_filesystem 0.1017s 95.3461ms 10.4881 Ops/s 10.5956 Ops/s $\color{#d91a1a}-1.01\%$
test_serialize_model_filesystem 99.3034ms 93.8896ms 10.6508 Ops/s 10.5157 Ops/s $\color{#35bf28}+1.29\%$
test_reshape_pytree 71.5120μs 20.5470μs 48.6688 KOps/s 47.3944 KOps/s $\color{#35bf28}+2.69\%$
test_reshape_td 74.1770μs 30.7716μs 32.4975 KOps/s 32.2116 KOps/s $\color{#35bf28}+0.89\%$
test_view_pytree 46.8370μs 20.6097μs 48.5208 KOps/s 43.6868 KOps/s $\textbf{\color{#35bf28}+11.07\%}$
test_view_td 0.1201s 60.1109μs 16.6359 KOps/s 16.2837 KOps/s $\color{#35bf28}+2.16\%$
test_unbind_pytree 69.6800μs 24.2732μs 41.1976 KOps/s 40.8609 KOps/s $\color{#35bf28}+0.82\%$
test_unbind_td 0.1211ms 35.7640μs 27.9611 KOps/s 27.5855 KOps/s $\color{#35bf28}+1.36\%$
test_split_pytree 53.8710μs 23.7420μs 42.1195 KOps/s 41.3005 KOps/s $\color{#35bf28}+1.98\%$
test_split_td 0.1348ms 39.5086μs 25.3109 KOps/s 25.4573 KOps/s $\color{#d91a1a}-0.58\%$
test_add_pytree 66.1830μs 29.0564μs 34.4159 KOps/s 33.3400 KOps/s $\color{#35bf28}+3.23\%$
test_add_td 0.1175ms 51.7753μs 19.3142 KOps/s 18.8059 KOps/s $\color{#35bf28}+2.70\%$
test_distributed 0.1869ms 99.1261μs 10.0882 KOps/s 10.0048 KOps/s $\color{#35bf28}+0.83\%$
test_tdmodule 65.0810μs 17.4985μs 57.1479 KOps/s 45.3222 KOps/s $\textbf{\color{#35bf28}+26.09\%}$
test_tdmodule_dispatch 61.2340μs 33.2325μs 30.0911 KOps/s 23.4480 KOps/s $\textbf{\color{#35bf28}+28.33\%}$
test_tdseq 38.4910μs 20.5448μs 48.6740 KOps/s 38.7292 KOps/s $\textbf{\color{#35bf28}+25.68\%}$
test_tdseq_dispatch 58.4290μs 38.7597μs 25.8000 KOps/s 20.7722 KOps/s $\textbf{\color{#35bf28}+24.20\%}$
test_instantiation_functorch 1.7394ms 1.2754ms 784.0878 Ops/s 748.5801 Ops/s $\color{#35bf28}+4.74\%$
test_instantiation_td 1.4146ms 0.9844ms 1.0159 KOps/s 986.5724 Ops/s $\color{#35bf28}+2.97\%$
test_exec_functorch 0.2785ms 0.1541ms 6.4894 KOps/s 6.1613 KOps/s $\textbf{\color{#35bf28}+5.32\%}$
test_exec_functional_call 0.2703ms 0.1467ms 6.8167 KOps/s 6.6322 KOps/s $\color{#35bf28}+2.78\%$
test_exec_td 0.2146ms 0.1416ms 7.0612 KOps/s 6.8002 KOps/s $\color{#35bf28}+3.84\%$
test_exec_td_decorator 0.7543ms 0.1909ms 5.2385 KOps/s 5.1165 KOps/s $\color{#35bf28}+2.38\%$
test_vmap_mlp_speed[True-True] 0.5808ms 0.4669ms 2.1419 KOps/s 2.0867 KOps/s $\color{#35bf28}+2.64\%$
test_vmap_mlp_speed[True-False] 0.6863ms 0.4656ms 2.1475 KOps/s 2.1136 KOps/s $\color{#35bf28}+1.61\%$
test_vmap_mlp_speed[False-True] 0.7970ms 0.3827ms 2.6128 KOps/s 2.5947 KOps/s $\color{#35bf28}+0.70\%$
test_vmap_mlp_speed[False-False] 0.7007ms 0.3825ms 2.6147 KOps/s 2.5938 KOps/s $\color{#35bf28}+0.80\%$
test_vmap_mlp_speed_decorator[True-True] 1.0181ms 0.4928ms 2.0293 KOps/s 1.9070 KOps/s $\textbf{\color{#35bf28}+6.41\%}$
test_vmap_mlp_speed_decorator[True-False] 0.7614ms 0.5090ms 1.9646 KOps/s 1.9112 KOps/s $\color{#35bf28}+2.79\%$
test_vmap_mlp_speed_decorator[False-True] 0.6332ms 0.3986ms 2.5089 KOps/s 2.5037 KOps/s $\color{#35bf28}+0.21\%$
test_vmap_mlp_speed_decorator[False-False] 0.6789ms 0.3970ms 2.5188 KOps/s 2.5108 KOps/s $\color{#35bf28}+0.32\%$
test_to_module_speed[True] 2.1763ms 1.3464ms 742.7155 Ops/s 715.5525 Ops/s $\color{#35bf28}+3.80\%$
test_to_module_speed[False] 1.8648ms 1.3230ms 755.8560 Ops/s 727.5585 Ops/s $\color{#35bf28}+3.89\%$

Copy link

github-actions bot commented Feb 15, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 134. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 38.4700μs 13.0891μs 76.3995 KOps/s 77.0339 KOps/s $\color{#d91a1a}-0.82\%$
test_plain_set_stack_nested 35.0410μs 13.4551μs 74.3213 KOps/s 76.5379 KOps/s $\color{#d91a1a}-2.90\%$
test_plain_set_nested_inplace 47.9410μs 14.4164μs 69.3654 KOps/s 69.5966 KOps/s $\color{#d91a1a}-0.33\%$
test_plain_set_stack_nested_inplace 34.7710μs 14.4132μs 69.3809 KOps/s 69.4812 KOps/s $\color{#d91a1a}-0.14\%$
test_items 34.6800μs 4.7262μs 211.5860 KOps/s 211.3034 KOps/s $\color{#35bf28}+0.13\%$
test_items_nested 0.3721ms 0.3395ms 2.9453 KOps/s 2.9340 KOps/s $\color{#35bf28}+0.38\%$
test_items_nested_locked 0.3820ms 0.3439ms 2.9080 KOps/s 2.9173 KOps/s $\color{#d91a1a}-0.32\%$
test_items_nested_leaf 0.2252ms 0.1993ms 5.0180 KOps/s 4.9370 KOps/s $\color{#35bf28}+1.64\%$
test_items_stack_nested 0.3745ms 0.3437ms 2.9098 KOps/s 2.9329 KOps/s $\color{#d91a1a}-0.79\%$
test_items_stack_nested_leaf 0.2417ms 0.2008ms 4.9812 KOps/s 5.0156 KOps/s $\color{#d91a1a}-0.69\%$
test_items_stack_nested_locked 0.3699ms 0.3443ms 2.9044 KOps/s 2.8814 KOps/s $\color{#35bf28}+0.80\%$
test_keys 19.5210μs 4.5590μs 219.3456 KOps/s 218.1136 KOps/s $\color{#35bf28}+0.56\%$
test_keys_nested 46.0819ms 0.1002ms 9.9821 KOps/s 10.4730 KOps/s $\color{#d91a1a}-4.69\%$
test_keys_nested_locked 0.1431ms 97.7997μs 10.2250 KOps/s 10.1904 KOps/s $\color{#35bf28}+0.34\%$
test_keys_nested_leaf 0.1128ms 77.5820μs 12.8896 KOps/s 12.7406 KOps/s $\color{#35bf28}+1.17\%$
test_keys_stack_nested 0.1228ms 93.8116μs 10.6597 KOps/s 10.6649 KOps/s $\color{#d91a1a}-0.05\%$
test_keys_stack_nested_leaf 98.4610μs 77.6266μs 12.8822 KOps/s 12.8790 KOps/s $\color{#35bf28}+0.02\%$
test_keys_stack_nested_locked 0.1658ms 98.3061μs 10.1723 KOps/s 10.1845 KOps/s $\color{#d91a1a}-0.12\%$
test_values 9.0700μs 1.9051μs 524.8980 KOps/s 528.3217 KOps/s $\color{#d91a1a}-0.65\%$
test_values_nested 75.8810μs 45.1812μs 22.1331 KOps/s 22.0002 KOps/s $\color{#35bf28}+0.60\%$
test_values_nested_locked 68.1720μs 47.2204μs 21.1773 KOps/s 20.8622 KOps/s $\color{#35bf28}+1.51\%$
test_values_nested_leaf 61.8210μs 39.4812μs 25.3285 KOps/s 25.1458 KOps/s $\color{#35bf28}+0.73\%$
test_values_stack_nested 75.3210μs 46.0767μs 21.7030 KOps/s 21.4702 KOps/s $\color{#35bf28}+1.08\%$
test_values_stack_nested_leaf 63.5710μs 39.8511μs 25.0934 KOps/s 25.2952 KOps/s $\color{#d91a1a}-0.80\%$
test_values_stack_nested_locked 83.6620μs 47.8289μs 20.9079 KOps/s 20.5981 KOps/s $\color{#35bf28}+1.50\%$
test_membership 14.0510μs 1.0490μs 953.3282 KOps/s 951.7444 KOps/s $\color{#35bf28}+0.17\%$
test_membership_nested 18.0510μs 2.8665μs 348.8524 KOps/s 340.9759 KOps/s $\color{#35bf28}+2.31\%$
test_membership_nested_leaf 35.2910μs 2.8983μs 345.0280 KOps/s 339.6744 KOps/s $\color{#35bf28}+1.58\%$
test_membership_stacked_nested 85.1630μs 2.9044μs 344.3036 KOps/s 344.0749 KOps/s $\color{#35bf28}+0.07\%$
test_membership_stacked_nested_leaf 70.0110μs 2.8950μs 345.4258 KOps/s 344.7264 KOps/s $\color{#35bf28}+0.20\%$
test_membership_nested_last 36.7310μs 3.5768μs 279.5833 KOps/s 187.8764 KOps/s $\textbf{\color{#35bf28}+48.81\%}$
test_membership_nested_leaf_last 20.4710μs 3.5882μs 278.6885 KOps/s 186.5167 KOps/s $\textbf{\color{#35bf28}+49.42\%}$
test_membership_stacked_nested_last 37.4420μs 4.4762μs 223.4017 KOps/s 79.2164 KOps/s $\textbf{\color{#35bf28}+182.01\%}$
test_membership_stacked_nested_leaf_last 25.7110μs 4.4655μs 223.9378 KOps/s 79.3248 KOps/s $\textbf{\color{#35bf28}+182.30\%}$
test_nested_getleaf 0.1726ms 8.4255μs 118.6880 KOps/s 118.4140 KOps/s $\color{#35bf28}+0.23\%$
test_nested_get 34.6410μs 7.9522μs 125.7512 KOps/s 126.1023 KOps/s $\color{#d91a1a}-0.28\%$
test_stacked_getleaf 26.5500μs 8.3639μs 119.5610 KOps/s 119.2537 KOps/s $\color{#35bf28}+0.26\%$
test_stacked_get 40.4110μs 7.8959μs 126.6480 KOps/s 126.2268 KOps/s $\color{#35bf28}+0.33\%$
test_nested_getitemleaf 25.7110μs 8.6828μs 115.1708 KOps/s 102.7635 KOps/s $\textbf{\color{#35bf28}+12.07\%}$
test_nested_getitem 40.8900μs 8.2088μs 121.8203 KOps/s 107.8377 KOps/s $\textbf{\color{#35bf28}+12.97\%}$
test_stacked_getitemleaf 42.6410μs 8.6690μs 115.3537 KOps/s 102.5830 KOps/s $\textbf{\color{#35bf28}+12.45\%}$
test_stacked_getitem 23.2610μs 8.2464μs 121.2653 KOps/s 106.9364 KOps/s $\textbf{\color{#35bf28}+13.40\%}$
test_lock_nested 2.5446ms 0.3509ms 2.8501 KOps/s 2.7946 KOps/s $\color{#35bf28}+1.99\%$
test_lock_stack_nested 0.4406ms 0.3081ms 3.2458 KOps/s 3.3125 KOps/s $\color{#d91a1a}-2.01\%$
test_unlock_nested 0.7251ms 0.3496ms 2.8604 KOps/s 2.8290 KOps/s $\color{#35bf28}+1.11\%$
test_unlock_stack_nested 0.3476ms 0.3165ms 3.1598 KOps/s 3.1915 KOps/s $\color{#d91a1a}-0.99\%$
test_flatten_speed 0.2981ms 0.1955ms 5.1139 KOps/s 3.8387 KOps/s $\textbf{\color{#35bf28}+33.22\%}$
test_unflatten_speed 0.3659ms 0.3235ms 3.0917 KOps/s 2.8119 KOps/s $\textbf{\color{#35bf28}+9.95\%}$
test_common_ops 1.0181ms 0.5811ms 1.7209 KOps/s 1.6486 KOps/s $\color{#35bf28}+4.39\%$
test_creation 33.5320μs 1.5902μs 628.8506 KOps/s 635.1066 KOps/s $\color{#d91a1a}-0.99\%$
test_creation_empty 34.6410μs 7.1529μs 139.8035 KOps/s 142.5353 KOps/s $\color{#d91a1a}-1.92\%$
test_creation_nested_1 24.0800μs 8.8945μs 112.4292 KOps/s 114.5780 KOps/s $\color{#d91a1a}-1.88\%$
test_creation_nested_2 36.9220μs 11.4161μs 87.5956 KOps/s 89.9871 KOps/s $\color{#d91a1a}-2.66\%$
test_clone 0.1596ms 13.7703μs 72.6201 KOps/s 72.8224 KOps/s $\color{#d91a1a}-0.28\%$
test_getitem[int] 25.0010μs 10.8639μs 92.0480 KOps/s 89.4558 KOps/s $\color{#35bf28}+2.90\%$
test_getitem[slice_int] 0.1667ms 21.8214μs 45.8266 KOps/s 45.7022 KOps/s $\color{#35bf28}+0.27\%$
test_getitem[range] 71.9820μs 51.1372μs 19.5552 KOps/s 19.4315 KOps/s $\color{#35bf28}+0.64\%$
test_getitem[tuple] 42.9210μs 18.8655μs 53.0069 KOps/s 51.7155 KOps/s $\color{#35bf28}+2.50\%$
test_getitem[list] 0.1576ms 38.1497μs 26.2125 KOps/s 26.1837 KOps/s $\color{#35bf28}+0.11\%$
test_setitem_dim[int] 45.2100μs 28.3496μs 35.2738 KOps/s 38.7855 KOps/s $\textbf{\color{#d91a1a}-9.05\%}$
test_setitem_dim[slice_int] 94.8920μs 51.4778μs 19.4258 KOps/s 21.1027 KOps/s $\textbf{\color{#d91a1a}-7.95\%}$
test_setitem_dim[range] 92.8330μs 69.8978μs 14.3066 KOps/s 14.7717 KOps/s $\color{#d91a1a}-3.15\%$
test_setitem_dim[tuple] 71.9920μs 43.8028μs 22.8296 KOps/s 24.4094 KOps/s $\textbf{\color{#d91a1a}-6.47\%}$
test_setitem 0.1236ms 17.7833μs 56.2324 KOps/s 55.4984 KOps/s $\color{#35bf28}+1.32\%$
test_set 0.3686ms 17.4919μs 57.1693 KOps/s 56.7895 KOps/s $\color{#35bf28}+0.67\%$
test_set_shared 1.4519ms 0.1031ms 9.7033 KOps/s 9.6751 KOps/s $\color{#35bf28}+0.29\%$
test_update 0.1485ms 19.2784μs 51.8715 KOps/s 52.1047 KOps/s $\color{#d91a1a}-0.45\%$
test_update_nested 66.9810μs 25.5513μs 39.1370 KOps/s 39.1347 KOps/s $+0.01\%$
test_set_nested 52.7220μs 18.4673μs 54.1497 KOps/s 53.9997 KOps/s $\color{#35bf28}+0.28\%$
test_set_nested_new 56.0710μs 21.3085μs 46.9297 KOps/s 46.5483 KOps/s $\color{#35bf28}+0.82\%$
test_select 73.3210μs 34.3308μs 29.1284 KOps/s 28.5236 KOps/s $\color{#35bf28}+2.12\%$
test_select_nested 68.6110μs 52.7721μs 18.9494 KOps/s 18.9063 KOps/s $\color{#35bf28}+0.23\%$
test_exclude_nested 0.1401ms 0.1127ms 8.8722 KOps/s 8.7732 KOps/s $\color{#35bf28}+1.13\%$
test_empty[True] 0.9173ms 0.3900ms 2.5640 KOps/s 2.5530 KOps/s $\color{#35bf28}+0.43\%$
test_empty[False] 3.5520μs 0.8578μs 1.1657 MOps/s 1.1703 MOps/s $\color{#d91a1a}-0.39\%$
test_to 83.0920μs 63.9856μs 15.6285 KOps/s 17.5336 KOps/s $\textbf{\color{#d91a1a}-10.87\%}$
test_to_nonblocking 72.3410μs 36.8803μs 27.1147 KOps/s 27.8199 KOps/s $\color{#d91a1a}-2.53\%$
test_unbind_speed 0.8641ms 0.2624ms 3.8105 KOps/s 3.7414 KOps/s $\color{#35bf28}+1.85\%$
test_unbind_speed_stack0 0.3104ms 0.2646ms 3.7800 KOps/s 3.8064 KOps/s $\color{#d91a1a}-0.69\%$
test_unbind_speed_stack1 0.1422s 0.8864ms 1.1281 KOps/s 1.3065 KOps/s $\textbf{\color{#d91a1a}-13.65\%}$
test_split 1.5937ms 1.5308ms 653.2396 Ops/s 648.6236 Ops/s $\color{#35bf28}+0.71\%$
test_chunk 1.5994ms 1.5349ms 651.4879 Ops/s 651.2154 Ops/s $\color{#35bf28}+0.04\%$
test_creation[device0] 0.1403ms 74.2077μs 13.4757 KOps/s 13.5900 KOps/s $\color{#d91a1a}-0.84\%$
test_creation_from_tensor 0.1322ms 53.9560μs 18.5336 KOps/s 18.3391 KOps/s $\color{#35bf28}+1.06\%$
test_add_one[memmap_tensor0] 86.2720μs 6.7512μs 148.1215 KOps/s 144.4036 KOps/s $\color{#35bf28}+2.57\%$
test_contiguous[memmap_tensor0] 14.8000μs 0.6406μs 1.5609 MOps/s 1.5476 MOps/s $\color{#35bf28}+0.86\%$
test_stack[memmap_tensor0] 30.1210μs 4.4910μs 222.6692 KOps/s 212.3464 KOps/s $\color{#35bf28}+4.86\%$
test_memmaptd_index 1.0729ms 0.2625ms 3.8095 KOps/s 3.8041 KOps/s $\color{#35bf28}+0.14\%$
test_memmaptd_index_astensor 0.6413ms 0.3230ms 3.0961 KOps/s 3.0899 KOps/s $\color{#35bf28}+0.20\%$
test_memmaptd_index_op 0.8836ms 0.5963ms 1.6770 KOps/s 1.6887 KOps/s $\color{#d91a1a}-0.69\%$
test_serialize_model 0.2381s 0.1052s 9.5025 Ops/s 10.6867 Ops/s $\textbf{\color{#d91a1a}-11.08\%}$
test_serialize_model_pickle 1.3486s 1.2364s 0.8088 Ops/s 0.8082 Ops/s $\color{#35bf28}+0.07\%$
test_serialize_weights 89.3876ms 86.8400ms 11.5154 Ops/s 10.9645 Ops/s $\textbf{\color{#35bf28}+5.02\%}$
test_serialize_weights_returnearly 61.1603ms 54.6436ms 18.3004 Ops/s 11.2308 Ops/s $\textbf{\color{#35bf28}+62.95\%}$
test_serialize_weights_pickle 1.3466s 1.2358s 0.8092 Ops/s 0.8088 Ops/s $\color{#35bf28}+0.05\%$
test_reshape_pytree 58.1010μs 25.3487μs 39.4497 KOps/s 38.8896 KOps/s $\color{#35bf28}+1.44\%$
test_reshape_td 65.9220μs 30.2805μs 33.0245 KOps/s 30.7828 KOps/s $\textbf{\color{#35bf28}+7.28\%}$
test_view_pytree 0.2150ms 24.6135μs 40.6281 KOps/s 40.2407 KOps/s $\color{#35bf28}+0.96\%$
test_view_td 0.1553s 60.0579μs 16.6506 KOps/s 20.7708 KOps/s $\textbf{\color{#d91a1a}-19.84\%}$
test_unbind_pytree 70.3010μs 30.3782μs 32.9184 KOps/s 31.9113 KOps/s $\color{#35bf28}+3.16\%$
test_unbind_td 0.1248ms 39.6433μs 25.2249 KOps/s 24.5661 KOps/s $\color{#35bf28}+2.68\%$
test_split_pytree 63.0010μs 29.1633μs 34.2896 KOps/s 34.5048 KOps/s $\color{#d91a1a}-0.62\%$
test_split_td 0.1131ms 39.5014μs 25.3155 KOps/s 24.4696 KOps/s $\color{#35bf28}+3.46\%$
test_add_pytree 0.1144ms 36.3838μs 27.4848 KOps/s 27.6543 KOps/s $\color{#d91a1a}-0.61\%$
test_add_td 0.1291ms 47.3441μs 21.1220 KOps/s 21.0502 KOps/s $\color{#35bf28}+0.34\%$
test_distributed 7.9686ms 0.1031ms 9.6997 KOps/s 13.3584 KOps/s $\textbf{\color{#d91a1a}-27.39\%}$
test_tdmodule 28.4500μs 13.5156μs 73.9886 KOps/s 57.8654 KOps/s $\textbf{\color{#35bf28}+27.86\%}$
test_tdmodule_dispatch 45.0120μs 25.9474μs 38.5395 KOps/s 28.0803 KOps/s $\textbf{\color{#35bf28}+37.25\%}$
test_tdseq 32.9600μs 17.0765μs 58.5602 KOps/s 50.2814 KOps/s $\textbf{\color{#35bf28}+16.46\%}$
test_tdseq_dispatch 47.5910μs 31.4840μs 31.7621 KOps/s 26.6488 KOps/s $\textbf{\color{#35bf28}+19.19\%}$
test_instantiation_functorch 1.7714ms 1.6491ms 606.3859 Ops/s 599.3161 Ops/s $\color{#35bf28}+1.18\%$
test_instantiation_td 1.6738ms 1.1660ms 857.6498 Ops/s 867.2956 Ops/s $\color{#d91a1a}-1.11\%$
test_exec_functorch 0.2085ms 0.1616ms 6.1871 KOps/s 6.2637 KOps/s $\color{#d91a1a}-1.22\%$
test_exec_functional_call 0.2550ms 0.1587ms 6.3010 KOps/s 6.3598 KOps/s $\color{#d91a1a}-0.92\%$
test_exec_td 0.1971ms 0.1486ms 6.7316 KOps/s 6.6685 KOps/s $\color{#35bf28}+0.95\%$
test_exec_td_decorator 0.7744ms 0.1959ms 5.1034 KOps/s 5.1701 KOps/s $\color{#d91a1a}-1.29\%$
test_vmap_mlp_speed[True-True] 0.7199ms 0.6245ms 1.6013 KOps/s 1.6322 KOps/s $\color{#d91a1a}-1.89\%$
test_vmap_mlp_speed[True-False] 0.7380ms 0.6178ms 1.6186 KOps/s 1.6398 KOps/s $\color{#d91a1a}-1.30\%$
test_vmap_mlp_speed[False-True] 0.6080ms 0.5512ms 1.8143 KOps/s 1.8408 KOps/s $\color{#d91a1a}-1.44\%$
test_vmap_mlp_speed[False-False] 0.7219ms 0.5617ms 1.7803 KOps/s 1.8437 KOps/s $\color{#d91a1a}-3.44\%$
test_vmap_mlp_speed_decorator[True-True] 1.1068ms 0.6534ms 1.5305 KOps/s 1.5401 KOps/s $\color{#d91a1a}-0.62\%$
test_vmap_mlp_speed_decorator[True-False] 0.7552ms 0.6335ms 1.5785 KOps/s 1.5160 KOps/s $\color{#35bf28}+4.12\%$
test_vmap_mlp_speed_decorator[False-True] 0.8537ms 0.5649ms 1.7703 KOps/s 1.8041 KOps/s $\color{#d91a1a}-1.87\%$
test_vmap_mlp_speed_decorator[False-False] 0.7032ms 0.5616ms 1.7805 KOps/s 1.7978 KOps/s $\color{#d91a1a}-0.96\%$
test_vmap_transformer_speed[True-True] 8.4359ms 8.2829ms 120.7306 Ops/s 121.4691 Ops/s $\color{#d91a1a}-0.61\%$
test_vmap_transformer_speed[True-False] 8.6817ms 8.2899ms 120.6282 Ops/s 121.7357 Ops/s $\color{#d91a1a}-0.91\%$
test_vmap_transformer_speed[False-True] 8.2971ms 8.2161ms 121.7129 Ops/s 122.7097 Ops/s $\color{#d91a1a}-0.81\%$
test_vmap_transformer_speed[False-False] 8.3542ms 8.1849ms 122.1768 Ops/s 122.5818 Ops/s $\color{#d91a1a}-0.33\%$
test_vmap_transformer_speed_decorator[True-True] 19.5492ms 19.4206ms 51.4917 Ops/s 50.8955 Ops/s $\color{#35bf28}+1.17\%$
test_vmap_transformer_speed_decorator[True-False] 19.5366ms 19.4301ms 51.4665 Ops/s 50.9372 Ops/s $\color{#35bf28}+1.04\%$
test_vmap_transformer_speed_decorator[False-True] 20.2515ms 19.3589ms 51.6558 Ops/s 51.8064 Ops/s $\color{#d91a1a}-0.29\%$
test_vmap_transformer_speed_decorator[False-False] 19.5010ms 19.3209ms 51.7574 Ops/s 51.8966 Ops/s $\color{#d91a1a}-0.27\%$
test_to_module_speed[True] 3.0186ms 1.2499ms 800.0401 Ops/s 790.8114 Ops/s $\color{#35bf28}+1.17\%$
test_to_module_speed[False] 1.3239ms 1.1999ms 833.4291 Ops/s 814.4767 Ops/s $\color{#35bf28}+2.33\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants