Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Faster instantiation #550

Merged
merged 1 commit into from
Nov 3, 2023
Merged

[Refactor] Faster instantiation #550

merged 1 commit into from
Nov 3, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 2, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 2, 2023
@vmoens vmoens added the Refactor Refactoring code - not a new feature label Nov 2, 2023
Copy link

github-actions bot commented Nov 2, 2023

$\color{#35bf28}\textsf{\Large✔\kern{0.2cm}\normalsize OK}$ Result of CPU Benchmark Tests

Total Benchmarks: 105. Improved: $\large\color{#35bf28}47$. Worsened: $\large\color{#d91a1a}0$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 56.0350μs 15.2140μs 65.7291 KOps/s 65.8717 KOps/s $\color{#d91a1a}-0.22\%$
test_plain_set_stack_nested 0.2698ms 0.1423ms 7.0250 KOps/s 7.1038 KOps/s $\color{#d91a1a}-1.11\%$
test_plain_set_nested_inplace 46.5170μs 18.1511μs 55.0932 KOps/s 55.4405 KOps/s $\color{#d91a1a}-0.63\%$
test_plain_set_stack_nested_inplace 0.2331ms 0.1708ms 5.8558 KOps/s 5.9144 KOps/s $\color{#d91a1a}-0.99\%$
test_items 38.9030μs 2.4179μs 413.5894 KOps/s 388.8505 KOps/s $\textbf{\color{#35bf28}+6.36\%}$
test_items_nested 0.6022ms 0.2656ms 3.7647 KOps/s 3.6904 KOps/s $\color{#35bf28}+2.02\%$
test_items_nested_locked 0.4476ms 0.2668ms 3.7477 KOps/s 3.7076 KOps/s $\color{#35bf28}+1.08\%$
test_items_nested_leaf 0.5749ms 0.1650ms 6.0621 KOps/s 6.0240 KOps/s $\color{#35bf28}+0.63\%$
test_items_stack_nested 2.3944ms 1.3860ms 721.5018 Ops/s 671.5397 Ops/s $\textbf{\color{#35bf28}+7.44\%}$
test_items_stack_nested_leaf 1.5137ms 1.2583ms 794.6927 Ops/s 745.6299 Ops/s $\textbf{\color{#35bf28}+6.58\%}$
test_items_stack_nested_locked 0.8473ms 0.7495ms 1.3343 KOps/s 1.3371 KOps/s $\color{#d91a1a}-0.21\%$
test_keys 17.8130μs 3.8384μs 260.5233 KOps/s 258.1388 KOps/s $\color{#35bf28}+0.92\%$
test_keys_nested 1.4759ms 0.1371ms 7.2915 KOps/s 6.9607 KOps/s $\color{#35bf28}+4.75\%$
test_keys_nested_locked 0.1967ms 0.1372ms 7.2891 KOps/s 7.3779 KOps/s $\color{#d91a1a}-1.20\%$
test_keys_nested_leaf 0.2504ms 0.1352ms 7.3939 KOps/s 7.4647 KOps/s $\color{#d91a1a}-0.95\%$
test_keys_stack_nested 5.1950ms 1.3304ms 751.6449 Ops/s 722.5429 Ops/s $\color{#35bf28}+4.03\%$
test_keys_stack_nested_leaf 1.4026ms 1.2750ms 784.3283 Ops/s 718.3891 Ops/s $\textbf{\color{#35bf28}+9.18\%}$
test_keys_stack_nested_locked 0.9216ms 0.6293ms 1.5891 KOps/s 1.6131 KOps/s $\color{#d91a1a}-1.49\%$
test_values 11.4237μs 1.1851μs 843.8122 KOps/s 840.0529 KOps/s $\color{#35bf28}+0.45\%$
test_values_nested 97.1920μs 46.9313μs 21.3078 KOps/s 20.5361 KOps/s $\color{#35bf28}+3.76\%$
test_values_nested_locked 0.1006ms 47.5879μs 21.0137 KOps/s 20.6198 KOps/s $\color{#35bf28}+1.91\%$
test_values_nested_leaf 96.0510μs 42.2372μs 23.6758 KOps/s 22.8973 KOps/s $\color{#35bf28}+3.40\%$
test_values_stack_nested 1.7276ms 1.1130ms 898.4781 Ops/s 829.3066 Ops/s $\textbf{\color{#35bf28}+8.34\%}$
test_values_stack_nested_leaf 1.2550ms 1.1004ms 908.7589 Ops/s 841.0574 Ops/s $\textbf{\color{#35bf28}+8.05\%}$
test_values_stack_nested_locked 0.8348ms 0.4954ms 2.0185 KOps/s 2.0759 KOps/s $\color{#d91a1a}-2.77\%$
test_membership 15.5990μs 1.3404μs 746.0672 KOps/s 741.6596 KOps/s $\color{#35bf28}+0.59\%$
test_membership_nested 23.7540μs 2.7659μs 361.5422 KOps/s 358.1436 KOps/s $\color{#35bf28}+0.95\%$
test_membership_nested_leaf 42.1790μs 2.7848μs 359.0900 KOps/s 360.5792 KOps/s $\color{#d91a1a}-0.41\%$
test_membership_stacked_nested 39.6240μs 11.4176μs 87.5843 KOps/s 88.0807 KOps/s $\color{#d91a1a}-0.56\%$
test_membership_stacked_nested_leaf 55.9850μs 11.3277μs 88.2794 KOps/s 88.2459 KOps/s $\color{#35bf28}+0.04\%$
test_membership_nested_last 24.7760μs 5.7278μs 174.5857 KOps/s 172.2179 KOps/s $\color{#35bf28}+1.37\%$
test_membership_nested_leaf_last 21.0800μs 5.7247μs 174.6813 KOps/s 166.7107 KOps/s $\color{#35bf28}+4.78\%$
test_membership_stacked_nested_last 0.3913ms 0.1772ms 5.6439 KOps/s 5.6516 KOps/s $\color{#d91a1a}-0.14\%$
test_membership_stacked_nested_leaf_last 40.9160μs 13.4471μs 74.3653 KOps/s 75.3878 KOps/s $\color{#d91a1a}-1.36\%$
test_nested_getleaf 45.4860μs 11.8564μs 84.3427 KOps/s 84.0394 KOps/s $\color{#35bf28}+0.36\%$
test_nested_get 46.6350μs 11.1719μs 89.5101 KOps/s 89.0916 KOps/s $\color{#35bf28}+0.47\%$
test_stacked_getleaf 0.9925ms 0.5792ms 1.7264 KOps/s 1.5247 KOps/s $\textbf{\color{#35bf28}+13.23\%}$
test_stacked_get 0.6278ms 0.5502ms 1.8177 KOps/s 1.5966 KOps/s $\textbf{\color{#35bf28}+13.85\%}$
test_nested_getitemleaf 34.9350μs 11.9062μs 83.9896 KOps/s 83.6307 KOps/s $\color{#35bf28}+0.43\%$
test_nested_getitem 51.2670μs 11.2211μs 89.1174 KOps/s 88.4688 KOps/s $\color{#35bf28}+0.73\%$
test_stacked_getitemleaf 0.8596ms 0.5785ms 1.7285 KOps/s 1.5282 KOps/s $\textbf{\color{#35bf28}+13.10\%}$
test_stacked_getitem 0.9713ms 0.5524ms 1.8102 KOps/s 1.5956 KOps/s $\textbf{\color{#35bf28}+13.45\%}$
test_lock_nested 53.9103ms 0.9434ms 1.0600 KOps/s 896.1367 Ops/s $\textbf{\color{#35bf28}+18.29\%}$
test_lock_stack_nested 67.9741ms 12.5843ms 79.4640 Ops/s 66.9588 Ops/s $\textbf{\color{#35bf28}+18.68\%}$
test_unlock_nested 54.3506ms 0.9564ms 1.0456 KOps/s 842.7497 Ops/s $\textbf{\color{#35bf28}+24.07\%}$
test_unlock_stack_nested 72.7516ms 13.0772ms 76.4688 Ops/s 63.6767 Ops/s $\textbf{\color{#35bf28}+20.09\%}$
test_flatten_speed 0.7630ms 0.6886ms 1.4521 KOps/s 1.2708 KOps/s $\textbf{\color{#35bf28}+14.27\%}$
test_unflatten_speed 1.7833ms 1.1957ms 836.3437 Ops/s 715.1534 Ops/s $\textbf{\color{#35bf28}+16.95\%}$
test_common_ops 0.7243ms 0.6222ms 1.6072 KOps/s 1.1750 KOps/s $\textbf{\color{#35bf28}+36.79\%}$
test_creation 18.2640μs 2.1775μs 459.2436 KOps/s 215.6383 KOps/s $\textbf{\color{#35bf28}+112.97\%}$
test_creation_empty 32.0400μs 7.4258μs 134.6649 KOps/s 96.1951 KOps/s $\textbf{\color{#35bf28}+39.99\%}$
test_creation_nested_1 30.8480μs 11.5200μs 86.8053 KOps/s 53.9119 KOps/s $\textbf{\color{#35bf28}+61.01\%}$
test_creation_nested_2 46.4470μs 13.8338μs 72.2867 KOps/s 48.1362 KOps/s $\textbf{\color{#35bf28}+50.17\%}$
test_clone 73.2880μs 10.6714μs 93.7084 KOps/s 56.1069 KOps/s $\textbf{\color{#35bf28}+67.02\%}$
test_getitem[int] 52.4790μs 13.1137μs 76.2561 KOps/s 48.0495 KOps/s $\textbf{\color{#35bf28}+58.70\%}$
test_getitem[slice_int] 85.6310μs 29.4133μs 33.9982 KOps/s 24.3198 KOps/s $\textbf{\color{#35bf28}+39.80\%}$
test_getitem[range] 0.1335ms 55.4246μs 18.0425 KOps/s 14.9624 KOps/s $\textbf{\color{#35bf28}+20.59\%}$
test_getitem[tuple] 55.9350μs 23.3660μs 42.7973 KOps/s 29.9441 KOps/s $\textbf{\color{#35bf28}+42.92\%}$
test_getitem[list] 0.2219ms 49.6428μs 20.1439 KOps/s 16.1251 KOps/s $\textbf{\color{#35bf28}+24.92\%}$
test_setitem_dim[int] 52.2680μs 26.3926μs 37.8894 KOps/s 37.7733 KOps/s $\color{#35bf28}+0.31\%$
test_setitem_dim[slice_int] 95.0680μs 49.8568μs 20.0574 KOps/s 19.2580 KOps/s $\color{#35bf28}+4.15\%$
test_setitem_dim[range] 0.1633ms 73.1597μs 13.6687 KOps/s 14.0077 KOps/s $\color{#d91a1a}-2.42\%$
test_setitem_dim[tuple] 74.6800μs 39.4273μs 25.3631 KOps/s 24.5337 KOps/s $\color{#35bf28}+3.38\%$
test_setitem 97.2620μs 14.7664μs 67.7212 KOps/s 43.1305 KOps/s $\textbf{\color{#35bf28}+57.01\%}$
test_set 0.1010ms 14.1035μs 70.9042 KOps/s 44.2877 KOps/s $\textbf{\color{#35bf28}+60.10\%}$
test_set_shared 1.6091ms 0.1625ms 6.1551 KOps/s 6.0987 KOps/s $\color{#35bf28}+0.92\%$
test_update 0.2170ms 18.9540μs 52.7592 KOps/s 38.5237 KOps/s $\textbf{\color{#35bf28}+36.95\%}$
test_update_nested 0.1171ms 28.4260μs 35.1790 KOps/s 25.6557 KOps/s $\textbf{\color{#35bf28}+37.12\%}$
test_set_nested 77.7060μs 16.3969μs 60.9873 KOps/s 40.6194 KOps/s $\textbf{\color{#35bf28}+50.14\%}$
test_set_nested_new 0.1244ms 22.3930μs 44.6568 KOps/s 25.3488 KOps/s $\textbf{\color{#35bf28}+76.17\%}$
test_select 0.1726ms 47.1530μs 21.2075 KOps/s 12.9721 KOps/s $\textbf{\color{#35bf28}+63.49\%}$
test_unbind_speed 0.5076ms 0.2918ms 3.4266 KOps/s 2.0365 KOps/s $\textbf{\color{#35bf28}+68.26\%}$
test_unbind_speed_stack0 63.7672ms 4.6376ms 215.6292 Ops/s 158.3695 Ops/s $\textbf{\color{#35bf28}+36.16\%}$
test_unbind_speed_stack1 2.0544μs 0.6130μs 1.6313 MOps/s 1.6486 MOps/s $\color{#d91a1a}-1.05\%$
test_creation[device0] 0.4010ms 0.2868ms 3.4862 KOps/s 3.5075 KOps/s $\color{#d91a1a}-0.61\%$
test_creation_from_tensor 3.3112ms 0.3215ms 3.1104 KOps/s 3.1271 KOps/s $\color{#d91a1a}-0.53\%$
test_add_one[memmap_tensor0] 0.4685ms 24.5745μs 40.6926 KOps/s 40.8964 KOps/s $\color{#d91a1a}-0.50\%$
test_contiguous[memmap_tensor0] 27.1310μs 5.6998μs 175.4451 KOps/s 174.9651 KOps/s $\color{#35bf28}+0.27\%$
test_stack[memmap_tensor0] 53.1800μs 18.8392μs 53.0808 KOps/s 54.4446 KOps/s $\color{#d91a1a}-2.50\%$
test_memmaptd_index 0.4509ms 0.2383ms 4.1958 KOps/s 4.1151 KOps/s $\color{#35bf28}+1.96\%$
test_memmaptd_index_astensor 1.1044ms 0.9109ms 1.0978 KOps/s 1.0681 KOps/s $\color{#35bf28}+2.79\%$
test_memmaptd_index_op 2.8473ms 2.1405ms 467.1745 Ops/s 451.7064 Ops/s $\color{#35bf28}+3.42\%$
test_reshape_pytree 77.4050μs 23.3870μs 42.7588 KOps/s 43.2620 KOps/s $\color{#d91a1a}-1.16\%$
test_reshape_td 54.6630μs 21.0596μs 47.4842 KOps/s 32.8756 KOps/s $\textbf{\color{#35bf28}+44.44\%}$
test_view_pytree 63.7090μs 23.2909μs 42.9352 KOps/s 43.2808 KOps/s $\color{#d91a1a}-0.80\%$
test_view_td 19.0550μs 4.1221μs 242.5935 KOps/s 155.4520 KOps/s $\textbf{\color{#35bf28}+56.06\%}$
test_unbind_pytree 68.3880μs 26.4016μs 37.8764 KOps/s 37.9928 KOps/s $\color{#d91a1a}-0.31\%$
test_unbind_td 82.6140μs 40.7063μs 24.5662 KOps/s 13.7284 KOps/s $\textbf{\color{#35bf28}+78.94\%}$
test_split_pytree 91.3110μs 26.4871μs 37.7542 KOps/s 38.5426 KOps/s $\color{#d91a1a}-2.05\%$
test_split_td 0.5411ms 55.1011μs 18.1485 KOps/s 12.0454 KOps/s $\textbf{\color{#35bf28}+50.67\%}$
test_add_pytree 73.0060μs 32.3796μs 30.8836 KOps/s 31.0878 KOps/s $\color{#d91a1a}-0.66\%$
test_add_td 2.7767ms 43.0637μs 23.2214 KOps/s 17.3195 KOps/s $\textbf{\color{#35bf28}+34.08\%}$
test_distributed 26.4700μs 6.0095μs 166.4025 KOps/s 161.1689 KOps/s $\color{#35bf28}+3.25\%$
test_tdmodule 0.1687ms 21.0695μs 47.4621 KOps/s 44.7298 KOps/s $\textbf{\color{#35bf28}+6.11\%}$
test_tdmodule_dispatch 0.2111ms 37.3291μs 26.7887 KOps/s 24.3657 KOps/s $\textbf{\color{#35bf28}+9.94\%}$
test_tdseq 0.1163ms 24.0006μs 41.6656 KOps/s 41.9140 KOps/s $\color{#d91a1a}-0.59\%$
test_tdseq_dispatch 0.1414ms 42.4701μs 23.5460 KOps/s 21.5663 KOps/s $\textbf{\color{#35bf28}+9.18\%}$
test_instantiation_functorch 1.4210ms 1.2976ms 770.6753 Ops/s 723.8403 Ops/s $\textbf{\color{#35bf28}+6.47\%}$
test_instantiation_td 1.6292ms 1.0503ms 952.0721 Ops/s 948.1749 Ops/s $\color{#35bf28}+0.41\%$
test_exec_functorch 0.2145ms 0.1435ms 6.9695 KOps/s 6.9405 KOps/s $\color{#35bf28}+0.42\%$
test_exec_td 0.2186ms 0.1407ms 7.1062 KOps/s 6.9808 KOps/s $\color{#35bf28}+1.80\%$
test_vmap_mlp_speed[True-True] 1.2700ms 0.8467ms 1.1810 KOps/s 1.0789 KOps/s $\textbf{\color{#35bf28}+9.47\%}$
test_vmap_mlp_speed[True-False] 0.7000ms 0.4569ms 2.1888 KOps/s 2.1132 KOps/s $\color{#35bf28}+3.58\%$
test_vmap_mlp_speed[False-True] 1.0869ms 0.7420ms 1.3477 KOps/s 1.2335 KOps/s $\textbf{\color{#35bf28}+9.26\%}$
test_vmap_mlp_speed[False-False] 0.5978ms 0.3792ms 2.6371 KOps/s 2.6389 KOps/s $\color{#d91a1a}-0.07\%$

@vmoens
Copy link
Contributor Author

vmoens commented Nov 2, 2023

woohoo! +112% on creation speed!
@albertbou92 @matteobettini @BY571 @skandermolla
In practice, for torchrl it's a >10% further speedup on data collection speed (but also impacts all other modules)
Can anyone review this?

@albertbou92
Copy link
Contributor

I see, so defining directly the attributes as class-level attributes is faster at runtime than doing attribute assignment in the new method, is that it?

@vmoens
Copy link
Contributor Author

vmoens commented Nov 3, 2023

Yes that and removing slots

@albertbou92
Copy link
Contributor

Yes that and removing slots

looking forward to see how it affects torchrl speed! :)

@vmoens vmoens merged commit a3595f4 into main Nov 3, 2023
31 of 43 checks passed
@vmoens vmoens deleted the faster_construction branch November 3, 2023 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactor Refactoring code - not a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants