Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] MemoryMappedTensor #541

Merged
merged 53 commits into from
Nov 14, 2023
Merged

[Refactor] MemoryMappedTensor #541

merged 53 commits into from
Nov 14, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Oct 10, 2023

This PR refactors MemmapTensor to a new Tensor-based class, MemoryMappedTensor.
This should be considerably faster.
MemmapTensor is kept within the library as a separate class which will raise a deprecation warning when created.

This change is bc-breaking in a subtle manner:
when creating a memmap tensordict, the backend will now be MemoryMappedTensor.
When indexed, MemoryMappedTensor will return an object from the same class only if the storage of the indexed object is the same as the original one (where MemmapTensor was always returning a MemmapTensor with a lazy index).
This in turn means that indexing a tensordict with, say, a tensor will now return a tensordict with tensors and not memmap valued tensors. For slices and other indexes that do not modify the storage, nothing will change.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 10, 2023
@github-actions
Copy link

github-actions bot commented Oct 12, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 105. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 30.8990μs 19.8620μs 50.3475 KOps/s 49.5428 KOps/s $\color{#35bf28}+1.62\%$
test_plain_set_stack_nested 0.2031ms 0.1816ms 5.5068 KOps/s 5.3938 KOps/s $\color{#35bf28}+2.09\%$
test_plain_set_nested_inplace 49.1990μs 23.6753μs 42.2380 KOps/s 42.2575 KOps/s $\color{#d91a1a}-0.05\%$
test_plain_set_stack_nested_inplace 0.2499ms 0.2213ms 4.5179 KOps/s 4.5202 KOps/s $\color{#d91a1a}-0.05\%$
test_items 0.1365ms 3.0734μs 325.3702 KOps/s 329.0883 KOps/s $\color{#d91a1a}-1.13\%$
test_items_nested 0.4348ms 0.3967ms 2.5211 KOps/s 2.6157 KOps/s $\color{#d91a1a}-3.62\%$
test_items_nested_locked 0.4206ms 0.3966ms 2.5212 KOps/s 2.7483 KOps/s $\textbf{\color{#d91a1a}-8.26\%}$
test_items_nested_leaf 1.2236ms 0.2370ms 4.2191 KOps/s 4.5363 KOps/s $\textbf{\color{#d91a1a}-6.99\%}$
test_items_stack_nested 1.8610ms 1.7989ms 555.8982 Ops/s 548.8150 Ops/s $\color{#35bf28}+1.29\%$
test_items_stack_nested_leaf 1.6357ms 1.6015ms 624.4084 Ops/s 605.7387 Ops/s $\color{#35bf28}+3.08\%$
test_items_stack_nested_locked 1.0711ms 0.9864ms 1.0138 KOps/s 1.0435 KOps/s $\color{#d91a1a}-2.85\%$
test_keys 37.8990μs 4.6576μs 214.7023 KOps/s 204.3353 KOps/s $\textbf{\color{#35bf28}+5.07\%}$
test_keys_nested 1.5733ms 0.1737ms 5.7584 KOps/s 5.3897 KOps/s $\textbf{\color{#35bf28}+6.84\%}$
test_keys_nested_locked 0.2180ms 0.1727ms 5.7888 KOps/s 5.7669 KOps/s $\color{#35bf28}+0.38\%$
test_keys_nested_leaf 1.5901ms 0.1703ms 5.8728 KOps/s 5.8245 KOps/s $\color{#35bf28}+0.83\%$
test_keys_stack_nested 1.6947ms 1.6046ms 623.1981 Ops/s 597.0795 Ops/s $\color{#35bf28}+4.37\%$
test_keys_stack_nested_leaf 1.7368ms 1.6032ms 623.7371 Ops/s 597.8020 Ops/s $\color{#35bf28}+4.34\%$
test_keys_stack_nested_locked 1.0023ms 0.7785ms 1.2846 KOps/s 1.2710 KOps/s $\color{#35bf28}+1.07\%$
test_values 20.1000μs 1.3236μs 755.5340 KOps/s 767.0575 KOps/s $\color{#d91a1a}-1.50\%$
test_values_nested 88.4990μs 64.4041μs 15.5270 KOps/s 14.8769 KOps/s $\color{#35bf28}+4.37\%$
test_values_nested_locked 85.7000μs 64.8049μs 15.4309 KOps/s 14.8543 KOps/s $\color{#35bf28}+3.88\%$
test_values_nested_leaf 77.4990μs 56.3876μs 17.7344 KOps/s 16.7470 KOps/s $\textbf{\color{#35bf28}+5.90\%}$
test_values_stack_nested 1.4905ms 1.4025ms 712.9937 Ops/s 654.9123 Ops/s $\textbf{\color{#35bf28}+8.87\%}$
test_values_stack_nested_leaf 1.4241ms 1.3926ms 718.0730 Ops/s 680.9733 Ops/s $\textbf{\color{#35bf28}+5.45\%}$
test_values_stack_nested_locked 0.7160ms 0.6306ms 1.5858 KOps/s 1.5678 KOps/s $\color{#35bf28}+1.15\%$
test_membership 54.3000μs 1.8802μs 531.8626 KOps/s 496.1601 KOps/s $\textbf{\color{#35bf28}+7.20\%}$
test_membership_nested 25.4000μs 3.7995μs 263.1918 KOps/s 260.2904 KOps/s $\color{#35bf28}+1.11\%$
test_membership_nested_leaf 24.3000μs 3.7869μs 264.0692 KOps/s 259.8244 KOps/s $\color{#35bf28}+1.63\%$
test_membership_stacked_nested 40.5000μs 14.9136μs 67.0530 KOps/s 61.5273 KOps/s $\textbf{\color{#35bf28}+8.98\%}$
test_membership_stacked_nested_leaf 36.1000μs 14.8780μs 67.2135 KOps/s 61.2329 KOps/s $\textbf{\color{#35bf28}+9.77\%}$
test_membership_nested_last 31.1000μs 7.6781μs 130.2397 KOps/s 127.9960 KOps/s $\color{#35bf28}+1.75\%$
test_membership_nested_leaf_last 28.2000μs 7.6658μs 130.4503 KOps/s 128.1257 KOps/s $\color{#35bf28}+1.81\%$
test_membership_stacked_nested_last 0.2507ms 0.2271ms 4.4026 KOps/s 4.3793 KOps/s $\color{#35bf28}+0.53\%$
test_membership_stacked_nested_leaf_last 42.4000μs 17.1021μs 58.4723 KOps/s 54.0353 KOps/s $\textbf{\color{#35bf28}+8.21\%}$
test_nested_getleaf 38.6000μs 15.7113μs 63.6486 KOps/s 63.7431 KOps/s $\color{#d91a1a}-0.15\%$
test_nested_get 37.8000μs 14.9417μs 66.9267 KOps/s 66.7999 KOps/s $\color{#35bf28}+0.19\%$
test_stacked_getleaf 0.8172ms 0.7248ms 1.3797 KOps/s 1.3307 KOps/s $\color{#35bf28}+3.69\%$
test_stacked_get 3.0511ms 0.7057ms 1.4171 KOps/s 1.3817 KOps/s $\color{#35bf28}+2.56\%$
test_nested_getitemleaf 0.1785ms 15.7278μs 63.5819 KOps/s 63.0654 KOps/s $\color{#35bf28}+0.82\%$
test_nested_getitem 39.3000μs 14.9090μs 67.0738 KOps/s 66.7099 KOps/s $\color{#35bf28}+0.55\%$
test_stacked_getitemleaf 0.8108ms 0.7239ms 1.3813 KOps/s 1.3315 KOps/s $\color{#35bf28}+3.75\%$
test_stacked_getitem 0.7621ms 0.6932ms 1.4427 KOps/s 1.3894 KOps/s $\color{#35bf28}+3.84\%$
test_lock_nested 56.2092ms 1.1674ms 856.5861 Ops/s 898.2715 Ops/s $\color{#d91a1a}-4.64\%$
test_lock_stack_nested 75.9773ms 15.4593ms 64.6858 Ops/s 64.7886 Ops/s $\color{#d91a1a}-0.16\%$
test_unlock_nested 52.2380ms 1.1721ms 853.1495 Ops/s 857.0762 Ops/s $\color{#d91a1a}-0.46\%$
test_unlock_stack_nested 73.9057ms 15.9683ms 62.6241 Ops/s 62.5045 Ops/s $\color{#35bf28}+0.19\%$
test_flatten_speed 0.9157ms 0.8464ms 1.1814 KOps/s 1.1403 KOps/s $\color{#35bf28}+3.60\%$
test_unflatten_speed 1.5477ms 1.4581ms 685.8198 Ops/s 679.7259 Ops/s $\color{#35bf28}+0.90\%$
test_common_ops 3.0637ms 0.7620ms 1.3124 KOps/s 1.2933 KOps/s $\color{#35bf28}+1.47\%$
test_creation 58.7990μs 3.0033μs 332.9638 KOps/s 338.7230 KOps/s $\color{#d91a1a}-1.70\%$
test_creation_empty 30.4000μs 9.4507μs 105.8122 KOps/s 103.1488 KOps/s $\color{#35bf28}+2.58\%$
test_creation_nested_1 38.2000μs 14.0648μs 71.0993 KOps/s 68.7925 KOps/s $\color{#35bf28}+3.35\%$
test_creation_nested_2 77.3990μs 17.5322μs 57.0378 KOps/s 56.2737 KOps/s $\color{#35bf28}+1.36\%$
test_clone 62.3000μs 14.7359μs 67.8616 KOps/s 67.1602 KOps/s $\color{#35bf28}+1.04\%$
test_getitem[int] 42.6000μs 17.6245μs 56.7392 KOps/s 56.5610 KOps/s $\color{#35bf28}+0.32\%$
test_getitem[slice_int] 80.4990μs 37.9480μs 26.3518 KOps/s 26.8056 KOps/s $\color{#d91a1a}-1.69\%$
test_getitem[range] 0.1490ms 61.5293μs 16.2524 KOps/s 16.4511 KOps/s $\color{#d91a1a}-1.21\%$
test_getitem[tuple] 65.1990μs 31.9996μs 31.2504 KOps/s 31.3216 KOps/s $\color{#d91a1a}-0.23\%$
test_getitem[list] 0.3077ms 56.6766μs 17.6440 KOps/s 17.7100 KOps/s $\color{#d91a1a}-0.37\%$
test_setitem_dim[int] 42.9000μs 32.4730μs 30.7948 KOps/s 30.3910 KOps/s $\color{#35bf28}+1.33\%$
test_setitem_dim[slice_int] 69.7000μs 58.6333μs 17.0552 KOps/s 17.0206 KOps/s $\color{#35bf28}+0.20\%$
test_setitem_dim[range] 94.9000μs 77.5515μs 12.8947 KOps/s 12.8177 KOps/s $\color{#35bf28}+0.60\%$
test_setitem_dim[tuple] 64.6000μs 49.5589μs 20.1780 KOps/s 20.4398 KOps/s $\color{#d91a1a}-1.28\%$
test_setitem 97.1990μs 19.5600μs 51.1247 KOps/s 50.2497 KOps/s $\color{#35bf28}+1.74\%$
test_set 95.0000μs 18.9636μs 52.7326 KOps/s 51.7950 KOps/s $\color{#35bf28}+1.81\%$
test_set_shared 2.5620ms 0.1613ms 6.1990 KOps/s 6.2644 KOps/s $\color{#d91a1a}-1.04\%$
test_update 0.1102ms 24.5620μs 40.7133 KOps/s 40.2141 KOps/s $\color{#35bf28}+1.24\%$
test_update_nested 0.1092ms 34.5724μs 28.9248 KOps/s 28.8636 KOps/s $\color{#35bf28}+0.21\%$
test_set_nested 86.1000μs 20.6430μs 48.4427 KOps/s 47.6956 KOps/s $\color{#35bf28}+1.57\%$
test_set_nested_new 87.5990μs 28.1364μs 35.5411 KOps/s 34.5961 KOps/s $\color{#35bf28}+2.73\%$
test_select 0.1231ms 59.4096μs 16.8323 KOps/s 17.0197 KOps/s $\color{#d91a1a}-1.10\%$
test_unbind_speed 0.4211ms 0.3689ms 2.7106 KOps/s 2.5821 KOps/s $\color{#35bf28}+4.98\%$
test_unbind_speed_stack0 59.6449ms 5.2735ms 189.6275 Ops/s 180.9227 Ops/s $\color{#35bf28}+4.81\%$
test_unbind_speed_stack1 13.7998μs 0.9610μs 1.0406 MOps/s 872.9610 KOps/s $\textbf{\color{#35bf28}+19.21\%}$
test_creation[device0] 2.0331ms 0.3531ms 2.8324 KOps/s 2.9005 KOps/s $\color{#d91a1a}-2.35\%$
test_creation_from_tensor 57.4586ms 0.4327ms 2.3113 KOps/s 2.6088 KOps/s $\textbf{\color{#d91a1a}-11.40\%}$
test_add_one[memmap_tensor0] 0.1614ms 30.4083μs 32.8858 KOps/s 32.3060 KOps/s $\color{#35bf28}+1.79\%$
test_contiguous[memmap_tensor0] 30.9000μs 8.3491μs 119.7740 KOps/s 115.7872 KOps/s $\color{#35bf28}+3.44\%$
test_stack[memmap_tensor0] 62.1990μs 25.6047μs 39.0553 KOps/s 38.7801 KOps/s $\color{#35bf28}+0.71\%$
test_memmaptd_index 0.3390ms 0.2737ms 3.6534 KOps/s 3.4439 KOps/s $\textbf{\color{#35bf28}+6.08\%}$
test_memmaptd_index_astensor 0.4017ms 0.3438ms 2.9085 KOps/s 920.5795 Ops/s $\textbf{\color{#35bf28}+215.94\%}$
test_memmaptd_index_op 0.7338ms 0.6646ms 1.5047 KOps/s 426.7540 Ops/s $\textbf{\color{#35bf28}+252.58\%}$
test_reshape_pytree 0.1094ms 31.8367μs 31.4103 KOps/s 31.0341 KOps/s $\color{#35bf28}+1.21\%$
test_reshape_td 54.8000μs 28.1369μs 35.5406 KOps/s 34.6788 KOps/s $\color{#35bf28}+2.49\%$
test_view_pytree 77.8000μs 31.2610μs 31.9887 KOps/s 31.6496 KOps/s $\color{#35bf28}+1.07\%$
test_view_td 17.0000μs 5.5344μs 180.6868 KOps/s 182.5779 KOps/s $\color{#d91a1a}-1.04\%$
test_unbind_pytree 96.2990μs 37.3767μs 26.7547 KOps/s 26.8546 KOps/s $\color{#d91a1a}-0.37\%$
test_unbind_td 98.6990μs 53.4983μs 18.6922 KOps/s 18.6957 KOps/s $\color{#d91a1a}-0.02\%$
test_split_pytree 62.9000μs 36.7675μs 27.1979 KOps/s 27.4156 KOps/s $\color{#d91a1a}-0.79\%$
test_split_td 0.1783ms 96.6925μs 10.3421 KOps/s 10.2860 KOps/s $\color{#35bf28}+0.55\%$
test_add_pytree 78.8000μs 45.0505μs 22.1973 KOps/s 22.2857 KOps/s $\color{#d91a1a}-0.40\%$
test_add_td 81.0000μs 55.9716μs 17.8662 KOps/s 17.7740 KOps/s $\color{#35bf28}+0.52\%$
test_distributed 54.0000μs 8.2250μs 121.5802 KOps/s 122.6607 KOps/s $\color{#d91a1a}-0.88\%$
test_tdmodule 0.1141ms 24.4962μs 40.8227 KOps/s 40.2242 KOps/s $\color{#35bf28}+1.49\%$
test_tdmodule_dispatch 0.2154ms 44.5125μs 22.4656 KOps/s 22.6101 KOps/s $\color{#d91a1a}-0.64\%$
test_tdseq 0.1390ms 26.4022μs 37.8757 KOps/s 37.8803 KOps/s $\color{#d91a1a}-0.01\%$
test_tdseq_dispatch 0.5242ms 46.5668μs 21.4745 KOps/s 21.6141 KOps/s $\color{#d91a1a}-0.65\%$
test_instantiation_functorch 1.6548ms 1.5387ms 649.8968 Ops/s 648.0128 Ops/s $\color{#35bf28}+0.29\%$
test_instantiation_td 1.8319ms 1.2300ms 813.0208 Ops/s 756.6224 Ops/s $\textbf{\color{#35bf28}+7.45\%}$
test_exec_functorch 0.2683ms 0.1860ms 5.3768 KOps/s 5.4536 KOps/s $\color{#d91a1a}-1.41\%$
test_exec_td 0.2141ms 0.1773ms 5.6397 KOps/s 5.7691 KOps/s $\color{#d91a1a}-2.24\%$
test_vmap_mlp_speed[True-True] 6.8421ms 0.9894ms 1.0107 KOps/s 1.0174 KOps/s $\color{#d91a1a}-0.67\%$
test_vmap_mlp_speed[True-False] 6.3454ms 0.5304ms 1.8853 KOps/s 1.9025 KOps/s $\color{#d91a1a}-0.91\%$
test_vmap_mlp_speed[False-True] 1.1983ms 0.8495ms 1.1772 KOps/s 1.1630 KOps/s $\color{#35bf28}+1.22\%$
test_vmap_mlp_speed[False-False] 6.2969ms 0.4366ms 2.2906 KOps/s 2.3035 KOps/s $\color{#d91a1a}-0.56\%$

@vmoens vmoens added enhancement New feature or request Refactor Refactoring code - not a new feature labels Oct 25, 2023
# Conflicts:
#	tensordict/tensordict.py
#	test/test_tensordict.py
@vmoens vmoens marked this pull request as ready for review November 14, 2023 15:49
vmoens added a commit to pytorch/rl that referenced this pull request Nov 14, 2023
@vmoens vmoens merged commit f601dfa into main Nov 14, 2023
40 of 43 checks passed
@vmoens vmoens deleted the memmap_tensor_refact branch November 14, 2023 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BC-breaking CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request Refactor Refactoring code - not a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants