Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Extend RB with lazy stack (revamp) #2454

Merged
merged 1 commit into from
Sep 25, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Sep 25, 2024

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Sep 25, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2454

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 19 Unrelated Failures

As of commit 94d5626 with merge base 1aca00e (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Sep 25, 2024
ghstack-source-id: df397d09166d8fb61eceacb5fe8659e0295ca414
Pull Request resolved: #2454
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 25, 2024
@vmoens vmoens merged commit 94d5626 into gh/vmoens/30/base Sep 25, 2024
27 of 45 checks passed
vmoens added a commit that referenced this pull request Sep 25, 2024
ghstack-source-id: df397d09166d8fb61eceacb5fe8659e0295ca414
Pull Request resolved: #2454
@vmoens vmoens deleted the gh/vmoens/30/head branch September 25, 2024 05:44
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 146. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}19$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 59.7008ms 59.0162ms 16.9445 Ops/s 16.8659 Ops/s $\color{#35bf28}+0.47\%$
test_sync 50.9551ms 33.5245ms 29.8289 Ops/s 30.9393 Ops/s $\color{#d91a1a}-3.59\%$
test_async 60.2890ms 31.2443ms 32.0059 Ops/s 31.9267 Ops/s $\color{#35bf28}+0.25\%$
test_simple 0.4953s 0.4242s 2.3575 Ops/s 2.4468 Ops/s $\color{#d91a1a}-3.65\%$
test_transformed 0.5593s 0.5580s 1.7921 Ops/s 1.7788 Ops/s $\color{#35bf28}+0.75\%$
test_serial 1.2679s 1.2627s 0.7919 Ops/s 0.7732 Ops/s $\color{#35bf28}+2.42\%$
test_parallel 1.2083s 1.1327s 0.8829 Ops/s 0.8875 Ops/s $\color{#d91a1a}-0.53\%$
test_step_mdp_speed[True-True-True-True-True] 0.2043ms 28.8817μs 34.6240 KOps/s 36.4023 KOps/s $\color{#d91a1a}-4.89\%$
test_step_mdp_speed[True-True-True-True-False] 41.6380μs 16.6464μs 60.0732 KOps/s 62.3295 KOps/s $\color{#d91a1a}-3.62\%$
test_step_mdp_speed[True-True-True-False-True] 66.3320μs 16.1267μs 62.0089 KOps/s 63.4118 KOps/s $\color{#d91a1a}-2.21\%$
test_step_mdp_speed[True-True-True-False-False] 34.1640μs 9.4612μs 105.6946 KOps/s 107.4701 KOps/s $\color{#d91a1a}-1.65\%$
test_step_mdp_speed[True-True-False-True-True] 83.3350μs 30.6884μs 32.5856 KOps/s 34.2959 KOps/s $\color{#d91a1a}-4.99\%$
test_step_mdp_speed[True-True-False-True-False] 44.1520μs 18.4885μs 54.0877 KOps/s 56.5557 KOps/s $\color{#d91a1a}-4.36\%$
test_step_mdp_speed[True-True-False-False-True] 52.4080μs 18.1858μs 54.9879 KOps/s 57.8660 KOps/s $\color{#d91a1a}-4.97\%$
test_step_mdp_speed[True-True-False-False-False] 41.8680μs 11.3398μs 88.1847 KOps/s 91.6627 KOps/s $\color{#d91a1a}-3.79\%$
test_step_mdp_speed[True-False-True-True-True] 76.9130μs 32.1581μs 31.0964 KOps/s 32.3224 KOps/s $\color{#d91a1a}-3.79\%$
test_step_mdp_speed[True-False-True-True-False] 67.9380μs 20.1698μs 49.5791 KOps/s 52.1619 KOps/s $\color{#d91a1a}-4.95\%$
test_step_mdp_speed[True-False-True-False-True] 71.4330μs 17.8931μs 55.8876 KOps/s 57.8386 KOps/s $\color{#d91a1a}-3.37\%$
test_step_mdp_speed[True-False-True-False-False] 42.8900μs 11.2165μs 89.1547 KOps/s 92.6168 KOps/s $\color{#d91a1a}-3.74\%$
test_step_mdp_speed[True-False-False-True-True] 79.7990μs 34.2264μs 29.2172 KOps/s 31.1384 KOps/s $\textbf{\color{#d91a1a}-6.17\%}$
test_step_mdp_speed[True-False-False-True-False] 0.1534ms 22.7357μs 43.9837 KOps/s 48.1478 KOps/s $\textbf{\color{#d91a1a}-8.65\%}$
test_step_mdp_speed[True-False-False-False-True] 0.1558ms 20.4174μs 48.9778 KOps/s 53.3092 KOps/s $\textbf{\color{#d91a1a}-8.13\%}$
test_step_mdp_speed[True-False-False-False-False] 46.5270μs 13.1516μs 76.0366 KOps/s 81.0268 KOps/s $\textbf{\color{#d91a1a}-6.16\%}$
test_step_mdp_speed[False-True-True-True-True] 74.9000μs 32.4615μs 30.8058 KOps/s 32.8851 KOps/s $\textbf{\color{#d91a1a}-6.32\%}$
test_step_mdp_speed[False-True-True-True-False] 51.2150μs 20.3499μs 49.1403 KOps/s 51.5348 KOps/s $\color{#d91a1a}-4.65\%$
test_step_mdp_speed[False-True-True-False-True] 58.5900μs 21.0098μs 47.5968 KOps/s 50.5592 KOps/s $\textbf{\color{#d91a1a}-5.86\%}$
test_step_mdp_speed[False-True-True-False-False] 41.1170μs 12.5358μs 79.7715 KOps/s 83.2557 KOps/s $\color{#d91a1a}-4.18\%$
test_step_mdp_speed[False-True-False-True-True] 0.1846ms 34.1821μs 29.2551 KOps/s 31.3256 KOps/s $\textbf{\color{#d91a1a}-6.61\%}$
test_step_mdp_speed[False-True-False-True-False] 68.0670μs 22.3207μs 44.8014 KOps/s 47.6787 KOps/s $\textbf{\color{#d91a1a}-6.03\%}$
test_step_mdp_speed[False-True-False-False-True] 2.7544ms 22.3982μs 44.6465 KOps/s 46.8903 KOps/s $\color{#d91a1a}-4.79\%$
test_step_mdp_speed[False-True-False-False-False] 41.1870μs 14.1871μs 70.4865 KOps/s 73.5938 KOps/s $\color{#d91a1a}-4.22\%$
test_step_mdp_speed[False-False-True-True-True] 76.9930μs 35.8484μs 27.8952 KOps/s 29.5235 KOps/s $\textbf{\color{#d91a1a}-5.52\%}$
test_step_mdp_speed[False-False-True-True-False] 0.1216ms 23.8869μs 41.8640 KOps/s 44.6134 KOps/s $\textbf{\color{#d91a1a}-6.16\%}$
test_step_mdp_speed[False-False-True-False-True] 48.4600μs 22.3431μs 44.7565 KOps/s 47.1149 KOps/s $\textbf{\color{#d91a1a}-5.01\%}$
test_step_mdp_speed[False-False-True-False-False] 40.5660μs 14.3057μs 69.9021 KOps/s 73.9854 KOps/s $\textbf{\color{#d91a1a}-5.52\%}$
test_step_mdp_speed[False-False-False-True-True] 82.4630μs 37.2956μs 26.8128 KOps/s 28.6316 KOps/s $\textbf{\color{#d91a1a}-6.35\%}$
test_step_mdp_speed[False-False-False-True-False] 95.2480μs 25.1453μs 39.7689 KOps/s 42.0943 KOps/s $\textbf{\color{#d91a1a}-5.52\%}$
test_step_mdp_speed[False-False-False-False-True] 58.0990μs 23.7317μs 42.1377 KOps/s 44.4648 KOps/s $\textbf{\color{#d91a1a}-5.23\%}$
test_step_mdp_speed[False-False-False-False-False] 47.9890μs 15.9290μs 62.7785 KOps/s 66.6134 KOps/s $\textbf{\color{#d91a1a}-5.76\%}$
test_values[generalized_advantage_estimate-True-True] 10.5180ms 9.5965ms 104.2047 Ops/s 103.5642 Ops/s $\color{#35bf28}+0.62\%$
test_values[vec_generalized_advantage_estimate-True-True] 39.1229ms 35.5597ms 28.1217 Ops/s 28.1936 Ops/s $\color{#d91a1a}-0.26\%$
test_values[td0_return_estimate-False-False] 0.2255ms 0.1654ms 6.0468 KOps/s 5.6812 KOps/s $\textbf{\color{#35bf28}+6.43\%}$
test_values[td1_return_estimate-False-False] 25.9534ms 23.5516ms 42.4600 Ops/s 42.0268 Ops/s $\color{#35bf28}+1.03\%$
test_values[vec_td1_return_estimate-False-False] 40.8499ms 35.9967ms 27.7803 Ops/s 28.0167 Ops/s $\color{#d91a1a}-0.84\%$
test_values[td_lambda_return_estimate-True-False] 36.1039ms 34.1112ms 29.3159 Ops/s 29.0100 Ops/s $\color{#35bf28}+1.05\%$
test_values[vec_td_lambda_return_estimate-True-False] 37.9263ms 35.9013ms 27.8541 Ops/s 28.1810 Ops/s $\color{#d91a1a}-1.16\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 12.2678ms 8.3839ms 119.2761 Ops/s 119.2263 Ops/s $\color{#35bf28}+0.04\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.5708ms 2.0225ms 494.4432 Ops/s 492.8853 Ops/s $\color{#35bf28}+0.32\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4166ms 0.3562ms 2.8073 KOps/s 2.8152 KOps/s $\color{#d91a1a}-0.28\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 48.1637ms 46.4082ms 21.5479 Ops/s 19.9803 Ops/s $\textbf{\color{#35bf28}+7.85\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.1059ms 3.0287ms 330.1788 Ops/s 330.4802 Ops/s $\color{#d91a1a}-0.09\%$
test_dqn_speed[False-None] 6.3770ms 1.3175ms 759.0085 Ops/s 761.7204 Ops/s $\color{#d91a1a}-0.36\%$
test_dqn_speed[False-backward] 1.8742ms 1.7881ms 559.2654 Ops/s 562.6638 Ops/s $\color{#d91a1a}-0.60\%$
test_dqn_speed[True-None] 0.6517ms 0.4570ms 2.1882 KOps/s 2.1949 KOps/s $\color{#d91a1a}-0.30\%$
test_dqn_speed[True-backward] 0.9377ms 0.8630ms 1.1587 KOps/s 1.1389 KOps/s $\color{#35bf28}+1.74\%$
test_dqn_speed[reduce-overhead-None] 0.6001ms 0.4579ms 2.1839 KOps/s 2.1754 KOps/s $\color{#35bf28}+0.39\%$
test_dqn_speed[reduce-overhead-backward] 0.9235ms 0.8650ms 1.1560 KOps/s 1.1520 KOps/s $\color{#35bf28}+0.35\%$
test_ddpg_speed[False-None] 3.4691ms 2.7348ms 365.6627 Ops/s 363.8044 Ops/s $\color{#35bf28}+0.51\%$
test_ddpg_speed[False-backward] 4.1752ms 3.8763ms 257.9773 Ops/s 257.3192 Ops/s $\color{#35bf28}+0.26\%$
test_ddpg_speed[True-None] 1.2307ms 0.9976ms 1.0024 KOps/s 982.4358 Ops/s $\color{#35bf28}+2.03\%$
test_ddpg_speed[True-backward] 1.9419ms 1.8706ms 534.5834 Ops/s 456.8015 Ops/s $\textbf{\color{#35bf28}+17.03\%}$
test_ddpg_speed[reduce-overhead-None] 1.3439ms 0.9853ms 1.0149 KOps/s 982.9051 Ops/s $\color{#35bf28}+3.26\%$
test_ddpg_speed[reduce-overhead-backward] 2.0191ms 1.8906ms 528.9437 Ops/s 529.7543 Ops/s $\color{#d91a1a}-0.15\%$
test_sac_speed[False-None] 9.7064ms 7.7600ms 128.8661 Ops/s 128.1357 Ops/s $\color{#35bf28}+0.57\%$
test_sac_speed[False-backward] 10.7264ms 10.3918ms 96.2294 Ops/s 94.0116 Ops/s $\color{#35bf28}+2.36\%$
test_sac_speed[True-None] 2.3036ms 1.8314ms 546.0182 Ops/s 539.9378 Ops/s $\color{#35bf28}+1.13\%$
test_sac_speed[True-backward] 3.5573ms 3.4914ms 286.4140 Ops/s 273.1227 Ops/s $\color{#35bf28}+4.87\%$
test_sac_speed[reduce-overhead-None] 2.2958ms 1.8242ms 548.1893 Ops/s 529.5642 Ops/s $\color{#35bf28}+3.52\%$
test_sac_speed[reduce-overhead-backward] 3.5664ms 3.4918ms 286.3885 Ops/s 278.0525 Ops/s $\color{#35bf28}+3.00\%$
test_redq_speed[False-None] 14.1767ms 12.3967ms 80.6666 Ops/s 76.1387 Ops/s $\textbf{\color{#35bf28}+5.95\%}$
test_redq_speed[False-backward] 24.3366ms 21.8858ms 45.6916 Ops/s 44.6358 Ops/s $\color{#35bf28}+2.37\%$
test_redq_speed[True-None] 5.1236ms 4.4427ms 225.0907 Ops/s 212.7361 Ops/s $\textbf{\color{#35bf28}+5.81\%}$
test_redq_speed[True-backward] 13.5682ms 11.7347ms 85.2175 Ops/s 78.6893 Ops/s $\textbf{\color{#35bf28}+8.30\%}$
test_redq_speed[reduce-overhead-None] 5.0364ms 4.4477ms 224.8349 Ops/s 213.0392 Ops/s $\textbf{\color{#35bf28}+5.54\%}$
test_redq_speed[reduce-overhead-backward] 13.7050ms 11.7988ms 84.7544 Ops/s 82.0781 Ops/s $\color{#35bf28}+3.26\%$
test_redq_deprec_speed[False-None] 14.2965ms 12.3491ms 80.9778 Ops/s 79.7682 Ops/s $\color{#35bf28}+1.52\%$
test_redq_deprec_speed[False-backward] 19.5050ms 18.0120ms 55.5186 Ops/s 53.8551 Ops/s $\color{#35bf28}+3.09\%$
test_redq_deprec_speed[True-None] 4.2065ms 3.5117ms 284.7641 Ops/s 278.6217 Ops/s $\color{#35bf28}+2.20\%$
test_redq_deprec_speed[True-backward] 8.9014ms 7.8844ms 126.8328 Ops/s 121.7378 Ops/s $\color{#35bf28}+4.19\%$
test_redq_deprec_speed[reduce-overhead-None] 3.9460ms 3.5066ms 285.1773 Ops/s 276.6437 Ops/s $\color{#35bf28}+3.08\%$
test_redq_deprec_speed[reduce-overhead-backward] 7.9442ms 7.8454ms 127.4626 Ops/s 122.9452 Ops/s $\color{#35bf28}+3.67\%$
test_td3_speed[False-None] 32.8361ms 7.8319ms 127.6834 Ops/s 128.2225 Ops/s $\color{#d91a1a}-0.42\%$
test_td3_speed[False-backward] 11.6207ms 9.9746ms 100.2551 Ops/s 97.5034 Ops/s $\color{#35bf28}+2.82\%$
test_td3_speed[True-None] 2.0948ms 1.8925ms 528.3997 Ops/s 501.5571 Ops/s $\textbf{\color{#35bf28}+5.35\%}$
test_td3_speed[True-backward] 3.5177ms 3.4781ms 287.5171 Ops/s 275.9944 Ops/s $\color{#35bf28}+4.17\%$
test_td3_speed[reduce-overhead-None] 2.1102ms 1.8974ms 527.0309 Ops/s 501.6961 Ops/s $\textbf{\color{#35bf28}+5.05\%}$
test_td3_speed[reduce-overhead-backward] 3.5435ms 3.4733ms 287.9105 Ops/s 279.8449 Ops/s $\color{#35bf28}+2.88\%$
test_cql_speed[False-None] 37.8800ms 35.1273ms 28.4678 Ops/s 27.6976 Ops/s $\color{#35bf28}+2.78\%$
test_cql_speed[False-backward] 45.3033ms 43.8337ms 22.8135 Ops/s 21.3169 Ops/s $\textbf{\color{#35bf28}+7.02\%}$
test_cql_speed[True-None] 16.3302ms 15.2145ms 65.7266 Ops/s 62.7233 Ops/s $\color{#35bf28}+4.79\%$
test_cql_speed[True-backward] 23.2094ms 21.5542ms 46.3946 Ops/s 42.3446 Ops/s $\textbf{\color{#35bf28}+9.56\%}$
test_cql_speed[reduce-overhead-None] 18.8605ms 15.5985ms 64.1087 Ops/s 61.7505 Ops/s $\color{#35bf28}+3.82\%$
test_cql_speed[reduce-overhead-backward] 22.6793ms 21.4498ms 46.6206 Ops/s 45.2072 Ops/s $\color{#35bf28}+3.13\%$
test_a2c_speed[False-None] 9.1138ms 7.0078ms 142.6984 Ops/s 139.0345 Ops/s $\color{#35bf28}+2.64\%$
test_a2c_speed[False-backward] 14.9763ms 13.8517ms 72.1931 Ops/s 70.4961 Ops/s $\color{#35bf28}+2.41\%$
test_a2c_speed[True-None] 3.5670ms 3.2688ms 305.9187 Ops/s 298.1687 Ops/s $\color{#35bf28}+2.60\%$
test_a2c_speed[True-backward] 10.1304ms 9.6472ms 103.6575 Ops/s 101.7842 Ops/s $\color{#35bf28}+1.84\%$
test_a2c_speed[reduce-overhead-None] 3.6111ms 3.2848ms 304.4354 Ops/s 300.2094 Ops/s $\color{#35bf28}+1.41\%$
test_a2c_speed[reduce-overhead-backward] 10.0560ms 9.6340ms 103.7995 Ops/s 99.2324 Ops/s $\color{#35bf28}+4.60\%$
test_ppo_speed[False-None] 10.0193ms 7.2986ms 137.0132 Ops/s 132.0465 Ops/s $\color{#35bf28}+3.76\%$
test_ppo_speed[False-backward] 15.7692ms 14.2016ms 70.4148 Ops/s 66.7202 Ops/s $\textbf{\color{#35bf28}+5.54\%}$
test_ppo_speed[True-None] 4.1647ms 3.6677ms 272.6504 Ops/s 267.3288 Ops/s $\color{#35bf28}+1.99\%$
test_ppo_speed[True-backward] 9.8184ms 9.5014ms 105.2473 Ops/s 103.9136 Ops/s $\color{#35bf28}+1.28\%$
test_ppo_speed[reduce-overhead-None] 4.3656ms 3.6640ms 272.9255 Ops/s 264.7567 Ops/s $\color{#35bf28}+3.09\%$
test_ppo_speed[reduce-overhead-backward] 10.7009ms 9.5358ms 104.8683 Ops/s 103.4278 Ops/s $\color{#35bf28}+1.39\%$
test_reinforce_speed[False-None] 8.2065ms 6.3845ms 156.6287 Ops/s 154.1112 Ops/s $\color{#35bf28}+1.63\%$
test_reinforce_speed[False-backward] 10.5049ms 9.5750ms 104.4386 Ops/s 103.1643 Ops/s $\color{#35bf28}+1.24\%$
test_reinforce_speed[True-None] 3.1674ms 2.6007ms 384.5149 Ops/s 374.4973 Ops/s $\color{#35bf28}+2.67\%$
test_reinforce_speed[True-backward] 8.8513ms 8.4804ms 117.9193 Ops/s 115.7743 Ops/s $\color{#35bf28}+1.85\%$
test_reinforce_speed[reduce-overhead-None] 3.0220ms 2.6061ms 383.7171 Ops/s 375.6870 Ops/s $\color{#35bf28}+2.14\%$
test_reinforce_speed[reduce-overhead-backward] 9.2844ms 8.5140ms 117.4534 Ops/s 116.1830 Ops/s $\color{#35bf28}+1.09\%$
test_iql_speed[False-None] 32.6484ms 31.4172ms 31.8297 Ops/s 31.9827 Ops/s $\color{#d91a1a}-0.48\%$
test_iql_speed[False-backward] 45.2518ms 43.7977ms 22.8322 Ops/s 22.7793 Ops/s $\color{#35bf28}+0.23\%$
test_iql_speed[True-None] 15.2359ms 13.0279ms 76.7581 Ops/s 75.1779 Ops/s $\color{#35bf28}+2.10\%$
test_iql_speed[True-backward] 25.0131ms 23.7066ms 42.1824 Ops/s 40.9696 Ops/s $\color{#35bf28}+2.96\%$
test_iql_speed[reduce-overhead-None] 14.1194ms 12.9980ms 76.9351 Ops/s 72.3799 Ops/s $\textbf{\color{#35bf28}+6.29\%}$
test_iql_speed[reduce-overhead-backward] 24.8800ms 23.6639ms 42.2585 Ops/s 40.1365 Ops/s $\textbf{\color{#35bf28}+5.29\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.3662ms 4.9981ms 200.0769 Ops/s 196.5500 Ops/s $\color{#35bf28}+1.79\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.1263ms 0.4723ms 2.1174 KOps/s 2.0761 KOps/s $\color{#35bf28}+1.99\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6543ms 0.4461ms 2.2418 KOps/s 2.2155 KOps/s $\color{#35bf28}+1.19\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 7.1788ms 4.9545ms 201.8380 Ops/s 200.8335 Ops/s $\color{#35bf28}+0.50\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.6490ms 0.4666ms 2.1433 KOps/s 2.1227 KOps/s $\color{#35bf28}+0.97\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6529ms 0.4426ms 2.2595 KOps/s 2.2223 KOps/s $\color{#35bf28}+1.67\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.2908ms 1.5897ms 629.0635 Ops/s 619.6046 Ops/s $\color{#35bf28}+1.53\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.1048ms 1.5014ms 666.0585 Ops/s 658.6355 Ops/s $\color{#35bf28}+1.13\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.5653ms 5.0904ms 196.4472 Ops/s 183.0738 Ops/s $\textbf{\color{#35bf28}+7.30\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.1770ms 0.6026ms 1.6595 KOps/s 1.6078 KOps/s $\color{#35bf28}+3.21\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8452ms 0.5761ms 1.7357 KOps/s 1.6838 KOps/s $\color{#35bf28}+3.08\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.3175ms 5.0013ms 199.9487 Ops/s 194.8685 Ops/s $\color{#35bf28}+2.61\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.4039ms 0.4745ms 2.1074 KOps/s 2.0710 KOps/s $\color{#35bf28}+1.76\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6265ms 0.4465ms 2.2396 KOps/s 2.1621 KOps/s $\color{#35bf28}+3.58\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.1214ms 4.8639ms 205.5975 Ops/s 196.5083 Ops/s $\color{#35bf28}+4.63\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0647ms 0.4691ms 2.1317 KOps/s 2.1144 KOps/s $\color{#35bf28}+0.82\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6736ms 0.4413ms 2.2659 KOps/s 2.2104 KOps/s $\color{#35bf28}+2.51\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.2305ms 5.0823ms 196.7594 Ops/s 191.0756 Ops/s $\color{#35bf28}+2.97\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.3155ms 0.6076ms 1.6458 KOps/s 1.6069 KOps/s $\color{#35bf28}+2.42\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7734ms 0.5744ms 1.7411 KOps/s 1.6971 KOps/s $\color{#35bf28}+2.59\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 5.8774ms 4.1869ms 238.8411 Ops/s 231.2569 Ops/s $\color{#35bf28}+3.28\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 7.8082ms 2.2320ms 448.0239 Ops/s 532.3904 Ops/s $\textbf{\color{#d91a1a}-15.85\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.7339ms 1.2326ms 811.2652 Ops/s 738.9698 Ops/s $\textbf{\color{#35bf28}+9.78\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.3481s 11.0305ms 90.6578 Ops/s 238.8695 Ops/s $\textbf{\color{#d91a1a}-62.05\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 5.9160ms 2.1681ms 461.2272 Ops/s 462.7944 Ops/s $\color{#d91a1a}-0.34\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 6.2493ms 1.4226ms 702.9195 Ops/s 775.1260 Ops/s $\textbf{\color{#d91a1a}-9.32\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.8464ms 4.3686ms 228.9038 Ops/s 228.9735 Ops/s $\color{#d91a1a}-0.03\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 3.4092ms 2.1046ms 475.1500 Ops/s 431.2705 Ops/s $\textbf{\color{#35bf28}+10.17\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 5.9056ms 1.4571ms 686.2988 Ops/s 673.4858 Ops/s $\color{#35bf28}+1.90\%$

Copy link

Result of GPU Benchmark Tests

Expand to view detailed results
Name Max Mean Ops
test_single 0.1009s 99.3336ms 10.0671 Ops/s
test_sync 91.4315ms 87.2442ms 11.4621 Ops/s
test_async 0.1667s 82.1971ms 12.1659 Ops/s
test_single_pixels 0.1072s 0.1070s 9.3468 Ops/s
test_sync_pixels 71.3487ms 69.5045ms 14.3876 Ops/s
test_async_pixels 0.1228s 66.2282ms 15.0993 Ops/s
test_simple 0.7081s 0.7078s 1.4129 Ops/s
test_transformed 0.9446s 0.9398s 1.0641 Ops/s
test_serial 2.0275s 2.0249s 0.4938 Ops/s
test_parallel 1.8221s 1.7742s 0.5636 Ops/s
test_step_mdp_speed[True-True-True-True-True] 0.2143ms 34.9301μs 28.6286 KOps/s
test_step_mdp_speed[True-True-True-True-False] 66.3610μs 20.1673μs 49.5852 KOps/s
test_step_mdp_speed[True-True-True-False-True] 51.2910μs 20.0929μs 49.7688 KOps/s
test_step_mdp_speed[True-True-True-False-False] 77.3010μs 11.5527μs 86.5600 KOps/s
test_step_mdp_speed[True-True-False-True-True] 72.5710μs 37.0481μs 26.9920 KOps/s
test_step_mdp_speed[True-True-False-True-False] 50.4500μs 21.7859μs 45.9013 KOps/s
test_step_mdp_speed[True-True-False-False-True] 49.2210μs 21.8756μs 45.7130 KOps/s
test_step_mdp_speed[True-True-False-False-False] 45.0010μs 13.3924μs 74.6690 KOps/s
test_step_mdp_speed[True-False-True-True-True] 81.8310μs 38.8043μs 25.7704 KOps/s
test_step_mdp_speed[True-False-True-True-False] 58.9710μs 23.6121μs 42.3511 KOps/s
test_step_mdp_speed[True-False-True-False-True] 49.9100μs 21.5844μs 46.3299 KOps/s
test_step_mdp_speed[True-False-True-False-False] 31.1310μs 13.3497μs 74.9078 KOps/s
test_step_mdp_speed[True-False-False-True-True] 71.0710μs 40.6660μs 24.5906 KOps/s
test_step_mdp_speed[True-False-False-True-False] 51.9310μs 25.5938μs 39.0720 KOps/s
test_step_mdp_speed[True-False-False-False-True] 57.6710μs 23.5141μs 42.5277 KOps/s
test_step_mdp_speed[True-False-False-False-False] 37.9500μs 15.2654μs 65.5077 KOps/s
test_step_mdp_speed[False-True-True-True-True] 67.4310μs 39.0696μs 25.5953 KOps/s
test_step_mdp_speed[False-True-True-True-False] 52.5710μs 24.0951μs 41.5023 KOps/s
test_step_mdp_speed[False-True-True-False-True] 49.9610μs 24.6043μs 40.6433 KOps/s
test_step_mdp_speed[False-True-True-False-False] 36.7900μs 14.8867μs 67.1740 KOps/s
test_step_mdp_speed[False-True-False-True-True] 66.7110μs 40.5027μs 24.6897 KOps/s
test_step_mdp_speed[False-True-False-True-False] 52.8710μs 25.6991μs 38.9118 KOps/s
test_step_mdp_speed[False-True-False-False-True] 3.6094ms 26.3423μs 37.9617 KOps/s
test_step_mdp_speed[False-True-False-False-False] 44.9710μs 16.7224μs 59.8002 KOps/s
test_step_mdp_speed[False-False-True-True-True] 79.0710μs 42.8086μs 23.3598 KOps/s
test_step_mdp_speed[False-False-True-True-False] 62.7110μs 27.7989μs 35.9727 KOps/s
test_step_mdp_speed[False-False-True-False-True] 57.3500μs 26.3660μs 37.9276 KOps/s
test_step_mdp_speed[False-False-True-False-False] 56.4410μs 16.6910μs 59.9124 KOps/s
test_step_mdp_speed[False-False-False-True-True] 0.1299ms 44.0417μs 22.7058 KOps/s
test_step_mdp_speed[False-False-False-True-False] 71.2910μs 29.5897μs 33.7955 KOps/s
test_step_mdp_speed[False-False-False-False-True] 56.2910μs 28.1413μs 35.5350 KOps/s
test_step_mdp_speed[False-False-False-False-False] 49.5510μs 18.5578μs 53.8856 KOps/s
test_values[generalized_advantage_estimate-True-True] 24.3434ms 23.5295ms 42.4998 Ops/s
test_values[vec_generalized_advantage_estimate-True-True] 0.1079s 3.0287ms 330.1753 Ops/s
test_values[td0_return_estimate-False-False] 88.6510μs 63.5187μs 15.7434 KOps/s
test_values[td1_return_estimate-False-False] 54.0826ms 52.9985ms 18.8685 Ops/s
test_values[vec_td1_return_estimate-False-False] 1.3846ms 1.0482ms 954.0059 Ops/s
test_values[td_lambda_return_estimate-True-False] 83.6739ms 83.2780ms 12.0080 Ops/s
test_values[vec_td_lambda_return_estimate-True-False] 1.4102ms 1.0473ms 954.8564 Ops/s
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.9158ms 23.8128ms 41.9942 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.9668ms 0.7214ms 1.3862 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 1.0267ms 0.6564ms 1.5234 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.7820ms 1.4432ms 692.9159 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7561ms 0.6530ms 1.5313 KOps/s
test_dqn_speed[False-None] 7.5625ms 1.2943ms 772.5891 Ops/s
test_dqn_speed[False-backward] 1.8353ms 1.7839ms 560.5714 Ops/s
test_dqn_speed[True-None] 0.8540ms 0.5552ms 1.8012 KOps/s
test_dqn_speed[True-backward] 1.0851ms 1.0018ms 998.1749 Ops/s
test_dqn_speed[reduce-overhead-None] 0.6807ms 0.5519ms 1.8120 KOps/s
test_dqn_speed[reduce-overhead-backward] 1.0375ms 0.9849ms 1.0153 KOps/s
test_ddpg_speed[False-None] 3.3386ms 2.6655ms 375.1668 Ops/s
test_ddpg_speed[False-backward] 4.1169ms 3.8716ms 258.2922 Ops/s
test_ddpg_speed[True-None] 1.5653ms 1.2506ms 799.6165 Ops/s
test_ddpg_speed[True-backward] 2.2482ms 2.1974ms 455.0819 Ops/s
test_ddpg_speed[reduce-overhead-None] 1.5523ms 1.2490ms 800.6490 Ops/s
test_ddpg_speed[reduce-overhead-backward] 2.2526ms 2.1929ms 456.0207 Ops/s
test_sac_speed[False-None] 8.2897ms 7.3462ms 136.1252 Ops/s
test_sac_speed[False-backward] 10.9939ms 10.4400ms 95.7854 Ops/s
test_sac_speed[True-None] 2.3646ms 2.0202ms 495.0048 Ops/s
test_sac_speed[True-backward] 4.2720ms 3.9192ms 255.1534 Ops/s
test_sac_speed[reduce-overhead-None] 2.3655ms 2.0443ms 489.1588 Ops/s
test_sac_speed[reduce-overhead-backward] 3.9834ms 3.9108ms 255.7041 Ops/s
test_redq_speed[False-None] 11.6253ms 9.9559ms 100.4430 Ops/s
test_redq_speed[False-backward] 18.2884ms 17.4260ms 57.3857 Ops/s
test_redq_speed[True-None] 3.9185ms 3.4918ms 286.3825 Ops/s
test_redq_speed[True-backward] 8.6862ms 8.3039ms 120.4261 Ops/s
test_redq_speed[reduce-overhead-None] 3.7116ms 3.4316ms 291.4078 Ops/s
test_redq_speed[reduce-overhead-backward] 8.6628ms 8.3306ms 120.0389 Ops/s
test_redq_deprec_speed[False-None] 11.7026ms 10.1543ms 98.4804 Ops/s
test_redq_deprec_speed[False-backward] 15.4948ms 14.6778ms 68.1302 Ops/s
test_redq_deprec_speed[True-None] 3.5557ms 3.1468ms 317.7839 Ops/s
test_redq_deprec_speed[True-backward] 6.9701ms 6.7235ms 148.7315 Ops/s
test_redq_deprec_speed[reduce-overhead-None] 3.2942ms 3.1273ms 319.7690 Ops/s
test_redq_deprec_speed[reduce-overhead-backward] 7.1048ms 6.6813ms 149.6723 Ops/s
test_td3_speed[False-None] 7.5337ms 7.3273ms 136.4751 Ops/s
test_td3_speed[False-backward] 10.3968ms 10.1699ms 98.3297 Ops/s
test_td3_speed[True-None] 2.1790ms 2.0671ms 483.7783 Ops/s
test_td3_speed[True-backward] 3.9232ms 3.8387ms 260.5018 Ops/s
test_td3_speed[reduce-overhead-None] 2.0901ms 2.0495ms 487.9153 Ops/s
test_td3_speed[reduce-overhead-backward] 4.0648ms 3.8458ms 260.0268 Ops/s
test_cql_speed[False-None] 27.8563ms 24.6052ms 40.6418 Ops/s
test_cql_speed[False-backward] 36.5799ms 32.8448ms 30.4462 Ops/s
test_cql_speed[True-None] 11.6509ms 10.8262ms 92.3685 Ops/s
test_cql_speed[True-backward] 16.7880ms 16.4828ms 60.6692 Ops/s
test_cql_speed[reduce-overhead-None] 11.1745ms 10.8732ms 91.9690 Ops/s
test_cql_speed[reduce-overhead-backward] 16.6944ms 16.4450ms 60.8086 Ops/s
test_a2c_speed[False-None] 7.5038ms 5.2312ms 191.1604 Ops/s
test_a2c_speed[False-backward] 11.8271ms 11.4600ms 87.2597 Ops/s
test_a2c_speed[True-None] 3.2562ms 3.0749ms 325.2180 Ops/s
test_a2c_speed[True-backward] 8.9322ms 8.5151ms 117.4381 Ops/s
test_a2c_speed[reduce-overhead-None] 3.3491ms 3.0748ms 325.2217 Ops/s
test_a2c_speed[reduce-overhead-backward] 8.8716ms 8.5355ms 117.1582 Ops/s
test_ppo_speed[False-None] 5.7899ms 5.5240ms 181.0283 Ops/s
test_ppo_speed[False-backward] 12.2771ms 11.8677ms 84.2623 Ops/s
test_ppo_speed[True-None] 3.8295ms 3.4447ms 290.3030 Ops/s
test_ppo_speed[True-backward] 8.4686ms 8.1857ms 122.1649 Ops/s
test_ppo_speed[reduce-overhead-None] 3.7645ms 3.4494ms 289.9062 Ops/s
test_ppo_speed[reduce-overhead-backward] 8.5683ms 8.3284ms 120.0706 Ops/s
test_reinforce_speed[False-None] 6.2605ms 4.4072ms 226.9040 Ops/s
test_reinforce_speed[False-backward] 7.5426ms 7.1135ms 140.5786 Ops/s
test_reinforce_speed[True-None] 2.3610ms 2.2188ms 450.7036 Ops/s
test_reinforce_speed[True-backward] 7.7044ms 7.1484ms 139.8917 Ops/s
test_reinforce_speed[reduce-overhead-None] 2.6193ms 2.2066ms 453.1813 Ops/s
test_reinforce_speed[reduce-overhead-backward] 7.5301ms 7.1295ms 140.2623 Ops/s
test_iql_speed[False-None] 19.9566ms 19.3541ms 51.6686 Ops/s
test_iql_speed[False-backward] 30.3829ms 29.5628ms 33.8263 Ops/s
test_iql_speed[True-None] 8.4571ms 7.8025ms 128.1633 Ops/s
test_iql_speed[True-backward] 16.6596ms 16.2742ms 61.4468 Ops/s
test_iql_speed[reduce-overhead-None] 8.1236ms 7.8476ms 127.4268 Ops/s
test_iql_speed[reduce-overhead-backward] 16.7204ms 16.3163ms 61.2884 Ops/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.6604ms 6.4902ms 154.0790 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6184ms 0.3334ms 2.9993 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5497ms 0.3181ms 3.1433 KOps/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.5965ms 6.3560ms 157.3328 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.2307ms 0.3287ms 3.0423 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5391ms 0.3126ms 3.1985 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6639ms 1.3710ms 729.3841 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6016ms 1.2498ms 800.1319 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.6917ms 6.5894ms 151.7595 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7955ms 0.4723ms 2.1175 KOps/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7471ms 0.4525ms 2.2099 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.6727ms 6.4932ms 154.0081 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.0518ms 0.3131ms 3.1943 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 1.0824ms 0.3024ms 3.3067 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.6769ms 6.3628ms 157.1630 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0421ms 0.2412ms 4.1451 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4113ms 0.2171ms 4.6059 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.7647ms 6.6396ms 150.6125 Ops/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0417ms 0.3769ms 2.6530 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.5652ms 0.3628ms 2.7560 KOps/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.8082ms 5.3078ms 188.4008 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 7.7570ms 2.1728ms 460.2297 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.6126ms 1.2639ms 791.2201 Ops/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.4067s 13.3256ms 75.0433 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.7051ms 1.4775ms 676.8267 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 6.8976ms 1.2368ms 808.5363 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.1861ms 5.4950ms 181.9824 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 10.5358ms 2.1697ms 460.8983 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.5882ms 1.3027ms 767.6530 Ops/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants