-
Notifications
You must be signed in to change notification settings - Fork 434
[Performance] Lazy stack optimization for collector-to-buffer writes #3438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3438
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 1 Unrelated FailureAs of commit cddbc62 with merge base eb7a1e4 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 82.5609μs | 81.6652μs | 12.2451 KOps/s | 12.4090 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1408ms | 0.1401ms | 7.1369 KOps/s | 7.1825 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1342s | 0.1334s | 7.4977 Ops/s | 7.8426 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.8000μs | 2.7731μs | 360.6117 KOps/s | 371.2428 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 38.7528μs | 38.4387μs | 26.0154 KOps/s | 27.7441 KOps/s | |
| test_simple | 0.9237s | 0.8318s | 1.2021 Ops/s | 1.2047 Ops/s | |
| test_transformed | 1.5646s | 1.4810s | 0.6752 Ops/s | 0.6840 Ops/s | |
| test_serial | 2.4738s | 2.3834s | 0.4196 Ops/s | 0.4285 Ops/s | |
| test_parallel | 2.0306s | 1.9559s | 0.5113 Ops/s | 0.5170 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.3331ms | 45.5816μs | 21.9387 KOps/s | 22.0546 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 58.1320μs | 25.2313μs | 39.6334 KOps/s | 39.4378 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 83.2830μs | 25.0565μs | 39.9098 KOps/s | 39.0135 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 42.2320μs | 13.8860μs | 72.0149 KOps/s | 72.0465 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 92.5530μs | 47.5380μs | 21.0358 KOps/s | 21.0664 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 56.9920μs | 28.1629μs | 35.5077 KOps/s | 35.6896 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 60.4020μs | 27.8838μs | 35.8631 KOps/s | 35.6908 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 54.4120μs | 16.9506μs | 58.9949 KOps/s | 59.1610 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 0.1265ms | 50.8386μs | 19.6701 KOps/s | 19.2570 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 69.5120μs | 31.0308μs | 32.2261 KOps/s | 31.7400 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 57.1820μs | 27.6497μs | 36.1668 KOps/s | 35.1565 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 43.7920μs | 16.7693μs | 59.6327 KOps/s | 58.9501 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 88.3730μs | 53.4010μs | 18.7263 KOps/s | 19.0123 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 73.1320μs | 33.6426μs | 29.7242 KOps/s | 29.7735 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 61.0620μs | 30.5216μs | 32.7637 KOps/s | 32.9287 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 59.2420μs | 19.5488μs | 51.1539 KOps/s | 50.6580 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 99.9430μs | 51.2706μs | 19.5044 KOps/s | 20.0951 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 69.6020μs | 30.3911μs | 32.9044 KOps/s | 32.3169 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 65.7520μs | 31.6289μs | 31.6167 KOps/s | 31.2419 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 49.8620μs | 18.6783μs | 53.5381 KOps/s | 54.4276 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 2.6119ms | 54.3592μs | 18.3962 KOps/s | 18.6500 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 66.4620μs | 33.6886μs | 29.6836 KOps/s | 29.4274 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 70.7630μs | 34.5816μs | 28.9171 KOps/s | 29.2924 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 49.6410μs | 21.1577μs | 47.2641 KOps/s | 47.4871 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 0.1019ms | 55.5093μs | 18.0150 KOps/s | 17.8621 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 77.4520μs | 36.9224μs | 27.0838 KOps/s | 27.3222 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 73.8130μs | 34.7936μs | 28.7409 KOps/s | 28.9086 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 57.6320μs | 21.1730μs | 47.2299 KOps/s | 47.5436 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 0.1059ms | 57.6433μs | 17.3481 KOps/s | 17.0978 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 74.9530μs | 39.0653μs | 25.5981 KOps/s | 25.2592 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 78.3130μs | 37.1615μs | 26.9096 KOps/s | 27.0238 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 64.7320μs | 23.7667μs | 42.0758 KOps/s | 41.5881 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.8796s | 0.7772s | 1.2867 Ops/s | 1.2711 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7311s | 0.6382s | 1.5668 Ops/s | 1.5383 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7979s | 1.7164s | 0.5826 Ops/s | 0.5817 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.5504s | 1.4800s | 0.6757 Ops/s | 0.6745 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 2.0422s | 1.9726s | 0.5069 Ops/s | 0.5079 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.8130s | 1.7396s | 0.5748 Ops/s | 0.5734 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.7818s | 4.6887s | 0.2133 Ops/s | 0.2124 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.6534s | 4.5146s | 0.2215 Ops/s | 0.2233 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 2.0375s | 1.9781s | 0.5055 Ops/s | 0.5060 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.8762s | 1.7351s | 0.5763 Ops/s | 0.5890 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 22.7895ms | 21.8448ms | 45.7774 Ops/s | 47.6312 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 0.1390s | 3.7191ms | 268.8820 Ops/s | 262.2745 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.1107ms | 87.6554μs | 11.4083 KOps/s | 11.6354 KOps/s | |
| test_values[td1_return_estimate-False-False] | 52.8873ms | 51.7450ms | 19.3255 Ops/s | 20.1711 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 1.3778ms | 1.1125ms | 898.8859 Ops/s | 910.6630 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 85.9720ms | 84.2263ms | 11.8728 Ops/s | 12.3443 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 1.3511ms | 1.1066ms | 903.6336 Ops/s | 913.2117 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 22.7706ms | 21.9534ms | 45.5510 Ops/s | 48.1032 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.0608ms | 0.7841ms | 1.2754 KOps/s | 1.3065 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7443ms | 0.7002ms | 1.4282 KOps/s | 1.4583 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.6190ms | 1.5263ms | 655.2005 Ops/s | 665.3428 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.8304ms | 0.7190ms | 1.3908 KOps/s | 1.4139 KOps/s | |
| test_dqn_speed[False-None] | 1.7035ms | 1.5697ms | 637.0633 Ops/s | 633.9639 Ops/s | |
| test_dqn_speed[False-backward] | 2.3261ms | 2.2199ms | 450.4721 Ops/s | 455.9845 Ops/s | |
| test_dqn_speed[True-None] | 0.7117ms | 0.5844ms | 1.7110 KOps/s | 1.8217 KOps/s | |
| test_dqn_speed[True-backward] | 1.2981ms | 1.2216ms | 818.6209 Ops/s | 845.9448 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.6559ms | 0.5780ms | 1.7300 KOps/s | 1.6798 KOps/s | |
| test_ddpg_speed[False-None] | 3.5782ms | 2.9772ms | 335.8840 Ops/s | 340.4654 Ops/s | |
| test_ddpg_speed[False-backward] | 4.6363ms | 4.4225ms | 226.1141 Ops/s | 229.6927 Ops/s | |
| test_ddpg_speed[True-None] | 1.4449ms | 1.3227ms | 756.0499 Ops/s | 774.5700 Ops/s | |
| test_ddpg_speed[True-backward] | 2.6812ms | 2.5770ms | 388.0521 Ops/s | 400.4700 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.6230ms | 1.3495ms | 741.0409 Ops/s | 755.2130 Ops/s | |
| test_sac_speed[False-None] | 9.1066ms | 8.6344ms | 115.8162 Ops/s | 118.2568 Ops/s | |
| test_sac_speed[False-backward] | 12.4077ms | 11.9023ms | 84.0172 Ops/s | 84.5687 Ops/s | |
| test_sac_speed[True-None] | 2.1625ms | 1.8358ms | 544.7129 Ops/s | 556.1260 Ops/s | |
| test_sac_speed[True-backward] | 3.7751ms | 3.6478ms | 274.1407 Ops/s | 279.0405 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 18.7499ms | 10.4789ms | 95.4296 Ops/s | 93.8490 Ops/s | |
| test_redq_deprec_speed[False-None] | 10.4665ms | 9.5536ms | 104.6724 Ops/s | 106.0408 Ops/s | |
| test_redq_deprec_speed[False-backward] | 13.4714ms | 13.0305ms | 76.7430 Ops/s | 79.3033 Ops/s | |
| test_redq_deprec_speed[True-None] | 2.8188ms | 2.5646ms | 389.9239 Ops/s | 401.5192 Ops/s | |
| test_redq_deprec_speed[True-backward] | 4.8273ms | 4.3912ms | 227.7275 Ops/s | 233.2272 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 15.2756ms | 9.5027ms | 105.2332 Ops/s | 88.6407 Ops/s | |
| test_td3_speed[False-None] | 8.6424ms | 8.4083ms | 118.9298 Ops/s | 119.8573 Ops/s | |
| test_td3_speed[False-backward] | 11.5241ms | 11.0030ms | 90.8844 Ops/s | 90.4454 Ops/s | |
| test_td3_speed[True-None] | 1.6923ms | 1.6496ms | 606.1989 Ops/s | 624.7728 Ops/s | |
| test_td3_speed[True-backward] | 3.3637ms | 3.2734ms | 305.4932 Ops/s | 320.6952 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 65.2657ms | 23.2247ms | 43.0575 Ops/s | 42.2562 Ops/s | |
| test_cql_speed[False-None] | 18.3718ms | 17.7461ms | 56.3505 Ops/s | 56.9199 Ops/s | |
| test_cql_speed[False-backward] | 24.0958ms | 23.6379ms | 42.3050 Ops/s | 43.7598 Ops/s | |
| test_cql_speed[True-None] | 3.6122ms | 3.3120ms | 301.9365 Ops/s | 310.7384 Ops/s | |
| test_cql_speed[True-backward] | 5.9664ms | 5.5915ms | 178.8435 Ops/s | 188.5120 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 18.4533ms | 11.6673ms | 85.7095 Ops/s | 85.8677 Ops/s | |
| test_a2c_speed[False-None] | 4.3361ms | 3.3440ms | 299.0387 Ops/s | 304.3224 Ops/s | |
| test_a2c_speed[False-backward] | 6.7295ms | 6.5143ms | 153.5077 Ops/s | 161.2003 Ops/s | |
| test_a2c_speed[True-None] | 1.3966ms | 1.3337ms | 749.7931 Ops/s | 754.7561 Ops/s | |
| test_a2c_speed[True-backward] | 3.1769ms | 3.1245ms | 320.0477 Ops/s | 326.5576 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 1.0464ms | 0.9674ms | 1.0337 KOps/s | 1.0352 KOps/s | |
| test_ppo_speed[False-None] | 4.0513ms | 3.9196ms | 255.1273 Ops/s | 256.8064 Ops/s | |
| test_ppo_speed[False-backward] | 7.7365ms | 7.3162ms | 136.6832 Ops/s | 138.3801 Ops/s | |
| test_ppo_speed[True-None] | 1.5489ms | 1.4287ms | 699.9293 Ops/s | 714.9976 Ops/s | |
| test_ppo_speed[True-backward] | 3.4186ms | 3.2824ms | 304.6526 Ops/s | 307.2243 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 1.1298ms | 1.0419ms | 959.7812 Ops/s | 953.2886 Ops/s | |
| test_reinforce_speed[False-None] | 2.4223ms | 2.3157ms | 431.8266 Ops/s | 431.1167 Ops/s | |
| test_reinforce_speed[False-backward] | 3.9281ms | 3.4967ms | 285.9821 Ops/s | 291.0403 Ops/s | |
| test_reinforce_speed[True-None] | 1.4461ms | 1.2675ms | 788.9494 Ops/s | 806.1003 Ops/s | |
| test_reinforce_speed[True-backward] | 3.2751ms | 3.1542ms | 317.0393 Ops/s | 325.9236 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 16.3020ms | 9.0839ms | 110.0854 Ops/s | 98.0720 Ops/s | |
| test_iql_speed[False-None] | 10.2912ms | 9.6928ms | 103.1690 Ops/s | 103.2568 Ops/s | |
| test_iql_speed[False-backward] | 14.5589ms | 13.7718ms | 72.6120 Ops/s | 72.0262 Ops/s | |
| test_iql_speed[True-None] | 2.3327ms | 2.1866ms | 457.3387 Ops/s | 464.8715 Ops/s | |
| test_iql_speed[True-backward] | 4.9754ms | 4.8807ms | 204.8868 Ops/s | 207.5553 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 16.9964ms | 10.0065ms | 99.9354 Ops/s | 97.8589 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.6030ms | 6.0613ms | 164.9821 Ops/s | 162.3710 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.0190ms | 0.3189ms | 3.1358 KOps/s | 2.6525 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6154ms | 0.2951ms | 3.3883 KOps/s | 3.3316 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.1533ms | 5.8160ms | 171.9394 Ops/s | 166.9571 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.1444ms | 0.3369ms | 2.9685 KOps/s | 2.8707 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6471ms | 0.2999ms | 3.3343 KOps/s | 3.3154 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.7317ms | 1.3815ms | 723.8446 Ops/s | 687.2524 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.5111ms | 1.3090ms | 763.9331 Ops/s | 724.6747 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.0502ms | 5.9231ms | 168.8312 Ops/s | 162.1238 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8187ms | 0.4892ms | 2.0442 KOps/s | 2.1508 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8681ms | 0.4678ms | 2.1378 KOps/s | 2.3010 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.0620ms | 5.8291ms | 171.5543 Ops/s | 165.9214 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.1316ms | 0.2898ms | 3.4508 KOps/s | 2.6249 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.4811ms | 0.2727ms | 3.6670 KOps/s | 2.7693 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.1624ms | 5.7756ms | 173.1428 Ops/s | 166.5402 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 2.1346ms | 0.3776ms | 2.6481 KOps/s | 2.7006 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5112ms | 0.3596ms | 2.7808 KOps/s | 2.8572 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.1003ms | 5.9746ms | 167.3758 Ops/s | 163.5469 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 2.1695ms | 0.5455ms | 1.8331 KOps/s | 1.9647 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7562ms | 0.5295ms | 1.8887 KOps/s | 2.1633 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 6.6225ms | 5.1635ms | 193.6688 Ops/s | 48.4662 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 8.9703ms | 2.2595ms | 442.5797 Ops/s | 529.6709 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 10.1627ms | 1.3319ms | 750.8159 Ops/s | 1.0432 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.5848s | 16.7784ms | 59.6004 Ops/s | 192.4404 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 4.0202ms | 1.8689ms | 535.0668 Ops/s | 531.2042 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 9.0428ms | 1.2643ms | 790.9388 Ops/s | 782.7137 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 6.8963ms | 5.3996ms | 185.1995 Ops/s | 186.3062 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 11.6834ms | 2.2207ms | 450.3108 Ops/s | 498.6093 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 3.9924ms | 1.1726ms | 852.8300 Ops/s | 849.4931 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 39.5790ms | 36.8974ms | 27.1022 Ops/s | 26.8297 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 20.0624ms | 18.5696ms | 53.8515 Ops/s | 53.0097 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 41.6254ms | 38.1984ms | 26.1791 Ops/s | 25.9251 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 20.6609ms | 18.9919ms | 52.6541 Ops/s | 52.5124 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 40.9988ms | 39.6181ms | 25.2410 Ops/s | 25.0289 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 21.9697ms | 20.2432ms | 49.3994 Ops/s | 49.0797 Ops/s |
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 83.1249μs | 81.8303μs | 12.2204 KOps/s | 12.3131 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1424ms | 0.1416ms | 7.0597 KOps/s | 7.1729 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1260s | 0.1255s | 7.9676 Ops/s | 8.5860 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.7608μs | 2.7527μs | 363.2762 KOps/s | 376.5348 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 40.0430μs | 39.4921μs | 25.3215 KOps/s | 25.8995 KOps/s | |
| test_simple | 0.5665s | 0.5649s | 1.7702 Ops/s | 1.7107 Ops/s | |
| test_transformed | 1.2816s | 1.1852s | 0.8438 Ops/s | 0.8447 Ops/s | |
| test_serial | 1.7071s | 1.7023s | 0.5874 Ops/s | 0.5797 Ops/s | |
| test_parallel | 1.2518s | 1.1446s | 0.8737 Ops/s | 0.8787 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.3135ms | 45.5695μs | 21.9445 KOps/s | 22.3578 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 50.1800μs | 25.6568μs | 38.9760 KOps/s | 38.6807 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 52.4010μs | 25.5128μs | 39.1960 KOps/s | 39.2066 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 40.0610μs | 13.8772μs | 72.0606 KOps/s | 71.0243 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 82.9310μs | 48.7441μs | 20.5153 KOps/s | 20.8427 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 52.0010μs | 28.3848μs | 35.2301 KOps/s | 36.0186 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 54.5010μs | 28.4400μs | 35.1618 KOps/s | 35.3258 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 47.6910μs | 16.6906μs | 59.9138 KOps/s | 58.9862 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 81.9110μs | 51.1710μs | 19.5423 KOps/s | 19.5781 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 65.1620μs | 30.5569μs | 32.7258 KOps/s | 32.4217 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 58.0210μs | 28.1203μs | 35.5615 KOps/s | 36.2835 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 41.0910μs | 16.5066μs | 60.5817 KOps/s | 59.4238 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 83.6110μs | 53.2307μs | 18.7862 KOps/s | 18.4301 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 61.8310μs | 33.2951μs | 30.0344 KOps/s | 29.6171 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 56.7810μs | 30.9146μs | 32.3472 KOps/s | 31.9944 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 44.1710μs | 19.3205μs | 51.7584 KOps/s | 50.2879 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 93.2720μs | 50.7789μs | 19.6932 KOps/s | 19.4527 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 55.4010μs | 30.9580μs | 32.3018 KOps/s | 31.8329 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.3809ms | 32.9563μs | 30.3432 KOps/s | 30.4945 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 46.9610μs | 18.6236μs | 53.6953 KOps/s | 52.6439 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 88.0020μs | 53.9715μs | 18.5283 KOps/s | 18.2946 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 68.4310μs | 33.7879μs | 29.5964 KOps/s | 28.9921 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 74.2620μs | 35.2407μs | 28.3763 KOps/s | 28.6374 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 78.1510μs | 20.7062μs | 48.2947 KOps/s | 46.5609 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 98.2820μs | 56.6191μs | 17.6619 KOps/s | 17.5347 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 58.1710μs | 36.3056μs | 27.5439 KOps/s | 26.6291 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 65.6820μs | 34.8109μs | 28.7266 KOps/s | 28.7120 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 48.2710μs | 21.3093μs | 46.9279 KOps/s | 46.4138 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 87.7420μs | 58.7828μs | 17.0118 KOps/s | 16.6080 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 63.1310μs | 38.7992μs | 25.7738 KOps/s | 25.4178 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 69.1110μs | 37.3482μs | 26.7750 KOps/s | 27.5911 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 73.1610μs | 23.7754μs | 42.0603 KOps/s | 42.0990 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.7618s | 0.7580s | 1.3193 Ops/s | 1.2575 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7406s | 0.6418s | 1.5581 Ops/s | 1.5399 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7752s | 1.6959s | 0.5897 Ops/s | 0.5853 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.5447s | 1.4693s | 0.6806 Ops/s | 0.6730 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 2.0303s | 1.9466s | 0.5137 Ops/s | 0.5090 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.8082s | 1.7258s | 0.5794 Ops/s | 0.5742 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.8335s | 4.7478s | 0.2106 Ops/s | 0.2135 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.6224s | 4.5258s | 0.2210 Ops/s | 0.2194 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 2.1283s | 1.9709s | 0.5074 Ops/s | 0.5060 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.7510s | 1.6717s | 0.5982 Ops/s | 0.5893 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 11.8817ms | 10.8466ms | 92.1951 Ops/s | 94.3337 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 12.7238ms | 11.1318ms | 89.8331 Ops/s | 56.2763 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.2421ms | 0.1406ms | 7.1119 KOps/s | 7.9330 KOps/s | |
| test_values[td1_return_estimate-False-False] | 30.3634ms | 29.4540ms | 33.9513 Ops/s | 33.8013 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 11.3937ms | 11.1592ms | 89.6120 Ops/s | 56.3942 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 45.5190ms | 43.7966ms | 22.8328 Ops/s | 22.9114 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 12.2307ms | 11.1662ms | 89.5557 Ops/s | 56.0432 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 9.6741ms | 9.5319ms | 104.9108 Ops/s | 106.1474 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.7121ms | 1.5001ms | 666.6059 Ops/s | 659.2101 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.5178ms | 0.4252ms | 2.3520 KOps/s | 2.3061 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 19.4366ms | 18.7902ms | 53.2192 Ops/s | 28.7191 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 2.1422ms | 1.7266ms | 579.1851 Ops/s | 584.0525 Ops/s | |
| test_dqn_speed[False-None] | 1.5183ms | 1.4141ms | 707.1437 Ops/s | 695.5854 Ops/s | |
| test_dqn_speed[False-backward] | 1.9760ms | 1.9362ms | 516.4777 Ops/s | 510.3646 Ops/s | |
| test_dqn_speed[True-None] | 0.7830ms | 0.5414ms | 1.8471 KOps/s | 1.8293 KOps/s | |
| test_dqn_speed[True-backward] | 1.0288ms | 0.9982ms | 1.0019 KOps/s | 850.0212 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.9418ms | 0.5271ms | 1.8971 KOps/s | 1.8300 KOps/s | |
| test_ddpg_speed[False-None] | 4.3221ms | 2.9643ms | 337.3507 Ops/s | 343.7276 Ops/s | |
| test_ddpg_speed[False-backward] | 4.5009ms | 4.1862ms | 238.8783 Ops/s | 240.1774 Ops/s | |
| test_ddpg_speed[True-None] | 1.6830ms | 1.4119ms | 708.2638 Ops/s | 697.3215 Ops/s | |
| test_ddpg_speed[True-backward] | 2.5150ms | 2.3931ms | 417.8633 Ops/s | 376.8703 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.4554ms | 1.3892ms | 719.8625 Ops/s | 698.1169 Ops/s | |
| test_sac_speed[False-None] | 8.8154ms | 8.1801ms | 122.2479 Ops/s | 122.1722 Ops/s | |
| test_sac_speed[False-backward] | 12.1178ms | 11.4394ms | 87.4171 Ops/s | 86.1405 Ops/s | |
| test_sac_speed[True-None] | 2.5504ms | 2.1354ms | 468.2932 Ops/s | 460.5500 Ops/s | |
| test_sac_speed[True-backward] | 4.5836ms | 4.1246ms | 242.4461 Ops/s | 244.4614 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 2.4701ms | 2.1311ms | 469.2432 Ops/s | 446.4885 Ops/s | |
| test_redq_speed[False-None] | 10.9619ms | 10.4670ms | 95.5381 Ops/s | 93.3354 Ops/s | |
| test_redq_speed[False-backward] | 18.5157ms | 17.8791ms | 55.9311 Ops/s | 56.0587 Ops/s | |
| test_redq_speed[True-None] | 4.7313ms | 4.4026ms | 227.1373 Ops/s | 229.7392 Ops/s | |
| test_redq_speed[True-backward] | 10.0385ms | 9.7333ms | 102.7405 Ops/s | 94.5727 Ops/s | |
| test_redq_speed[reduce-overhead-None] | 4.5754ms | 4.3225ms | 231.3452 Ops/s | 220.1939 Ops/s | |
| test_redq_deprec_speed[False-None] | 11.6039ms | 11.0649ms | 90.3762 Ops/s | 89.4540 Ops/s | |
| test_redq_deprec_speed[False-backward] | 16.2611ms | 15.7557ms | 63.4691 Ops/s | 62.3081 Ops/s | |
| test_redq_deprec_speed[True-None] | 4.4014ms | 3.6688ms | 272.5655 Ops/s | 267.7504 Ops/s | |
| test_redq_deprec_speed[True-backward] | 7.8198ms | 7.5692ms | 132.1147 Ops/s | 132.7005 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 3.9961ms | 3.6100ms | 277.0081 Ops/s | 279.3934 Ops/s | |
| test_td3_speed[False-None] | 8.3187ms | 8.1529ms | 122.6561 Ops/s | 122.4674 Ops/s | |
| test_td3_speed[False-backward] | 11.7888ms | 11.0677ms | 90.3533 Ops/s | 90.4446 Ops/s | |
| test_td3_speed[True-None] | 1.8418ms | 1.8180ms | 550.0530 Ops/s | 548.3735 Ops/s | |
| test_td3_speed[True-backward] | 4.0380ms | 3.6618ms | 273.0885 Ops/s | 252.2537 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 1.8489ms | 1.7900ms | 558.6661 Ops/s | 554.0323 Ops/s | |
| test_cql_speed[False-None] | 29.4037ms | 26.4670ms | 37.7829 Ops/s | 38.0585 Ops/s | |
| test_cql_speed[False-backward] | 40.4521ms | 36.1431ms | 27.6678 Ops/s | 27.8404 Ops/s | |
| test_cql_speed[True-None] | 12.6864ms | 12.4144ms | 80.5513 Ops/s | 76.2404 Ops/s | |
| test_cql_speed[True-backward] | 18.8409ms | 18.3689ms | 54.4398 Ops/s | 54.6072 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 12.7698ms | 12.4139ms | 80.5551 Ops/s | 79.2421 Ops/s | |
| test_a2c_speed[False-None] | 5.5748ms | 5.3549ms | 186.7445 Ops/s | 182.5535 Ops/s | |
| test_a2c_speed[False-backward] | 12.0977ms | 11.7408ms | 85.1730 Ops/s | 84.1387 Ops/s | |
| test_a2c_speed[True-None] | 4.1303ms | 3.7458ms | 266.9683 Ops/s | 260.7267 Ops/s | |
| test_a2c_speed[True-backward] | 8.9998ms | 8.5783ms | 116.5727 Ops/s | 116.0092 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 4.0865ms | 3.6976ms | 270.4430 Ops/s | 267.0233 Ops/s | |
| test_ppo_speed[False-None] | 6.3077ms | 5.9525ms | 167.9980 Ops/s | 165.1361 Ops/s | |
| test_ppo_speed[False-backward] | 13.2117ms | 12.6313ms | 79.1681 Ops/s | 78.5366 Ops/s | |
| test_ppo_speed[True-None] | 3.8158ms | 3.6237ms | 275.9637 Ops/s | 272.2137 Ops/s | |
| test_ppo_speed[True-backward] | 8.7543ms | 8.3835ms | 119.2813 Ops/s | 117.1816 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 3.8027ms | 3.6233ms | 275.9907 Ops/s | 274.0391 Ops/s | |
| test_reinforce_speed[False-None] | 4.9745ms | 4.6246ms | 216.2353 Ops/s | 214.7035 Ops/s | |
| test_reinforce_speed[False-backward] | 7.7310ms | 7.4260ms | 134.6620 Ops/s | 135.1591 Ops/s | |
| test_reinforce_speed[True-None] | 3.0943ms | 2.8731ms | 348.0514 Ops/s | 334.5105 Ops/s | |
| test_reinforce_speed[True-backward] | 8.2443ms | 7.7281ms | 129.3976 Ops/s | 116.8220 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 3.0146ms | 2.8582ms | 349.8695 Ops/s | 339.9249 Ops/s | |
| test_iql_speed[False-None] | 24.9990ms | 20.3817ms | 49.0637 Ops/s | 49.9612 Ops/s | |
| test_iql_speed[False-backward] | 35.7492ms | 30.8243ms | 32.4419 Ops/s | 32.7725 Ops/s | |
| test_iql_speed[True-None] | 8.7555ms | 8.5182ms | 117.3952 Ops/s | 111.1575 Ops/s | |
| test_iql_speed[True-backward] | 17.0398ms | 16.7205ms | 59.8068 Ops/s | 59.6576 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 8.8931ms | 8.6050ms | 116.2119 Ops/s | 113.4854 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.2626ms | 6.1170ms | 163.4800 Ops/s | 160.0243 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 2.1514ms | 0.3076ms | 3.2506 KOps/s | 3.4421 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5301ms | 0.2953ms | 3.3861 KOps/s | 2.9051 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.1947ms | 5.9531ms | 167.9806 Ops/s | 166.8636 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 2.3421ms | 0.3398ms | 2.9425 KOps/s | 2.9778 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5108ms | 0.3105ms | 3.2202 KOps/s | 2.9646 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.5089ms | 1.2922ms | 773.8705 Ops/s | 685.7287 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.5222ms | 1.2287ms | 813.8654 Ops/s | 724.0638 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 10.2244ms | 6.3318ms | 157.9318 Ops/s | 162.2306 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.1664ms | 0.5058ms | 1.9770 KOps/s | 1.9154 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7039ms | 0.4762ms | 2.0999 KOps/s | 1.9792 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.0671ms | 5.9860ms | 167.0576 Ops/s | 166.8224 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 2.3386ms | 0.3382ms | 2.9571 KOps/s | 3.5428 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.4736ms | 0.2654ms | 3.7673 KOps/s | 3.7376 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.1289ms | 5.9370ms | 168.4364 Ops/s | 168.7031 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 2.0518ms | 0.2973ms | 3.3636 KOps/s | 2.8816 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6059ms | 0.3094ms | 3.2321 KOps/s | 3.0186 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.5709ms | 6.1498ms | 162.6069 Ops/s | 163.0967 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.1564ms | 0.4664ms | 2.1443 KOps/s | 1.9899 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7145ms | 0.4799ms | 2.0836 KOps/s | 2.0099 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.6709s | 18.4775ms | 54.1200 Ops/s | 55.6767 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 11.3631ms | 2.0106ms | 497.3655 Ops/s | 521.5782 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 7.1116ms | 1.2219ms | 818.4024 Ops/s | 788.1009 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 6.4934ms | 5.1285ms | 194.9883 Ops/s | 192.4866 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 4.0254ms | 1.8000ms | 555.5658 Ops/s | 570.3471 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 2.9208ms | 1.1678ms | 856.3190 Ops/s | 1.1156 KOps/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.5233s | 15.7409ms | 63.5289 Ops/s | 56.2444 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 4.3081ms | 2.0418ms | 489.7536 Ops/s | 471.9836 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 12.1121ms | 1.5354ms | 651.3160 Ops/s | 951.4083 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 38.7000ms | 36.4806ms | 27.4118 Ops/s | 27.6035 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 20.1726ms | 18.5138ms | 54.0139 Ops/s | 54.4383 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 41.4421ms | 37.9187ms | 26.3722 Ops/s | 26.6670 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 20.9041ms | 19.1495ms | 52.2207 Ops/s | 52.9276 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 41.7349ms | 39.5742ms | 25.2690 Ops/s | 25.4669 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 22.4944ms | 20.6912ms | 48.3298 Ops/s | 47.5251 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.9244ms | 0.2268ms | 4.4090 KOps/s | 4.3519 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.5560ms | 1.3887ms | 720.1190 Ops/s | 710.1624 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.6261ms | 2.3293ms | 429.3095 Ops/s | 417.5578 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.0676ms | 2.9143ms | 343.1334 Ops/s | 341.1287 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.2150ms | 0.1400ms | 7.1444 KOps/s | 7.1779 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3590ms | 0.2006ms | 4.9842 KOps/s | 5.1440 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 1.9206ms | 1.7679ms | 565.6324 Ops/s | 588.9215 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.5159ms | 1.3199ms | 757.6285 Ops/s | 782.2691 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.2966ms | 1.1401ms | 877.1381 Ops/s | 877.4109 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 3.7205ms | 3.6168ms | 276.4860 Ops/s | 275.5326 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 10.8101ms | 5.8143ms | 171.9890 Ops/s | 173.1712 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.5840ms | 7.0208ms | 142.4333 Ops/s | 142.3719 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4733ms | 0.2753ms | 3.6327 KOps/s | 3.6063 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.6816ms | 1.4989ms | 667.1357 Ops/s | 657.2712 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.5618ms | 2.4193ms | 413.3453 Ops/s | 398.2725 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.3883ms | 3.1380ms | 318.6699 Ops/s | 318.1328 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 35.5111ms | 34.9737ms | 28.5929 Ops/s | 28.7912 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 69.5882ms | 68.6075ms | 14.5757 Ops/s | 14.4894 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 45.8965ms | 44.1824ms | 22.6335 Ops/s | 20.7459 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 99.3002ms | 98.7993ms | 10.1215 Ops/s | 10.3131 Ops/s |
Optimize data collection pipeline by using lazy stacks in collectors when a replay buffer is present, enabling single-write operations directly to storage instead of two separate write operations. Before: 1. Collector: torch.stack(tensordicts, out=_final_rollout) -> Write 1 2. Storage: storage[cursor] = data -> Write 2 After: 1. Collector: LazyStackedTensorDict.lazy_stack(tensordicts) -> No write 2. Storage: torch.stack(lazy.unbind(), out=storage[cursor]) -> Single write Changes: - TensorStorage.set() now detects LazyStackedTensorDict and uses torch.stack(..., out=) to write directly to storage - Collector.rollout() uses lazy_stack when replay buffer is present - Added tests for storage and collector integration - Added benchmarks to measure the improvement Co-authored-by: Cursor <cursoragent@cursor.com>
The torch.stack(..., out=) approach for TensorDict doesn't work correctly. Reverted to using the normal assignment path self._storage[cursor] = data which handles lazy stacks through TensorDict's __setitem__. Also simplified the test to verify data integrity more reliably. Co-authored-by: Cursor <cursoragent@cursor.com>
Instead of torch.stack(..., out=), iterate through the lazy stack's tensordicts and use update_() to write each directly to the corresponding storage location. This avoids creating an intermediate contiguous copy. The optimization only applies when stack_dim == 0 (the batch dimension), which is the common case for collector outputs. Co-authored-by: Cursor <cursoragent@cursor.com>
For slice indices, storage[slice] returns a view, so we can use _stack_onto_ to copy directly from the lazy stack's tensordicts. For non-contiguous tensor indices, we continue to iterate and update each element individually since storage[tensor] returns a copy. Co-authored-by: Cursor <cursoragent@cursor.com>
bacb6c5 to
b54982c
Compare
Extend the lazy stack optimization in TensorStorage.set() to handle any stack_dim, not just stack_dim=0. This is important for parallel environments where the storage is 2D [max_size, n_steps] and the lazy stack has stack_dim=1 (time dimension). Changes: - Use _stack_onto_ for slices with any stack_dim - For tensor indices with stack_dim>0, check if contiguous and convert to slice - Add tests for 2D storage with lazy stack (stack_dim=1) - Add collector integration test with parallel envs Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
torch.stackwithout=parameterBefore
torch.stack(tensordicts, out=_final_rollout)→ Write 1storage[cursor] = data→ Write 2After
LazyStackedTensorDict.lazy_stack(tensordicts)→ No write (lazy)torch.stack(lazy.unbind(), out=storage[cursor])→ Single writeChanges
storages.py): ModifiedTensorStorage.set()to detectLazyStackedTensorDictinput and usetorch.stack(..., out=storage[cursor])instead of assignment_single.py): ModifiedCollector.rollout()to use lazy stack when replay buffer is present withextend_buffer=Truetest_extend_lazystack_direct_writeandtest_collector_with_rb_uses_lazy_stacktest_single_with_rbandtest_single_with_rb_pixelsto compare performanceTest plan
test_extend_lazystack_direct_writeto verify storage optimization workstest_collector_with_rb_uses_lazy_stackto verify collector integrationMade with Cursor