Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Better handling of params and buffers in bytes #1059

Merged
merged 1 commit into from
Oct 24, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Oct 24, 2024

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 24, 2024
ghstack-source-id: 87945c47b376d223bb3dc33bd6ec7cb9bb047455
Pull Request resolved: #1059
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 24, 2024
@vmoens vmoens added the Refactor Refactoring code - not a new feature label Oct 24, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}16$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 50.9850μs 25.2778μs 39.5604 KOps/s 36.2971 KOps/s $\textbf{\color{#35bf28}+8.99\%}$
test_plain_set_stack_nested 54.2310μs 25.4077μs 39.3581 KOps/s 39.2841 KOps/s $\color{#35bf28}+0.19\%$
test_plain_set_nested_inplace 0.1320ms 28.1206μs 35.5611 KOps/s 36.4272 KOps/s $\color{#d91a1a}-2.38\%$
test_plain_set_stack_nested_inplace 65.5130μs 27.8515μs 35.9047 KOps/s 36.7080 KOps/s $\color{#d91a1a}-2.19\%$
test_items 27.9720μs 4.5793μs 218.3757 KOps/s 240.3508 KOps/s $\textbf{\color{#d91a1a}-9.14\%}$
test_items_nested 0.5260ms 0.3823ms 2.6159 KOps/s 2.6618 KOps/s $\color{#d91a1a}-1.73\%$
test_items_nested_locked 0.6798ms 0.3822ms 2.6163 KOps/s 2.6594 KOps/s $\color{#d91a1a}-1.62\%$
test_items_nested_leaf 0.1630ms 81.4655μs 12.2751 KOps/s 12.3408 KOps/s $\color{#d91a1a}-0.53\%$
test_items_stack_nested 0.9544ms 0.3950ms 2.5319 KOps/s 2.6390 KOps/s $\color{#d91a1a}-4.06\%$
test_items_stack_nested_leaf 0.1924ms 85.4768μs 11.6991 KOps/s 12.3086 KOps/s $\color{#d91a1a}-4.95\%$
test_items_stack_nested_locked 0.5499ms 0.3833ms 2.6088 KOps/s 2.6326 KOps/s $\color{#d91a1a}-0.90\%$
test_keys 27.4520μs 3.5101μs 284.8948 KOps/s 251.5046 KOps/s $\textbf{\color{#35bf28}+13.28\%}$
test_keys_nested 0.5425ms 0.1380ms 7.2485 KOps/s 7.4542 KOps/s $\color{#d91a1a}-2.76\%$
test_keys_nested_locked 1.8447ms 0.1421ms 7.0375 KOps/s 7.1337 KOps/s $\color{#d91a1a}-1.35\%$
test_keys_nested_leaf 0.1978ms 0.1194ms 8.3742 KOps/s 8.5304 KOps/s $\color{#d91a1a}-1.83\%$
test_keys_stack_nested 0.2348ms 0.1357ms 7.3694 KOps/s 7.4286 KOps/s $\color{#d91a1a}-0.80\%$
test_keys_stack_nested_leaf 0.4376ms 0.1202ms 8.3180 KOps/s 8.4820 KOps/s $\color{#d91a1a}-1.93\%$
test_keys_stack_nested_locked 0.2421ms 0.1412ms 7.0826 KOps/s 7.1711 KOps/s $\color{#d91a1a}-1.23\%$
test_values 7.3396μs 1.0549μs 947.9499 KOps/s 958.9198 KOps/s $\color{#d91a1a}-1.14\%$
test_values_nested 0.1665ms 93.7434μs 10.6674 KOps/s 10.7122 KOps/s $\color{#d91a1a}-0.42\%$
test_values_nested_locked 0.1756ms 95.6591μs 10.4538 KOps/s 10.7122 KOps/s $\color{#d91a1a}-2.41\%$
test_values_nested_leaf 0.1361ms 79.4629μs 12.5845 KOps/s 12.6714 KOps/s $\color{#d91a1a}-0.69\%$
test_values_stack_nested 0.1697ms 92.4218μs 10.8200 KOps/s 10.0742 KOps/s $\textbf{\color{#35bf28}+7.40\%}$
test_values_stack_nested_leaf 0.1794ms 80.0854μs 12.4867 KOps/s 12.7826 KOps/s $\color{#d91a1a}-2.32\%$
test_values_stack_nested_locked 0.1730ms 92.5867μs 10.8007 KOps/s 10.6162 KOps/s $\color{#35bf28}+1.74\%$
test_membership 19.8060μs 0.8869μs 1.1275 MOps/s 1.1188 MOps/s $\color{#35bf28}+0.77\%$
test_membership_nested 37.8600μs 2.7578μs 362.6123 KOps/s 362.6169 KOps/s $-0.00\%$
test_membership_nested_leaf 38.5820μs 2.7750μs 360.3606 KOps/s 362.6622 KOps/s $\color{#d91a1a}-0.63\%$
test_membership_stacked_nested 22.2820μs 2.7589μs 362.4628 KOps/s 365.2297 KOps/s $\color{#d91a1a}-0.76\%$
test_membership_stacked_nested_leaf 21.6000μs 2.7399μs 364.9791 KOps/s 364.9493 KOps/s $+0.01\%$
test_membership_nested_last 40.6560μs 4.0848μs 244.8089 KOps/s 239.9212 KOps/s $\color{#35bf28}+2.04\%$
test_membership_nested_leaf_last 0.1354ms 4.2217μs 236.8733 KOps/s 235.5535 KOps/s $\color{#35bf28}+0.56\%$
test_membership_stacked_nested_last 29.9060μs 4.1591μs 240.4384 KOps/s 241.6389 KOps/s $\color{#d91a1a}-0.50\%$
test_membership_stacked_nested_leaf_last 20.2780μs 4.1523μs 240.8287 KOps/s 238.5202 KOps/s $\color{#35bf28}+0.97\%$
test_nested_getleaf 48.2200μs 10.8263μs 92.3674 KOps/s 93.7447 KOps/s $\color{#d91a1a}-1.47\%$
test_nested_get 54.5510μs 10.4548μs 95.6495 KOps/s 97.6538 KOps/s $\color{#d91a1a}-2.05\%$
test_stacked_getleaf 31.0680μs 10.7153μs 93.3244 KOps/s 93.9791 KOps/s $\color{#d91a1a}-0.70\%$
test_stacked_get 59.6510μs 10.0628μs 99.3764 KOps/s 98.6289 KOps/s $\color{#35bf28}+0.76\%$
test_nested_getitemleaf 0.2401ms 11.0028μs 90.8862 KOps/s 90.1356 KOps/s $\color{#35bf28}+0.83\%$
test_nested_getitem 38.2210μs 10.3936μs 96.2129 KOps/s 97.3860 KOps/s $\color{#d91a1a}-1.20\%$
test_stacked_getitemleaf 34.8150μs 11.0169μs 90.7693 KOps/s 91.8617 KOps/s $\color{#d91a1a}-1.19\%$
test_stacked_getitem 49.1190μs 10.0789μs 99.2169 KOps/s 96.6376 KOps/s $\color{#35bf28}+2.67\%$
test_lock_nested 2.0079ms 0.5070ms 1.9725 KOps/s 2.0036 KOps/s $\color{#d91a1a}-1.55\%$
test_lock_stack_nested 0.8646ms 0.4809ms 2.0795 KOps/s 2.0969 KOps/s $\color{#d91a1a}-0.83\%$
test_unlock_nested 0.7429ms 0.4235ms 2.3612 KOps/s 2.4004 KOps/s $\color{#d91a1a}-1.63\%$
test_unlock_stack_nested 0.7564ms 0.3963ms 2.5230 KOps/s 2.5634 KOps/s $\color{#d91a1a}-1.57\%$
test_flatten_speed 0.1967ms 0.1011ms 9.8886 KOps/s 10.0276 KOps/s $\color{#d91a1a}-1.39\%$
test_unflatten_speed 0.6113ms 0.5289ms 1.8908 KOps/s 1.9785 KOps/s $\color{#d91a1a}-4.44\%$
test_common_ops 2.3898ms 1.1640ms 859.1179 Ops/s 874.9428 Ops/s $\color{#d91a1a}-1.81\%$
test_creation 15.4390μs 2.0885μs 478.8170 KOps/s 473.6908 KOps/s $\color{#35bf28}+1.08\%$
test_creation_empty 56.3850μs 19.6430μs 50.9086 KOps/s 52.3445 KOps/s $\color{#d91a1a}-2.74\%$
test_creation_nested_1 63.5680μs 23.3505μs 42.8256 KOps/s 44.1723 KOps/s $\color{#d91a1a}-3.05\%$
test_creation_nested_2 0.1847ms 28.0021μs 35.7117 KOps/s 37.2542 KOps/s $\color{#d91a1a}-4.14\%$
test_clone 0.1069ms 17.4114μs 57.4336 KOps/s 57.7216 KOps/s $\color{#d91a1a}-0.50\%$
test_getitem[int] 1.0862ms 17.2734μs 57.8925 KOps/s 60.4099 KOps/s $\color{#d91a1a}-4.17\%$
test_getitem[slice_int] 0.1409ms 31.5720μs 31.6736 KOps/s 30.8119 KOps/s $\color{#35bf28}+2.80\%$
test_getitem[range] 0.1678ms 58.1938μs 17.1840 KOps/s 17.3759 KOps/s $\color{#d91a1a}-1.10\%$
test_getitem[tuple] 0.3387ms 27.1929μs 36.7743 KOps/s 40.1226 KOps/s $\textbf{\color{#d91a1a}-8.35\%}$
test_getitem[list] 0.2069ms 53.4016μs 18.7260 KOps/s 18.9127 KOps/s $\color{#d91a1a}-0.99\%$
test_setitem_dim[int] 72.4160μs 34.5786μs 28.9196 KOps/s 30.0335 KOps/s $\color{#d91a1a}-3.71\%$
test_setitem_dim[slice_int] 0.1146ms 62.7581μs 15.9342 KOps/s 16.0455 KOps/s $\color{#d91a1a}-0.69\%$
test_setitem_dim[range] 0.1274ms 84.9173μs 11.7762 KOps/s 11.8995 KOps/s $\color{#d91a1a}-1.04\%$
test_setitem_dim[tuple] 0.1261ms 51.3566μs 19.4717 KOps/s 20.4723 KOps/s $\color{#d91a1a}-4.89\%$
test_setitem 0.1131ms 31.1630μs 32.0893 KOps/s 32.3908 KOps/s $\color{#d91a1a}-0.93\%$
test_set 0.1893ms 32.7878μs 30.4991 KOps/s 33.4985 KOps/s $\textbf{\color{#d91a1a}-8.95\%}$
test_set_shared 3.8127ms 0.2184ms 4.5792 KOps/s 4.5462 KOps/s $\color{#35bf28}+0.73\%$
test_update 0.8563ms 40.1160μs 24.9277 KOps/s 25.7156 KOps/s $\color{#d91a1a}-3.06\%$
test_update_nested 0.1336ms 51.2992μs 19.4935 KOps/s 20.2399 KOps/s $\color{#d91a1a}-3.69\%$
test_update__nested 0.1362ms 45.8377μs 21.8161 KOps/s 22.3840 KOps/s $\color{#d91a1a}-2.54\%$
test_set_nested 0.1157ms 34.0409μs 29.3765 KOps/s 29.6126 KOps/s $\color{#d91a1a}-0.80\%$
test_set_nested_new 0.1245ms 39.1609μs 25.5357 KOps/s 26.3857 KOps/s $\color{#d91a1a}-3.22\%$
test_select 0.3712ms 56.8991μs 17.5750 KOps/s 18.0551 KOps/s $\color{#d91a1a}-2.66\%$
test_select_nested 0.1443ms 60.9780μs 16.3994 KOps/s 16.8010 KOps/s $\color{#d91a1a}-2.39\%$
test_exclude_nested 0.1848ms 75.8185μs 13.1894 KOps/s 13.4319 KOps/s $\color{#d91a1a}-1.81\%$
test_empty[True] 0.4256ms 0.3530ms 2.8325 KOps/s 2.8476 KOps/s $\color{#d91a1a}-0.53\%$
test_empty[False] 10.8920μs 1.2762μs 783.5608 KOps/s 815.8417 KOps/s $\color{#d91a1a}-3.96\%$
test_unbind_speed 0.4358ms 0.3069ms 3.2582 KOps/s 3.3564 KOps/s $\color{#d91a1a}-2.93\%$
test_unbind_speed_stack0 0.6287ms 0.3084ms 3.2424 KOps/s 3.3692 KOps/s $\color{#d91a1a}-3.76\%$
test_unbind_speed_stack1 0.1044s 0.8169ms 1.2242 KOps/s 1.3541 KOps/s $\textbf{\color{#d91a1a}-9.60\%}$
test_split 94.8092ms 2.2791ms 438.7619 Ops/s 465.1243 Ops/s $\textbf{\color{#d91a1a}-5.67\%}$
test_chunk 3.3165ms 2.0851ms 479.5905 Ops/s 463.8114 Ops/s $\color{#35bf28}+3.40\%$
test_creation[device0] 0.2061ms 0.1155ms 8.6578 KOps/s 8.5613 KOps/s $\color{#35bf28}+1.13\%$
test_creation_from_tensor 4.0363ms 0.1189ms 8.4086 KOps/s 8.5654 KOps/s $\color{#d91a1a}-1.83\%$
test_add_one[memmap_tensor0] 0.3127ms 7.4140μs 134.8805 KOps/s 131.3967 KOps/s $\color{#35bf28}+2.65\%$
test_contiguous[memmap_tensor0] 29.6450μs 1.8680μs 535.3351 KOps/s 528.9048 KOps/s $\color{#35bf28}+1.22\%$
test_stack[memmap_tensor0] 62.9080μs 5.8100μs 172.1184 KOps/s 178.8852 KOps/s $\color{#d91a1a}-3.78\%$
test_memmaptd_index 1.1172ms 0.4144ms 2.4131 KOps/s 2.4443 KOps/s $\color{#d91a1a}-1.28\%$
test_memmaptd_index_astensor 0.7936ms 0.5150ms 1.9416 KOps/s 1.9629 KOps/s $\color{#d91a1a}-1.09\%$
test_memmaptd_index_op 1.5181ms 1.0907ms 916.8726 Ops/s 946.5318 Ops/s $\color{#d91a1a}-3.13\%$
test_serialize_model 0.2190s 0.1360s 7.3525 Ops/s 8.5497 Ops/s $\textbf{\color{#d91a1a}-14.00\%}$
test_serialize_model_pickle 0.4498s 0.3910s 2.5575 Ops/s 2.5343 Ops/s $\color{#35bf28}+0.91\%$
test_serialize_weights 0.1226s 0.1149s 8.7015 Ops/s 7.5729 Ops/s $\textbf{\color{#35bf28}+14.90\%}$
test_serialize_weights_returnearly 0.2111s 0.1641s 6.0950 Ops/s 6.3326 Ops/s $\color{#d91a1a}-3.75\%$
test_serialize_weights_pickle 1.2492s 0.7413s 1.3490 Ops/s 1.1823 Ops/s $\textbf{\color{#35bf28}+14.11\%}$
test_serialize_weights_filesystem 0.1466s 0.1397s 7.1579 Ops/s 6.9977 Ops/s $\color{#35bf28}+2.29\%$
test_serialize_model_filesystem 0.1492s 0.1432s 6.9817 Ops/s 6.3504 Ops/s $\textbf{\color{#35bf28}+9.94\%}$
test_reshape_pytree 85.4800μs 39.1159μs 25.5651 KOps/s 25.8883 KOps/s $\color{#d91a1a}-1.25\%$
test_reshape_td 94.9270μs 46.5750μs 21.4707 KOps/s 21.0027 KOps/s $\color{#35bf28}+2.23\%$
test_view_pytree 0.1142ms 38.8931μs 25.7115 KOps/s 26.0437 KOps/s $\color{#d91a1a}-1.28\%$
test_view_td 0.1287ms 51.8198μs 19.2977 KOps/s 18.7811 KOps/s $\color{#35bf28}+2.75\%$
test_unbind_pytree 82.6140μs 35.5093μs 28.1616 KOps/s 27.6032 KOps/s $\color{#35bf28}+2.02\%$
test_unbind_td 0.3025ms 44.9891μs 22.2276 KOps/s 22.3532 KOps/s $\color{#d91a1a}-0.56\%$
test_split_pytree 82.3440μs 37.9234μs 26.3690 KOps/s 26.6139 KOps/s $\color{#d91a1a}-0.92\%$
test_split_td 0.4640ms 60.3392μs 16.5730 KOps/s 17.4408 KOps/s $\color{#d91a1a}-4.98\%$
test_add_pytree 0.1212ms 44.6161μs 22.4135 KOps/s 21.5931 KOps/s $\color{#35bf28}+3.80\%$
test_add_td 0.2418ms 89.8526μs 11.1293 KOps/s 11.6290 KOps/s $\color{#d91a1a}-4.30\%$
test_compile_add_one_nested[tensordict-compile] 0.1448ms 73.5814μs 13.5904 KOps/s 14.0955 KOps/s $\color{#d91a1a}-3.58\%$
test_compile_add_one_nested[tensordict-eager] 0.4239ms 0.2010ms 4.9752 KOps/s 4.9636 KOps/s $\color{#35bf28}+0.23\%$
test_compile_add_one_nested[pytree-compile] 0.1633ms 55.4518μs 18.0337 KOps/s 18.5034 KOps/s $\color{#d91a1a}-2.54\%$
test_compile_add_one_nested[pytree-eager] 0.4902ms 0.1476ms 6.7767 KOps/s 6.7221 KOps/s $\color{#35bf28}+0.81\%$
test_compile_copy_nested[tensordict-compile] 83.7860μs 28.5121μs 35.0728 KOps/s 36.8720 KOps/s $\color{#d91a1a}-4.88\%$
test_compile_copy_nested[tensordict-eager] 0.1509ms 76.7001μs 13.0378 KOps/s 13.0191 KOps/s $\color{#35bf28}+0.14\%$
test_compile_copy_nested[pytree-compile] 0.1952ms 78.2960μs 12.7720 KOps/s 12.6426 KOps/s $\color{#35bf28}+1.02\%$
test_compile_copy_nested[pytree-eager] 0.1348ms 66.7650μs 14.9779 KOps/s 14.7020 KOps/s $\color{#35bf28}+1.88\%$
test_compile_add_one_flat[tensordict-compile] 0.2593ms 0.1222ms 8.1805 KOps/s 8.2369 KOps/s $\color{#d91a1a}-0.68\%$
test_compile_add_one_flat[tensordict-eager] 0.5037ms 0.2460ms 4.0646 KOps/s 4.0690 KOps/s $\color{#d91a1a}-0.11\%$
test_compile_add_one_flat[tensorclass-compile] 0.1114ms 54.2452μs 18.4348 KOps/s 19.1453 KOps/s $\color{#d91a1a}-3.71\%$
test_compile_add_one_flat[tensorclass-eager] 0.2110ms 79.0778μs 12.6458 KOps/s 12.7849 KOps/s $\color{#d91a1a}-1.09\%$
test_compile_add_one_flat[pytree-compile] 0.2365ms 0.1130ms 8.8516 KOps/s 9.1545 KOps/s $\color{#d91a1a}-3.31\%$
test_compile_add_one_flat[pytree-eager] 0.4942ms 0.2994ms 3.3397 KOps/s 3.3683 KOps/s $\color{#d91a1a}-0.85\%$
test_compile_add_self_flat[tensordict-eager] 0.5194ms 0.2781ms 3.5955 KOps/s 3.6459 KOps/s $\color{#d91a1a}-1.38\%$
test_compile_add_self_flat[tensordict-compile] 0.2772ms 0.1258ms 7.9499 KOps/s 8.3687 KOps/s $\textbf{\color{#d91a1a}-5.00\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1973ms 75.3485μs 13.2717 KOps/s 13.2071 KOps/s $\color{#35bf28}+0.49\%$
test_compile_add_self_flat[tensorclass-compile] 0.1058ms 53.7655μs 18.5993 KOps/s 18.6299 KOps/s $\color{#d91a1a}-0.16\%$
test_compile_add_self_flat[pytree-eager] 0.4140ms 0.2406ms 4.1561 KOps/s 4.0969 KOps/s $\color{#35bf28}+1.45\%$
test_compile_add_self_flat[pytree-compile] 0.1948ms 0.1123ms 8.9073 KOps/s 9.0839 KOps/s $\color{#d91a1a}-1.95\%$
test_compile_copy_flat[tensordict-compile] 63.8890μs 28.9546μs 34.5369 KOps/s 32.7889 KOps/s $\textbf{\color{#35bf28}+5.33\%}$
test_compile_copy_flat[tensordict-eager] 0.1950ms 78.3342μs 12.7658 KOps/s 13.1418 KOps/s $\color{#d91a1a}-2.86\%$
test_compile_copy_flat[pytree-compile] 0.1989ms 81.6671μs 12.2448 KOps/s 12.4928 KOps/s $\color{#d91a1a}-1.98\%$
test_compile_copy_flat[pytree-eager] 0.1449ms 69.6925μs 14.3488 KOps/s 14.8215 KOps/s $\color{#d91a1a}-3.19\%$
test_compile_assign_and_add[tensordict-compile] 0.3187ms 0.2141ms 4.6704 KOps/s 4.7908 KOps/s $\color{#d91a1a}-2.51\%$
test_compile_assign_and_add[tensordict-eager] 2.1096ms 1.8817ms 531.4270 Ops/s 550.4544 Ops/s $\color{#d91a1a}-3.46\%$
test_compile_assign_and_add[pytree-compile] 0.2975ms 0.2088ms 4.7882 KOps/s 4.8541 KOps/s $\color{#d91a1a}-1.36\%$
test_compile_assign_and_add[pytree-eager] 1.7969ms 1.1745ms 851.4409 Ops/s 862.2249 Ops/s $\color{#d91a1a}-1.25\%$
test_compile_assign_and_add_stack[compile] 0.5673ms 0.4644ms 2.1535 KOps/s 2.2151 KOps/s $\color{#d91a1a}-2.78\%$
test_compile_assign_and_add_stack[eager] 6.0429ms 4.2707ms 234.1532 Ops/s 238.7514 Ops/s $\color{#d91a1a}-1.93\%$
test_compile_indexing[tensor-tensordict-compile] 91.3400μs 43.3303μs 23.0786 KOps/s 23.4419 KOps/s $\color{#d91a1a}-1.55\%$
test_compile_indexing[tensor-tensordict-eager] 0.5126ms 49.6883μs 20.1255 KOps/s 20.2112 KOps/s $\color{#d91a1a}-0.42\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1262ms 36.7438μs 27.2155 KOps/s 27.1077 KOps/s $\color{#35bf28}+0.40\%$
test_compile_indexing[tensor-tensorclass-eager] 76.6030μs 29.6479μs 33.7292 KOps/s 34.2690 KOps/s $\color{#d91a1a}-1.58\%$
test_compile_indexing[tensor-pytree-compile] 98.8440μs 37.6640μs 26.5505 KOps/s 26.1505 KOps/s $\color{#35bf28}+1.53\%$
test_compile_indexing[tensor-pytree-eager] 82.7040μs 29.3167μs 34.1102 KOps/s 34.1464 KOps/s $\color{#d91a1a}-0.11\%$
test_compile_indexing[slice-tensordict-compile] 0.1989ms 78.0715μs 12.8088 KOps/s 12.7074 KOps/s $\color{#35bf28}+0.80\%$
test_compile_indexing[slice-tensordict-eager] 0.7490ms 29.9881μs 33.3466 KOps/s 36.0119 KOps/s $\textbf{\color{#d91a1a}-7.40\%}$
test_compile_indexing[slice-tensorclass-compile] 0.1316ms 70.7183μs 14.1406 KOps/s 14.1155 KOps/s $\color{#35bf28}+0.18\%$
test_compile_indexing[slice-tensorclass-eager] 74.7400μs 24.4762μs 40.8560 KOps/s 42.4356 KOps/s $\color{#d91a1a}-3.72\%$
test_compile_indexing[slice-pytree-compile] 0.1584ms 71.7133μs 13.9444 KOps/s 14.0314 KOps/s $\color{#d91a1a}-0.62\%$
test_compile_indexing[slice-pytree-eager] 64.5000μs 24.3573μs 41.0555 KOps/s 42.8857 KOps/s $\color{#d91a1a}-4.27\%$
test_compile_indexing[int-tensordict-compile] 0.1737ms 77.9945μs 12.8214 KOps/s 12.5016 KOps/s $\color{#35bf28}+2.56\%$
test_compile_indexing[int-tensordict-eager] 0.7960ms 29.7332μs 33.6324 KOps/s 35.2287 KOps/s $\color{#d91a1a}-4.53\%$
test_compile_indexing[int-tensorclass-compile] 0.1545ms 71.3077μs 14.0237 KOps/s 13.6522 KOps/s $\color{#35bf28}+2.72\%$
test_compile_indexing[int-tensorclass-eager] 74.4390μs 24.2440μs 41.2473 KOps/s 42.9394 KOps/s $\color{#d91a1a}-3.94\%$
test_compile_indexing[int-pytree-compile] 0.1636ms 71.2426μs 14.0365 KOps/s 13.9865 KOps/s $\color{#35bf28}+0.36\%$
test_compile_indexing[int-pytree-eager] 78.3860μs 24.1755μs 41.3642 KOps/s 42.4792 KOps/s $\color{#d91a1a}-2.62\%$
test_mod_add[eager] 0.1017ms 27.0468μs 36.9729 KOps/s 37.3886 KOps/s $\color{#d91a1a}-1.11\%$
test_mod_add[compile] 0.1128ms 43.7501μs 22.8571 KOps/s 22.2525 KOps/s $\color{#35bf28}+2.72\%$
test_mod_add[compile-overhead] 0.1297ms 45.2390μs 22.1048 KOps/s 21.8189 KOps/s $\color{#35bf28}+1.31\%$
test_mod_wrap[eager] 0.3530ms 0.2145ms 4.6621 KOps/s 4.5375 KOps/s $\color{#35bf28}+2.75\%$
test_mod_wrap[compile] 1.6844ms 0.2031ms 4.9234 KOps/s 4.8687 KOps/s $\color{#35bf28}+1.12\%$
test_mod_wrap[compile-overhead] 1.8272ms 0.2049ms 4.8803 KOps/s 4.9270 KOps/s $\color{#d91a1a}-0.95\%$
test_mod_wrap_and_backward[eager] 13.4587ms 11.3453ms 88.1419 Ops/s 89.8715 Ops/s $\color{#d91a1a}-1.92\%$
test_mod_wrap_and_backward[compile] 12.6384ms 10.7640ms 92.9025 Ops/s 91.8278 Ops/s $\color{#35bf28}+1.17\%$
test_mod_wrap_and_backward[compile-overhead] 12.3171ms 10.7737ms 92.8184 Ops/s 91.7907 Ops/s $\color{#35bf28}+1.12\%$
test_seq_add[eager] 0.2135ms 91.8749μs 10.8844 KOps/s 10.5742 KOps/s $\color{#35bf28}+2.93\%$
test_seq_add[compile] 0.3087ms 59.2228μs 16.8854 KOps/s 17.0663 KOps/s $\color{#d91a1a}-1.06\%$
test_seq_add[compile-overhead] 0.1119ms 56.6774μs 17.6437 KOps/s 17.1668 KOps/s $\color{#35bf28}+2.78\%$
test_seq_wrap[eager] 0.7140ms 0.3922ms 2.5499 KOps/s 2.5482 KOps/s $\color{#35bf28}+0.07\%$
test_seq_wrap[compile] 0.4091ms 0.2211ms 4.5238 KOps/s 4.4813 KOps/s $\color{#35bf28}+0.95\%$
test_seq_wrap[compile-overhead] 0.7652ms 0.2241ms 4.4631 KOps/s 4.4687 KOps/s $\color{#d91a1a}-0.13\%$
test_func_call_runtime[False-eager] 1.1828ms 0.5546ms 1.8032 KOps/s 1.8117 KOps/s $\color{#d91a1a}-0.47\%$
test_func_call_runtime[False-compile] 0.8060ms 0.4303ms 2.3242 KOps/s 2.4005 KOps/s $\color{#d91a1a}-3.18\%$
test_func_call_runtime[False-compile-overhead] 0.5823ms 0.4307ms 2.3218 KOps/s 2.3804 KOps/s $\color{#d91a1a}-2.46\%$
test_func_call_runtime[True-eager] 1.2490ms 0.7639ms 1.3091 KOps/s 1.3327 KOps/s $\color{#d91a1a}-1.77\%$
test_func_call_runtime[True-compile] 0.8568ms 0.4710ms 2.1232 KOps/s 2.1896 KOps/s $\color{#d91a1a}-3.03\%$
test_func_call_runtime[True-compile-overhead] 0.9894ms 0.4700ms 2.1278 KOps/s 2.1927 KOps/s $\color{#d91a1a}-2.96\%$
test_func_call_cm_runtime[False-eager] 0.9872ms 0.5552ms 1.8012 KOps/s 1.8737 KOps/s $\color{#d91a1a}-3.87\%$
test_func_call_cm_runtime[False-compile] 0.5349ms 0.4264ms 2.3451 KOps/s 2.4025 KOps/s $\color{#d91a1a}-2.39\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5249ms 0.4244ms 2.3564 KOps/s 2.3857 KOps/s $\color{#d91a1a}-1.23\%$
test_func_call_cm_runtime[True-eager] 1.1289ms 0.9194ms 1.0877 KOps/s 1.1253 KOps/s $\color{#d91a1a}-3.35\%$
test_func_call_cm_runtime[True-compile] 0.6174ms 0.4917ms 2.0339 KOps/s 2.0519 KOps/s $\color{#d91a1a}-0.88\%$
test_func_call_cm_runtime[True-compile-overhead] 1.2408ms 0.5050ms 1.9800 KOps/s 2.0285 KOps/s $\color{#d91a1a}-2.39\%$
test_vmap_func_call_cm_runtime[eager] 2.3608ms 1.9250ms 519.4894 Ops/s 514.3537 Ops/s $\color{#35bf28}+1.00\%$
test_vmap_func_call_cm_runtime[compile] 1.0841ms 0.5146ms 1.9433 KOps/s 1.9634 KOps/s $\color{#d91a1a}-1.02\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.9386ms 0.5207ms 1.9205 KOps/s 1.9470 KOps/s $\color{#d91a1a}-1.36\%$
test_distributed 0.3528ms 0.1301ms 7.6864 KOps/s 7.9096 KOps/s $\color{#d91a1a}-2.82\%$
test_tdmodule 0.1292ms 18.8109μs 53.1608 KOps/s 54.3624 KOps/s $\color{#d91a1a}-2.21\%$
test_tdmodule_dispatch 69.9000μs 37.8946μs 26.3890 KOps/s 27.1553 KOps/s $\color{#d91a1a}-2.82\%$
test_tdseq 43.9620μs 22.0067μs 45.4407 KOps/s 47.1451 KOps/s $\color{#d91a1a}-3.62\%$
test_tdseq_dispatch 78.9170μs 43.6641μs 22.9021 KOps/s 23.9203 KOps/s $\color{#d91a1a}-4.26\%$
test_instantiation_functorch 2.3553ms 1.5420ms 648.4967 Ops/s 671.5561 Ops/s $\color{#d91a1a}-3.43\%$
test_exec_functorch 0.3241ms 0.1807ms 5.5349 KOps/s 5.6503 KOps/s $\color{#d91a1a}-2.04\%$
test_exec_functional_call 0.3776ms 0.1749ms 5.7171 KOps/s 5.8677 KOps/s $\color{#d91a1a}-2.57\%$
test_exec_td_decorator 0.5443ms 0.2390ms 4.1845 KOps/s 4.3227 KOps/s $\color{#d91a1a}-3.20\%$
test_vmap_mlp_speed_decorator[True-True] 0.8604ms 0.6648ms 1.5043 KOps/s 1.5151 KOps/s $\color{#d91a1a}-0.71\%$
test_vmap_mlp_speed_decorator[True-False] 1.4865ms 0.6815ms 1.4673 KOps/s 1.5476 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_vmap_mlp_speed_decorator[False-True] 0.7885ms 0.5435ms 1.8400 KOps/s 1.8764 KOps/s $\color{#d91a1a}-1.94\%$
test_vmap_mlp_speed_decorator[False-False] 0.9429ms 0.5437ms 1.8391 KOps/s 1.8883 KOps/s $\color{#d91a1a}-2.60\%$
test_to_module_speed[True] 1.9313ms 1.3816ms 723.7976 Ops/s 728.2294 Ops/s $\color{#d91a1a}-0.61\%$
test_to_module_speed[False] 2.3072ms 1.3789ms 725.2033 Ops/s 748.7661 Ops/s $\color{#d91a1a}-3.15\%$
test_tc_init 94.9570μs 49.3110μs 20.2795 KOps/s 21.4068 KOps/s $\textbf{\color{#d91a1a}-5.27\%}$
test_tc_init_nested 0.2019ms 98.8200μs 10.1194 KOps/s 10.7995 KOps/s $\textbf{\color{#d91a1a}-6.30\%}$
test_tc_first_layer_tensor 20.5280μs 1.5658μs 638.6400 KOps/s 674.1653 KOps/s $\textbf{\color{#d91a1a}-5.27\%}$
test_tc_first_layer_nontensor 23.1430μs 4.7552μs 210.2953 KOps/s 211.4929 KOps/s $\color{#d91a1a}-0.57\%$
test_tc_second_layer_tensor 31.4280μs 2.8450μs 351.4901 KOps/s 365.8675 KOps/s $\color{#d91a1a}-3.93\%$
test_tc_second_layer_nontensor 40.2150μs 6.0847μs 164.3477 KOps/s 163.9763 KOps/s $\color{#35bf28}+0.23\%$
test_unbind 0.2244s 13.4579ms 74.3056 Ops/s 80.8105 Ops/s $\textbf{\color{#d91a1a}-8.05\%}$
test_full_like 8.8134ms 7.8154ms 127.9529 Ops/s 141.8460 Ops/s $\textbf{\color{#d91a1a}-9.79\%}$
test_zeros_like 4.1868ms 3.0080ms 332.4438 Ops/s 364.5264 Ops/s $\textbf{\color{#d91a1a}-8.80\%}$
test_ones_like 4.0811ms 3.5213ms 283.9878 Ops/s 323.5869 Ops/s $\textbf{\color{#d91a1a}-12.24\%}$
test_clone 6.1733ms 5.5212ms 181.1205 Ops/s 182.0028 Ops/s $\color{#d91a1a}-0.48\%$
test_squeeze 65.8030μs 13.0837μs 76.4310 KOps/s 78.1954 KOps/s $\color{#d91a1a}-2.26\%$
test_unsqueeze 0.2059ms 94.4143μs 10.5916 KOps/s 10.6146 KOps/s $\color{#d91a1a}-0.22\%$
test_split 0.4893ms 0.2005ms 4.9884 KOps/s 5.1880 KOps/s $\color{#d91a1a}-3.85\%$
test_permute 0.3850ms 0.2227ms 4.4909 KOps/s 4.5270 KOps/s $\color{#d91a1a}-0.80\%$
test_stack 31.1231ms 26.3454ms 37.9573 Ops/s 39.1565 Ops/s $\color{#d91a1a}-3.06\%$
test_cat 31.7582ms 25.9747ms 38.4991 Ops/s 37.4143 Ops/s $\color{#35bf28}+2.90\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}30$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 34.1120μs 16.3559μs 61.1402 KOps/s 56.9670 KOps/s $\textbf{\color{#35bf28}+7.33\%}$
test_plain_set_stack_nested 45.4920μs 16.4144μs 60.9220 KOps/s 56.1203 KOps/s $\textbf{\color{#35bf28}+8.56\%}$
test_plain_set_nested_inplace 45.8320μs 17.5377μs 57.0201 KOps/s 52.2424 KOps/s $\textbf{\color{#35bf28}+9.15\%}$
test_plain_set_stack_nested_inplace 51.0520μs 17.5583μs 56.9532 KOps/s 52.1962 KOps/s $\textbf{\color{#35bf28}+9.11\%}$
test_items 28.1610μs 2.8939μs 345.5599 KOps/s 339.5853 KOps/s $\color{#35bf28}+1.76\%$
test_items_nested 0.3776ms 0.3393ms 2.9477 KOps/s 2.9190 KOps/s $\color{#35bf28}+0.98\%$
test_items_nested_locked 0.3751ms 0.3412ms 2.9308 KOps/s 2.9209 KOps/s $\color{#35bf28}+0.34\%$
test_items_nested_leaf 92.9540μs 64.3164μs 15.5481 KOps/s 15.6053 KOps/s $\color{#d91a1a}-0.37\%$
test_items_stack_nested 0.3847ms 0.3422ms 2.9223 KOps/s 2.9146 KOps/s $\color{#35bf28}+0.26\%$
test_items_stack_nested_leaf 92.2140μs 67.6533μs 14.7813 KOps/s 15.1308 KOps/s $\color{#d91a1a}-2.31\%$
test_items_stack_nested_locked 0.3843ms 0.3449ms 2.8997 KOps/s 2.8809 KOps/s $\color{#35bf28}+0.65\%$
test_keys 29.2510μs 3.4648μs 288.6185 KOps/s 287.5643 KOps/s $\color{#35bf28}+0.37\%$
test_keys_nested 0.2455ms 71.9638μs 13.8959 KOps/s 13.9297 KOps/s $\color{#d91a1a}-0.24\%$
test_keys_nested_locked 0.7104ms 77.6068μs 12.8855 KOps/s 12.8679 KOps/s $\color{#35bf28}+0.14\%$
test_keys_nested_leaf 92.5140μs 62.1584μs 16.0879 KOps/s 15.9261 KOps/s $\color{#35bf28}+1.02\%$
test_keys_stack_nested 0.1124ms 72.9481μs 13.7084 KOps/s 13.7180 KOps/s $\color{#d91a1a}-0.07\%$
test_keys_stack_nested_leaf 0.1020ms 64.2188μs 15.5718 KOps/s 15.5271 KOps/s $\color{#35bf28}+0.29\%$
test_keys_stack_nested_locked 0.1305ms 77.6926μs 12.8712 KOps/s 12.7175 KOps/s $\color{#35bf28}+1.21\%$
test_values 5.4387μs 0.8513μs 1.1747 MOps/s 1.1749 MOps/s $\color{#d91a1a}-0.02\%$
test_values_nested 85.1540μs 50.1209μs 19.9518 KOps/s 20.0560 KOps/s $\color{#d91a1a}-0.52\%$
test_values_nested_locked 84.2540μs 51.6620μs 19.3566 KOps/s 19.3931 KOps/s $\color{#d91a1a}-0.19\%$
test_values_nested_leaf 80.2230μs 43.4079μs 23.0373 KOps/s 23.1727 KOps/s $\color{#d91a1a}-0.58\%$
test_values_stack_nested 90.8640μs 51.3826μs 19.4618 KOps/s 19.5647 KOps/s $\color{#d91a1a}-0.53\%$
test_values_stack_nested_leaf 76.6740μs 43.9010μs 22.7785 KOps/s 22.4047 KOps/s $\color{#35bf28}+1.67\%$
test_values_stack_nested_locked 89.7440μs 52.3210μs 19.1128 KOps/s 19.1090 KOps/s $\color{#35bf28}+0.02\%$
test_membership 1.7676μs 0.5316μs 1.8810 MOps/s 1.8716 MOps/s $\color{#35bf28}+0.50\%$
test_membership_nested 32.3420μs 2.0310μs 492.3575 KOps/s 521.9606 KOps/s $\textbf{\color{#d91a1a}-5.67\%}$
test_membership_nested_leaf 20.7160μs 1.9782μs 505.5062 KOps/s 523.5041 KOps/s $\color{#d91a1a}-3.44\%$
test_membership_stacked_nested 48.2220μs 2.0123μs 496.9341 KOps/s 517.2289 KOps/s $\color{#d91a1a}-3.92\%$
test_membership_stacked_nested_leaf 32.2310μs 2.0165μs 495.9178 KOps/s 509.2876 KOps/s $\color{#d91a1a}-2.63\%$
test_membership_nested_last 36.8820μs 3.0614μs 326.6452 KOps/s 331.0646 KOps/s $\color{#d91a1a}-1.33\%$
test_membership_nested_leaf_last 51.2420μs 3.0316μs 329.8538 KOps/s 329.2018 KOps/s $\color{#35bf28}+0.20\%$
test_membership_stacked_nested_last 29.1210μs 4.3158μs 231.7074 KOps/s 281.8751 KOps/s $\textbf{\color{#d91a1a}-17.80\%}$
test_membership_stacked_nested_leaf_last 36.6210μs 4.3461μs 230.0888 KOps/s 285.3937 KOps/s $\textbf{\color{#d91a1a}-19.38\%}$
test_nested_getleaf 67.5030μs 5.9964μs 166.7679 KOps/s 167.0660 KOps/s $\color{#d91a1a}-0.18\%$
test_nested_get 39.7120μs 5.7080μs 175.1912 KOps/s 175.4234 KOps/s $\color{#d91a1a}-0.13\%$
test_stacked_getleaf 37.7010μs 6.0278μs 165.8968 KOps/s 166.3138 KOps/s $\color{#d91a1a}-0.25\%$
test_stacked_get 31.0220μs 5.7352μs 174.3612 KOps/s 175.0660 KOps/s $\color{#d91a1a}-0.40\%$
test_nested_getitemleaf 39.2020μs 6.1399μs 162.8700 KOps/s 166.1194 KOps/s $\color{#d91a1a}-1.96\%$
test_nested_getitem 30.1910μs 5.7835μs 172.9052 KOps/s 173.6457 KOps/s $\color{#d91a1a}-0.43\%$
test_stacked_getitemleaf 27.1020μs 6.1027μs 163.8607 KOps/s 163.9374 KOps/s $\color{#d91a1a}-0.05\%$
test_stacked_getitem 31.2920μs 5.7629μs 173.5232 KOps/s 172.7621 KOps/s $\color{#35bf28}+0.44\%$
test_lock_nested 0.8497ms 0.4345ms 2.3013 KOps/s 2.3283 KOps/s $\color{#d91a1a}-1.16\%$
test_lock_stack_nested 0.4414ms 0.3896ms 2.5667 KOps/s 2.5113 KOps/s $\color{#35bf28}+2.20\%$
test_unlock_nested 0.8251ms 0.3719ms 2.6892 KOps/s 2.7052 KOps/s $\color{#d91a1a}-0.59\%$
test_unlock_stack_nested 0.3604ms 0.3284ms 3.0453 KOps/s 2.9844 KOps/s $\color{#35bf28}+2.04\%$
test_flatten_speed 0.1187ms 78.6786μs 12.7099 KOps/s 12.5378 KOps/s $\color{#35bf28}+1.37\%$
test_unflatten_speed 0.3663ms 0.3221ms 3.1048 KOps/s 3.1016 KOps/s $\color{#35bf28}+0.10\%$
test_common_ops 1.7922ms 1.2467ms 802.1026 Ops/s 770.5488 Ops/s $\color{#35bf28}+4.09\%$
test_creation 22.4910μs 1.5035μs 665.1055 KOps/s 666.6443 KOps/s $\color{#d91a1a}-0.23\%$
test_creation_empty 38.8020μs 14.9142μs 67.0501 KOps/s 55.4549 KOps/s $\textbf{\color{#35bf28}+20.91\%}$
test_creation_nested_1 53.8220μs 16.6288μs 60.1366 KOps/s 50.3835 KOps/s $\textbf{\color{#35bf28}+19.36\%}$
test_creation_nested_2 47.7420μs 19.3445μs 51.6943 KOps/s 44.9573 KOps/s $\textbf{\color{#35bf28}+14.99\%}$
test_clone 63.4130μs 29.1266μs 34.3329 KOps/s 34.1029 KOps/s $\color{#35bf28}+0.67\%$
test_getitem[int] 1.2815ms 16.6025μs 60.2321 KOps/s 59.5450 KOps/s $\color{#35bf28}+1.15\%$
test_getitem[slice_int] 0.1360ms 30.1560μs 33.1609 KOps/s 33.5631 KOps/s $\color{#d91a1a}-1.20\%$
test_getitem[range] 0.3372ms 0.1177ms 8.4964 KOps/s 8.6731 KOps/s $\color{#d91a1a}-2.04\%$
test_getitem[tuple] 0.1310ms 26.2633μs 38.0760 KOps/s 38.5319 KOps/s $\color{#d91a1a}-1.18\%$
test_getitem[list] 0.2121ms 0.1053ms 9.4933 KOps/s 9.4946 KOps/s $\color{#d91a1a}-0.01\%$
test_setitem_dim[int] 0.1257ms 45.2111μs 22.1185 KOps/s 22.0500 KOps/s $\color{#35bf28}+0.31\%$
test_setitem_dim[slice_int] 0.1202ms 71.3952μs 14.0065 KOps/s 14.6102 KOps/s $\color{#d91a1a}-4.13\%$
test_setitem_dim[range] 0.1831ms 0.1312ms 7.6207 KOps/s 7.6484 KOps/s $\color{#d91a1a}-0.36\%$
test_setitem_dim[tuple] 96.6540μs 61.9140μs 16.1514 KOps/s 16.2960 KOps/s $\color{#d91a1a}-0.89\%$
test_setitem 84.1740μs 41.6906μs 23.9862 KOps/s 23.3036 KOps/s $\color{#35bf28}+2.93\%$
test_set 83.1630μs 42.1660μs 23.7158 KOps/s 23.5232 KOps/s $\color{#35bf28}+0.82\%$
test_set_shared 0.3434ms 54.8672μs 18.2258 KOps/s 18.6235 KOps/s $\color{#d91a1a}-2.14\%$
test_update 0.1056ms 50.0645μs 19.9742 KOps/s 18.9613 KOps/s $\textbf{\color{#35bf28}+5.34\%}$
test_update_nested 0.1099ms 58.5212μs 17.0878 KOps/s 16.6811 KOps/s $\color{#35bf28}+2.44\%$
test_update__nested 0.5767ms 70.5351μs 14.1773 KOps/s 15.0213 KOps/s $\textbf{\color{#d91a1a}-5.62\%}$
test_set_nested 0.1016ms 43.4695μs 23.0046 KOps/s 22.6190 KOps/s $\color{#35bf28}+1.70\%$
test_set_nested_new 0.1017ms 48.5728μs 20.5877 KOps/s 20.9433 KOps/s $\color{#d91a1a}-1.70\%$
test_select 0.1233ms 62.4751μs 16.0064 KOps/s 16.3805 KOps/s $\color{#d91a1a}-2.28\%$
test_select_nested 83.2840μs 41.6862μs 23.9888 KOps/s 23.8263 KOps/s $\color{#35bf28}+0.68\%$
test_exclude_nested 93.3640μs 57.9254μs 17.2636 KOps/s 16.8931 KOps/s $\color{#35bf28}+2.19\%$
test_empty[True] 0.2881ms 0.2527ms 3.9571 KOps/s 3.9473 KOps/s $\color{#35bf28}+0.25\%$
test_empty[False] 3.7322μs 0.7571μs 1.3209 MOps/s 1.2786 MOps/s $\color{#35bf28}+3.31\%$
test_to 57.0620μs 26.8996μs 37.1753 KOps/s 37.7573 KOps/s $\color{#d91a1a}-1.54\%$
test_to_nonblocking 56.9730μs 25.1770μs 39.7188 KOps/s 39.8040 KOps/s $\color{#d91a1a}-0.21\%$
test_unbind_speed 0.3213ms 0.2841ms 3.5201 KOps/s 3.5744 KOps/s $\color{#d91a1a}-1.52\%$
test_unbind_speed_stack0 0.3334ms 0.2759ms 3.6251 KOps/s 3.6007 KOps/s $\color{#35bf28}+0.68\%$
test_unbind_speed_stack1 92.0338ms 0.7039ms 1.4207 KOps/s 1.3952 KOps/s $\color{#35bf28}+1.83\%$
test_split 95.4925ms 2.2943ms 435.8554 Ops/s 438.5948 Ops/s $\color{#d91a1a}-0.62\%$
test_chunk 95.0610ms 2.2931ms 436.0972 Ops/s 436.6340 Ops/s $\color{#d91a1a}-0.12\%$
test_to[False] 3.4521ms 3.3947ms 294.5789 Ops/s 297.3516 Ops/s $\color{#d91a1a}-0.93\%$
test_to[True] 4.8450ms 4.5109ms 221.6838 Ops/s 225.6850 Ops/s $\color{#d91a1a}-1.77\%$
test_to_njt[False] 0.3353s 0.2533s 3.9486 Ops/s 3.9538 Ops/s $\color{#d91a1a}-0.13\%$
test_to_njt[True] 0.2615s 0.2610s 3.8314 Ops/s 3.8412 Ops/s $\color{#d91a1a}-0.25\%$
test_creation[device0] 0.3361ms 0.1325ms 7.5458 KOps/s 7.7339 KOps/s $\color{#d91a1a}-2.43\%$
test_creation_from_tensor 0.3924ms 0.1307ms 7.6502 KOps/s 7.6618 KOps/s $\color{#d91a1a}-0.15\%$
test_add_one[memmap_tensor0] 0.2299ms 9.2964μs 107.5681 KOps/s 108.1190 KOps/s $\color{#d91a1a}-0.51\%$
test_contiguous[memmap_tensor0] 36.0210μs 2.2014μs 454.2660 KOps/s 450.8318 KOps/s $\color{#35bf28}+0.76\%$
test_stack[memmap_tensor0] 34.8320μs 7.2462μs 138.0039 KOps/s 141.9388 KOps/s $\color{#d91a1a}-2.77\%$
test_memmaptd_index 93.6809ms 0.4968ms 2.0128 KOps/s 2.2316 KOps/s $\textbf{\color{#d91a1a}-9.80\%}$
test_memmaptd_index_astensor 0.7855ms 0.5080ms 1.9684 KOps/s 1.9393 KOps/s $\color{#35bf28}+1.50\%$
test_memmaptd_index_op 1.4116ms 1.0270ms 973.7307 Ops/s 923.5931 Ops/s $\textbf{\color{#35bf28}+5.43\%}$
test_serialize_model 0.1315s 0.1302s 7.6799 Ops/s 6.9422 Ops/s $\textbf{\color{#35bf28}+10.63\%}$
test_serialize_model_pickle 1.3510s 1.2136s 0.8240 Ops/s 0.8197 Ops/s $\color{#35bf28}+0.52\%$
test_serialize_weights 0.1305s 0.1296s 7.7185 Ops/s 7.6875 Ops/s $\color{#35bf28}+0.40\%$
test_serialize_weights_returnearly 0.2446s 63.7378ms 15.6893 Ops/s 17.7574 Ops/s $\textbf{\color{#d91a1a}-11.65\%}$
test_serialize_weights_pickle 1.3698s 1.1902s 0.8402 Ops/s 0.8354 Ops/s $\color{#35bf28}+0.57\%$
test_reshape_pytree 80.7430μs 36.6821μs 27.2612 KOps/s 27.4976 KOps/s $\color{#d91a1a}-0.86\%$
test_reshape_td 70.5930μs 42.7904μs 23.3697 KOps/s 24.0160 KOps/s $\color{#d91a1a}-2.69\%$
test_view_pytree 64.2220μs 37.0852μs 26.9649 KOps/s 27.3129 KOps/s $\color{#d91a1a}-1.27\%$
test_view_td 87.0240μs 48.0722μs 20.8020 KOps/s 21.1385 KOps/s $\color{#d91a1a}-1.59\%$
test_unbind_pytree 69.7230μs 34.7446μs 28.7814 KOps/s 28.3163 KOps/s $\color{#35bf28}+1.64\%$
test_unbind_td 0.4789ms 43.0176μs 23.2463 KOps/s 23.7101 KOps/s $\color{#d91a1a}-1.96\%$
test_split_pytree 0.4912ms 46.1375μs 21.6744 KOps/s 21.2061 KOps/s $\color{#35bf28}+2.21\%$
test_split_td 0.1539ms 58.5878μs 17.0684 KOps/s 17.5074 KOps/s $\color{#d91a1a}-2.51\%$
test_add_pytree 89.1430μs 58.5658μs 17.0748 KOps/s 17.3764 KOps/s $\color{#d91a1a}-1.74\%$
test_add_td 0.1451ms 93.3320μs 10.7144 KOps/s 10.2704 KOps/s $\color{#35bf28}+4.32\%$
test_compile_add_one_nested[tensordict-compile] 0.2274ms 0.1640ms 6.0959 KOps/s 6.1550 KOps/s $\color{#d91a1a}-0.96\%$
test_compile_add_one_nested[tensordict-eager] 0.3358ms 0.1605ms 6.2289 KOps/s 6.1955 KOps/s $\color{#35bf28}+0.54\%$
test_compile_add_one_nested[pytree-compile] 0.2213ms 0.1546ms 6.4697 KOps/s 6.3328 KOps/s $\color{#35bf28}+2.16\%$
test_compile_add_one_nested[pytree-eager] 0.2498ms 0.1884ms 5.3079 KOps/s 5.3478 KOps/s $\color{#d91a1a}-0.75\%$
test_compile_copy_nested[tensordict-compile] 52.9130μs 24.5729μs 40.6952 KOps/s 44.8239 KOps/s $\textbf{\color{#d91a1a}-9.21\%}$
test_compile_copy_nested[tensordict-eager] 0.1153ms 48.7931μs 20.4947 KOps/s 20.5891 KOps/s $\color{#d91a1a}-0.46\%$
test_compile_copy_nested[pytree-compile] 0.1135ms 65.1860μs 15.3407 KOps/s 15.1029 KOps/s $\color{#35bf28}+1.57\%$
test_compile_copy_nested[pytree-eager] 94.5240μs 49.8299μs 20.0683 KOps/s 20.0359 KOps/s $\color{#35bf28}+0.16\%$
test_compile_add_one_flat[tensordict-compile] 0.3783ms 0.3189ms 3.1360 KOps/s 3.1446 KOps/s $\color{#d91a1a}-0.27\%$
test_compile_add_one_flat[tensordict-eager] 0.3252ms 0.2310ms 4.3287 KOps/s 4.2130 KOps/s $\color{#35bf28}+2.75\%$
test_compile_add_one_flat[tensorclass-compile] 0.1976ms 0.1302ms 7.6805 KOps/s 7.7046 KOps/s $\color{#d91a1a}-0.31\%$
test_compile_add_one_flat[tensorclass-eager] 0.1250ms 65.7620μs 15.2063 KOps/s 15.1863 KOps/s $\color{#35bf28}+0.13\%$
test_compile_add_one_flat[pytree-compile] 0.4489ms 0.3302ms 3.0284 KOps/s 3.0704 KOps/s $\color{#d91a1a}-1.37\%$
test_compile_add_one_flat[pytree-eager] 0.7199ms 0.6356ms 1.5734 KOps/s 1.5736 KOps/s $\color{#d91a1a}-0.01\%$
test_compile_add_self_flat[tensordict-eager] 0.4096ms 0.2816ms 3.5517 KOps/s 3.4734 KOps/s $\color{#35bf28}+2.25\%$
test_compile_add_self_flat[tensordict-compile] 0.4394ms 0.3217ms 3.1083 KOps/s 3.1166 KOps/s $\color{#d91a1a}-0.27\%$
test_compile_add_self_flat[tensorclass-eager] 0.1746ms 80.0369μs 12.4942 KOps/s 12.8613 KOps/s $\color{#d91a1a}-2.85\%$
test_compile_add_self_flat[tensorclass-compile] 0.1951ms 0.1307ms 7.6502 KOps/s 7.6634 KOps/s $\color{#d91a1a}-0.17\%$
test_compile_add_self_flat[pytree-eager] 0.6246ms 0.5309ms 1.8836 KOps/s 1.8519 KOps/s $\color{#35bf28}+1.71\%$
test_compile_add_self_flat[pytree-compile] 0.3750ms 0.3247ms 3.0802 KOps/s 3.0595 KOps/s $\color{#35bf28}+0.68\%$
test_compile_copy_flat[tensordict-compile] 55.3620μs 20.9704μs 47.6862 KOps/s 48.6995 KOps/s $\color{#d91a1a}-2.08\%$
test_compile_copy_flat[tensordict-eager] 81.0530μs 39.5255μs 25.3001 KOps/s 25.5187 KOps/s $\color{#d91a1a}-0.86\%$
test_compile_copy_flat[pytree-compile] 0.1198ms 71.1412μs 14.0566 KOps/s 14.1813 KOps/s $\color{#d91a1a}-0.88\%$
test_compile_copy_flat[pytree-eager] 0.1160ms 52.2543μs 19.1372 KOps/s 19.5139 KOps/s $\color{#d91a1a}-1.93\%$
test_compile_assign_and_add[tensordict-compile] 2.3805ms 0.8307ms 1.2038 KOps/s 1.1271 KOps/s $\textbf{\color{#35bf28}+6.80\%}$
test_compile_assign_and_add[tensordict-eager] 3.4183ms 3.2328ms 309.3250 Ops/s 314.6502 Ops/s $\color{#d91a1a}-1.69\%$
test_compile_assign_and_add[pytree-compile] 2.3988ms 0.8346ms 1.1982 KOps/s 1.1023 KOps/s $\textbf{\color{#35bf28}+8.70\%}$
test_compile_assign_and_add[pytree-eager] 3.3170ms 3.2557ms 307.1495 Ops/s 306.8132 Ops/s $\color{#35bf28}+0.11\%$
test_compile_indexing[tensor-tensordict-compile] 0.1660ms 0.1204ms 8.3067 KOps/s 8.2859 KOps/s $\color{#35bf28}+0.25\%$
test_compile_indexing[tensor-tensordict-eager] 0.1972ms 63.5608μs 15.7330 KOps/s 15.7558 KOps/s $\color{#d91a1a}-0.14\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1517ms 0.1140ms 8.7701 KOps/s 8.6266 KOps/s $\color{#35bf28}+1.66\%$
test_compile_indexing[tensor-tensorclass-eager] 88.8840μs 43.9176μs 22.7699 KOps/s 22.9074 KOps/s $\color{#d91a1a}-0.60\%$
test_compile_indexing[tensor-pytree-compile] 0.1932ms 0.1207ms 8.2848 KOps/s 8.5908 KOps/s $\color{#d91a1a}-3.56\%$
test_compile_indexing[tensor-pytree-eager] 91.1240μs 46.3182μs 21.5898 KOps/s 21.6283 KOps/s $\color{#d91a1a}-0.18\%$
test_compile_indexing[slice-tensordict-compile] 0.1957ms 0.1556ms 6.4274 KOps/s 6.7310 KOps/s $\color{#d91a1a}-4.51\%$
test_compile_indexing[slice-tensordict-eager] 0.1588ms 26.6910μs 37.4659 KOps/s 37.6721 KOps/s $\color{#d91a1a}-0.55\%$
test_compile_indexing[slice-tensorclass-compile] 0.1880ms 0.1420ms 7.0416 KOps/s 7.0079 KOps/s $\color{#35bf28}+0.48\%$
test_compile_indexing[slice-tensorclass-eager] 60.2430μs 21.5214μs 46.4653 KOps/s 44.2621 KOps/s $\color{#35bf28}+4.98\%$
test_compile_indexing[slice-pytree-compile] 0.1903ms 0.1434ms 6.9723 KOps/s 6.6746 KOps/s $\color{#35bf28}+4.46\%$
test_compile_indexing[slice-pytree-eager] 65.5830μs 21.2598μs 47.0372 KOps/s 45.4718 KOps/s $\color{#35bf28}+3.44\%$
test_compile_indexing[int-tensordict-compile] 0.2966ms 0.1552ms 6.4429 KOps/s 6.5260 KOps/s $\color{#d91a1a}-1.27\%$
test_compile_indexing[int-tensordict-eager] 0.4375ms 27.0482μs 36.9710 KOps/s 37.1093 KOps/s $\color{#d91a1a}-0.37\%$
test_compile_indexing[int-tensorclass-compile] 0.1914ms 0.1431ms 6.9862 KOps/s 6.7529 KOps/s $\color{#35bf28}+3.45\%$
test_compile_indexing[int-tensorclass-eager] 62.6730μs 21.4222μs 46.6806 KOps/s 44.4347 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_compile_indexing[int-pytree-compile] 0.1835ms 0.1434ms 6.9732 KOps/s 6.9588 KOps/s $\color{#35bf28}+0.21\%$
test_compile_indexing[int-pytree-eager] 72.4330μs 21.5375μs 46.4305 KOps/s 46.2241 KOps/s $\color{#35bf28}+0.45\%$
test_mod_add[eager] 88.4340μs 31.4618μs 31.7845 KOps/s 30.0647 KOps/s $\textbf{\color{#35bf28}+5.72\%}$
test_mod_add[compile] 0.1335ms 82.5132μs 12.1193 KOps/s 12.0709 KOps/s $\color{#35bf28}+0.40\%$
test_mod_add[compile-overhead] 0.3269ms 0.1562ms 6.4004 KOps/s 5.9363 KOps/s $\textbf{\color{#35bf28}+7.82\%}$
test_mod_wrap[eager] 0.3265ms 0.2442ms 4.0952 KOps/s 3.9770 KOps/s $\color{#35bf28}+2.97\%$
test_mod_wrap[compile] 0.3900ms 0.3054ms 3.2747 KOps/s 3.2936 KOps/s $\color{#d91a1a}-0.57\%$
test_mod_wrap[compile-overhead] 7.9493ms 4.2199ms 236.9713 Ops/s 244.7850 Ops/s $\color{#d91a1a}-3.19\%$
test_mod_wrap_and_backward[eager] 1.5737ms 1.3585ms 736.1255 Ops/s 720.9859 Ops/s $\color{#35bf28}+2.10\%$
test_mod_wrap_and_backward[compile] 1.5935ms 1.3462ms 742.8240 Ops/s 736.8129 Ops/s $\color{#35bf28}+0.82\%$
test_mod_wrap_and_backward[compile-overhead] 1.5015ms 1.0145ms 985.7039 Ops/s 1.0623 KOps/s $\textbf{\color{#d91a1a}-7.21\%}$
test_seq_add[eager] 0.1627ms 97.4735μs 10.2592 KOps/s 9.5419 KOps/s $\textbf{\color{#35bf28}+7.52\%}$
test_seq_add[compile] 0.1638ms 91.5847μs 10.9189 KOps/s 10.3438 KOps/s $\textbf{\color{#35bf28}+5.56\%}$
test_seq_add[compile-overhead] 0.1687ms 0.1247ms 8.0188 KOps/s 7.9056 KOps/s $\color{#35bf28}+1.43\%$
test_seq_wrap[eager] 0.4589ms 0.3794ms 2.6354 KOps/s 2.4676 KOps/s $\textbf{\color{#35bf28}+6.80\%}$
test_seq_wrap[compile] 0.4907ms 0.3176ms 3.1484 KOps/s 2.9408 KOps/s $\textbf{\color{#35bf28}+7.06\%}$
test_seq_wrap[compile-overhead] 0.2858ms 0.2217ms 4.5105 KOps/s 4.4843 KOps/s $\color{#35bf28}+0.59\%$
test_func_call_runtime[False-eager] 0.8834ms 0.7452ms 1.3419 KOps/s 1.3030 KOps/s $\color{#35bf28}+2.99\%$
test_func_call_runtime[False-compile] 0.9031ms 0.8002ms 1.2497 KOps/s 1.1840 KOps/s $\textbf{\color{#35bf28}+5.55\%}$
test_func_call_runtime[False-compile-overhead] 0.4148ms 0.3603ms 2.7755 KOps/s 2.7347 KOps/s $\color{#35bf28}+1.49\%$
test_func_call_runtime[True-eager] 0.9702ms 0.9010ms 1.1098 KOps/s 1.0823 KOps/s $\color{#35bf28}+2.55\%$
test_func_call_runtime[True-compile] 1.0687ms 0.8267ms 1.2096 KOps/s 1.2040 KOps/s $\color{#35bf28}+0.47\%$
test_func_call_runtime[True-compile-overhead] 0.4345ms 0.3817ms 2.6199 KOps/s 2.6098 KOps/s $\color{#35bf28}+0.39\%$
test_func_call_cm_runtime[False-eager] 0.8236ms 0.7414ms 1.3487 KOps/s 1.3213 KOps/s $\color{#35bf28}+2.07\%$
test_func_call_cm_runtime[False-compile] 0.8609ms 0.8042ms 1.2434 KOps/s 1.2231 KOps/s $\color{#35bf28}+1.66\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4423ms 0.3634ms 2.7516 KOps/s 2.7463 KOps/s $\color{#35bf28}+0.20\%$
test_func_call_cm_runtime[True-eager] 1.1802ms 1.0193ms 981.1021 Ops/s 963.7470 Ops/s $\color{#35bf28}+1.80\%$
test_func_call_cm_runtime[True-compile] 0.9229ms 0.8507ms 1.1756 KOps/s 1.1021 KOps/s $\textbf{\color{#35bf28}+6.67\%}$
test_func_call_cm_runtime[True-compile-overhead] 0.4655ms 0.4078ms 2.4521 KOps/s 2.4009 KOps/s $\color{#35bf28}+2.13\%$
test_vmap_func_call_cm_runtime[eager] 2.5998ms 2.1262ms 470.3218 Ops/s 462.7400 Ops/s $\color{#35bf28}+1.64\%$
test_vmap_func_call_cm_runtime[compile] 0.9326ms 0.8626ms 1.1592 KOps/s 1.1376 KOps/s $\color{#35bf28}+1.90\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4933ms 0.4118ms 2.4284 KOps/s 2.4172 KOps/s $\color{#35bf28}+0.46\%$
test_distributed 5.0958ms 0.1616ms 6.1869 KOps/s 8.8128 KOps/s $\textbf{\color{#d91a1a}-29.80\%}$
test_tdmodule 0.2209ms 14.3060μs 69.9008 KOps/s 63.5952 KOps/s $\textbf{\color{#35bf28}+9.92\%}$
test_tdmodule_dispatch 49.7320μs 27.6947μs 36.1080 KOps/s 32.0809 KOps/s $\textbf{\color{#35bf28}+12.55\%}$
test_tdseq 35.0920μs 15.4424μs 64.7567 KOps/s 57.8639 KOps/s $\textbf{\color{#35bf28}+11.91\%}$
test_tdseq_dispatch 51.4820μs 30.7313μs 32.5401 KOps/s 28.7451 KOps/s $\textbf{\color{#35bf28}+13.20\%}$
test_instantiation_functorch 2.5389ms 1.8553ms 539.0100 Ops/s 537.9172 Ops/s $\color{#35bf28}+0.20\%$
test_exec_functorch 0.3030ms 0.2097ms 4.7680 KOps/s 4.5729 KOps/s $\color{#35bf28}+4.27\%$
test_exec_functional_call 0.3130ms 0.2102ms 4.7583 KOps/s 4.4692 KOps/s $\textbf{\color{#35bf28}+6.47\%}$
test_exec_td_decorator 0.4233ms 0.2602ms 3.8427 KOps/s 3.5694 KOps/s $\textbf{\color{#35bf28}+7.66\%}$
test_vmap_mlp_speed_decorator[True-True] 0.9167ms 0.6903ms 1.4486 KOps/s 1.3820 KOps/s $\color{#35bf28}+4.82\%$
test_vmap_mlp_speed_decorator[True-False] 0.8100ms 0.6880ms 1.4534 KOps/s 1.3852 KOps/s $\color{#35bf28}+4.92\%$
test_vmap_mlp_speed_decorator[False-True] 0.7211ms 0.6107ms 1.6374 KOps/s 1.5886 KOps/s $\color{#35bf28}+3.07\%$
test_vmap_mlp_speed_decorator[False-False] 0.7050ms 0.6101ms 1.6390 KOps/s 1.5878 KOps/s $\color{#35bf28}+3.22\%$
test_vmap_transformer_speed_decorator[True-True] 20.0666ms 19.8876ms 50.2825 Ops/s 50.0729 Ops/s $\color{#35bf28}+0.42\%$
test_vmap_transformer_speed_decorator[True-False] 19.9628ms 19.8778ms 50.3074 Ops/s 49.8945 Ops/s $\color{#35bf28}+0.83\%$
test_vmap_transformer_speed_decorator[False-True] 19.7920ms 19.7165ms 50.7190 Ops/s 50.4128 Ops/s $\color{#35bf28}+0.61\%$
test_vmap_transformer_speed_decorator[False-False] 19.7872ms 19.7260ms 50.6946 Ops/s 50.4192 Ops/s $\color{#35bf28}+0.55\%$
test_to_module_speed[True] 1.3274ms 0.9829ms 1.0174 KOps/s 1.0031 KOps/s $\color{#35bf28}+1.43\%$
test_to_module_speed[False] 1.3870ms 0.9655ms 1.0357 KOps/s 1.0251 KOps/s $\color{#35bf28}+1.03\%$
test_tc_init 63.3720μs 34.9585μs 28.6054 KOps/s 26.7331 KOps/s $\textbf{\color{#35bf28}+7.00\%}$
test_tc_init_nested 0.1105ms 71.2361μs 14.0378 KOps/s 13.1764 KOps/s $\textbf{\color{#35bf28}+6.54\%}$
test_tc_first_layer_tensor 3.9074μs 0.6887μs 1.4520 MOps/s 1.4294 MOps/s $\color{#35bf28}+1.58\%$
test_tc_first_layer_nontensor 20.1810μs 2.3294μs 429.2982 KOps/s 434.1352 KOps/s $\color{#d91a1a}-1.11\%$
test_tc_second_layer_tensor 7.3753μs 1.4262μs 701.1818 KOps/s 705.8370 KOps/s $\color{#d91a1a}-0.66\%$
test_tc_second_layer_nontensor 24.1510μs 3.0714μs 325.5875 KOps/s 329.5853 KOps/s $\color{#d91a1a}-1.21\%$
test_unbind 0.1914s 9.6175ms 103.9776 Ops/s 91.6734 Ops/s $\textbf{\color{#35bf28}+13.42\%}$
test_full_like 0.6575ms 0.5725ms 1.7466 KOps/s 1.7478 KOps/s $\color{#d91a1a}-0.07\%$
test_zeros_like 0.2564ms 0.1979ms 5.0518 KOps/s 5.0519 KOps/s $-0.00\%$
test_ones_like 0.2423ms 0.1977ms 5.0570 KOps/s 5.0564 KOps/s $\color{#35bf28}+0.01\%$
test_clone 0.4494ms 0.4146ms 2.4117 KOps/s 2.4120 KOps/s $\color{#d91a1a}-0.01\%$
test_squeeze 39.9720μs 9.8462μs 101.5620 KOps/s 101.6073 KOps/s $\color{#d91a1a}-0.04\%$
test_unsqueeze 0.2361ms 74.1137μs 13.4928 KOps/s 13.3719 KOps/s $\color{#35bf28}+0.90\%$
test_split 0.4213ms 0.1651ms 6.0553 KOps/s 6.0893 KOps/s $\color{#d91a1a}-0.56\%$
test_permute 0.2218ms 0.1746ms 5.7281 KOps/s 5.6147 KOps/s $\color{#35bf28}+2.02\%$
test_stack 1.2660ms 0.8693ms 1.1504 KOps/s 1.1751 KOps/s $\color{#d91a1a}-2.10\%$
test_cat 1.2542ms 1.2314ms 812.0897 Ops/s 812.2780 Ops/s $\color{#d91a1a}-0.02\%$

@vmoens vmoens merged commit 5bf16b1 into gh/vmoens/33/base Oct 24, 2024
49 of 50 checks passed
vmoens added a commit that referenced this pull request Oct 24, 2024
ghstack-source-id: 87945c47b376d223bb3dc33bd6ec7cb9bb047455
Pull Request resolved: #1059
@vmoens vmoens deleted the gh/vmoens/33/head branch October 24, 2024 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactor Refactoring code - not a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants