Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix pre 2.1 _apply compatibility #1050

Merged
merged 1 commit into from
Oct 21, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Oct 21, 2024

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 21, 2024
ghstack-source-id: b8e890e36e0b15dda039e74004c5ab63af16435b
Pull Request resolved: #1050
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 21, 2024
@vmoens vmoens merged commit 6502bd1 into gh/vmoens/31/base Oct 21, 2024
34 of 38 checks passed
@vmoens vmoens deleted the gh/vmoens/31/head branch October 21, 2024 09:57
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}22$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 77.5840μs 26.3916μs 37.8909 KOps/s 38.7557 KOps/s $\color{#d91a1a}-2.23\%$
test_plain_set_stack_nested 78.7160μs 26.0852μs 38.3359 KOps/s 38.3186 KOps/s $\color{#35bf28}+0.04\%$
test_plain_set_nested_inplace 69.0880μs 28.5124μs 35.0724 KOps/s 35.2814 KOps/s $\color{#d91a1a}-0.59\%$
test_plain_set_stack_nested_inplace 66.0430μs 28.1398μs 35.5369 KOps/s 35.5028 KOps/s $\color{#35bf28}+0.10\%$
test_items 30.0960μs 4.1667μs 239.9975 KOps/s 237.6622 KOps/s $\color{#35bf28}+0.98\%$
test_items_nested 0.6993ms 0.3821ms 2.6170 KOps/s 2.6174 KOps/s $\color{#d91a1a}-0.02\%$
test_items_nested_locked 0.4904ms 0.3821ms 2.6171 KOps/s 2.6218 KOps/s $\color{#d91a1a}-0.18\%$
test_items_nested_leaf 0.1459ms 81.7195μs 12.2370 KOps/s 12.2215 KOps/s $\color{#35bf28}+0.13\%$
test_items_stack_nested 0.8334ms 0.3889ms 2.5710 KOps/s 2.6126 KOps/s $\color{#d91a1a}-1.59\%$
test_items_stack_nested_leaf 0.1530ms 86.7382μs 11.5290 KOps/s 11.8979 KOps/s $\color{#d91a1a}-3.10\%$
test_items_stack_nested_locked 0.7063ms 0.3886ms 2.5731 KOps/s 2.6216 KOps/s $\color{#d91a1a}-1.85\%$
test_keys 34.4340μs 3.5275μs 283.4867 KOps/s 280.3109 KOps/s $\color{#35bf28}+1.13\%$
test_keys_nested 0.2539ms 0.1362ms 7.3420 KOps/s 7.3458 KOps/s $\color{#d91a1a}-0.05\%$
test_keys_nested_locked 0.7212ms 0.1412ms 7.0814 KOps/s 7.0536 KOps/s $\color{#35bf28}+0.39\%$
test_keys_nested_leaf 0.1875ms 0.1193ms 8.3796 KOps/s 8.3789 KOps/s $+0.01\%$
test_keys_stack_nested 0.2748ms 0.1352ms 7.3954 KOps/s 7.4208 KOps/s $\color{#d91a1a}-0.34\%$
test_keys_stack_nested_leaf 0.1999ms 0.1166ms 8.5739 KOps/s 8.5114 KOps/s $\color{#35bf28}+0.73\%$
test_keys_stack_nested_locked 0.2373ms 0.1390ms 7.1928 KOps/s 7.1241 KOps/s $\color{#35bf28}+0.96\%$
test_values 8.2354μs 1.0319μs 969.0949 KOps/s 975.6954 KOps/s $\color{#d91a1a}-0.68\%$
test_values_nested 0.1518ms 93.7226μs 10.6698 KOps/s 10.4743 KOps/s $\color{#35bf28}+1.87\%$
test_values_nested_locked 0.1619ms 93.8279μs 10.6578 KOps/s 10.5511 KOps/s $\color{#35bf28}+1.01\%$
test_values_nested_leaf 0.1337ms 79.2327μs 12.6211 KOps/s 12.4470 KOps/s $\color{#35bf28}+1.40\%$
test_values_stack_nested 0.1397ms 93.1119μs 10.7398 KOps/s 10.6045 KOps/s $\color{#35bf28}+1.28\%$
test_values_stack_nested_leaf 0.1409ms 78.3255μs 12.7672 KOps/s 12.6518 KOps/s $\color{#35bf28}+0.91\%$
test_values_stack_nested_locked 0.1564ms 93.5164μs 10.6933 KOps/s 10.5772 KOps/s $\color{#35bf28}+1.10\%$
test_membership 15.5990μs 0.8910μs 1.1224 MOps/s 1.3167 MOps/s $\textbf{\color{#d91a1a}-14.76\%}$
test_membership_nested 30.5470μs 2.7544μs 363.0511 KOps/s 362.5608 KOps/s $\color{#35bf28}+0.14\%$
test_membership_nested_leaf 45.0610μs 2.7097μs 369.0460 KOps/s 367.3683 KOps/s $\color{#35bf28}+0.46\%$
test_membership_stacked_nested 29.9060μs 2.7598μs 362.3425 KOps/s 363.4984 KOps/s $\color{#d91a1a}-0.32\%$
test_membership_stacked_nested_leaf 28.3630μs 2.7329μs 365.9143 KOps/s 365.4333 KOps/s $\color{#35bf28}+0.13\%$
test_membership_nested_last 24.7160μs 4.1928μs 238.5044 KOps/s 241.8028 KOps/s $\color{#d91a1a}-1.36\%$
test_membership_nested_leaf_last 29.6350μs 4.2515μs 235.2121 KOps/s 240.5666 KOps/s $\color{#d91a1a}-2.23\%$
test_membership_stacked_nested_last 24.1750μs 7.4039μs 135.0649 KOps/s 127.6710 KOps/s $\textbf{\color{#35bf28}+5.79\%}$
test_membership_stacked_nested_leaf_last 44.4820μs 7.3259μs 136.5028 KOps/s 128.9982 KOps/s $\textbf{\color{#35bf28}+5.82\%}$
test_nested_getleaf 36.9590μs 10.8248μs 92.3804 KOps/s 97.6655 KOps/s $\textbf{\color{#d91a1a}-5.41\%}$
test_nested_get 43.1100μs 10.3910μs 96.2373 KOps/s 99.1244 KOps/s $\color{#d91a1a}-2.91\%$
test_stacked_getleaf 44.0620μs 10.8175μs 92.4428 KOps/s 95.3487 KOps/s $\color{#d91a1a}-3.05\%$
test_stacked_get 34.3840μs 10.2283μs 97.7675 KOps/s 100.7556 KOps/s $\color{#d91a1a}-2.97\%$
test_nested_getitemleaf 37.1290μs 11.4764μs 87.1351 KOps/s 86.5177 KOps/s $\color{#35bf28}+0.71\%$
test_nested_getitem 36.8390μs 10.6036μs 94.3074 KOps/s 94.2008 KOps/s $\color{#35bf28}+0.11\%$
test_stacked_getitemleaf 52.0270μs 11.2761μs 88.6831 KOps/s 91.1326 KOps/s $\color{#d91a1a}-2.69\%$
test_stacked_getitem 37.7600μs 10.6462μs 93.9298 KOps/s 97.0018 KOps/s $\color{#d91a1a}-3.17\%$
test_lock_nested 0.9365ms 0.5109ms 1.9573 KOps/s 1.9009 KOps/s $\color{#35bf28}+2.97\%$
test_lock_stack_nested 0.6960ms 0.4680ms 2.1367 KOps/s 2.0765 KOps/s $\color{#35bf28}+2.90\%$
test_unlock_nested 0.9598ms 0.4278ms 2.3375 KOps/s 2.2636 KOps/s $\color{#35bf28}+3.27\%$
test_unlock_stack_nested 1.7840ms 0.3875ms 2.5808 KOps/s 2.5556 KOps/s $\color{#35bf28}+0.99\%$
test_flatten_speed 0.1789ms 0.1007ms 9.9334 KOps/s 9.9195 KOps/s $\color{#35bf28}+0.14\%$
test_unflatten_speed 0.6192ms 0.5192ms 1.9260 KOps/s 1.9347 KOps/s $\color{#d91a1a}-0.45\%$
test_common_ops 2.2410ms 1.1989ms 834.0984 Ops/s 801.6882 Ops/s $\color{#35bf28}+4.04\%$
test_creation 32.0690μs 2.0732μs 482.3401 KOps/s 467.9167 KOps/s $\color{#35bf28}+3.08\%$
test_creation_empty 42.7900μs 19.9318μs 50.1710 KOps/s 49.6689 KOps/s $\color{#35bf28}+1.01\%$
test_creation_nested_1 74.6390μs 23.2022μs 43.0994 KOps/s 42.2648 KOps/s $\color{#35bf28}+1.97\%$
test_creation_nested_2 70.7020μs 27.8187μs 35.9471 KOps/s 36.1656 KOps/s $\color{#d91a1a}-0.60\%$
test_clone 0.1078ms 17.0165μs 58.7666 KOps/s 56.7636 KOps/s $\color{#35bf28}+3.53\%$
test_getitem[int] 1.1361ms 16.5737μs 60.3365 KOps/s 58.5084 KOps/s $\color{#35bf28}+3.12\%$
test_getitem[slice_int] 0.1603ms 31.4820μs 31.7642 KOps/s 31.1923 KOps/s $\color{#35bf28}+1.83\%$
test_getitem[range] 0.3329ms 58.6351μs 17.0546 KOps/s 16.4876 KOps/s $\color{#35bf28}+3.44\%$
test_getitem[tuple] 0.1423ms 26.3939μs 37.8876 KOps/s 38.2674 KOps/s $\color{#d91a1a}-0.99\%$
test_getitem[list] 0.2916ms 54.0099μs 18.5151 KOps/s 18.4103 KOps/s $\color{#35bf28}+0.57\%$
test_setitem_dim[int] 93.0030μs 36.4071μs 27.4672 KOps/s 27.7660 KOps/s $\color{#d91a1a}-1.08\%$
test_setitem_dim[slice_int] 0.1059ms 63.5950μs 15.7245 KOps/s 15.5580 KOps/s $\color{#35bf28}+1.07\%$
test_setitem_dim[range] 0.1956ms 87.1821μs 11.4702 KOps/s 11.2621 KOps/s $\color{#35bf28}+1.85\%$
test_setitem_dim[tuple] 89.3970μs 51.4446μs 19.4384 KOps/s 19.0261 KOps/s $\color{#35bf28}+2.17\%$
test_setitem 88.2340μs 32.0909μs 31.1615 KOps/s 29.7367 KOps/s $\color{#35bf28}+4.79\%$
test_set 0.1234ms 31.2136μs 32.0374 KOps/s 31.3220 KOps/s $\color{#35bf28}+2.28\%$
test_set_shared 3.8221ms 0.2219ms 4.5073 KOps/s 4.4448 KOps/s $\color{#35bf28}+1.41\%$
test_update 0.5960ms 41.8200μs 23.9120 KOps/s 24.4547 KOps/s $\color{#d91a1a}-2.22\%$
test_update_nested 0.1481ms 52.3585μs 19.0991 KOps/s 18.7854 KOps/s $\color{#35bf28}+1.67\%$
test_update__nested 0.8794ms 46.9911μs 21.2806 KOps/s 22.0659 KOps/s $\color{#d91a1a}-3.56\%$
test_set_nested 0.1670ms 35.8489μs 27.8949 KOps/s 27.8150 KOps/s $\color{#35bf28}+0.29\%$
test_set_nested_new 0.1612ms 39.9067μs 25.0584 KOps/s 24.5624 KOps/s $\color{#35bf28}+2.02\%$
test_select 0.1875ms 58.4557μs 17.1070 KOps/s 17.1480 KOps/s $\color{#d91a1a}-0.24\%$
test_select_nested 0.1170ms 60.4725μs 16.5364 KOps/s 16.3965 KOps/s $\color{#35bf28}+0.85\%$
test_exclude_nested 0.2098ms 77.9411μs 12.8302 KOps/s 13.0815 KOps/s $\color{#d91a1a}-1.92\%$
test_empty[True] 0.4497ms 0.3600ms 2.7780 KOps/s 2.7587 KOps/s $\color{#35bf28}+0.70\%$
test_empty[False] 7.0505μs 1.2475μs 801.6159 KOps/s 796.7102 KOps/s $\color{#35bf28}+0.62\%$
test_unbind_speed 0.4999ms 0.3048ms 3.2807 KOps/s 3.1449 KOps/s $\color{#35bf28}+4.32\%$
test_unbind_speed_stack0 0.4061ms 0.2951ms 3.3888 KOps/s 3.3203 KOps/s $\color{#35bf28}+2.06\%$
test_unbind_speed_stack1 0.1056s 0.7391ms 1.3529 KOps/s 1.3325 KOps/s $\color{#35bf28}+1.53\%$
test_split 0.1069s 2.2104ms 452.3985 Ops/s 443.3335 Ops/s $\color{#35bf28}+2.04\%$
test_chunk 0.1116s 2.2370ms 447.0277 Ops/s 491.4703 Ops/s $\textbf{\color{#d91a1a}-9.04\%}$
test_creation[device0] 4.6359ms 0.1205ms 8.2964 KOps/s 8.1499 KOps/s $\color{#35bf28}+1.80\%$
test_creation_from_tensor 0.2635ms 0.1167ms 8.5675 KOps/s 8.4290 KOps/s $\color{#35bf28}+1.64\%$
test_add_one[memmap_tensor0] 0.1650ms 7.3178μs 136.6530 KOps/s 135.0392 KOps/s $\color{#35bf28}+1.20\%$
test_contiguous[memmap_tensor0] 29.0140μs 1.8835μs 530.9246 KOps/s 509.8759 KOps/s $\color{#35bf28}+4.13\%$
test_stack[memmap_tensor0] 37.9800μs 5.4109μs 184.8113 KOps/s 174.9543 KOps/s $\textbf{\color{#35bf28}+5.63\%}$
test_memmaptd_index 1.1267ms 0.4085ms 2.4481 KOps/s 2.4267 KOps/s $\color{#35bf28}+0.88\%$
test_memmaptd_index_astensor 0.7808ms 0.5122ms 1.9524 KOps/s 1.9336 KOps/s $\color{#35bf28}+0.97\%$
test_memmaptd_index_op 2.1860ms 1.1085ms 902.0916 Ops/s 881.1731 Ops/s $\color{#35bf28}+2.37\%$
test_serialize_model 0.2401s 0.1412s 7.0828 Ops/s 8.2100 Ops/s $\textbf{\color{#d91a1a}-13.73\%}$
test_serialize_model_pickle 0.4398s 0.3927s 2.5464 Ops/s 2.5679 Ops/s $\color{#d91a1a}-0.84\%$
test_serialize_weights 0.1342s 0.1171s 8.5376 Ops/s 8.3062 Ops/s $\color{#35bf28}+2.79\%$
test_serialize_weights_returnearly 0.1800s 0.1623s 6.1620 Ops/s 5.5043 Ops/s $\textbf{\color{#35bf28}+11.95\%}$
test_serialize_weights_pickle 0.4796s 0.4194s 2.3842 Ops/s 2.5526 Ops/s $\textbf{\color{#d91a1a}-6.60\%}$
test_serialize_weights_filesystem 0.2459s 0.1589s 6.2921 Ops/s 6.9637 Ops/s $\textbf{\color{#d91a1a}-9.64\%}$
test_serialize_model_filesystem 0.1691s 0.1564s 6.3951 Ops/s 6.5295 Ops/s $\color{#d91a1a}-2.06\%$
test_reshape_pytree 89.2860μs 39.9662μs 25.0211 KOps/s 25.2482 KOps/s $\color{#d91a1a}-0.90\%$
test_reshape_td 0.1176ms 48.6244μs 20.5658 KOps/s 21.0093 KOps/s $\color{#d91a1a}-2.11\%$
test_view_pytree 0.1055ms 40.0180μs 24.9888 KOps/s 25.2218 KOps/s $\color{#d91a1a}-0.92\%$
test_view_td 0.1619ms 54.7577μs 18.2623 KOps/s 18.1260 KOps/s $\color{#35bf28}+0.75\%$
test_unbind_pytree 90.0180μs 36.0395μs 27.7473 KOps/s 27.6358 KOps/s $\color{#35bf28}+0.40\%$
test_unbind_td 0.3126ms 44.3903μs 22.5274 KOps/s 21.3021 KOps/s $\textbf{\color{#35bf28}+5.75\%}$
test_split_pytree 95.5980μs 37.8277μs 26.4357 KOps/s 25.4475 KOps/s $\color{#35bf28}+3.88\%$
test_split_td 0.5007ms 58.0357μs 17.2308 KOps/s 16.9433 KOps/s $\color{#35bf28}+1.70\%$
test_add_pytree 0.1131ms 44.5013μs 22.4713 KOps/s 21.4176 KOps/s $\color{#35bf28}+4.92\%$
test_add_td 0.2196ms 90.0622μs 11.1034 KOps/s 11.0196 KOps/s $\color{#35bf28}+0.76\%$
test_compile_add_one_nested[tensordict-compile] 0.1830ms 73.3571μs 13.6320 KOps/s 13.4949 KOps/s $\color{#35bf28}+1.02\%$
test_compile_add_one_nested[tensordict-eager] 0.5645ms 0.2030ms 4.9266 KOps/s 4.7054 KOps/s $\color{#35bf28}+4.70\%$
test_compile_add_one_nested[pytree-compile] 0.1212ms 54.6795μs 18.2884 KOps/s 17.8608 KOps/s $\color{#35bf28}+2.39\%$
test_compile_add_one_nested[pytree-eager] 0.3352ms 0.1467ms 6.8176 KOps/s 6.6058 KOps/s $\color{#35bf28}+3.21\%$
test_compile_copy_nested[tensordict-compile] 71.9140μs 28.0125μs 35.6984 KOps/s 35.8576 KOps/s $\color{#d91a1a}-0.44\%$
test_compile_copy_nested[tensordict-eager] 0.1645ms 76.4080μs 13.0876 KOps/s 12.7515 KOps/s $\color{#35bf28}+2.64\%$
test_compile_copy_nested[pytree-compile] 0.1605ms 78.8349μs 12.6847 KOps/s 12.6010 KOps/s $\color{#35bf28}+0.66\%$
test_compile_copy_nested[pytree-eager] 0.1386ms 67.6841μs 14.7745 KOps/s 14.4778 KOps/s $\color{#35bf28}+2.05\%$
test_compile_add_one_flat[tensordict-compile] 0.1935ms 0.1215ms 8.2299 KOps/s 7.8784 KOps/s $\color{#35bf28}+4.46\%$
test_compile_add_one_flat[tensordict-eager] 0.4425ms 0.2458ms 4.0678 KOps/s 4.0295 KOps/s $\color{#35bf28}+0.95\%$
test_compile_add_one_flat[tensorclass-compile] 0.1357ms 53.9748μs 18.5272 KOps/s 18.2018 KOps/s $\color{#35bf28}+1.79\%$
test_compile_add_one_flat[tensorclass-eager] 0.1594ms 79.6083μs 12.5615 KOps/s 11.9154 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_compile_add_one_flat[pytree-compile] 0.1966ms 0.1127ms 8.8753 KOps/s 8.7855 KOps/s $\color{#35bf28}+1.02\%$
test_compile_add_one_flat[pytree-eager] 0.4577ms 0.2994ms 3.3397 KOps/s 3.1917 KOps/s $\color{#35bf28}+4.64\%$
test_compile_add_self_flat[tensordict-eager] 0.5167ms 0.2776ms 3.6027 KOps/s 3.4956 KOps/s $\color{#35bf28}+3.06\%$
test_compile_add_self_flat[tensordict-compile] 0.4652ms 0.1266ms 7.8963 KOps/s 7.8971 KOps/s $-0.01\%$
test_compile_add_self_flat[tensorclass-eager] 0.3495ms 76.2391μs 13.1166 KOps/s 12.8776 KOps/s $\color{#35bf28}+1.86\%$
test_compile_add_self_flat[tensorclass-compile] 0.1175ms 55.0055μs 18.1800 KOps/s 17.7476 KOps/s $\color{#35bf28}+2.44\%$
test_compile_add_self_flat[pytree-eager] 0.4283ms 0.2453ms 4.0769 KOps/s 3.9682 KOps/s $\color{#35bf28}+2.74\%$
test_compile_add_self_flat[pytree-compile] 0.2398ms 0.1121ms 8.9204 KOps/s 8.7526 KOps/s $\color{#35bf28}+1.92\%$
test_compile_copy_flat[tensordict-compile] 64.2300μs 29.5098μs 33.8871 KOps/s 34.2191 KOps/s $\color{#d91a1a}-0.97\%$
test_compile_copy_flat[tensordict-eager] 0.1539ms 77.8164μs 12.8508 KOps/s 12.5572 KOps/s $\color{#35bf28}+2.34\%$
test_compile_copy_flat[pytree-compile] 0.1560ms 82.0124μs 12.1933 KOps/s 12.2951 KOps/s $\color{#d91a1a}-0.83\%$
test_compile_copy_flat[pytree-eager] 0.1661ms 69.2605μs 14.4382 KOps/s 14.2707 KOps/s $\color{#35bf28}+1.17\%$
test_compile_assign_and_add[tensordict-compile] 0.8517ms 0.2248ms 4.4494 KOps/s 4.5262 KOps/s $\color{#d91a1a}-1.70\%$
test_compile_assign_and_add[tensordict-eager] 3.0746ms 1.8321ms 545.8095 Ops/s 539.0533 Ops/s $\color{#35bf28}+1.25\%$
test_compile_assign_and_add[pytree-compile] 0.3539ms 0.2122ms 4.7134 KOps/s 4.5720 KOps/s $\color{#35bf28}+3.09\%$
test_compile_assign_and_add[pytree-eager] 2.4206ms 1.1833ms 845.1107 Ops/s 844.7097 Ops/s $\color{#35bf28}+0.05\%$
test_compile_assign_and_add_stack[compile] 1.4204ms 0.4754ms 2.1033 KOps/s 2.1169 KOps/s $\color{#d91a1a}-0.64\%$
test_compile_assign_and_add_stack[eager] 4.6834ms 4.4617ms 224.1299 Ops/s 223.3681 Ops/s $\color{#35bf28}+0.34\%$
test_compile_indexing[tensor-tensordict-compile] 0.1210ms 42.8384μs 23.3435 KOps/s 21.9133 KOps/s $\textbf{\color{#35bf28}+6.53\%}$
test_compile_indexing[tensor-tensordict-eager] 0.5167ms 50.0880μs 19.9649 KOps/s 18.5475 KOps/s $\textbf{\color{#35bf28}+7.64\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.1152ms 38.2650μs 26.1335 KOps/s 25.7354 KOps/s $\color{#35bf28}+1.55\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1038ms 29.3062μs 34.1225 KOps/s 33.2881 KOps/s $\color{#35bf28}+2.51\%$
test_compile_indexing[tensor-pytree-compile] 0.1127ms 38.6979μs 25.8412 KOps/s 24.8573 KOps/s $\color{#35bf28}+3.96\%$
test_compile_indexing[tensor-pytree-eager] 0.2554ms 29.1379μs 34.3196 KOps/s 33.2333 KOps/s $\color{#35bf28}+3.27\%$
test_compile_indexing[slice-tensordict-compile] 0.1803ms 77.9345μs 12.8313 KOps/s 12.5821 KOps/s $\color{#35bf28}+1.98\%$
test_compile_indexing[slice-tensordict-eager] 0.5601ms 28.5015μs 35.0859 KOps/s 33.0595 KOps/s $\textbf{\color{#35bf28}+6.13\%}$
test_compile_indexing[slice-tensorclass-compile] 0.1883ms 72.6966μs 13.7558 KOps/s 13.5558 KOps/s $\color{#35bf28}+1.48\%$
test_compile_indexing[slice-tensorclass-eager] 0.1164ms 23.4089μs 42.7187 KOps/s 40.2203 KOps/s $\textbf{\color{#35bf28}+6.21\%}$
test_compile_indexing[slice-pytree-compile] 0.1752ms 72.6355μs 13.7674 KOps/s 13.5077 KOps/s $\color{#35bf28}+1.92\%$
test_compile_indexing[slice-pytree-eager] 0.1038ms 23.7885μs 42.0371 KOps/s 40.6671 KOps/s $\color{#35bf28}+3.37\%$
test_compile_indexing[int-tensordict-compile] 0.1422ms 78.1056μs 12.8032 KOps/s 12.5372 KOps/s $\color{#35bf28}+2.12\%$
test_compile_indexing[int-tensordict-eager] 0.7986ms 28.7288μs 34.8082 KOps/s 33.8963 KOps/s $\color{#35bf28}+2.69\%$
test_compile_indexing[int-tensorclass-compile] 0.2239ms 73.5908μs 13.5887 KOps/s 13.6294 KOps/s $\color{#d91a1a}-0.30\%$
test_compile_indexing[int-tensorclass-eager] 73.6470μs 23.5609μs 42.4432 KOps/s 40.5876 KOps/s $\color{#35bf28}+4.57\%$
test_compile_indexing[int-pytree-compile] 0.1552ms 72.1846μs 13.8534 KOps/s 13.5720 KOps/s $\color{#35bf28}+2.07\%$
test_compile_indexing[int-pytree-eager] 96.0790μs 23.6573μs 42.2702 KOps/s 40.8270 KOps/s $\color{#35bf28}+3.53\%$
test_mod_add[eager] 97.9090μs 26.4928μs 37.7461 KOps/s 35.9662 KOps/s $\color{#35bf28}+4.95\%$
test_mod_add[compile] 0.1209ms 44.8243μs 22.3093 KOps/s 20.2440 KOps/s $\textbf{\color{#35bf28}+10.20\%}$
test_mod_add[compile-overhead] 0.1332ms 44.4221μs 22.5113 KOps/s 21.4957 KOps/s $\color{#35bf28}+4.72\%$
test_mod_wrap[eager] 0.3615ms 0.2127ms 4.7024 KOps/s 4.5491 KOps/s $\color{#35bf28}+3.37\%$
test_mod_wrap[compile] 1.9743ms 0.2049ms 4.8803 KOps/s 4.7310 KOps/s $\color{#35bf28}+3.16\%$
test_mod_wrap[compile-overhead] 1.7730ms 0.2014ms 4.9650 KOps/s 4.8081 KOps/s $\color{#35bf28}+3.26\%$
test_mod_wrap_and_backward[eager] 16.7972ms 11.7006ms 85.4654 Ops/s 89.0530 Ops/s $\color{#d91a1a}-4.03\%$
test_mod_wrap_and_backward[compile] 20.2311ms 13.7014ms 72.9854 Ops/s 89.2397 Ops/s $\textbf{\color{#d91a1a}-18.21\%}$
test_mod_wrap_and_backward[compile-overhead] 17.9761ms 13.6045ms 73.5050 Ops/s 88.7415 Ops/s $\textbf{\color{#d91a1a}-17.17\%}$
test_seq_add[eager] 0.5922ms 96.1007μs 10.4058 KOps/s 9.9909 KOps/s $\color{#35bf28}+4.15\%$
test_seq_add[compile] 0.1394ms 57.7384μs 17.3195 KOps/s 16.5579 KOps/s $\color{#35bf28}+4.60\%$
test_seq_add[compile-overhead] 0.1086ms 56.5989μs 17.6682 KOps/s 17.0716 KOps/s $\color{#35bf28}+3.49\%$
test_seq_wrap[eager] 0.6680ms 0.3906ms 2.5605 KOps/s 2.4530 KOps/s $\color{#35bf28}+4.38\%$
test_seq_wrap[compile] 0.4435ms 0.2270ms 4.4059 KOps/s 4.3294 KOps/s $\color{#35bf28}+1.77\%$
test_seq_wrap[compile-overhead] 0.4425ms 0.2231ms 4.4828 KOps/s 4.3021 KOps/s $\color{#35bf28}+4.20\%$
test_func_call_runtime[False-eager] 0.8537ms 0.5350ms 1.8693 KOps/s 1.7835 KOps/s $\color{#35bf28}+4.81\%$
test_func_call_runtime[False-compile] 0.5310ms 0.4159ms 2.4044 KOps/s 2.2782 KOps/s $\textbf{\color{#35bf28}+5.54\%}$
test_func_call_runtime[False-compile-overhead] 0.7359ms 0.4191ms 2.3863 KOps/s 2.2745 KOps/s $\color{#35bf28}+4.92\%$
test_func_call_runtime[True-eager] 1.0616ms 0.7504ms 1.3326 KOps/s 1.2742 KOps/s $\color{#35bf28}+4.58\%$
test_func_call_runtime[True-compile] 0.7395ms 0.4543ms 2.2011 KOps/s 2.0715 KOps/s $\textbf{\color{#35bf28}+6.26\%}$
test_func_call_runtime[True-compile-overhead] 0.6172ms 0.4572ms 2.1872 KOps/s 2.0476 KOps/s $\textbf{\color{#35bf28}+6.82\%}$
test_func_call_cm_runtime[False-eager] 0.7888ms 0.5308ms 1.8838 KOps/s 1.8006 KOps/s $\color{#35bf28}+4.62\%$
test_func_call_cm_runtime[False-compile] 0.5319ms 0.4184ms 2.3901 KOps/s 2.2695 KOps/s $\textbf{\color{#35bf28}+5.31\%}$
test_func_call_cm_runtime[False-compile-overhead] 1.6907ms 0.4326ms 2.3117 KOps/s 2.1926 KOps/s $\textbf{\color{#35bf28}+5.43\%}$
test_func_call_cm_runtime[True-eager] 1.1340ms 0.8940ms 1.1186 KOps/s 1.0624 KOps/s $\textbf{\color{#35bf28}+5.28\%}$
test_func_call_cm_runtime[True-compile] 0.5980ms 0.4889ms 2.0455 KOps/s 1.9633 KOps/s $\color{#35bf28}+4.19\%$
test_func_call_cm_runtime[True-compile-overhead] 1.6735ms 0.5052ms 1.9795 KOps/s 1.9152 KOps/s $\color{#35bf28}+3.36\%$
test_vmap_func_call_cm_runtime[eager] 2.4498ms 1.9109ms 523.3221 Ops/s 494.3446 Ops/s $\textbf{\color{#35bf28}+5.86\%}$
test_vmap_func_call_cm_runtime[compile] 0.9773ms 0.5163ms 1.9367 KOps/s 1.8932 KOps/s $\color{#35bf28}+2.30\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.9575ms 0.5195ms 1.9251 KOps/s 1.8719 KOps/s $\color{#35bf28}+2.84\%$
test_distributed 0.3181ms 0.1278ms 7.8230 KOps/s 7.5268 KOps/s $\color{#35bf28}+3.94\%$
test_tdmodule 35.1550μs 19.2146μs 52.0438 KOps/s 48.1929 KOps/s $\textbf{\color{#35bf28}+7.99\%}$
test_tdmodule_dispatch 76.8930μs 38.3804μs 26.0550 KOps/s 25.5840 KOps/s $\color{#35bf28}+1.84\%$
test_tdseq 42.9400μs 22.0098μs 45.4344 KOps/s 43.9057 KOps/s $\color{#35bf28}+3.48\%$
test_tdseq_dispatch 70.6620μs 43.3976μs 23.0427 KOps/s 22.6081 KOps/s $\color{#35bf28}+1.92\%$
test_instantiation_functorch 2.3889ms 1.5336ms 652.0746 Ops/s 609.0683 Ops/s $\textbf{\color{#35bf28}+7.06\%}$
test_exec_functorch 0.3228ms 0.1789ms 5.5906 KOps/s 5.4295 KOps/s $\color{#35bf28}+2.97\%$
test_exec_functional_call 0.4580ms 0.1730ms 5.7803 KOps/s 5.6048 KOps/s $\color{#35bf28}+3.13\%$
test_exec_td_decorator 0.5281ms 0.2351ms 4.2528 KOps/s 3.9898 KOps/s $\textbf{\color{#35bf28}+6.59\%}$
test_vmap_mlp_speed_decorator[True-True] 0.8668ms 0.6412ms 1.5595 KOps/s 1.5002 KOps/s $\color{#35bf28}+3.95\%$
test_vmap_mlp_speed_decorator[True-False] 2.6390ms 0.6867ms 1.4562 KOps/s 1.5124 KOps/s $\color{#d91a1a}-3.71\%$
test_vmap_mlp_speed_decorator[False-True] 0.8558ms 0.5317ms 1.8806 KOps/s 1.8039 KOps/s $\color{#35bf28}+4.25\%$
test_vmap_mlp_speed_decorator[False-False] 0.9334ms 0.5347ms 1.8702 KOps/s 1.8426 KOps/s $\color{#35bf28}+1.50\%$
test_to_module_speed[True] 2.0415ms 1.3819ms 723.6355 Ops/s 716.8843 Ops/s $\color{#35bf28}+0.94\%$
test_to_module_speed[False] 2.2326ms 1.3469ms 742.4389 Ops/s 741.4892 Ops/s $\color{#35bf28}+0.13\%$
test_tc_init 0.1134ms 47.4213μs 21.0876 KOps/s 20.8142 KOps/s $\color{#35bf28}+1.31\%$
test_tc_init_nested 0.1973ms 95.5853μs 10.4619 KOps/s 10.4907 KOps/s $\color{#d91a1a}-0.28\%$
test_tc_first_layer_tensor 38.7930μs 1.5411μs 648.9008 KOps/s 636.9345 KOps/s $\color{#35bf28}+1.88\%$
test_tc_first_layer_nontensor 0.2950ms 4.7553μs 210.2937 KOps/s 210.9783 KOps/s $\color{#d91a1a}-0.32\%$
test_tc_second_layer_tensor 76.2820μs 2.8363μs 352.5699 KOps/s 356.0722 KOps/s $\color{#d91a1a}-0.98\%$
test_tc_second_layer_nontensor 47.8990μs 6.0571μs 165.0942 KOps/s 164.7735 KOps/s $\color{#35bf28}+0.19\%$
test_unbind 0.2243s 15.4099ms 64.8934 Ops/s 67.2674 Ops/s $\color{#d91a1a}-3.53\%$
test_full_like 8.8937ms 7.7488ms 129.0528 Ops/s 126.3362 Ops/s $\color{#35bf28}+2.15\%$
test_zeros_like 3.4172ms 2.9946ms 333.9318 Ops/s 336.7245 Ops/s $\color{#d91a1a}-0.83\%$
test_ones_like 4.0661ms 3.5438ms 282.1863 Ops/s 279.0628 Ops/s $\color{#35bf28}+1.12\%$
test_clone 6.1422ms 5.4068ms 184.9507 Ops/s 186.4198 Ops/s $\color{#d91a1a}-0.79\%$
test_squeeze 60.2220μs 12.9989μs 76.9294 KOps/s 75.8343 KOps/s $\color{#35bf28}+1.44\%$
test_unsqueeze 0.1855ms 94.0945μs 10.6276 KOps/s 10.2539 KOps/s $\color{#35bf28}+3.64\%$
test_split 0.5255ms 0.1968ms 5.0823 KOps/s 5.0263 KOps/s $\color{#35bf28}+1.11\%$
test_permute 0.3056ms 0.2203ms 4.5390 KOps/s 4.2626 KOps/s $\textbf{\color{#35bf28}+6.49\%}$
test_stack 31.1599ms 25.2849ms 39.5493 Ops/s 38.9395 Ops/s $\color{#35bf28}+1.57\%$
test_cat 29.0767ms 25.0795ms 39.8732 Ops/s 39.4073 Ops/s $\color{#35bf28}+1.18\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}18$. Worsened: $\large\color{#d91a1a}35$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1503ms 17.9730μs 55.6391 KOps/s 61.5800 KOps/s $\textbf{\color{#d91a1a}-9.65\%}$
test_plain_set_stack_nested 50.7710μs 18.1741μs 55.0235 KOps/s 61.6911 KOps/s $\textbf{\color{#d91a1a}-10.81\%}$
test_plain_set_nested_inplace 51.7110μs 19.1757μs 52.1494 KOps/s 58.2780 KOps/s $\textbf{\color{#d91a1a}-10.52\%}$
test_plain_set_stack_nested_inplace 48.5810μs 19.2340μs 51.9912 KOps/s 58.6649 KOps/s $\textbf{\color{#d91a1a}-11.38\%}$
test_items 25.7000μs 2.8350μs 352.7375 KOps/s 349.1627 KOps/s $\color{#35bf28}+1.02\%$
test_items_nested 0.3929ms 0.3407ms 2.9353 KOps/s 2.9309 KOps/s $\color{#35bf28}+0.15\%$
test_items_nested_locked 0.3999ms 0.3445ms 2.9027 KOps/s 2.8787 KOps/s $\color{#35bf28}+0.83\%$
test_items_nested_leaf 87.0420μs 62.6636μs 15.9582 KOps/s 15.9637 KOps/s $\color{#d91a1a}-0.03\%$
test_items_stack_nested 0.4156ms 0.3389ms 2.9511 KOps/s 2.8973 KOps/s $\color{#35bf28}+1.86\%$
test_items_stack_nested_leaf 93.3010μs 62.6544μs 15.9606 KOps/s 15.4650 KOps/s $\color{#35bf28}+3.20\%$
test_items_stack_nested_locked 0.4238ms 0.3403ms 2.9388 KOps/s 2.8825 KOps/s $\color{#35bf28}+1.95\%$
test_keys 21.5200μs 3.4410μs 290.6096 KOps/s 293.8050 KOps/s $\color{#d91a1a}-1.09\%$
test_keys_nested 0.1077ms 71.1116μs 14.0624 KOps/s 13.9654 KOps/s $\color{#35bf28}+0.69\%$
test_keys_nested_locked 2.4578ms 78.0112μs 12.8187 KOps/s 12.9883 KOps/s $\color{#d91a1a}-1.31\%$
test_keys_nested_leaf 90.2520μs 60.7159μs 16.4702 KOps/s 16.3866 KOps/s $\color{#35bf28}+0.51\%$
test_keys_stack_nested 0.1208ms 70.6964μs 14.1450 KOps/s 13.8014 KOps/s $\color{#35bf28}+2.49\%$
test_keys_stack_nested_leaf 0.1186ms 61.9138μs 16.1515 KOps/s 15.6434 KOps/s $\color{#35bf28}+3.25\%$
test_keys_stack_nested_locked 0.1200ms 76.9206μs 13.0004 KOps/s 12.7953 KOps/s $\color{#35bf28}+1.60\%$
test_values 5.0367μs 0.8685μs 1.1514 MOps/s 1.1946 MOps/s $\color{#d91a1a}-3.62\%$
test_values_nested 0.1263ms 48.5087μs 20.6149 KOps/s 20.5141 KOps/s $\color{#35bf28}+0.49\%$
test_values_nested_locked 81.9810μs 50.1025μs 19.9591 KOps/s 19.8218 KOps/s $\color{#35bf28}+0.69\%$
test_values_nested_leaf 72.9020μs 42.5334μs 23.5109 KOps/s 23.3050 KOps/s $\color{#35bf28}+0.88\%$
test_values_stack_nested 90.1420μs 48.7247μs 20.5235 KOps/s 19.8782 KOps/s $\color{#35bf28}+3.25\%$
test_values_stack_nested_leaf 78.6210μs 43.0052μs 23.2530 KOps/s 22.7037 KOps/s $\color{#35bf28}+2.42\%$
test_values_stack_nested_locked 81.6210μs 50.5712μs 19.7741 KOps/s 19.1108 KOps/s $\color{#35bf28}+3.47\%$
test_membership 1.6206μs 0.4998μs 2.0006 MOps/s 2.0093 MOps/s $\color{#d91a1a}-0.44\%$
test_membership_nested 13.1450μs 1.9515μs 512.4166 KOps/s 527.5728 KOps/s $\color{#d91a1a}-2.87\%$
test_membership_nested_leaf 9.8737μs 1.9149μs 522.2121 KOps/s 550.2021 KOps/s $\textbf{\color{#d91a1a}-5.09\%}$
test_membership_stacked_nested 45.0810μs 2.0087μs 497.8368 KOps/s 525.6902 KOps/s $\textbf{\color{#d91a1a}-5.30\%}$
test_membership_stacked_nested_leaf 39.6210μs 1.9805μs 504.9264 KOps/s 518.4865 KOps/s $\color{#d91a1a}-2.62\%$
test_membership_nested_last 34.9810μs 2.9814μs 335.4181 KOps/s 340.9250 KOps/s $\color{#d91a1a}-1.62\%$
test_membership_nested_leaf_last 30.4410μs 2.9919μs 334.2413 KOps/s 341.5589 KOps/s $\color{#d91a1a}-2.14\%$
test_membership_stacked_nested_last 28.9110μs 2.9736μs 336.2962 KOps/s 235.9208 KOps/s $\textbf{\color{#35bf28}+42.55\%}$
test_membership_stacked_nested_leaf_last 31.1110μs 2.9730μs 336.3598 KOps/s 236.5179 KOps/s $\textbf{\color{#35bf28}+42.21\%}$
test_nested_getleaf 45.0910μs 5.9825μs 167.1531 KOps/s 163.1880 KOps/s $\color{#35bf28}+2.43\%$
test_nested_get 35.4710μs 5.7467μs 174.0137 KOps/s 173.1708 KOps/s $\color{#35bf28}+0.49\%$
test_stacked_getleaf 42.5710μs 6.0166μs 166.2059 KOps/s 167.1587 KOps/s $\color{#d91a1a}-0.57\%$
test_stacked_get 26.9110μs 5.5903μs 178.8823 KOps/s 178.3491 KOps/s $\color{#35bf28}+0.30\%$
test_nested_getitemleaf 32.2000μs 6.1691μs 162.0993 KOps/s 164.4445 KOps/s $\color{#d91a1a}-1.43\%$
test_nested_getitem 48.1410μs 5.7465μs 174.0187 KOps/s 172.9873 KOps/s $\color{#35bf28}+0.60\%$
test_stacked_getitemleaf 33.9410μs 6.0532μs 165.2029 KOps/s 165.5392 KOps/s $\color{#d91a1a}-0.20\%$
test_stacked_getitem 27.2800μs 5.6456μs 177.1286 KOps/s 176.8532 KOps/s $\color{#35bf28}+0.16\%$
test_lock_nested 7.1255ms 0.4314ms 2.3179 KOps/s 2.2962 KOps/s $\color{#35bf28}+0.95\%$
test_lock_stack_nested 0.4317ms 0.3835ms 2.6077 KOps/s 2.5832 KOps/s $\color{#35bf28}+0.95\%$
test_unlock_nested 0.7640ms 0.3609ms 2.7708 KOps/s 2.7478 KOps/s $\color{#35bf28}+0.84\%$
test_unlock_stack_nested 0.4040ms 0.3222ms 3.1038 KOps/s 3.0860 KOps/s $\color{#35bf28}+0.57\%$
test_flatten_speed 0.1514ms 75.2803μs 13.2837 KOps/s 12.9481 KOps/s $\color{#35bf28}+2.59\%$
test_unflatten_speed 0.3540ms 0.3216ms 3.1091 KOps/s 3.0469 KOps/s $\color{#35bf28}+2.04\%$
test_common_ops 1.6218ms 1.3128ms 761.7215 Ops/s 804.1384 Ops/s $\textbf{\color{#d91a1a}-5.27\%}$
test_creation 29.1500μs 1.4989μs 667.1702 KOps/s 664.8815 KOps/s $\color{#35bf28}+0.34\%$
test_creation_empty 46.1410μs 17.8349μs 56.0698 KOps/s 70.8179 KOps/s $\textbf{\color{#d91a1a}-20.83\%}$
test_creation_nested_1 47.6110μs 19.6395μs 50.9177 KOps/s 62.9274 KOps/s $\textbf{\color{#d91a1a}-19.09\%}$
test_creation_nested_2 50.0410μs 22.3628μs 44.7172 KOps/s 54.1883 KOps/s $\textbf{\color{#d91a1a}-17.48\%}$
test_clone 63.8910μs 29.4594μs 33.9450 KOps/s 34.5259 KOps/s $\color{#d91a1a}-1.68\%$
test_getitem[int] 1.2227ms 15.6282μs 63.9868 KOps/s 62.2209 KOps/s $\color{#35bf28}+2.84\%$
test_getitem[slice_int] 0.1181ms 26.7648μs 37.3625 KOps/s 35.8375 KOps/s $\color{#35bf28}+4.26\%$
test_getitem[range] 0.2330ms 0.1115ms 8.9650 KOps/s 8.9873 KOps/s $\color{#d91a1a}-0.25\%$
test_getitem[tuple] 0.1166ms 23.3335μs 42.8569 KOps/s 42.0709 KOps/s $\color{#35bf28}+1.87\%$
test_getitem[list] 0.2068ms 0.1004ms 9.9600 KOps/s 9.8066 KOps/s $\color{#35bf28}+1.56\%$
test_setitem_dim[int] 83.6020μs 44.7253μs 22.3587 KOps/s 22.1697 KOps/s $\color{#35bf28}+0.85\%$
test_setitem_dim[slice_int] 0.1098ms 68.0095μs 14.7038 KOps/s 14.7162 KOps/s $\color{#d91a1a}-0.08\%$
test_setitem_dim[range] 0.1546ms 0.1276ms 7.8383 KOps/s 7.7748 KOps/s $\color{#35bf28}+0.82\%$
test_setitem_dim[tuple] 0.1364ms 60.7807μs 16.4526 KOps/s 16.3835 KOps/s $\color{#35bf28}+0.42\%$
test_setitem 91.4810μs 44.5262μs 22.4587 KOps/s 23.8263 KOps/s $\textbf{\color{#d91a1a}-5.74\%}$
test_set 83.0220μs 44.9289μs 22.2574 KOps/s 24.6449 KOps/s $\textbf{\color{#d91a1a}-9.69\%}$
test_set_shared 0.3802ms 55.4540μs 18.0330 KOps/s 18.1326 KOps/s $\color{#d91a1a}-0.55\%$
test_update 0.1058ms 57.0298μs 17.5347 KOps/s 20.3211 KOps/s $\textbf{\color{#d91a1a}-13.71\%}$
test_update_nested 99.9220μs 61.9540μs 16.1410 KOps/s 17.5221 KOps/s $\textbf{\color{#d91a1a}-7.88\%}$
test_update__nested 0.4325ms 68.5714μs 14.5833 KOps/s 15.9964 KOps/s $\textbf{\color{#d91a1a}-8.83\%}$
test_set_nested 92.7920μs 47.6485μs 20.9870 KOps/s 22.6379 KOps/s $\textbf{\color{#d91a1a}-7.29\%}$
test_set_nested_new 87.6510μs 51.7895μs 19.3089 KOps/s 21.2643 KOps/s $\textbf{\color{#d91a1a}-9.20\%}$
test_select 0.1249ms 65.4435μs 15.2804 KOps/s 16.6550 KOps/s $\textbf{\color{#d91a1a}-8.25\%}$
test_select_nested 78.9010μs 42.0437μs 23.7848 KOps/s 23.6810 KOps/s $\color{#35bf28}+0.44\%$
test_exclude_nested 0.1020ms 58.0958μs 17.2129 KOps/s 17.2424 KOps/s $\color{#d91a1a}-0.17\%$
test_empty[True] 0.3087ms 0.2542ms 3.9332 KOps/s 3.8073 KOps/s $\color{#35bf28}+3.31\%$
test_empty[False] 3.4971μs 0.7421μs 1.3475 MOps/s 1.3529 MOps/s $\color{#d91a1a}-0.41\%$
test_to 59.8210μs 26.8469μs 37.2483 KOps/s 37.0475 KOps/s $\color{#35bf28}+0.54\%$
test_to_nonblocking 66.8010μs 26.0020μs 38.4586 KOps/s 38.6888 KOps/s $\color{#d91a1a}-0.60\%$
test_unbind_speed 0.3307ms 0.2772ms 3.6077 KOps/s 3.6417 KOps/s $\color{#d91a1a}-0.93\%$
test_unbind_speed_stack0 0.3669ms 0.2705ms 3.6973 KOps/s 3.6113 KOps/s $\color{#35bf28}+2.38\%$
test_unbind_speed_stack1 0.6829ms 0.6426ms 1.5562 KOps/s 1.4274 KOps/s $\textbf{\color{#35bf28}+9.02\%}$
test_split 94.8100ms 2.2246ms 449.5108 Ops/s 457.0846 Ops/s $\color{#d91a1a}-1.66\%$
test_chunk 96.6826ms 2.2352ms 447.3870 Ops/s 455.9984 Ops/s $\color{#d91a1a}-1.89\%$
test_to[False] 3.5964ms 3.4031ms 293.8494 Ops/s 293.5581 Ops/s $\color{#35bf28}+0.10\%$
test_to[True] 4.8552ms 4.4681ms 223.8106 Ops/s 217.4972 Ops/s $\color{#35bf28}+2.90\%$
test_to_njt[False] 0.3322s 0.2514s 3.9780 Ops/s 4.3347 Ops/s $\textbf{\color{#d91a1a}-8.23\%}$
test_to_njt[True] 0.2630s 0.2610s 3.8311 Ops/s 3.5243 Ops/s $\textbf{\color{#35bf28}+8.71\%}$
test_creation[device0] 0.3530ms 0.1298ms 7.7043 KOps/s 7.4928 KOps/s $\color{#35bf28}+2.82\%$
test_creation_from_tensor 0.3904ms 0.1312ms 7.6229 KOps/s 7.3466 KOps/s $\color{#35bf28}+3.76\%$
test_add_one[memmap_tensor0] 0.2366ms 8.6173μs 116.0452 KOps/s 111.2739 KOps/s $\color{#35bf28}+4.29\%$
test_contiguous[memmap_tensor0] 31.8300μs 2.2361μs 447.2083 KOps/s 457.1815 KOps/s $\color{#d91a1a}-2.18\%$
test_stack[memmap_tensor0] 37.9110μs 6.6870μs 149.5431 KOps/s 147.8420 KOps/s $\color{#35bf28}+1.15\%$
test_memmaptd_index 1.0548ms 0.4264ms 2.3454 KOps/s 2.2691 KOps/s $\color{#35bf28}+3.36\%$
test_memmaptd_index_astensor 0.7675ms 0.5033ms 1.9869 KOps/s 1.9516 KOps/s $\color{#35bf28}+1.81\%$
test_memmaptd_index_op 1.4734ms 1.0764ms 929.0421 Ops/s 972.8078 Ops/s $\color{#d91a1a}-4.50\%$
test_serialize_model 0.1319s 0.1306s 7.6561 Ops/s 7.6202 Ops/s $\color{#35bf28}+0.47\%$
test_serialize_model_pickle 1.3610s 1.2213s 0.8188 Ops/s 0.8220 Ops/s $\color{#d91a1a}-0.40\%$
test_serialize_weights 0.1320s 0.1300s 7.6943 Ops/s 7.6875 Ops/s $\color{#35bf28}+0.09\%$
test_serialize_weights_returnearly 0.2042s 56.2776ms 17.7691 Ops/s 16.0888 Ops/s $\textbf{\color{#35bf28}+10.44\%}$
test_serialize_weights_pickle 1.3746s 1.1933s 0.8380 Ops/s 0.8340 Ops/s $\color{#35bf28}+0.48\%$
test_reshape_pytree 73.6420μs 35.6223μs 28.0723 KOps/s 28.0052 KOps/s $\color{#35bf28}+0.24\%$
test_reshape_td 78.0520μs 42.2236μs 23.6835 KOps/s 23.7203 KOps/s $\color{#d91a1a}-0.16\%$
test_view_pytree 73.6610μs 35.9018μs 27.8538 KOps/s 27.4767 KOps/s $\color{#35bf28}+1.37\%$
test_view_td 96.3420μs 47.6631μs 20.9806 KOps/s 20.8894 KOps/s $\color{#35bf28}+0.44\%$
test_unbind_pytree 67.9110μs 33.9423μs 29.4617 KOps/s 28.5650 KOps/s $\color{#35bf28}+3.14\%$
test_unbind_td 0.5239ms 42.1137μs 23.7452 KOps/s 23.8812 KOps/s $\color{#d91a1a}-0.57\%$
test_split_pytree 96.7620μs 45.9305μs 21.7720 KOps/s 21.0517 KOps/s $\color{#35bf28}+3.42\%$
test_split_td 94.8413ms 65.8107μs 15.1951 KOps/s 17.8202 KOps/s $\textbf{\color{#d91a1a}-14.73\%}$
test_add_pytree 0.1342ms 57.1197μs 17.5071 KOps/s 16.4167 KOps/s $\textbf{\color{#35bf28}+6.64\%}$
test_add_td 0.1551ms 0.1013ms 9.8735 KOps/s 9.9645 KOps/s $\color{#d91a1a}-0.91\%$
test_compile_add_one_nested[tensordict-compile] 0.2638ms 0.1620ms 6.1716 KOps/s 6.0723 KOps/s $\color{#35bf28}+1.63\%$
test_compile_add_one_nested[tensordict-eager] 0.2920ms 0.1616ms 6.1898 KOps/s 6.0716 KOps/s $\color{#35bf28}+1.95\%$
test_compile_add_one_nested[pytree-compile] 0.2061ms 0.1560ms 6.4099 KOps/s 6.2666 KOps/s $\color{#35bf28}+2.29\%$
test_compile_add_one_nested[pytree-eager] 0.2636ms 0.1872ms 5.3433 KOps/s 5.3787 KOps/s $\color{#d91a1a}-0.66\%$
test_compile_copy_nested[tensordict-compile] 59.8710μs 21.5123μs 46.4851 KOps/s 45.8410 KOps/s $\color{#35bf28}+1.41\%$
test_compile_copy_nested[tensordict-eager] 87.6420μs 48.8276μs 20.4802 KOps/s 20.6202 KOps/s $\color{#d91a1a}-0.68\%$
test_compile_copy_nested[pytree-compile] 0.3294ms 65.3202μs 15.3092 KOps/s 15.4693 KOps/s $\color{#d91a1a}-1.04\%$
test_compile_copy_nested[pytree-eager] 83.6410μs 49.6979μs 20.1216 KOps/s 20.3732 KOps/s $\color{#d91a1a}-1.24\%$
test_compile_add_one_flat[tensordict-compile] 0.4356ms 0.3207ms 3.1183 KOps/s 3.0987 KOps/s $\color{#35bf28}+0.63\%$
test_compile_add_one_flat[tensordict-eager] 0.3608ms 0.2317ms 4.3165 KOps/s 4.2979 KOps/s $\color{#35bf28}+0.43\%$
test_compile_add_one_flat[tensorclass-compile] 0.1841ms 0.1291ms 7.7449 KOps/s 7.7822 KOps/s $\color{#d91a1a}-0.48\%$
test_compile_add_one_flat[tensorclass-eager] 0.1349ms 65.5465μs 15.2564 KOps/s 15.2674 KOps/s $\color{#d91a1a}-0.07\%$
test_compile_add_one_flat[pytree-compile] 0.3990ms 0.3282ms 3.0469 KOps/s 3.0334 KOps/s $\color{#35bf28}+0.45\%$
test_compile_add_one_flat[pytree-eager] 0.7090ms 0.6273ms 1.5940 KOps/s 1.5069 KOps/s $\textbf{\color{#35bf28}+5.78\%}$
test_compile_add_self_flat[tensordict-eager] 0.4196ms 0.2817ms 3.5501 KOps/s 3.5284 KOps/s $\color{#35bf28}+0.62\%$
test_compile_add_self_flat[tensordict-compile] 0.3716ms 0.3214ms 3.1111 KOps/s 3.0686 KOps/s $\color{#35bf28}+1.39\%$
test_compile_add_self_flat[tensorclass-eager] 0.1665ms 77.4216μs 12.9163 KOps/s 12.7291 KOps/s $\color{#35bf28}+1.47\%$
test_compile_add_self_flat[tensorclass-compile] 0.1695ms 0.1300ms 7.6926 KOps/s 7.5632 KOps/s $\color{#35bf28}+1.71\%$
test_compile_add_self_flat[pytree-eager] 0.9967ms 0.5642ms 1.7724 KOps/s 1.8247 KOps/s $\color{#d91a1a}-2.86\%$
test_compile_add_self_flat[pytree-compile] 0.3849ms 0.3292ms 3.0377 KOps/s 2.9928 KOps/s $\color{#35bf28}+1.50\%$
test_compile_copy_flat[tensordict-compile] 0.4243ms 21.2225μs 47.1199 KOps/s 46.7952 KOps/s $\color{#35bf28}+0.69\%$
test_compile_copy_flat[tensordict-eager] 0.4281ms 37.9860μs 26.3255 KOps/s 25.9782 KOps/s $\color{#35bf28}+1.34\%$
test_compile_copy_flat[pytree-compile] 0.4708ms 69.3514μs 14.4193 KOps/s 14.2829 KOps/s $\color{#35bf28}+0.96\%$
test_compile_copy_flat[pytree-eager] 0.1117ms 50.8071μs 19.6823 KOps/s 19.5096 KOps/s $\color{#35bf28}+0.88\%$
test_compile_assign_and_add[tensordict-compile] 2.3825ms 0.8279ms 1.2079 KOps/s 1.1144 KOps/s $\textbf{\color{#35bf28}+8.38\%}$
test_compile_assign_and_add[tensordict-eager] 3.4788ms 3.2313ms 309.4747 Ops/s 306.9858 Ops/s $\color{#35bf28}+0.81\%$
test_compile_assign_and_add[pytree-compile] 2.3846ms 0.8380ms 1.1933 KOps/s 1.0925 KOps/s $\textbf{\color{#35bf28}+9.23\%}$
test_compile_assign_and_add[pytree-eager] 3.5196ms 3.3348ms 299.8697 Ops/s 311.8177 Ops/s $\color{#d91a1a}-3.83\%$
test_compile_indexing[tensor-tensordict-compile] 0.2203ms 0.1234ms 8.1046 KOps/s 8.3798 KOps/s $\color{#d91a1a}-3.28\%$
test_compile_indexing[tensor-tensordict-eager] 0.1923ms 65.3728μs 15.2969 KOps/s 16.2748 KOps/s $\textbf{\color{#d91a1a}-6.01\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.1675ms 0.1196ms 8.3618 KOps/s 8.8648 KOps/s $\textbf{\color{#d91a1a}-5.67\%}$
test_compile_indexing[tensor-tensorclass-eager] 99.3720μs 47.3983μs 21.0978 KOps/s 22.5881 KOps/s $\textbf{\color{#d91a1a}-6.60\%}$
test_compile_indexing[tensor-pytree-compile] 0.1900ms 0.1216ms 8.2210 KOps/s 8.8688 KOps/s $\textbf{\color{#d91a1a}-7.30\%}$
test_compile_indexing[tensor-pytree-eager] 94.7220μs 47.4147μs 21.0905 KOps/s 22.4876 KOps/s $\textbf{\color{#d91a1a}-6.21\%}$
test_compile_indexing[slice-tensordict-compile] 0.2156ms 0.1495ms 6.6871 KOps/s 6.7660 KOps/s $\color{#d91a1a}-1.17\%$
test_compile_indexing[slice-tensordict-eager] 0.1594ms 27.2017μs 36.7624 KOps/s 39.5702 KOps/s $\textbf{\color{#d91a1a}-7.10\%}$
test_compile_indexing[slice-tensorclass-compile] 0.1797ms 0.1446ms 6.9149 KOps/s 7.0971 KOps/s $\color{#d91a1a}-2.57\%$
test_compile_indexing[slice-tensorclass-eager] 65.1810μs 21.1757μs 47.2239 KOps/s 47.0445 KOps/s $\color{#35bf28}+0.38\%$
test_compile_indexing[slice-pytree-compile] 0.2259ms 0.1413ms 7.0757 KOps/s 6.9874 KOps/s $\color{#35bf28}+1.26\%$
test_compile_indexing[slice-pytree-eager] 61.7110μs 21.0427μs 47.5223 KOps/s 47.8123 KOps/s $\color{#d91a1a}-0.61\%$
test_compile_indexing[int-tensordict-compile] 0.2743ms 0.1471ms 6.7996 KOps/s 6.7020 KOps/s $\color{#35bf28}+1.46\%$
test_compile_indexing[int-tensordict-eager] 0.4629ms 24.9826μs 40.0279 KOps/s 39.5086 KOps/s $\color{#35bf28}+1.31\%$
test_compile_indexing[int-tensorclass-compile] 0.1919ms 0.1413ms 7.0753 KOps/s 7.0316 KOps/s $\color{#35bf28}+0.62\%$
test_compile_indexing[int-tensorclass-eager] 49.6010μs 20.8354μs 47.9952 KOps/s 46.8114 KOps/s $\color{#35bf28}+2.53\%$
test_compile_indexing[int-pytree-compile] 0.1985ms 0.1440ms 6.9463 KOps/s 7.0636 KOps/s $\color{#d91a1a}-1.66\%$
test_compile_indexing[int-pytree-eager] 55.1610μs 22.1290μs 45.1895 KOps/s 47.9773 KOps/s $\textbf{\color{#d91a1a}-5.81\%}$
test_mod_add[eager] 76.4410μs 33.7914μs 29.5933 KOps/s 31.1158 KOps/s $\color{#d91a1a}-4.89\%$
test_mod_add[compile] 0.1289ms 81.5806μs 12.2578 KOps/s 11.5810 KOps/s $\textbf{\color{#35bf28}+5.84\%}$
test_mod_add[compile-overhead] 0.3074ms 0.1523ms 6.5681 KOps/s 6.0220 KOps/s $\textbf{\color{#35bf28}+9.07\%}$
test_mod_wrap[eager] 0.3203ms 0.2424ms 4.1259 KOps/s 3.9971 KOps/s $\color{#35bf28}+3.22\%$
test_mod_wrap[compile] 1.4699ms 0.3004ms 3.3288 KOps/s 3.1402 KOps/s $\textbf{\color{#35bf28}+6.00\%}$
test_mod_wrap[compile-overhead] 7.7688ms 4.0620ms 246.1865 Ops/s 384.8827 Ops/s $\textbf{\color{#d91a1a}-36.04\%}$
test_mod_wrap_and_backward[eager] 1.5783ms 1.3489ms 741.3519 Ops/s 690.9530 Ops/s $\textbf{\color{#35bf28}+7.29\%}$
test_mod_wrap_and_backward[compile] 1.5718ms 1.3306ms 751.5591 Ops/s 680.1060 Ops/s $\textbf{\color{#35bf28}+10.51\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3288ms 0.9015ms 1.1093 KOps/s 975.7998 Ops/s $\textbf{\color{#35bf28}+13.68\%}$
test_seq_add[eager] 0.1629ms 0.1032ms 9.6882 KOps/s 9.6296 KOps/s $\color{#35bf28}+0.61\%$
test_seq_add[compile] 0.1683ms 91.3815μs 10.9431 KOps/s 10.2588 KOps/s $\textbf{\color{#35bf28}+6.67\%}$
test_seq_add[compile-overhead] 0.1658ms 0.1234ms 8.1039 KOps/s 7.6423 KOps/s $\textbf{\color{#35bf28}+6.04\%}$
test_seq_wrap[eager] 0.8451ms 0.3870ms 2.5843 KOps/s 2.5343 KOps/s $\color{#35bf28}+1.97\%$
test_seq_wrap[compile] 0.7219ms 0.3172ms 3.1528 KOps/s 3.0265 KOps/s $\color{#35bf28}+4.17\%$
test_seq_wrap[compile-overhead] 0.2676ms 0.2218ms 4.5084 KOps/s 4.4972 KOps/s $\color{#35bf28}+0.25\%$
test_func_call_runtime[False-eager] 1.1501ms 0.7231ms 1.3829 KOps/s 1.3532 KOps/s $\color{#35bf28}+2.20\%$
test_func_call_runtime[False-compile] 1.3385ms 0.7943ms 1.2590 KOps/s 1.2019 KOps/s $\color{#35bf28}+4.75\%$
test_func_call_runtime[False-compile-overhead] 0.4133ms 0.3613ms 2.7680 KOps/s 2.7729 KOps/s $\color{#d91a1a}-0.18\%$
test_func_call_runtime[True-eager] 0.9581ms 0.8798ms 1.1366 KOps/s 1.1125 KOps/s $\color{#35bf28}+2.16\%$
test_func_call_runtime[True-compile] 0.9944ms 0.8686ms 1.1512 KOps/s 1.1674 KOps/s $\color{#d91a1a}-1.39\%$
test_func_call_runtime[True-compile-overhead] 0.4442ms 0.3848ms 2.5989 KOps/s 2.6040 KOps/s $\color{#d91a1a}-0.20\%$
test_func_call_cm_runtime[False-eager] 0.8539ms 0.7455ms 1.3413 KOps/s 1.3521 KOps/s $\color{#d91a1a}-0.80\%$
test_func_call_cm_runtime[False-compile] 0.9233ms 0.8516ms 1.1742 KOps/s 1.2152 KOps/s $\color{#d91a1a}-3.37\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4640ms 0.3799ms 2.6323 KOps/s 2.7360 KOps/s $\color{#d91a1a}-3.79\%$
test_func_call_cm_runtime[True-eager] 1.3295ms 1.0342ms 966.9527 Ops/s 979.6419 Ops/s $\color{#d91a1a}-1.30\%$
test_func_call_cm_runtime[True-compile] 1.1014ms 0.9085ms 1.1007 KOps/s 1.1512 KOps/s $\color{#d91a1a}-4.39\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4855ms 0.4222ms 2.3685 KOps/s 2.4439 KOps/s $\color{#d91a1a}-3.08\%$
test_vmap_func_call_cm_runtime[eager] 2.8561ms 2.1358ms 468.2194 Ops/s 478.2570 Ops/s $\color{#d91a1a}-2.10\%$
test_vmap_func_call_cm_runtime[compile] 1.0906ms 0.9049ms 1.1051 KOps/s 1.1337 KOps/s $\color{#d91a1a}-2.52\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5016ms 0.4275ms 2.3389 KOps/s 2.4220 KOps/s $\color{#d91a1a}-3.43\%$
test_distributed 0.4825ms 0.1223ms 8.1751 KOps/s 8.8515 KOps/s $\textbf{\color{#d91a1a}-7.64\%}$
test_tdmodule 0.2196ms 16.5304μs 60.4945 KOps/s 70.1039 KOps/s $\textbf{\color{#d91a1a}-13.71\%}$
test_tdmodule_dispatch 52.7910μs 32.8039μs 30.4842 KOps/s 37.6960 KOps/s $\textbf{\color{#d91a1a}-19.13\%}$
test_tdseq 40.3010μs 17.1312μs 58.3730 KOps/s 68.2982 KOps/s $\textbf{\color{#d91a1a}-14.53\%}$
test_tdseq_dispatch 57.4910μs 34.7553μs 28.7726 KOps/s 34.1234 KOps/s $\textbf{\color{#d91a1a}-15.68\%}$
test_instantiation_functorch 2.0847ms 1.8752ms 533.2657 Ops/s 536.1290 Ops/s $\color{#d91a1a}-0.53\%$
test_exec_functorch 0.2625ms 0.2221ms 4.5028 KOps/s 4.6913 KOps/s $\color{#d91a1a}-4.02\%$
test_exec_functional_call 0.3192ms 0.2219ms 4.5057 KOps/s 4.7180 KOps/s $\color{#d91a1a}-4.50\%$
test_exec_td_decorator 0.4530ms 0.2751ms 3.6352 KOps/s 3.7848 KOps/s $\color{#d91a1a}-3.95\%$
test_vmap_mlp_speed_decorator[True-True] 0.8394ms 0.6913ms 1.4466 KOps/s 1.4841 KOps/s $\color{#d91a1a}-2.52\%$
test_vmap_mlp_speed_decorator[True-False] 0.8087ms 0.6968ms 1.4352 KOps/s 1.4842 KOps/s $\color{#d91a1a}-3.30\%$
test_vmap_mlp_speed_decorator[False-True] 0.7453ms 0.6173ms 1.6200 KOps/s 1.6781 KOps/s $\color{#d91a1a}-3.46\%$
test_vmap_mlp_speed_decorator[False-False] 0.7435ms 0.6165ms 1.6221 KOps/s 1.6739 KOps/s $\color{#d91a1a}-3.09\%$
test_vmap_transformer_speed_decorator[True-True] 20.3185ms 19.7209ms 50.7077 Ops/s 51.2775 Ops/s $\color{#d91a1a}-1.11\%$
test_vmap_transformer_speed_decorator[True-False] 20.2912ms 19.6751ms 50.8256 Ops/s 51.2696 Ops/s $\color{#d91a1a}-0.87\%$
test_vmap_transformer_speed_decorator[False-True] 20.2235ms 19.4514ms 51.4102 Ops/s 51.6801 Ops/s $\color{#d91a1a}-0.52\%$
test_vmap_transformer_speed_decorator[False-False] 20.2015ms 19.4112ms 51.5167 Ops/s 51.6791 Ops/s $\color{#d91a1a}-0.31\%$
test_to_module_speed[True] 1.5168ms 1.0007ms 999.2538 Ops/s 1.0137 KOps/s $\color{#d91a1a}-1.43\%$
test_to_module_speed[False] 1.4072ms 0.9705ms 1.0304 KOps/s 1.0370 KOps/s $\color{#d91a1a}-0.64\%$
test_tc_init 69.1610μs 38.7713μs 25.7923 KOps/s 30.0098 KOps/s $\textbf{\color{#d91a1a}-14.05\%}$
test_tc_init_nested 0.1536ms 75.9759μs 13.1621 KOps/s 14.5654 KOps/s $\textbf{\color{#d91a1a}-9.63\%}$
test_tc_first_layer_tensor 12.6060μs 0.6775μs 1.4760 MOps/s 1.4587 MOps/s $\color{#35bf28}+1.19\%$
test_tc_first_layer_nontensor 25.0000μs 2.2146μs 451.5505 KOps/s 445.9372 KOps/s $\color{#35bf28}+1.26\%$
test_tc_second_layer_tensor 6.7675μs 1.3665μs 731.7952 KOps/s 726.1677 KOps/s $\color{#35bf28}+0.77\%$
test_tc_second_layer_nontensor 22.3910μs 2.9128μs 343.3101 KOps/s 341.3912 KOps/s $\color{#35bf28}+0.56\%$
test_unbind 0.1911s 9.5214ms 105.0270 Ops/s 91.6947 Ops/s $\textbf{\color{#35bf28}+14.54\%}$
test_full_like 0.6595ms 0.5733ms 1.7443 KOps/s 1.7480 KOps/s $\color{#d91a1a}-0.21\%$
test_zeros_like 0.2689ms 0.1979ms 5.0522 KOps/s 5.0476 KOps/s $\color{#35bf28}+0.09\%$
test_ones_like 0.2550ms 0.1978ms 5.0565 KOps/s 5.0526 KOps/s $\color{#35bf28}+0.08\%$
test_clone 0.4431ms 0.4145ms 2.4124 KOps/s 2.4098 KOps/s $\color{#35bf28}+0.11\%$
test_squeeze 35.3910μs 9.7714μs 102.3392 KOps/s 101.4433 KOps/s $\color{#35bf28}+0.88\%$
test_unsqueeze 0.2150ms 74.0003μs 13.5135 KOps/s 13.3054 KOps/s $\color{#35bf28}+1.56\%$
test_split 0.4239ms 0.1616ms 6.1867 KOps/s 6.4215 KOps/s $\color{#d91a1a}-3.66\%$
test_permute 0.2338ms 0.1803ms 5.5454 KOps/s 5.6917 KOps/s $\color{#d91a1a}-2.57\%$
test_stack 1.2512ms 0.8517ms 1.1742 KOps/s 1.1775 KOps/s $\color{#d91a1a}-0.28\%$
test_cat 1.2446ms 1.2312ms 812.1919 Ops/s 811.9848 Ops/s $\color{#35bf28}+0.03\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants