Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster clone #1040

Open
wants to merge 5 commits into
base: gh/vmoens/29/base
Choose a base branch
from
Open

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Oct 14, 2024

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 14, 2024
ghstack-source-id: 6eecbac5e946a4d93d3d6e148e8c18aaa2501b00
Pull Request resolved: #1040
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 14, 2024
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 14, 2024
ghstack-source-id: 3fec3b6ac36b07dee77ecf1189f79b6d620532e1
Pull Request resolved: #1040
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 14, 2024
ghstack-source-id: 71833ef65890ab7c068dca2e1ed2fa5363c488ad
Pull Request resolved: #1040
Copy link

github-actions bot commented Oct 14, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}22$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 69.2890μs 25.7869μs 38.7793 KOps/s 38.1941 KOps/s $\color{#35bf28}+1.53\%$
test_plain_set_stack_nested 0.1289ms 26.0733μs 38.3534 KOps/s 38.4622 KOps/s $\color{#d91a1a}-0.28\%$
test_plain_set_nested_inplace 68.5880μs 28.1292μs 35.5502 KOps/s 34.8843 KOps/s $\color{#35bf28}+1.91\%$
test_plain_set_stack_nested_inplace 71.5940μs 28.1707μs 35.4979 KOps/s 35.4986 KOps/s $-0.00\%$
test_items 28.9840μs 4.1711μs 239.7425 KOps/s 238.0274 KOps/s $\color{#35bf28}+0.72\%$
test_items_nested 0.4657ms 0.3849ms 2.5978 KOps/s 2.6160 KOps/s $\color{#d91a1a}-0.69\%$
test_items_nested_locked 0.4689ms 0.3848ms 2.5984 KOps/s 2.6169 KOps/s $\color{#d91a1a}-0.71\%$
test_items_nested_leaf 0.1420ms 80.5680μs 12.4119 KOps/s 12.1863 KOps/s $\color{#35bf28}+1.85\%$
test_items_stack_nested 0.7067ms 0.3953ms 2.5300 KOps/s 2.5880 KOps/s $\color{#d91a1a}-2.24\%$
test_items_stack_nested_leaf 0.2818ms 85.4766μs 11.6991 KOps/s 11.8848 KOps/s $\color{#d91a1a}-1.56\%$
test_items_stack_nested_locked 0.6304ms 0.3856ms 2.5933 KOps/s 2.5832 KOps/s $\color{#35bf28}+0.39\%$
test_keys 19.8570μs 3.5769μs 279.5696 KOps/s 291.5474 KOps/s $\color{#d91a1a}-4.11\%$
test_keys_nested 0.2197ms 0.1321ms 7.5706 KOps/s 7.4204 KOps/s $\color{#35bf28}+2.02\%$
test_keys_nested_locked 0.7327ms 0.1373ms 7.2809 KOps/s 7.1266 KOps/s $\color{#35bf28}+2.17\%$
test_keys_nested_leaf 0.2174ms 0.1160ms 8.6193 KOps/s 8.5042 KOps/s $\color{#35bf28}+1.35\%$
test_keys_stack_nested 0.2245ms 0.1310ms 7.6328 KOps/s 7.6033 KOps/s $\color{#35bf28}+0.39\%$
test_keys_stack_nested_leaf 0.3499ms 0.1145ms 8.7316 KOps/s 8.6760 KOps/s $\color{#35bf28}+0.64\%$
test_keys_stack_nested_locked 0.2644ms 0.1362ms 7.3407 KOps/s 7.2505 KOps/s $\color{#35bf28}+1.24\%$
test_values 10.7280μs 1.0517μs 950.8277 KOps/s 954.4168 KOps/s $\color{#d91a1a}-0.38\%$
test_values_nested 0.1690ms 93.7003μs 10.6723 KOps/s 10.5723 KOps/s $\color{#35bf28}+0.95\%$
test_values_nested_locked 0.1532ms 93.3841μs 10.7085 KOps/s 10.4487 KOps/s $\color{#35bf28}+2.49\%$
test_values_nested_leaf 0.1484ms 77.2129μs 12.9512 KOps/s 12.5887 KOps/s $\color{#35bf28}+2.88\%$
test_values_stack_nested 0.3070ms 94.2557μs 10.6094 KOps/s 10.3395 KOps/s $\color{#35bf28}+2.61\%$
test_values_stack_nested_leaf 0.1719ms 77.8732μs 12.8414 KOps/s 12.6447 KOps/s $\color{#35bf28}+1.56\%$
test_values_stack_nested_locked 0.1680ms 94.2609μs 10.6089 KOps/s 10.4893 KOps/s $\color{#35bf28}+1.14\%$
test_membership 6.2717μs 0.7165μs 1.3958 MOps/s 1.4127 MOps/s $\color{#d91a1a}-1.20\%$
test_membership_nested 41.2570μs 2.7172μs 368.0244 KOps/s 361.0812 KOps/s $\color{#35bf28}+1.92\%$
test_membership_nested_leaf 98.2630μs 2.7446μs 364.3501 KOps/s 358.2951 KOps/s $\color{#35bf28}+1.69\%$
test_membership_stacked_nested 30.9580μs 2.6842μs 372.5510 KOps/s 366.1976 KOps/s $\color{#35bf28}+1.73\%$
test_membership_stacked_nested_leaf 43.7920μs 2.7060μs 369.5501 KOps/s 357.7445 KOps/s $\color{#35bf28}+3.30\%$
test_membership_nested_last 35.0050μs 4.1628μs 240.2215 KOps/s 239.2007 KOps/s $\color{#35bf28}+0.43\%$
test_membership_nested_leaf_last 48.1900μs 4.1683μs 239.9056 KOps/s 238.1114 KOps/s $\color{#35bf28}+0.75\%$
test_membership_stacked_nested_last 31.4690μs 5.0167μs 199.3334 KOps/s 76.2548 KOps/s $\textbf{\color{#35bf28}+161.40\%}$
test_membership_stacked_nested_leaf_last 45.8460μs 5.0605μs 197.6078 KOps/s 75.8753 KOps/s $\textbf{\color{#35bf28}+160.44\%}$
test_nested_getleaf 61.1240μs 10.9011μs 91.7342 KOps/s 93.7342 KOps/s $\color{#d91a1a}-2.13\%$
test_nested_get 54.8830μs 10.5153μs 95.0999 KOps/s 98.1153 KOps/s $\color{#d91a1a}-3.07\%$
test_stacked_getleaf 38.1410μs 10.8283μs 92.3506 KOps/s 93.8886 KOps/s $\color{#d91a1a}-1.64\%$
test_stacked_get 35.7870μs 10.4399μs 95.7860 KOps/s 101.4237 KOps/s $\textbf{\color{#d91a1a}-5.56\%}$
test_nested_getitemleaf 42.0390μs 11.4129μs 87.6202 KOps/s 88.8781 KOps/s $\color{#d91a1a}-1.42\%$
test_nested_getitem 42.7500μs 10.6744μs 93.6817 KOps/s 95.3144 KOps/s $\color{#d91a1a}-1.71\%$
test_stacked_getitemleaf 37.3290μs 11.2793μs 88.6584 KOps/s 89.9029 KOps/s $\color{#d91a1a}-1.38\%$
test_stacked_getitem 52.3780μs 10.6005μs 94.3355 KOps/s 96.3238 KOps/s $\color{#d91a1a}-2.06\%$
test_lock_nested 88.8736ms 0.5960ms 1.6778 KOps/s 1.9946 KOps/s $\textbf{\color{#d91a1a}-15.88\%}$
test_lock_stack_nested 0.6997ms 0.4659ms 2.1463 KOps/s 2.2014 KOps/s $\color{#d91a1a}-2.51\%$
test_unlock_nested 0.1094s 0.5357ms 1.8666 KOps/s 2.3629 KOps/s $\textbf{\color{#d91a1a}-21.00\%}$
test_unlock_stack_nested 0.6426ms 0.3800ms 2.6316 KOps/s 2.6914 KOps/s $\color{#d91a1a}-2.22\%$
test_flatten_speed 0.2159ms 0.1008ms 9.9226 KOps/s 9.8008 KOps/s $\color{#35bf28}+1.24\%$
test_unflatten_speed 0.8974ms 0.5260ms 1.9011 KOps/s 1.8961 KOps/s $\color{#35bf28}+0.27\%$
test_common_ops 4.5001ms 1.1707ms 854.2225 Ops/s 822.3672 Ops/s $\color{#35bf28}+3.87\%$
test_creation 35.7860μs 2.2469μs 445.0538 KOps/s 467.4938 KOps/s $\color{#d91a1a}-4.80\%$
test_creation_empty 83.8170μs 20.7442μs 48.2062 KOps/s 48.4069 KOps/s $\color{#d91a1a}-0.41\%$
test_creation_nested_1 67.4370μs 23.6103μs 42.3544 KOps/s 40.9332 KOps/s $\color{#35bf28}+3.47\%$
test_creation_nested_2 81.9930μs 27.7048μs 36.0948 KOps/s 34.5873 KOps/s $\color{#35bf28}+4.36\%$
test_clone 59.4710μs 16.9931μs 58.8474 KOps/s 56.4298 KOps/s $\color{#35bf28}+4.28\%$
test_getitem[int] 0.9929ms 16.6607μs 60.0215 KOps/s 59.2432 KOps/s $\color{#35bf28}+1.31\%$
test_getitem[slice_int] 0.1655ms 30.3908μs 32.9047 KOps/s 31.2526 KOps/s $\textbf{\color{#35bf28}+5.29\%}$
test_getitem[range] 0.6282ms 57.6739μs 17.3389 KOps/s 17.2275 KOps/s $\color{#35bf28}+0.65\%$
test_getitem[tuple] 0.1495ms 24.8673μs 40.2134 KOps/s 39.8748 KOps/s $\color{#35bf28}+0.85\%$
test_getitem[list] 0.6607ms 52.6619μs 18.9891 KOps/s 18.9487 KOps/s $\color{#35bf28}+0.21\%$
test_setitem_dim[int] 72.0350μs 32.6555μs 30.6227 KOps/s 30.6579 KOps/s $\color{#d91a1a}-0.11\%$
test_setitem_dim[slice_int] 0.1131ms 60.0821μs 16.6439 KOps/s 16.4877 KOps/s $\color{#35bf28}+0.95\%$
test_setitem_dim[range] 0.1579ms 83.3302μs 12.0005 KOps/s 11.7073 KOps/s $\color{#35bf28}+2.50\%$
test_setitem_dim[tuple] 0.1020ms 48.4645μs 20.6337 KOps/s 20.4205 KOps/s $\color{#35bf28}+1.04\%$
test_setitem 0.2896ms 30.6741μs 32.6008 KOps/s 31.0880 KOps/s $\color{#35bf28}+4.87\%$
test_set 90.1580μs 30.0167μs 33.3147 KOps/s 31.1699 KOps/s $\textbf{\color{#35bf28}+6.88\%}$
test_set_shared 3.2885ms 0.2183ms 4.5807 KOps/s 4.5827 KOps/s $\color{#d91a1a}-0.04\%$
test_update 0.1517ms 39.5963μs 25.2549 KOps/s 24.0129 KOps/s $\textbf{\color{#35bf28}+5.17\%}$
test_update_nested 0.1871ms 50.5647μs 19.7766 KOps/s 18.9301 KOps/s $\color{#35bf28}+4.47\%$
test_update__nested 0.7191ms 45.5917μs 21.9338 KOps/s 22.3579 KOps/s $\color{#d91a1a}-1.90\%$
test_set_nested 0.3237ms 33.3098μs 30.0212 KOps/s 29.4487 KOps/s $\color{#35bf28}+1.94\%$
test_set_nested_new 0.3274ms 38.5260μs 25.9565 KOps/s 25.1328 KOps/s $\color{#35bf28}+3.28\%$
test_select 0.1219ms 57.0016μs 17.5434 KOps/s 17.0384 KOps/s $\color{#35bf28}+2.96\%$
test_select_nested 0.1147ms 59.8925μs 16.6966 KOps/s 16.4894 KOps/s $\color{#35bf28}+1.26\%$
test_exclude_nested 0.6498ms 74.8501μs 13.3600 KOps/s 13.0447 KOps/s $\color{#35bf28}+2.42\%$
test_empty[True] 0.5408ms 0.3504ms 2.8542 KOps/s 2.8434 KOps/s $\color{#35bf28}+0.38\%$
test_empty[False] 18.4375μs 1.2484μs 801.0289 KOps/s 827.1624 KOps/s $\color{#d91a1a}-3.16\%$
test_unbind_speed 0.3839ms 0.2970ms 3.3675 KOps/s 3.3754 KOps/s $\color{#d91a1a}-0.23\%$
test_unbind_speed_stack0 0.4519ms 0.2898ms 3.4504 KOps/s 3.4838 KOps/s $\color{#d91a1a}-0.96\%$
test_unbind_speed_stack1 0.1125s 0.9303ms 1.0749 KOps/s 1.3644 KOps/s $\textbf{\color{#d91a1a}-21.22\%}$
test_split 3.2813ms 2.0000ms 500.0072 Ops/s 448.0534 Ops/s $\textbf{\color{#35bf28}+11.60\%}$
test_chunk 0.1025s 2.2056ms 453.3918 Ops/s 445.4885 Ops/s $\color{#35bf28}+1.77\%$
test_creation[device0] 0.2040ms 0.1151ms 8.6885 KOps/s 8.5011 KOps/s $\color{#35bf28}+2.20\%$
test_creation_from_tensor 3.2607ms 0.1164ms 8.5903 KOps/s 8.4540 KOps/s $\color{#35bf28}+1.61\%$
test_add_one[memmap_tensor0] 0.4741ms 7.0735μs 141.3727 KOps/s 129.4769 KOps/s $\textbf{\color{#35bf28}+9.19\%}$
test_contiguous[memmap_tensor0] 24.0540μs 1.8974μs 527.0373 KOps/s 518.9374 KOps/s $\color{#35bf28}+1.56\%$
test_stack[memmap_tensor0] 83.9170μs 5.6418μs 177.2483 KOps/s 174.6839 KOps/s $\color{#35bf28}+1.47\%$
test_memmaptd_index 1.2059ms 0.4135ms 2.4182 KOps/s 2.4245 KOps/s $\color{#d91a1a}-0.26\%$
test_memmaptd_index_astensor 0.9889ms 0.5085ms 1.9664 KOps/s 1.9360 KOps/s $\color{#35bf28}+1.57\%$
test_memmaptd_index_op 1.7647ms 1.0622ms 941.4631 Ops/s 920.8502 Ops/s $\color{#35bf28}+2.24\%$
test_serialize_model 0.1351s 0.1248s 8.0135 Ops/s 8.5637 Ops/s $\textbf{\color{#d91a1a}-6.42\%}$
test_serialize_model_pickle 0.4324s 0.3929s 2.5450 Ops/s 2.5104 Ops/s $\color{#35bf28}+1.38\%$
test_serialize_weights 0.1318s 0.1205s 8.2978 Ops/s 8.0192 Ops/s $\color{#35bf28}+3.47\%$
test_serialize_weights_returnearly 0.2567s 0.1750s 5.7146 Ops/s 6.1465 Ops/s $\textbf{\color{#d91a1a}-7.03\%}$
test_serialize_weights_pickle 0.4554s 0.4055s 2.4661 Ops/s 2.4289 Ops/s $\color{#35bf28}+1.53\%$
test_serialize_weights_filesystem 0.1538s 0.1416s 7.0634 Ops/s 7.1450 Ops/s $\color{#d91a1a}-1.14\%$
test_serialize_model_filesystem 0.1581s 0.1510s 6.6225 Ops/s 6.5821 Ops/s $\color{#35bf28}+0.61\%$
test_reshape_pytree 83.8070μs 38.0874μs 26.2554 KOps/s 25.7606 KOps/s $\color{#35bf28}+1.92\%$
test_reshape_td 91.0600μs 45.1273μs 22.1595 KOps/s 21.0053 KOps/s $\textbf{\color{#35bf28}+5.50\%}$
test_view_pytree 0.1006ms 38.3607μs 26.0683 KOps/s 25.9838 KOps/s $\color{#35bf28}+0.33\%$
test_view_td 0.1060ms 51.8040μs 19.3035 KOps/s 19.0150 KOps/s $\color{#35bf28}+1.52\%$
test_unbind_pytree 75.7920μs 35.4909μs 28.1762 KOps/s 28.0567 KOps/s $\color{#35bf28}+0.43\%$
test_unbind_td 0.3668ms 45.1200μs 22.1631 KOps/s 22.4875 KOps/s $\color{#d91a1a}-1.44\%$
test_split_pytree 85.5400μs 37.2904μs 26.8166 KOps/s 26.6019 KOps/s $\color{#35bf28}+0.81\%$
test_split_td 0.5266ms 57.5794μs 17.3673 KOps/s 17.3779 KOps/s $\color{#d91a1a}-0.06\%$
test_add_pytree 96.0190μs 43.6047μs 22.9333 KOps/s 22.2177 KOps/s $\color{#35bf28}+3.22\%$
test_add_td 0.1695ms 85.8646μs 11.6462 KOps/s 10.8455 KOps/s $\textbf{\color{#35bf28}+7.38\%}$
test_compile_add_one_nested[tensordict-compile] 0.1290ms 58.6062μs 17.0630 KOps/s 17.1953 KOps/s $\color{#d91a1a}-0.77\%$
test_compile_add_one_nested[tensordict-eager] 0.3960ms 0.1980ms 5.0516 KOps/s 5.0462 KOps/s $\color{#35bf28}+0.11\%$
test_compile_add_one_nested[pytree-compile] 0.1286ms 57.1068μs 17.5110 KOps/s 17.6727 KOps/s $\color{#d91a1a}-0.92\%$
test_compile_add_one_nested[pytree-eager] 0.3144ms 0.1379ms 7.2503 KOps/s 7.1254 KOps/s $\color{#35bf28}+1.75\%$
test_compile_copy_nested[tensordict-compile] 71.3940μs 23.2920μs 42.9332 KOps/s 44.2836 KOps/s $\color{#d91a1a}-3.05\%$
test_compile_copy_nested[tensordict-eager] 0.1478ms 74.6866μs 13.3893 KOps/s 13.0550 KOps/s $\color{#35bf28}+2.56\%$
test_compile_copy_nested[pytree-compile] 0.1349ms 75.2887μs 13.2822 KOps/s 13.0446 KOps/s $\color{#35bf28}+1.82\%$
test_compile_copy_nested[pytree-eager] 0.1358ms 68.3377μs 14.6332 KOps/s 14.0820 KOps/s $\color{#35bf28}+3.91\%$
test_compile_add_one_flat[tensordict-compile] 0.3044ms 0.1818ms 5.4993 KOps/s 5.4429 KOps/s $\color{#35bf28}+1.04\%$
test_compile_add_one_flat[tensordict-eager] 1.4672ms 0.2422ms 4.1287 KOps/s 4.0823 KOps/s $\color{#35bf28}+1.14\%$
test_compile_add_one_flat[tensorclass-compile] 0.1117ms 49.3149μs 20.2779 KOps/s 20.8496 KOps/s $\color{#d91a1a}-2.74\%$
test_compile_add_one_flat[tensorclass-eager] 0.4650ms 79.6544μs 12.5542 KOps/s 11.8601 KOps/s $\textbf{\color{#35bf28}+5.85\%}$
test_compile_add_one_flat[pytree-compile] 0.2799ms 0.1744ms 5.7324 KOps/s 5.6395 KOps/s $\color{#35bf28}+1.65\%$
test_compile_add_one_flat[pytree-eager] 0.5176ms 0.2821ms 3.5449 KOps/s 3.4597 KOps/s $\color{#35bf28}+2.46\%$
test_compile_add_self_flat[tensordict-eager] 0.4822ms 0.2753ms 3.6324 KOps/s 3.6002 KOps/s $\color{#35bf28}+0.89\%$
test_compile_add_self_flat[tensordict-compile] 0.5089ms 0.1872ms 5.3427 KOps/s 5.4754 KOps/s $\color{#d91a1a}-2.42\%$
test_compile_add_self_flat[tensorclass-eager] 0.2020ms 75.3398μs 13.2732 KOps/s 13.1895 KOps/s $\color{#35bf28}+0.63\%$
test_compile_add_self_flat[tensorclass-compile] 0.1388ms 48.5738μs 20.5872 KOps/s 20.6688 KOps/s $\color{#d91a1a}-0.39\%$
test_compile_add_self_flat[pytree-eager] 0.3808ms 0.2274ms 4.3983 KOps/s 4.3105 KOps/s $\color{#35bf28}+2.04\%$
test_compile_add_self_flat[pytree-compile] 0.3794ms 0.1715ms 5.8296 KOps/s 5.6198 KOps/s $\color{#35bf28}+3.73\%$
test_compile_copy_flat[tensordict-compile] 0.2446ms 0.1092ms 9.1608 KOps/s 9.0070 KOps/s $\color{#35bf28}+1.71\%$
test_compile_copy_flat[tensordict-eager] 0.1738ms 78.5355μs 12.7331 KOps/s 12.4012 KOps/s $\color{#35bf28}+2.68\%$
test_compile_copy_flat[pytree-compile] 0.1378ms 76.0635μs 13.1469 KOps/s 12.8520 KOps/s $\color{#35bf28}+2.29\%$
test_compile_copy_flat[pytree-eager] 0.1681ms 66.8078μs 14.9683 KOps/s 14.1643 KOps/s $\textbf{\color{#35bf28}+5.68\%}$
test_compile_assign_and_add[tensordict-compile] 0.2714ms 0.1927ms 5.1905 KOps/s 5.2148 KOps/s $\color{#d91a1a}-0.47\%$
test_compile_assign_and_add[tensordict-eager] 2.8873ms 1.6751ms 596.9635 Ops/s 571.9744 Ops/s $\color{#35bf28}+4.37\%$
test_compile_assign_and_add[pytree-compile] 0.2892ms 0.1916ms 5.2182 KOps/s 5.1871 KOps/s $\color{#35bf28}+0.60\%$
test_compile_assign_and_add[pytree-eager] 1.3372ms 1.0770ms 928.4779 Ops/s 904.9282 Ops/s $\color{#35bf28}+2.60\%$
test_compile_assign_and_add_stack[compile] 0.5232ms 0.4143ms 2.4136 KOps/s 2.2727 KOps/s $\textbf{\color{#35bf28}+6.20\%}$
test_compile_assign_and_add_stack[eager] 5.1599ms 4.1068ms 243.4977 Ops/s 234.1584 Ops/s $\color{#35bf28}+3.99\%$
test_compile_indexing[tensor-tensordict-compile] 78.6870μs 33.5657μs 29.7923 KOps/s 28.8566 KOps/s $\color{#35bf28}+3.24\%$
test_compile_indexing[tensor-tensordict-eager] 0.7245ms 48.2340μs 20.7323 KOps/s 20.2371 KOps/s $\color{#35bf28}+2.45\%$
test_compile_indexing[tensor-tensorclass-compile] 84.8280μs 29.2525μs 34.1851 KOps/s 32.9953 KOps/s $\color{#35bf28}+3.61\%$
test_compile_indexing[tensor-tensorclass-eager] 83.6360μs 27.9640μs 35.7603 KOps/s 33.6872 KOps/s $\textbf{\color{#35bf28}+6.15\%}$
test_compile_indexing[tensor-pytree-compile] 87.9840μs 29.3508μs 34.0706 KOps/s 32.9735 KOps/s $\color{#35bf28}+3.33\%$
test_compile_indexing[tensor-pytree-eager] 0.1415ms 28.0931μs 35.5959 KOps/s 34.1323 KOps/s $\color{#35bf28}+4.29\%$
test_compile_indexing[slice-tensordict-compile] 0.1400ms 71.7819μs 13.9311 KOps/s 13.5626 KOps/s $\color{#35bf28}+2.72\%$
test_compile_indexing[slice-tensordict-eager] 0.5580ms 27.1962μs 36.7699 KOps/s 36.1797 KOps/s $\color{#35bf28}+1.63\%$
test_compile_indexing[slice-tensorclass-compile] 0.1233ms 67.9277μs 14.7215 KOps/s 14.6971 KOps/s $\color{#35bf28}+0.17\%$
test_compile_indexing[slice-tensorclass-eager] 66.0930μs 23.1179μs 43.2566 KOps/s 43.1661 KOps/s $\color{#35bf28}+0.21\%$
test_compile_indexing[slice-pytree-compile] 0.1355ms 68.2881μs 14.6438 KOps/s 14.5204 KOps/s $\color{#35bf28}+0.85\%$
test_compile_indexing[slice-pytree-eager] 0.1816ms 23.1755μs 43.1491 KOps/s 42.9198 KOps/s $\color{#35bf28}+0.53\%$
test_compile_indexing[int-tensordict-compile] 0.1372ms 71.9380μs 13.9009 KOps/s 13.7729 KOps/s $\color{#35bf28}+0.93\%$
test_compile_indexing[int-tensordict-eager] 0.7859ms 27.0446μs 36.9759 KOps/s 36.7677 KOps/s $\color{#35bf28}+0.57\%$
test_compile_indexing[int-tensorclass-compile] 0.1920ms 67.9597μs 14.7146 KOps/s 14.6084 KOps/s $\color{#35bf28}+0.73\%$
test_compile_indexing[int-tensorclass-eager] 92.5050μs 22.5085μs 44.4276 KOps/s 43.6118 KOps/s $\color{#35bf28}+1.87\%$
test_compile_indexing[int-pytree-compile] 0.1443ms 67.3176μs 14.8550 KOps/s 14.6330 KOps/s $\color{#35bf28}+1.52\%$
test_compile_indexing[int-pytree-eager] 56.3350μs 22.6959μs 44.0609 KOps/s 43.1405 KOps/s $\color{#35bf28}+2.13\%$
test_mod_add[eager] 70.6820μs 26.7297μs 37.4116 KOps/s 35.9577 KOps/s $\color{#35bf28}+4.04\%$
test_mod_add[compile] 85.1690μs 37.6413μs 26.5665 KOps/s 26.0622 KOps/s $\color{#35bf28}+1.94\%$
test_mod_add[compile-overhead] 96.7810μs 37.6755μs 26.5425 KOps/s 26.6547 KOps/s $\color{#d91a1a}-0.42\%$
test_mod_wrap[eager] 0.3179ms 0.2081ms 4.8062 KOps/s 4.7678 KOps/s $\color{#35bf28}+0.80\%$
test_mod_wrap[compile] 0.3563ms 0.2306ms 4.3368 KOps/s 4.3110 KOps/s $\color{#35bf28}+0.60\%$
test_mod_wrap[compile-overhead] 0.3827ms 0.2296ms 4.3552 KOps/s 4.2691 KOps/s $\color{#35bf28}+2.02\%$
test_mod_wrap_and_backward[eager] 18.3251ms 11.6997ms 85.4722 Ops/s 82.4218 Ops/s $\color{#35bf28}+3.70\%$
test_mod_wrap_and_backward[compile] 13.7414ms 11.6520ms 85.8219 Ops/s 84.7389 Ops/s $\color{#35bf28}+1.28\%$
test_mod_wrap_and_backward[compile-overhead] 13.2989ms 11.8545ms 84.3560 Ops/s 75.0334 Ops/s $\textbf{\color{#35bf28}+12.42\%}$
test_seq_add[eager] 0.2197ms 94.0548μs 10.6321 KOps/s 10.1531 KOps/s $\color{#35bf28}+4.72\%$
test_seq_add[compile] 0.1195ms 63.8330μs 15.6659 KOps/s 15.6793 KOps/s $\color{#d91a1a}-0.09\%$
test_seq_add[compile-overhead] 0.1209ms 63.5418μs 15.7377 KOps/s 15.7536 KOps/s $\color{#d91a1a}-0.10\%$
test_seq_wrap[eager] 0.7301ms 0.3915ms 2.5541 KOps/s 2.4466 KOps/s $\color{#35bf28}+4.40\%$
test_seq_wrap[compile] 0.4909ms 0.2710ms 3.6894 KOps/s 3.6267 KOps/s $\color{#35bf28}+1.73\%$
test_seq_wrap[compile-overhead] 0.3662ms 0.2676ms 3.7369 KOps/s 3.5771 KOps/s $\color{#35bf28}+4.47\%$
test_func_call_runtime[False-eager] 0.6723ms 0.5175ms 1.9322 KOps/s 1.8526 KOps/s $\color{#35bf28}+4.30\%$
test_func_call_runtime[False-compile] 0.6339ms 0.4960ms 2.0159 KOps/s 1.9879 KOps/s $\color{#35bf28}+1.41\%$
test_func_call_runtime[False-compile-overhead] 1.0460ms 0.4986ms 2.0057 KOps/s 1.9705 KOps/s $\color{#35bf28}+1.78\%$
test_func_call_runtime[True-eager] 0.9430ms 0.7296ms 1.3705 KOps/s 1.3189 KOps/s $\color{#35bf28}+3.91\%$
test_func_call_runtime[True-compile] 0.6001ms 0.5039ms 1.9846 KOps/s 1.9034 KOps/s $\color{#35bf28}+4.27\%$
test_func_call_runtime[True-compile-overhead] 0.7738ms 0.5063ms 1.9752 KOps/s 1.9179 KOps/s $\color{#35bf28}+2.99\%$
test_func_call_cm_runtime[False-eager] 0.7254ms 0.5149ms 1.9422 KOps/s 1.9017 KOps/s $\color{#35bf28}+2.13\%$
test_func_call_cm_runtime[False-compile] 0.9302ms 0.4958ms 2.0169 KOps/s 1.9779 KOps/s $\color{#35bf28}+1.97\%$
test_func_call_cm_runtime[False-compile-overhead] 0.9188ms 0.4955ms 2.0182 KOps/s 1.9753 KOps/s $\color{#35bf28}+2.17\%$
test_func_call_cm_runtime[True-eager] 1.2164ms 0.8741ms 1.1441 KOps/s 1.0963 KOps/s $\color{#35bf28}+4.35\%$
test_func_call_cm_runtime[True-compile] 1.1599ms 0.7291ms 1.3715 KOps/s 1.3197 KOps/s $\color{#35bf28}+3.93\%$
test_func_call_cm_runtime[True-compile-overhead] 1.1813ms 0.7312ms 1.3677 KOps/s 1.3255 KOps/s $\color{#35bf28}+3.18\%$
test_vmap_func_call_cm_runtime[eager] 2.3936ms 1.8723ms 534.0915 Ops/s 512.1015 Ops/s $\color{#35bf28}+4.29\%$
test_vmap_func_call_cm_runtime[compile] 3.4336ms 1.9577ms 510.7938 Ops/s 499.6178 Ops/s $\color{#35bf28}+2.24\%$
test_vmap_func_call_cm_runtime[compile-overhead] 3.3404ms 1.9524ms 512.1930 Ops/s 492.1656 Ops/s $\color{#35bf28}+4.07\%$
test_distributed 0.3029ms 0.1266ms 7.9017 KOps/s 7.5913 KOps/s $\color{#35bf28}+4.09\%$
test_tdmodule 98.5340μs 19.2525μs 51.9414 KOps/s 48.6060 KOps/s $\textbf{\color{#35bf28}+6.86\%}$
test_tdmodule_dispatch 65.8230μs 38.4040μs 26.0389 KOps/s 25.1138 KOps/s $\color{#35bf28}+3.68\%$
test_tdseq 49.0020μs 22.3377μs 44.7673 KOps/s 43.3337 KOps/s $\color{#35bf28}+3.31\%$
test_tdseq_dispatch 74.6900μs 44.3183μs 22.5640 KOps/s 21.9582 KOps/s $\color{#35bf28}+2.76\%$
test_instantiation_functorch 2.1361ms 1.5350ms 651.4693 Ops/s 582.6844 Ops/s $\textbf{\color{#35bf28}+11.80\%}$
test_exec_functorch 0.3295ms 0.1824ms 5.4823 KOps/s 5.3086 KOps/s $\color{#35bf28}+3.27\%$
test_exec_functional_call 0.3120ms 0.1716ms 5.8288 KOps/s 5.8008 KOps/s $\color{#35bf28}+0.48\%$
test_exec_td_decorator 0.4782ms 0.2328ms 4.2962 KOps/s 4.2328 KOps/s $\color{#35bf28}+1.50\%$
test_vmap_mlp_speed_decorator[True-True] 1.0297ms 0.6635ms 1.5070 KOps/s 1.4996 KOps/s $\color{#35bf28}+0.49\%$
test_vmap_mlp_speed_decorator[True-False] 0.8457ms 0.6333ms 1.5791 KOps/s 1.5058 KOps/s $\color{#35bf28}+4.87\%$
test_vmap_mlp_speed_decorator[False-True] 0.8092ms 0.5194ms 1.9253 KOps/s 1.8408 KOps/s $\color{#35bf28}+4.59\%$
test_vmap_mlp_speed_decorator[False-False] 0.7022ms 0.5158ms 1.9389 KOps/s 1.7222 KOps/s $\textbf{\color{#35bf28}+12.59\%}$
test_to_module_speed[True] 2.2988ms 1.4231ms 702.6701 Ops/s 685.8409 Ops/s $\color{#35bf28}+2.45\%$
test_to_module_speed[False] 1.7128ms 1.3816ms 723.8200 Ops/s 714.2990 Ops/s $\color{#35bf28}+1.33\%$
test_tc_init 0.1249ms 47.4474μs 21.0760 KOps/s 20.4833 KOps/s $\color{#35bf28}+2.89\%$
test_tc_init_nested 0.2054ms 94.8297μs 10.5452 KOps/s 10.2663 KOps/s $\color{#35bf28}+2.72\%$
test_tc_first_layer_tensor 24.0050μs 1.6030μs 623.8207 KOps/s 647.5391 KOps/s $\color{#d91a1a}-3.66\%$
test_tc_first_layer_nontensor 24.2950μs 4.7150μs 212.0886 KOps/s 201.3532 KOps/s $\textbf{\color{#35bf28}+5.33\%}$
test_tc_second_layer_tensor 28.6940μs 2.9248μs 341.9068 KOps/s 340.9035 KOps/s $\color{#35bf28}+0.29\%$
test_tc_second_layer_nontensor 44.5550μs 5.9719μs 167.4504 KOps/s 162.6014 KOps/s $\color{#35bf28}+2.98\%$
test_unbind 0.5180s 13.6787ms 73.1064 Ops/s 132.9157 Ops/s $\textbf{\color{#d91a1a}-45.00\%}$
test_full_like 13.8388ms 8.6218ms 115.9851 Ops/s 74.1177 Ops/s $\textbf{\color{#35bf28}+56.49\%}$
test_zeros_like 4.8817ms 3.0767ms 325.0257 Ops/s 118.9574 Ops/s $\textbf{\color{#35bf28}+173.23\%}$
test_ones_like 12.6680ms 6.3519ms 157.4337 Ops/s 111.7366 Ops/s $\textbf{\color{#35bf28}+40.90\%}$
test_clone 16.2253ms 9.0088ms 111.0024 Ops/s 90.1455 Ops/s $\textbf{\color{#35bf28}+23.14\%}$
test_squeeze 74.0480μs 12.3087μs 81.2432 KOps/s 81.8361 KOps/s $\color{#d91a1a}-0.72\%$
test_unsqueeze 0.1681ms 91.5289μs 10.9255 KOps/s 11.1825 KOps/s $\color{#d91a1a}-2.30\%$
test_split 0.5531ms 0.1927ms 5.1889 KOps/s 5.2204 KOps/s $\color{#d91a1a}-0.60\%$
test_permute 0.4814ms 0.2144ms 4.6631 KOps/s 4.6142 KOps/s $\color{#35bf28}+1.06\%$
test_stack 35.1199ms 27.2849ms 36.6503 Ops/s 36.4029 Ops/s $\color{#35bf28}+0.68\%$
test_cat 31.0045ms 26.8414ms 37.2559 Ops/s 36.7647 Ops/s $\color{#35bf28}+1.34\%$

Copy link

github-actions bot commented Oct 14, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 218. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1481ms 18.0352μs 55.4471 KOps/s 56.0120 KOps/s $\color{#d91a1a}-1.01\%$
test_plain_set_stack_nested 38.5210μs 17.9399μs 55.7418 KOps/s 56.2071 KOps/s $\color{#d91a1a}-0.83\%$
test_plain_set_nested_inplace 47.0510μs 19.2175μs 52.0360 KOps/s 52.2696 KOps/s $\color{#d91a1a}-0.45\%$
test_plain_set_stack_nested_inplace 61.6610μs 19.0721μs 52.4326 KOps/s 52.5799 KOps/s $\color{#d91a1a}-0.28\%$
test_items 22.5700μs 2.8824μs 346.9361 KOps/s 345.2369 KOps/s $\color{#35bf28}+0.49\%$
test_items_nested 0.4969ms 0.3374ms 2.9643 KOps/s 2.9884 KOps/s $\color{#d91a1a}-0.81\%$
test_items_nested_locked 0.3675ms 0.3396ms 2.9445 KOps/s 2.9785 KOps/s $\color{#d91a1a}-1.14\%$
test_items_nested_leaf 85.9520μs 62.9998μs 15.8731 KOps/s 16.0185 KOps/s $\color{#d91a1a}-0.91\%$
test_items_stack_nested 0.3721ms 0.3411ms 2.9317 KOps/s 2.9422 KOps/s $\color{#d91a1a}-0.36\%$
test_items_stack_nested_leaf 86.6210μs 63.5325μs 15.7400 KOps/s 15.7040 KOps/s $\color{#35bf28}+0.23\%$
test_items_stack_nested_locked 0.4368ms 0.3415ms 2.9284 KOps/s 2.9613 KOps/s $\color{#d91a1a}-1.11\%$
test_keys 30.8000μs 3.4327μs 291.3153 KOps/s 289.5678 KOps/s $\color{#35bf28}+0.60\%$
test_keys_nested 0.1229ms 71.2072μs 14.0435 KOps/s 14.1398 KOps/s $\color{#d91a1a}-0.68\%$
test_keys_nested_locked 0.7876ms 76.5246μs 13.0677 KOps/s 13.0194 KOps/s $\color{#35bf28}+0.37\%$
test_keys_nested_leaf 97.2320μs 61.6266μs 16.2268 KOps/s 16.3216 KOps/s $\color{#d91a1a}-0.58\%$
test_keys_stack_nested 0.1111ms 71.1836μs 14.0482 KOps/s 14.1016 KOps/s $\color{#d91a1a}-0.38\%$
test_keys_stack_nested_leaf 95.1820μs 62.9837μs 15.8771 KOps/s 15.8853 KOps/s $\color{#d91a1a}-0.05\%$
test_keys_stack_nested_locked 0.1054ms 77.4681μs 12.9085 KOps/s 12.8721 KOps/s $\color{#35bf28}+0.28\%$
test_values 4.8050μs 0.8369μs 1.1949 MOps/s 1.1830 MOps/s $\color{#35bf28}+1.01\%$
test_values_nested 74.6510μs 48.4863μs 20.6244 KOps/s 20.5079 KOps/s $\color{#35bf28}+0.57\%$
test_values_nested_locked 74.0710μs 50.0881μs 19.9648 KOps/s 19.8581 KOps/s $\color{#35bf28}+0.54\%$
test_values_nested_leaf 67.9920μs 42.8792μs 23.3214 KOps/s 23.4114 KOps/s $\color{#d91a1a}-0.38\%$
test_values_stack_nested 85.1720μs 49.6950μs 20.1228 KOps/s 20.1938 KOps/s $\color{#d91a1a}-0.35\%$
test_values_stack_nested_leaf 71.4820μs 43.7713μs 22.8460 KOps/s 23.0268 KOps/s $\color{#d91a1a}-0.79\%$
test_values_stack_nested_locked 80.2810μs 51.8904μs 19.2714 KOps/s 19.5605 KOps/s $\color{#d91a1a}-1.48\%$
test_membership 1.5356μs 0.5066μs 1.9740 MOps/s 1.9716 MOps/s $\color{#35bf28}+0.12\%$
test_membership_nested 13.3500μs 1.8705μs 534.6106 KOps/s 548.0333 KOps/s $\color{#d91a1a}-2.45\%$
test_membership_nested_leaf 14.6405μs 1.8397μs 543.5701 KOps/s 557.6154 KOps/s $\color{#d91a1a}-2.52\%$
test_membership_stacked_nested 33.2710μs 1.8903μs 529.0027 KOps/s 534.2995 KOps/s $\color{#d91a1a}-0.99\%$
test_membership_stacked_nested_leaf 25.0100μs 1.9309μs 517.9049 KOps/s 534.7840 KOps/s $\color{#d91a1a}-3.16\%$
test_membership_nested_last 38.7310μs 2.9437μs 339.7118 KOps/s 343.5265 KOps/s $\color{#d91a1a}-1.11\%$
test_membership_nested_leaf_last 36.2610μs 2.9618μs 337.6271 KOps/s 344.6008 KOps/s $\color{#d91a1a}-2.02\%$
test_membership_stacked_nested_last 26.4400μs 2.9099μs 343.6511 KOps/s 346.2904 KOps/s $\color{#d91a1a}-0.76\%$
test_membership_stacked_nested_leaf_last 33.2110μs 2.9296μs 341.3441 KOps/s 344.7707 KOps/s $\color{#d91a1a}-0.99\%$
test_nested_getleaf 32.7800μs 6.1453μs 162.7272 KOps/s 165.5011 KOps/s $\color{#d91a1a}-1.68\%$
test_nested_get 34.0700μs 5.7354μs 174.3544 KOps/s 174.1047 KOps/s $\color{#35bf28}+0.14\%$
test_stacked_getleaf 32.9110μs 6.0763μs 164.5725 KOps/s 165.3511 KOps/s $\color{#d91a1a}-0.47\%$
test_stacked_get 35.0510μs 5.6410μs 177.2726 KOps/s 176.6379 KOps/s $\color{#35bf28}+0.36\%$
test_nested_getitemleaf 40.2210μs 6.1116μs 163.6242 KOps/s 164.4866 KOps/s $\color{#d91a1a}-0.52\%$
test_nested_getitem 32.1100μs 5.7612μs 173.5752 KOps/s 174.3752 KOps/s $\color{#d91a1a}-0.46\%$
test_stacked_getitemleaf 31.1600μs 6.0795μs 164.4863 KOps/s 164.1013 KOps/s $\color{#35bf28}+0.23\%$
test_stacked_getitem 35.9810μs 5.7058μs 175.2593 KOps/s 177.5636 KOps/s $\color{#d91a1a}-1.30\%$
test_lock_nested 7.6557ms 0.4264ms 2.3453 KOps/s 2.3772 KOps/s $\color{#d91a1a}-1.34\%$
test_lock_stack_nested 0.4219ms 0.3887ms 2.5726 KOps/s 2.5879 KOps/s $\color{#d91a1a}-0.59\%$
test_unlock_nested 0.7798ms 0.3600ms 2.7778 KOps/s 2.8043 KOps/s $\color{#d91a1a}-0.94\%$
test_unlock_stack_nested 0.3621ms 0.3261ms 3.0668 KOps/s 3.0869 KOps/s $\color{#d91a1a}-0.65\%$
test_flatten_speed 0.1564ms 77.7880μs 12.8555 KOps/s 13.0576 KOps/s $\color{#d91a1a}-1.55\%$
test_unflatten_speed 0.4111ms 0.3251ms 3.0762 KOps/s 3.1157 KOps/s $\color{#d91a1a}-1.27\%$
test_common_ops 1.6757ms 1.2923ms 773.8041 Ops/s 770.1550 Ops/s $\color{#35bf28}+0.47\%$
test_creation 29.0010μs 1.4484μs 690.4227 KOps/s 685.5652 KOps/s $\color{#35bf28}+0.71\%$
test_creation_empty 63.8210μs 17.8292μs 56.0879 KOps/s 55.7126 KOps/s $\color{#35bf28}+0.67\%$
test_creation_nested_1 55.2910μs 19.4460μs 51.4245 KOps/s 50.9733 KOps/s $\color{#35bf28}+0.89\%$
test_creation_nested_2 50.2110μs 22.3847μs 44.6733 KOps/s 43.9264 KOps/s $\color{#35bf28}+1.70\%$
test_clone 66.6510μs 28.7896μs 34.7348 KOps/s 34.8615 KOps/s $\color{#d91a1a}-0.36\%$
test_getitem[int] 1.3639ms 15.6425μs 63.9286 KOps/s 66.3906 KOps/s $\color{#d91a1a}-3.71\%$
test_getitem[slice_int] 0.1204ms 26.6492μs 37.5245 KOps/s 38.5328 KOps/s $\color{#d91a1a}-2.62\%$
test_getitem[range] 0.1583ms 0.1082ms 9.2463 KOps/s 9.3098 KOps/s $\color{#d91a1a}-0.68\%$
test_getitem[tuple] 0.1260ms 23.4704μs 42.6068 KOps/s 45.0243 KOps/s $\textbf{\color{#d91a1a}-5.37\%}$
test_getitem[list] 0.1940ms 98.6454μs 10.1373 KOps/s 10.2572 KOps/s $\color{#d91a1a}-1.17\%$
test_setitem_dim[int] 66.5020μs 43.7993μs 22.8314 KOps/s 22.5867 KOps/s $\color{#35bf28}+1.08\%$
test_setitem_dim[slice_int] 94.2420μs 66.2949μs 15.0841 KOps/s 15.0286 KOps/s $\color{#35bf28}+0.37\%$
test_setitem_dim[range] 0.1873ms 0.1265ms 7.9050 KOps/s 7.9499 KOps/s $\color{#d91a1a}-0.56\%$
test_setitem_dim[tuple] 90.5620μs 59.8409μs 16.7110 KOps/s 16.8128 KOps/s $\color{#d91a1a}-0.61\%$
test_setitem 74.2120μs 43.2232μs 23.1357 KOps/s 23.2663 KOps/s $\color{#d91a1a}-0.56\%$
test_set 81.1010μs 41.3635μs 24.1759 KOps/s 23.8252 KOps/s $\color{#35bf28}+1.47\%$
test_set_shared 0.3656ms 53.2160μs 18.7913 KOps/s 18.4497 KOps/s $\color{#35bf28}+1.85\%$
test_update 94.6120μs 52.3516μs 19.1016 KOps/s 19.2711 KOps/s $\color{#d91a1a}-0.88\%$
test_update_nested 94.5610μs 59.4605μs 16.8179 KOps/s 16.8241 KOps/s $\color{#d91a1a}-0.04\%$
test_update__nested 0.1957ms 63.5399μs 15.7381 KOps/s 14.8851 KOps/s $\textbf{\color{#35bf28}+5.73\%}$
test_set_nested 94.4020μs 45.9516μs 21.7621 KOps/s 22.2602 KOps/s $\color{#d91a1a}-2.24\%$
test_set_nested_new 92.0520μs 48.8504μs 20.4706 KOps/s 20.9657 KOps/s $\color{#d91a1a}-2.36\%$
test_select 97.9920μs 61.4597μs 16.2708 KOps/s 16.5933 KOps/s $\color{#d91a1a}-1.94\%$
test_select_nested 69.7210μs 41.5868μs 24.0461 KOps/s 23.9394 KOps/s $\color{#35bf28}+0.45\%$
test_exclude_nested 85.4520μs 57.4668μs 17.4014 KOps/s 17.2400 KOps/s $\color{#35bf28}+0.94\%$
test_empty[True] 0.2880ms 0.2612ms 3.8283 KOps/s 3.8948 KOps/s $\color{#d91a1a}-1.71\%$
test_empty[False] 6.6261μs 0.7383μs 1.3545 MOps/s 1.3479 MOps/s $\color{#35bf28}+0.49\%$
test_to 63.8810μs 26.3096μs 38.0090 KOps/s 38.7987 KOps/s $\color{#d91a1a}-2.04\%$
test_to_nonblocking 53.4410μs 24.9690μs 40.0496 KOps/s 41.0036 KOps/s $\color{#d91a1a}-2.33\%$
test_unbind_speed 1.5734ms 0.2717ms 3.6811 KOps/s 3.6459 KOps/s $\color{#35bf28}+0.97\%$
test_unbind_speed_stack0 0.3191ms 0.2743ms 3.6450 KOps/s 3.6568 KOps/s $\color{#d91a1a}-0.32\%$
test_unbind_speed_stack1 93.2680ms 0.7126ms 1.4032 KOps/s 1.3981 KOps/s $\color{#35bf28}+0.37\%$
test_split 95.0083ms 2.0875ms 479.0488 Ops/s 469.5180 Ops/s $\color{#35bf28}+2.03\%$
test_chunk 95.2293ms 2.1098ms 473.9778 Ops/s 468.0460 Ops/s $\color{#35bf28}+1.27\%$
test_creation[device0] 0.3066ms 0.1264ms 7.9087 KOps/s 7.9615 KOps/s $\color{#d91a1a}-0.66\%$
test_creation_from_tensor 0.3511ms 0.1276ms 7.8385 KOps/s 7.8619 KOps/s $\color{#d91a1a}-0.30\%$
test_add_one[memmap_tensor0] 0.1474ms 8.8321μs 113.2240 KOps/s 112.8768 KOps/s $\color{#35bf28}+0.31\%$
test_contiguous[memmap_tensor0] 29.6900μs 2.0792μs 480.9534 KOps/s 446.1647 KOps/s $\textbf{\color{#35bf28}+7.80\%}$
test_stack[memmap_tensor0] 30.9610μs 6.3821μs 156.6885 KOps/s 154.5334 KOps/s $\color{#35bf28}+1.39\%$
test_memmaptd_index 1.1309ms 0.4248ms 2.3543 KOps/s 2.4536 KOps/s $\color{#d91a1a}-4.05\%$
test_memmaptd_index_astensor 0.7840ms 0.4979ms 2.0084 KOps/s 2.0661 KOps/s $\color{#d91a1a}-2.79\%$
test_memmaptd_index_op 1.4813ms 1.0732ms 931.8274 Ops/s 945.6589 Ops/s $\color{#d91a1a}-1.46\%$
test_serialize_model 0.1305s 0.1297s 7.7129 Ops/s 7.6844 Ops/s $\color{#35bf28}+0.37\%$
test_serialize_model_pickle 1.3774s 1.2186s 0.8206 Ops/s 0.8230 Ops/s $\color{#d91a1a}-0.29\%$
test_serialize_weights 0.2232s 0.1425s 7.0177 Ops/s 6.9837 Ops/s $\color{#35bf28}+0.49\%$
test_serialize_weights_returnearly 0.2209s 55.9398ms 17.8764 Ops/s 17.7111 Ops/s $\color{#35bf28}+0.93\%$
test_serialize_weights_pickle 1.3898s 1.2213s 0.8188 Ops/s 0.8213 Ops/s $\color{#d91a1a}-0.30\%$
test_reshape_pytree 71.0610μs 34.9648μs 28.6002 KOps/s 29.0496 KOps/s $\color{#d91a1a}-1.55\%$
test_reshape_td 71.3610μs 40.5373μs 24.6686 KOps/s 24.5762 KOps/s $\color{#35bf28}+0.38\%$
test_view_pytree 63.5410μs 34.7931μs 28.7414 KOps/s 29.7911 KOps/s $\color{#d91a1a}-3.52\%$
test_view_td 84.0510μs 46.6653μs 21.4292 KOps/s 22.5641 KOps/s $\textbf{\color{#d91a1a}-5.03\%}$
test_unbind_pytree 59.5210μs 33.6475μs 29.7199 KOps/s 29.0042 KOps/s $\color{#35bf28}+2.47\%$
test_unbind_td 0.6643ms 42.6029μs 23.4726 KOps/s 22.9168 KOps/s $\color{#35bf28}+2.43\%$
test_split_pytree 75.4220μs 45.2372μs 22.1057 KOps/s 21.5993 KOps/s $\color{#35bf28}+2.34\%$
test_split_td 0.1748ms 53.7848μs 18.5926 KOps/s 17.6705 KOps/s $\textbf{\color{#35bf28}+5.22\%}$
test_add_pytree 87.0920μs 54.9620μs 18.1944 KOps/s 16.6733 KOps/s $\textbf{\color{#35bf28}+9.12\%}$
test_add_td 0.2310ms 95.8459μs 10.4334 KOps/s 9.8267 KOps/s $\textbf{\color{#35bf28}+6.17\%}$
test_compile_add_one_nested[tensordict-compile] 0.3203ms 0.1576ms 6.3437 KOps/s 6.1657 KOps/s $\color{#35bf28}+2.89\%$
test_compile_add_one_nested[tensordict-eager] 0.2903ms 0.1603ms 6.2384 KOps/s 6.3181 KOps/s $\color{#d91a1a}-1.26\%$
test_compile_add_one_nested[pytree-compile] 0.2143ms 0.1560ms 6.4091 KOps/s 6.3472 KOps/s $\color{#35bf28}+0.98\%$
test_compile_add_one_nested[pytree-eager] 0.2804ms 0.1831ms 5.4605 KOps/s 5.4447 KOps/s $\color{#35bf28}+0.29\%$
test_compile_copy_nested[tensordict-compile] 56.1010μs 21.1438μs 47.2951 KOps/s 45.7675 KOps/s $\color{#35bf28}+3.34\%$
test_compile_copy_nested[tensordict-eager] 0.1056ms 47.0824μs 21.2394 KOps/s 21.2115 KOps/s $\color{#35bf28}+0.13\%$
test_compile_copy_nested[pytree-compile] 0.3112ms 64.3962μs 15.5289 KOps/s 15.4803 KOps/s $\color{#35bf28}+0.31\%$
test_compile_copy_nested[pytree-eager] 92.3210μs 49.8423μs 20.0633 KOps/s 20.1295 KOps/s $\color{#d91a1a}-0.33\%$
test_compile_add_one_flat[tensordict-compile] 0.3645ms 0.3107ms 3.2187 KOps/s 3.1819 KOps/s $\color{#35bf28}+1.16\%$
test_compile_add_one_flat[tensordict-eager] 0.3132ms 0.2294ms 4.3584 KOps/s 4.3423 KOps/s $\color{#35bf28}+0.37\%$
test_compile_add_one_flat[tensorclass-compile] 0.1942ms 0.1256ms 7.9615 KOps/s 7.8559 KOps/s $\color{#35bf28}+1.34\%$
test_compile_add_one_flat[tensorclass-eager] 0.1262ms 64.1341μs 15.5923 KOps/s 15.5553 KOps/s $\color{#35bf28}+0.24\%$
test_compile_add_one_flat[pytree-compile] 0.3670ms 0.3205ms 3.1201 KOps/s 3.1406 KOps/s $\color{#d91a1a}-0.65\%$
test_compile_add_one_flat[pytree-eager] 0.7032ms 0.6115ms 1.6354 KOps/s 1.6192 KOps/s $\color{#35bf28}+1.00\%$
test_compile_add_self_flat[tensordict-eager] 0.4067ms 0.2796ms 3.5765 KOps/s 3.5700 KOps/s $\color{#35bf28}+0.18\%$
test_compile_add_self_flat[tensordict-compile] 0.4662ms 0.3149ms 3.1760 KOps/s 3.1863 KOps/s $\color{#d91a1a}-0.32\%$
test_compile_add_self_flat[tensorclass-eager] 0.1650ms 81.2969μs 12.3006 KOps/s 12.5918 KOps/s $\color{#d91a1a}-2.31\%$
test_compile_add_self_flat[tensorclass-compile] 0.1945ms 0.1312ms 7.6191 KOps/s 7.5804 KOps/s $\color{#35bf28}+0.51\%$
test_compile_add_self_flat[pytree-eager] 0.6565ms 0.5365ms 1.8639 KOps/s 1.8975 KOps/s $\color{#d91a1a}-1.77\%$
test_compile_add_self_flat[pytree-compile] 0.3847ms 0.3189ms 3.1353 KOps/s 3.1103 KOps/s $\color{#35bf28}+0.81\%$
test_compile_copy_flat[tensordict-compile] 55.6210μs 19.1763μs 52.1478 KOps/s 52.9529 KOps/s $\color{#d91a1a}-1.52\%$
test_compile_copy_flat[tensordict-eager] 86.9120μs 38.5554μs 25.9367 KOps/s 26.0905 KOps/s $\color{#d91a1a}-0.59\%$
test_compile_copy_flat[pytree-compile] 0.1147ms 70.0492μs 14.2757 KOps/s 14.3784 KOps/s $\color{#d91a1a}-0.71\%$
test_compile_copy_flat[pytree-eager] 90.5920μs 51.3442μs 19.4764 KOps/s 19.4955 KOps/s $\color{#d91a1a}-0.10\%$
test_compile_assign_and_add[tensordict-compile] 2.3974ms 0.8367ms 1.1952 KOps/s 1.1339 KOps/s $\textbf{\color{#35bf28}+5.41\%}$
test_compile_assign_and_add[tensordict-eager] 3.6554ms 3.2978ms 303.2296 Ops/s 312.6941 Ops/s $\color{#d91a1a}-3.03\%$
test_compile_assign_and_add[pytree-compile] 2.4501ms 0.8409ms 1.1892 KOps/s 1.1199 KOps/s $\textbf{\color{#35bf28}+6.19\%}$
test_compile_assign_and_add[pytree-eager] 3.4766ms 3.2177ms 310.7825 Ops/s 315.8217 Ops/s $\color{#d91a1a}-1.60\%$
test_compile_indexing[tensor-tensordict-compile] 0.2900ms 0.1201ms 8.3288 KOps/s 8.0653 KOps/s $\color{#35bf28}+3.27\%$
test_compile_indexing[tensor-tensordict-eager] 0.2254ms 60.6529μs 16.4873 KOps/s 15.3506 KOps/s $\textbf{\color{#35bf28}+7.40\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.1707ms 0.1196ms 8.3620 KOps/s 8.3832 KOps/s $\color{#d91a1a}-0.25\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2107ms 44.4288μs 22.5079 KOps/s 22.0900 KOps/s $\color{#35bf28}+1.89\%$
test_compile_indexing[tensor-pytree-compile] 0.2484ms 0.1207ms 8.2873 KOps/s 8.3561 KOps/s $\color{#d91a1a}-0.82\%$
test_compile_indexing[tensor-pytree-eager] 0.1800ms 45.1731μs 22.1370 KOps/s 22.0677 KOps/s $\color{#35bf28}+0.31\%$
test_compile_indexing[slice-tensordict-compile] 0.2086ms 0.1493ms 6.6990 KOps/s 6.6385 KOps/s $\color{#35bf28}+0.91\%$
test_compile_indexing[slice-tensordict-eager] 0.1568ms 24.2689μs 41.2050 KOps/s 40.9120 KOps/s $\color{#35bf28}+0.72\%$
test_compile_indexing[slice-tensorclass-compile] 0.2004ms 0.1401ms 7.1398 KOps/s 7.2378 KOps/s $\color{#d91a1a}-1.35\%$
test_compile_indexing[slice-tensorclass-eager] 60.8410μs 21.3058μs 46.9356 KOps/s 51.0325 KOps/s $\textbf{\color{#d91a1a}-8.03\%}$
test_compile_indexing[slice-pytree-compile] 0.1992ms 0.1454ms 6.8767 KOps/s 7.2018 KOps/s $\color{#d91a1a}-4.51\%$
test_compile_indexing[slice-pytree-eager] 60.3110μs 20.8463μs 47.9702 KOps/s 51.7000 KOps/s $\textbf{\color{#d91a1a}-7.21\%}$
test_compile_indexing[int-tensordict-compile] 0.2397ms 0.1449ms 6.9022 KOps/s 6.9607 KOps/s $\color{#d91a1a}-0.84\%$
test_compile_indexing[int-tensordict-eager] 0.4889ms 23.5764μs 42.4152 KOps/s 41.2442 KOps/s $\color{#35bf28}+2.84\%$
test_compile_indexing[int-tensorclass-compile] 0.2065ms 0.1394ms 7.1719 KOps/s 7.1801 KOps/s $\color{#d91a1a}-0.12\%$
test_compile_indexing[int-tensorclass-eager] 65.8410μs 19.7345μs 50.6726 KOps/s 49.7924 KOps/s $\color{#35bf28}+1.77\%$
test_compile_indexing[int-pytree-compile] 0.1934ms 0.1394ms 7.1736 KOps/s 7.1890 KOps/s $\color{#d91a1a}-0.21\%$
test_compile_indexing[int-pytree-eager] 98.4820μs 19.9576μs 50.1062 KOps/s 50.6246 KOps/s $\color{#d91a1a}-1.02\%$
test_mod_add[eager] 74.5510μs 33.1648μs 30.1525 KOps/s 29.6786 KOps/s $\color{#35bf28}+1.60\%$
test_mod_add[compile] 0.1943ms 82.3029μs 12.1502 KOps/s 12.6656 KOps/s $\color{#d91a1a}-4.07\%$
test_mod_add[compile-overhead] 0.2974ms 0.1493ms 6.6962 KOps/s 6.0980 KOps/s $\textbf{\color{#35bf28}+9.81\%}$
test_mod_wrap[eager] 0.3222ms 0.2458ms 4.0689 KOps/s 3.9105 KOps/s $\color{#35bf28}+4.05\%$
test_mod_wrap[compile] 1.3769ms 0.2910ms 3.4366 KOps/s 3.3769 KOps/s $\color{#35bf28}+1.77\%$
test_mod_wrap[compile-overhead] 7.6847ms 4.1189ms 242.7832 Ops/s 247.2398 Ops/s $\color{#d91a1a}-1.80\%$
test_mod_wrap_and_backward[eager] 1.4687ms 1.3339ms 749.6822 Ops/s 692.7018 Ops/s $\textbf{\color{#35bf28}+8.23\%}$
test_mod_wrap_and_backward[compile] 1.5726ms 1.3083ms 764.3560 Ops/s 701.2890 Ops/s $\textbf{\color{#35bf28}+8.99\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3298ms 0.8879ms 1.1262 KOps/s 992.3240 Ops/s $\textbf{\color{#35bf28}+13.49\%}$
test_seq_add[eager] 0.1466ms 97.1126μs 10.2973 KOps/s 9.8690 KOps/s $\color{#35bf28}+4.34\%$
test_seq_add[compile] 0.1326ms 90.6671μs 11.0294 KOps/s 11.1978 KOps/s $\color{#d91a1a}-1.50\%$
test_seq_add[compile-overhead] 0.1655ms 0.1223ms 8.1771 KOps/s 8.1551 KOps/s $\color{#35bf28}+0.27\%$
test_seq_wrap[eager] 0.5148ms 0.3949ms 2.5322 KOps/s 2.5373 KOps/s $\color{#d91a1a}-0.20\%$
test_seq_wrap[compile] 0.3548ms 0.3065ms 3.2631 KOps/s 3.1169 KOps/s $\color{#35bf28}+4.69\%$
test_seq_wrap[compile-overhead] 0.2693ms 0.2221ms 4.5026 KOps/s 4.6016 KOps/s $\color{#d91a1a}-2.15\%$
test_func_call_runtime[False-eager] 0.7841ms 0.7194ms 1.3900 KOps/s 1.2983 KOps/s $\textbf{\color{#35bf28}+7.07\%}$
test_func_call_runtime[False-compile] 0.9784ms 0.7862ms 1.2719 KOps/s 1.2786 KOps/s $\color{#d91a1a}-0.52\%$
test_func_call_runtime[False-compile-overhead] 0.4893ms 0.3556ms 2.8121 KOps/s 2.8095 KOps/s $\color{#35bf28}+0.09\%$
test_func_call_runtime[True-eager] 0.9905ms 0.8845ms 1.1306 KOps/s 1.1066 KOps/s $\color{#35bf28}+2.17\%$
test_func_call_runtime[True-compile] 0.8582ms 0.7973ms 1.2543 KOps/s 1.2349 KOps/s $\color{#35bf28}+1.57\%$
test_func_call_runtime[True-compile-overhead] 0.4438ms 0.3770ms 2.6524 KOps/s 2.6812 KOps/s $\color{#d91a1a}-1.08\%$
test_func_call_cm_runtime[False-eager] 0.9115ms 0.7172ms 1.3943 KOps/s 1.3577 KOps/s $\color{#35bf28}+2.70\%$
test_func_call_cm_runtime[False-compile] 0.8523ms 0.7712ms 1.2967 KOps/s 1.2774 KOps/s $\color{#35bf28}+1.51\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4057ms 0.3574ms 2.7981 KOps/s 2.8008 KOps/s $\color{#d91a1a}-0.09\%$
test_func_call_cm_runtime[True-eager] 1.0688ms 0.9862ms 1.0140 KOps/s 988.8272 Ops/s $\color{#35bf28}+2.54\%$
test_func_call_cm_runtime[True-compile] 0.9036ms 0.8253ms 1.2116 KOps/s 1.2024 KOps/s $\color{#35bf28}+0.77\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4475ms 0.4015ms 2.4905 KOps/s 2.4796 KOps/s $\color{#35bf28}+0.44\%$
test_vmap_func_call_cm_runtime[eager] 2.5650ms 2.1025ms 475.6268 Ops/s 469.5910 Ops/s $\color{#35bf28}+1.29\%$
test_vmap_func_call_cm_runtime[compile] 0.9261ms 0.8369ms 1.1949 KOps/s 1.1813 KOps/s $\color{#35bf28}+1.15\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4584ms 0.4075ms 2.4538 KOps/s 2.4692 KOps/s $\color{#d91a1a}-0.62\%$
test_distributed 2.3598ms 0.1720ms 5.8129 KOps/s 8.7610 KOps/s $\textbf{\color{#d91a1a}-33.65\%}$
test_tdmodule 36.1510μs 15.8888μs 62.9375 KOps/s 60.7466 KOps/s $\color{#35bf28}+3.61\%$
test_tdmodule_dispatch 60.3910μs 31.6683μs 31.5773 KOps/s 31.9786 KOps/s $\color{#d91a1a}-1.25\%$
test_tdseq 49.9900μs 16.8165μs 59.4653 KOps/s 59.1002 KOps/s $\color{#35bf28}+0.62\%$
test_tdseq_dispatch 57.1210μs 33.9812μs 29.4280 KOps/s 29.1847 KOps/s $\color{#35bf28}+0.83\%$
test_instantiation_functorch 1.9780ms 1.8246ms 548.0543 Ops/s 549.0712 Ops/s $\color{#d91a1a}-0.19\%$
test_exec_functorch 0.2646ms 0.2034ms 4.9169 KOps/s 4.7975 KOps/s $\color{#35bf28}+2.49\%$
test_exec_functional_call 0.2757ms 0.2082ms 4.8039 KOps/s 4.8082 KOps/s $\color{#d91a1a}-0.09\%$
test_exec_td_decorator 0.4564ms 0.2630ms 3.8017 KOps/s 3.8531 KOps/s $\color{#d91a1a}-1.34\%$
test_vmap_mlp_speed_decorator[True-True] 0.8333ms 0.6892ms 1.4510 KOps/s 1.4452 KOps/s $\color{#35bf28}+0.40\%$
test_vmap_mlp_speed_decorator[True-False] 0.8160ms 0.6864ms 1.4569 KOps/s 1.4476 KOps/s $\color{#35bf28}+0.65\%$
test_vmap_mlp_speed_decorator[False-True] 0.7385ms 0.6070ms 1.6474 KOps/s 1.6597 KOps/s $\color{#d91a1a}-0.74\%$
test_vmap_mlp_speed_decorator[False-False] 0.7040ms 0.6078ms 1.6454 KOps/s 1.6408 KOps/s $\color{#35bf28}+0.28\%$
test_vmap_transformer_speed_decorator[True-True] 20.6510ms 19.8679ms 50.3324 Ops/s 50.7021 Ops/s $\color{#d91a1a}-0.73\%$
test_vmap_transformer_speed_decorator[True-False] 20.5037ms 19.7164ms 50.7191 Ops/s 50.2595 Ops/s $\color{#35bf28}+0.91\%$
test_vmap_transformer_speed_decorator[False-True] 20.5028ms 19.8369ms 50.4111 Ops/s 51.0525 Ops/s $\color{#d91a1a}-1.26\%$
test_vmap_transformer_speed_decorator[False-False] 20.9611ms 19.7388ms 50.6616 Ops/s 50.8175 Ops/s $\color{#d91a1a}-0.31\%$
test_to_module_speed[True] 1.4442ms 0.9878ms 1.0123 KOps/s 1.0095 KOps/s $\color{#35bf28}+0.28\%$
test_to_module_speed[False] 1.4271ms 0.9685ms 1.0326 KOps/s 1.0460 KOps/s $\color{#d91a1a}-1.29\%$
test_tc_init 76.1620μs 37.6332μs 26.5723 KOps/s 26.1944 KOps/s $\color{#35bf28}+1.44\%$
test_tc_init_nested 0.2014ms 74.0647μs 13.5017 KOps/s 12.9097 KOps/s $\color{#35bf28}+4.59\%$
test_tc_first_layer_tensor 4.9300μs 0.6617μs 1.5112 MOps/s 1.5082 MOps/s $\color{#35bf28}+0.20\%$
test_tc_first_layer_nontensor 42.6310μs 2.1702μs 460.7816 KOps/s 459.7175 KOps/s $\color{#35bf28}+0.23\%$
test_tc_second_layer_tensor 9.6477μs 1.3201μs 757.4973 KOps/s 742.2117 KOps/s $\color{#35bf28}+2.06\%$
test_tc_second_layer_nontensor 24.1110μs 2.8740μs 347.9504 KOps/s 348.5403 KOps/s $\color{#d91a1a}-0.17\%$
test_unbind 0.1941s 9.5224ms 105.0161 Ops/s 92.7657 Ops/s $\textbf{\color{#35bf28}+13.21\%}$
test_full_like 0.6638ms 0.5744ms 1.7411 KOps/s 1.7450 KOps/s $\color{#d91a1a}-0.22\%$
test_zeros_like 0.2628ms 0.1980ms 5.0517 KOps/s 5.0463 KOps/s $\color{#35bf28}+0.11\%$
test_ones_like 0.2332ms 0.1978ms 5.0553 KOps/s 5.0501 KOps/s $\color{#35bf28}+0.10\%$
test_clone 0.4430ms 0.4147ms 2.4113 KOps/s 2.4101 KOps/s $\color{#35bf28}+0.05\%$
test_squeeze 44.7600μs 9.5882μs 104.2947 KOps/s 106.6937 KOps/s $\color{#d91a1a}-2.25\%$
test_unsqueeze 0.2169ms 73.8036μs 13.5495 KOps/s 13.8298 KOps/s $\color{#d91a1a}-2.03\%$
test_split 0.4195ms 0.1513ms 6.6097 KOps/s 6.5958 KOps/s $\color{#35bf28}+0.21\%$
test_permute 0.2893ms 0.1799ms 5.5586 KOps/s 5.6511 KOps/s $\color{#d91a1a}-1.64\%$
test_stack 1.2568ms 0.8294ms 1.2057 KOps/s 1.1538 KOps/s $\color{#35bf28}+4.50\%$
test_cat 1.2624ms 1.2313ms 812.1598 Ops/s 812.0162 Ops/s $\color{#35bf28}+0.02\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 16, 2024
ghstack-source-id: e710b72f185d8c18284fdd6cd4283c78d12a28f3
Pull Request resolved: #1040
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 16, 2024
ghstack-source-id: 5df15306395f6986e77caed2cfa87b3516a1b134
Pull Request resolved: #1040
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants