Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix parsing integer batch size within export #1004

Open
wants to merge 3 commits into
base: gh/vmoens/18/base
Choose a base branch
from

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Sep 20, 2024

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Sep 20, 2024
ghstack-source-id: 73e7dd429770e1c383b3b2a1c28dbbf661d65f07
Pull Request resolved: #1004
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 20, 2024
Copy link

github-actions bot commented Sep 20, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}24$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 90.3460μs 25.6517μs 38.9838 KOps/s 42.0286 KOps/s $\textbf{\color{#d91a1a}-7.24\%}$
test_plain_set_stack_nested 81.3820μs 25.6494μs 38.9873 KOps/s 40.9080 KOps/s $\color{#d91a1a}-4.70\%$
test_plain_set_nested_inplace 78.8670μs 28.2397μs 35.4111 KOps/s 37.8495 KOps/s $\textbf{\color{#d91a1a}-6.44\%}$
test_plain_set_stack_nested_inplace 67.5450μs 28.2379μs 35.4134 KOps/s 37.8340 KOps/s $\textbf{\color{#d91a1a}-6.40\%}$
test_items 24.2650μs 4.3290μs 230.9977 KOps/s 240.9439 KOps/s $\color{#d91a1a}-4.13\%$
test_items_nested 0.5422ms 0.3811ms 2.6240 KOps/s 2.6290 KOps/s $\color{#d91a1a}-0.19\%$
test_items_nested_locked 0.5778ms 0.3827ms 2.6129 KOps/s 2.6256 KOps/s $\color{#d91a1a}-0.48\%$
test_items_nested_leaf 0.1465ms 80.0650μs 12.4899 KOps/s 12.4847 KOps/s $\color{#35bf28}+0.04\%$
test_items_stack_nested 0.4875ms 0.3882ms 2.5757 KOps/s 2.6263 KOps/s $\color{#d91a1a}-1.93\%$
test_items_stack_nested_leaf 0.1505ms 85.2998μs 11.7234 KOps/s 12.0747 KOps/s $\color{#d91a1a}-2.91\%$
test_items_stack_nested_locked 0.5917ms 0.3906ms 2.5600 KOps/s 2.5923 KOps/s $\color{#d91a1a}-1.25\%$
test_keys 51.1150μs 3.5094μs 284.9465 KOps/s 286.7259 KOps/s $\color{#d91a1a}-0.62\%$
test_keys_nested 0.2206ms 0.1342ms 7.4539 KOps/s 7.4895 KOps/s $\color{#d91a1a}-0.48\%$
test_keys_nested_locked 0.7522ms 0.1395ms 7.1692 KOps/s 7.2292 KOps/s $\color{#d91a1a}-0.83\%$
test_keys_nested_leaf 0.1981ms 0.1168ms 8.5584 KOps/s 8.6360 KOps/s $\color{#d91a1a}-0.90\%$
test_keys_stack_nested 0.3068ms 0.1338ms 7.4720 KOps/s 7.4816 KOps/s $\color{#d91a1a}-0.13\%$
test_keys_stack_nested_leaf 0.2151ms 0.1163ms 8.5995 KOps/s 8.8809 KOps/s $\color{#d91a1a}-3.17\%$
test_keys_stack_nested_locked 0.2260ms 0.1395ms 7.1677 KOps/s 7.3793 KOps/s $\color{#d91a1a}-2.87\%$
test_values 14.8114μs 1.0402μs 961.3325 KOps/s 920.2927 KOps/s $\color{#35bf28}+4.46\%$
test_values_nested 0.1428ms 92.8479μs 10.7703 KOps/s 10.7425 KOps/s $\color{#35bf28}+0.26\%$
test_values_nested_locked 0.1466ms 92.9801μs 10.7550 KOps/s 10.7332 KOps/s $\color{#35bf28}+0.20\%$
test_values_nested_leaf 0.1367ms 79.1348μs 12.6367 KOps/s 12.7436 KOps/s $\color{#d91a1a}-0.84\%$
test_values_stack_nested 0.1686ms 93.2474μs 10.7242 KOps/s 10.1606 KOps/s $\textbf{\color{#35bf28}+5.55\%}$
test_values_stack_nested_leaf 0.1246ms 78.6717μs 12.7111 KOps/s 13.1709 KOps/s $\color{#d91a1a}-3.49\%$
test_values_stack_nested_locked 0.1513ms 93.1508μs 10.7353 KOps/s 10.7000 KOps/s $\color{#35bf28}+0.33\%$
test_membership 25.2780μs 0.8864μs 1.1282 MOps/s 1.4073 MOps/s $\textbf{\color{#d91a1a}-19.83\%}$
test_membership_nested 29.3550μs 2.8027μs 356.8007 KOps/s 359.7703 KOps/s $\color{#d91a1a}-0.83\%$
test_membership_nested_leaf 25.1870μs 2.7935μs 357.9785 KOps/s 365.9989 KOps/s $\color{#d91a1a}-2.19\%$
test_membership_stacked_nested 22.3420μs 2.7841μs 359.1782 KOps/s 367.9815 KOps/s $\color{#d91a1a}-2.39\%$
test_membership_stacked_nested_leaf 49.9830μs 2.7978μs 357.4186 KOps/s 366.3923 KOps/s $\color{#d91a1a}-2.45\%$
test_membership_nested_last 32.4500μs 4.2370μs 236.0150 KOps/s 241.1513 KOps/s $\color{#d91a1a}-2.13\%$
test_membership_nested_leaf_last 45.8250μs 4.2763μs 233.8492 KOps/s 240.6598 KOps/s $\color{#d91a1a}-2.83\%$
test_membership_stacked_nested_last 31.8490μs 4.9849μs 200.6049 KOps/s 84.5008 KOps/s $\textbf{\color{#35bf28}+137.40\%}$
test_membership_stacked_nested_leaf_last 33.3420μs 5.0441μs 198.2495 KOps/s 84.4283 KOps/s $\textbf{\color{#35bf28}+134.81\%}$
test_nested_getleaf 65.7520μs 10.7630μs 92.9108 KOps/s 96.1892 KOps/s $\color{#d91a1a}-3.41\%$
test_nested_get 43.0290μs 9.9605μs 100.3969 KOps/s 100.6425 KOps/s $\color{#d91a1a}-0.24\%$
test_stacked_getleaf 42.5590μs 10.5680μs 94.6251 KOps/s 95.5818 KOps/s $\color{#d91a1a}-1.00\%$
test_stacked_get 76.7630μs 9.9376μs 100.6282 KOps/s 100.0912 KOps/s $\color{#35bf28}+0.54\%$
test_nested_getitemleaf 43.9220μs 11.0345μs 90.6252 KOps/s 91.3562 KOps/s $\color{#d91a1a}-0.80\%$
test_nested_getitem 39.9950μs 10.2259μs 97.7913 KOps/s 97.2067 KOps/s $\color{#35bf28}+0.60\%$
test_stacked_getitemleaf 53.0590μs 11.0677μs 90.3526 KOps/s 91.2083 KOps/s $\color{#d91a1a}-0.94\%$
test_stacked_getitem 46.2160μs 10.1876μs 98.1587 KOps/s 97.7682 KOps/s $\color{#35bf28}+0.40\%$
test_lock_nested 1.4056ms 0.5140ms 1.9457 KOps/s 1.9695 KOps/s $\color{#d91a1a}-1.21\%$
test_lock_stack_nested 0.8064ms 0.4832ms 2.0696 KOps/s 2.1758 KOps/s $\color{#d91a1a}-4.88\%$
test_unlock_nested 0.1144s 0.5461ms 1.8313 KOps/s 2.3585 KOps/s $\textbf{\color{#d91a1a}-22.36\%}$
test_unlock_stack_nested 0.7236ms 0.3988ms 2.5074 KOps/s 2.6514 KOps/s $\textbf{\color{#d91a1a}-5.43\%}$
test_flatten_speed 0.2139ms 0.1032ms 9.6896 KOps/s 10.0849 KOps/s $\color{#d91a1a}-3.92\%$
test_unflatten_speed 0.6890ms 0.5155ms 1.9400 KOps/s 1.9742 KOps/s $\color{#d91a1a}-1.74\%$
test_common_ops 2.1192ms 1.2013ms 832.4249 Ops/s 868.4444 Ops/s $\color{#d91a1a}-4.15\%$
test_creation 32.9710μs 2.1587μs 463.2498 KOps/s 479.3299 KOps/s $\color{#d91a1a}-3.35\%$
test_creation_empty 68.7080μs 20.9910μs 47.6394 KOps/s 51.5555 KOps/s $\textbf{\color{#d91a1a}-7.60\%}$
test_creation_nested_1 99.9760μs 24.5746μs 40.6924 KOps/s 44.7465 KOps/s $\textbf{\color{#d91a1a}-9.06\%}$
test_creation_nested_2 66.8050μs 28.8482μs 34.6642 KOps/s 37.7520 KOps/s $\textbf{\color{#d91a1a}-8.18\%}$
test_clone 0.1566ms 17.5888μs 56.8543 KOps/s 57.4949 KOps/s $\color{#d91a1a}-1.11\%$
test_getitem[int] 1.0820ms 17.1075μs 58.4537 KOps/s 59.7480 KOps/s $\color{#d91a1a}-2.17\%$
test_getitem[slice_int] 0.1477ms 31.4669μs 31.7794 KOps/s 32.6219 KOps/s $\color{#d91a1a}-2.58\%$
test_getitem[range] 0.2332ms 58.3961μs 17.1244 KOps/s 17.1216 KOps/s $\color{#35bf28}+0.02\%$
test_getitem[tuple] 0.1522ms 25.2077μs 39.6705 KOps/s 39.8945 KOps/s $\color{#d91a1a}-0.56\%$
test_getitem[list] 0.3560ms 53.3721μs 18.7364 KOps/s 18.8335 KOps/s $\color{#d91a1a}-0.52\%$
test_setitem_dim[int] 94.0750μs 34.6007μs 28.9011 KOps/s 29.7894 KOps/s $\color{#d91a1a}-2.98\%$
test_setitem_dim[slice_int] 0.1094ms 61.2428μs 16.3285 KOps/s 15.9856 KOps/s $\color{#35bf28}+2.14\%$
test_setitem_dim[range] 0.1448ms 85.9348μs 11.6367 KOps/s 11.4770 KOps/s $\color{#35bf28}+1.39\%$
test_setitem_dim[tuple] 0.1352ms 51.4119μs 19.4507 KOps/s 19.7706 KOps/s $\color{#d91a1a}-1.62\%$
test_setitem 0.2105ms 32.7452μs 30.5388 KOps/s 32.2678 KOps/s $\textbf{\color{#d91a1a}-5.36\%}$
test_set 0.1795ms 32.1109μs 31.1421 KOps/s 33.8446 KOps/s $\textbf{\color{#d91a1a}-7.99\%}$
test_set_shared 3.5917ms 0.2264ms 4.4161 KOps/s 4.4667 KOps/s $\color{#d91a1a}-1.13\%$
test_update 0.2021ms 41.5079μs 24.0918 KOps/s 25.7393 KOps/s $\textbf{\color{#d91a1a}-6.40\%}$
test_update_nested 0.2004ms 53.4803μs 18.6985 KOps/s 19.7518 KOps/s $\textbf{\color{#d91a1a}-5.33\%}$
test_update__nested 0.9018ms 45.8381μs 21.8159 KOps/s 22.1145 KOps/s $\color{#d91a1a}-1.35\%$
test_set_nested 0.1837ms 34.4815μs 29.0011 KOps/s 30.4805 KOps/s $\color{#d91a1a}-4.85\%$
test_set_nested_new 0.2078ms 39.7404μs 25.1633 KOps/s 26.3058 KOps/s $\color{#d91a1a}-4.34\%$
test_select 0.3030ms 58.3827μs 17.1284 KOps/s 17.7053 KOps/s $\color{#d91a1a}-3.26\%$
test_select_nested 0.1600ms 61.1680μs 16.3484 KOps/s 16.8708 KOps/s $\color{#d91a1a}-3.10\%$
test_exclude_nested 0.1357ms 75.8400μs 13.1856 KOps/s 13.4525 KOps/s $\color{#d91a1a}-1.98\%$
test_empty[True] 1.0163ms 0.3571ms 2.8002 KOps/s 2.7306 KOps/s $\color{#35bf28}+2.55\%$
test_empty[False] 8.5158μs 1.2659μs 789.9477 KOps/s 804.8738 KOps/s $\color{#d91a1a}-1.85\%$
test_unbind_speed 0.4107ms 0.3108ms 3.2170 KOps/s 3.2572 KOps/s $\color{#d91a1a}-1.23\%$
test_unbind_speed_stack0 0.6474ms 0.3038ms 3.2914 KOps/s 3.4075 KOps/s $\color{#d91a1a}-3.41\%$
test_unbind_speed_stack1 0.1218s 0.7693ms 1.2999 KOps/s 1.3612 KOps/s $\color{#d91a1a}-4.50\%$
test_split 0.1195s 2.4983ms 400.2720 Ops/s 452.8289 Ops/s $\textbf{\color{#d91a1a}-11.61\%}$
test_chunk 2.2206ms 2.0208ms 494.8617 Ops/s 450.0858 Ops/s $\textbf{\color{#35bf28}+9.95\%}$
test_creation[device0] 0.2757ms 0.1183ms 8.4503 KOps/s 8.4579 KOps/s $\color{#d91a1a}-0.09\%$
test_creation_from_tensor 3.9548ms 0.1204ms 8.3030 KOps/s 8.5199 KOps/s $\color{#d91a1a}-2.55\%$
test_add_one[memmap_tensor0] 0.4583ms 7.4563μs 134.1143 KOps/s 138.1089 KOps/s $\color{#d91a1a}-2.89\%$
test_contiguous[memmap_tensor0] 17.4730μs 1.9325μs 517.4749 KOps/s 538.7640 KOps/s $\color{#d91a1a}-3.95\%$
test_stack[memmap_tensor0] 0.1012ms 5.7658μs 173.4376 KOps/s 174.4006 KOps/s $\color{#d91a1a}-0.55\%$
test_memmaptd_index 0.1201s 0.5803ms 1.7231 KOps/s 2.4246 KOps/s $\textbf{\color{#d91a1a}-28.93\%}$
test_memmaptd_index_astensor 1.1676ms 0.5138ms 1.9463 KOps/s 1.9479 KOps/s $\color{#d91a1a}-0.08\%$
test_memmaptd_index_op 1.8566ms 1.1258ms 888.2426 Ops/s 929.2522 Ops/s $\color{#d91a1a}-4.41\%$
test_serialize_model 0.1341s 0.1247s 8.0180 Ops/s 8.4068 Ops/s $\color{#d91a1a}-4.62\%$
test_serialize_model_pickle 0.4463s 0.3948s 2.5329 Ops/s 2.4925 Ops/s $\color{#35bf28}+1.62\%$
test_serialize_weights 0.1273s 0.1210s 8.2632 Ops/s 7.3210 Ops/s $\textbf{\color{#35bf28}+12.87\%}$
test_serialize_weights_returnearly 0.1722s 0.1646s 6.0765 Ops/s 6.2442 Ops/s $\color{#d91a1a}-2.68\%$
test_serialize_weights_pickle 0.5519s 0.4274s 2.3398 Ops/s 1.1854 Ops/s $\textbf{\color{#35bf28}+97.39\%}$
test_serialize_weights_filesystem 0.1556s 0.1457s 6.8620 Ops/s 7.1037 Ops/s $\color{#d91a1a}-3.40\%$
test_serialize_model_filesystem 0.1581s 0.1461s 6.8449 Ops/s 6.5542 Ops/s $\color{#35bf28}+4.44\%$
test_reshape_pytree 80.2790μs 38.9142μs 25.6976 KOps/s 25.8098 KOps/s $\color{#d91a1a}-0.43\%$
test_reshape_td 0.1025ms 46.5027μs 21.5041 KOps/s 21.4612 KOps/s $\color{#35bf28}+0.20\%$
test_view_pytree 94.8960μs 38.9343μs 25.6843 KOps/s 25.8903 KOps/s $\color{#d91a1a}-0.80\%$
test_view_td 0.1125ms 51.8703μs 19.2788 KOps/s 19.1849 KOps/s $\color{#35bf28}+0.49\%$
test_unbind_pytree 0.1212ms 36.8564μs 27.1323 KOps/s 27.9528 KOps/s $\color{#d91a1a}-2.94\%$
test_unbind_td 0.4339ms 45.6671μs 21.8976 KOps/s 22.2974 KOps/s $\color{#d91a1a}-1.79\%$
test_split_pytree 99.3340μs 38.1371μs 26.2212 KOps/s 26.0564 KOps/s $\color{#35bf28}+0.63\%$
test_split_td 0.2258ms 57.6999μs 17.3310 KOps/s 17.3825 KOps/s $\color{#d91a1a}-0.30\%$
test_add_pytree 0.1141ms 45.4140μs 22.0196 KOps/s 21.4058 KOps/s $\color{#35bf28}+2.87\%$
test_add_td 0.2285ms 89.7861μs 11.1376 KOps/s 11.0365 KOps/s $\color{#35bf28}+0.92\%$
test_compile_add_one_nested[tensordict-compile] 0.1394ms 73.2231μs 13.6569 KOps/s 13.6497 KOps/s $\color{#35bf28}+0.05\%$
test_compile_add_one_nested[tensordict-eager] 0.4277ms 0.2024ms 4.9412 KOps/s 4.8086 KOps/s $\color{#35bf28}+2.76\%$
test_compile_add_one_nested[pytree-compile] 0.1851ms 54.8569μs 18.2292 KOps/s 18.0265 KOps/s $\color{#35bf28}+1.12\%$
test_compile_add_one_nested[pytree-eager] 0.2610ms 0.1462ms 6.8406 KOps/s 6.7055 KOps/s $\color{#35bf28}+2.02\%$
test_compile_copy_nested[tensordict-compile] 75.4810μs 28.4837μs 35.1078 KOps/s 35.9707 KOps/s $\color{#d91a1a}-2.40\%$
test_compile_copy_nested[tensordict-eager] 0.1460ms 76.9121μs 13.0019 KOps/s 13.0541 KOps/s $\color{#d91a1a}-0.40\%$
test_compile_copy_nested[pytree-compile] 0.1434ms 79.2714μs 12.6149 KOps/s 13.0686 KOps/s $\color{#d91a1a}-3.47\%$
test_compile_copy_nested[pytree-eager] 0.1264ms 67.9471μs 14.7173 KOps/s 15.1727 KOps/s $\color{#d91a1a}-3.00\%$
test_compile_add_one_flat[tensordict-compile] 0.2763ms 0.1252ms 7.9855 KOps/s 7.9830 KOps/s $\color{#35bf28}+0.03\%$
test_compile_add_one_flat[tensordict-eager] 1.8540ms 0.2496ms 4.0071 KOps/s 4.0449 KOps/s $\color{#d91a1a}-0.94\%$
test_compile_add_one_flat[tensorclass-compile] 0.1391ms 55.4630μs 18.0300 KOps/s 18.3146 KOps/s $\color{#d91a1a}-1.55\%$
test_compile_add_one_flat[tensorclass-eager] 0.7200ms 82.0697μs 12.1848 KOps/s 12.4255 KOps/s $\color{#d91a1a}-1.94\%$
test_compile_add_one_flat[pytree-compile] 0.1893ms 0.1128ms 8.8623 KOps/s 8.9123 KOps/s $\color{#d91a1a}-0.56\%$
test_compile_add_one_flat[pytree-eager] 0.4141ms 0.3024ms 3.3069 KOps/s 3.2888 KOps/s $\color{#35bf28}+0.55\%$
test_compile_add_self_flat[tensordict-eager] 0.5239ms 0.2803ms 3.5676 KOps/s 3.5987 KOps/s $\color{#d91a1a}-0.86\%$
test_compile_add_self_flat[tensordict-compile] 0.2641ms 0.1286ms 7.7732 KOps/s 8.1619 KOps/s $\color{#d91a1a}-4.76\%$
test_compile_add_self_flat[tensorclass-eager] 0.1872ms 76.9241μs 12.9998 KOps/s 13.3826 KOps/s $\color{#d91a1a}-2.86\%$
test_compile_add_self_flat[tensorclass-compile] 0.1350ms 56.3059μs 17.7601 KOps/s 18.4762 KOps/s $\color{#d91a1a}-3.88\%$
test_compile_add_self_flat[pytree-eager] 0.4250ms 0.2474ms 4.0423 KOps/s 4.0704 KOps/s $\color{#d91a1a}-0.69\%$
test_compile_add_self_flat[pytree-compile] 0.2067ms 0.1124ms 8.8971 KOps/s 8.8604 KOps/s $\color{#35bf28}+0.41\%$
test_compile_copy_flat[tensordict-compile] 92.0110μs 30.6715μs 32.6036 KOps/s 32.1579 KOps/s $\color{#35bf28}+1.39\%$
test_compile_copy_flat[tensordict-eager] 0.1673ms 79.0016μs 12.6580 KOps/s 13.0809 KOps/s $\color{#d91a1a}-3.23\%$
test_compile_copy_flat[pytree-compile] 0.1839ms 81.5744μs 12.2587 KOps/s 12.7514 KOps/s $\color{#d91a1a}-3.86\%$
test_compile_copy_flat[pytree-eager] 0.1287ms 69.2851μs 14.4331 KOps/s 14.7734 KOps/s $\color{#d91a1a}-2.30\%$
test_compile_assign_and_add[tensordict-compile] 0.3752ms 0.2171ms 4.6068 KOps/s 4.7323 KOps/s $\color{#d91a1a}-2.65\%$
test_compile_assign_and_add[tensordict-eager] 2.2585ms 1.7847ms 560.3131 Ops/s 548.8401 Ops/s $\color{#35bf28}+2.09\%$
test_compile_assign_and_add[pytree-compile] 0.3255ms 0.2161ms 4.6268 KOps/s 4.7404 KOps/s $\color{#d91a1a}-2.40\%$
test_compile_assign_and_add[pytree-eager] 1.4638ms 1.1698ms 854.8741 Ops/s 839.8493 Ops/s $\color{#35bf28}+1.79\%$
test_compile_assign_and_add_stack[compile] 0.7429ms 0.4748ms 2.1061 KOps/s 2.0952 KOps/s $\color{#35bf28}+0.52\%$
test_compile_assign_and_add_stack[eager] 5.2091ms 4.5034ms 222.0526 Ops/s 223.7739 Ops/s $\color{#d91a1a}-0.77\%$
test_compile_indexing[tensor-tensordict-compile] 0.1244ms 44.9175μs 22.2631 KOps/s 22.0339 KOps/s $\color{#35bf28}+1.04\%$
test_compile_indexing[tensor-tensordict-eager] 0.7076ms 51.3017μs 19.4925 KOps/s 19.7090 KOps/s $\color{#d91a1a}-1.10\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1202ms 38.3678μs 26.0635 KOps/s 26.4737 KOps/s $\color{#d91a1a}-1.55\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1117ms 29.7152μs 33.6528 KOps/s 32.4275 KOps/s $\color{#35bf28}+3.78\%$
test_compile_indexing[tensor-pytree-compile] 0.1405ms 40.7210μs 24.5573 KOps/s 25.4876 KOps/s $\color{#d91a1a}-3.65\%$
test_compile_indexing[tensor-pytree-eager] 87.7630μs 29.5026μs 33.8953 KOps/s 32.6765 KOps/s $\color{#35bf28}+3.73\%$
test_compile_indexing[slice-tensordict-compile] 0.1820ms 77.2662μs 12.9423 KOps/s 12.6952 KOps/s $\color{#35bf28}+1.95\%$
test_compile_indexing[slice-tensordict-eager] 0.5888ms 29.1077μs 34.3552 KOps/s 33.8986 KOps/s $\color{#35bf28}+1.35\%$
test_compile_indexing[slice-tensorclass-compile] 0.1364ms 71.2777μs 14.0296 KOps/s 14.0623 KOps/s $\color{#d91a1a}-0.23\%$
test_compile_indexing[slice-tensorclass-eager] 93.9750μs 24.2466μs 41.2429 KOps/s 41.7551 KOps/s $\color{#d91a1a}-1.23\%$
test_compile_indexing[slice-pytree-compile] 0.1663ms 72.3309μs 13.8253 KOps/s 13.9329 KOps/s $\color{#d91a1a}-0.77\%$
test_compile_indexing[slice-pytree-eager] 82.3330μs 23.9355μs 41.7790 KOps/s 41.7304 KOps/s $\color{#35bf28}+0.12\%$
test_compile_indexing[int-tensordict-compile] 0.1601ms 78.4741μs 12.7431 KOps/s 12.6846 KOps/s $\color{#35bf28}+0.46\%$
test_compile_indexing[int-tensordict-eager] 1.0904ms 29.3294μs 34.0955 KOps/s 34.4901 KOps/s $\color{#d91a1a}-1.14\%$
test_compile_indexing[int-tensorclass-compile] 0.1736ms 71.5271μs 13.9807 KOps/s 13.9225 KOps/s $\color{#35bf28}+0.42\%$
test_compile_indexing[int-tensorclass-eager] 78.2750μs 23.9660μs 41.7258 KOps/s 41.3067 KOps/s $\color{#35bf28}+1.01\%$
test_compile_indexing[int-pytree-compile] 0.1811ms 73.0057μs 13.6976 KOps/s 13.9814 KOps/s $\color{#d91a1a}-2.03\%$
test_compile_indexing[int-pytree-eager] 94.5260μs 24.1807μs 41.3553 KOps/s 41.7953 KOps/s $\color{#d91a1a}-1.05\%$
test_mod_add[eager] 0.1048ms 27.8822μs 35.8652 KOps/s 35.5649 KOps/s $\color{#35bf28}+0.84\%$
test_mod_add[compile] 0.1708ms 45.0059μs 22.2193 KOps/s 21.5018 KOps/s $\color{#35bf28}+3.34\%$
test_mod_add[compile-overhead] 0.1552ms 44.6396μs 22.4017 KOps/s 22.1260 KOps/s $\color{#35bf28}+1.25\%$
test_mod_wrap[eager] 0.4412ms 0.2235ms 4.4749 KOps/s 4.4651 KOps/s $\color{#35bf28}+0.22\%$
test_mod_wrap[compile] 2.0367ms 0.2094ms 4.7746 KOps/s 4.7724 KOps/s $\color{#35bf28}+0.05\%$
test_mod_wrap[compile-overhead] 2.0054ms 0.2104ms 4.7530 KOps/s 4.7762 KOps/s $\color{#d91a1a}-0.49\%$
test_mod_wrap_and_backward[eager] 13.3795ms 11.1765ms 89.4731 Ops/s 89.6904 Ops/s $\color{#d91a1a}-0.24\%$
test_mod_wrap_and_backward[compile] 13.6151ms 11.1170ms 89.9524 Ops/s 88.2365 Ops/s $\color{#35bf28}+1.94\%$
test_mod_wrap_and_backward[compile-overhead] 12.8713ms 11.0729ms 90.3104 Ops/s 88.5381 Ops/s $\color{#35bf28}+2.00\%$
test_seq_add[eager] 0.2313ms 95.9548μs 10.4216 KOps/s 10.2137 KOps/s $\color{#35bf28}+2.04\%$
test_seq_add[compile] 0.1332ms 59.9248μs 16.6876 KOps/s 16.5399 KOps/s $\color{#35bf28}+0.89\%$
test_seq_add[compile-overhead] 0.1609ms 58.0194μs 17.2356 KOps/s 16.8262 KOps/s $\color{#35bf28}+2.43\%$
test_seq_wrap[eager] 0.7534ms 0.4034ms 2.4788 KOps/s 2.4508 KOps/s $\color{#35bf28}+1.14\%$
test_seq_wrap[compile] 0.3665ms 0.2298ms 4.3519 KOps/s 4.3287 KOps/s $\color{#35bf28}+0.54\%$
test_seq_wrap[compile-overhead] 0.5373ms 0.2314ms 4.3220 KOps/s 4.3711 KOps/s $\color{#d91a1a}-1.12\%$
test_func_call_runtime[False-eager] 0.8547ms 0.5591ms 1.7886 KOps/s 1.7673 KOps/s $\color{#35bf28}+1.21\%$
test_func_call_runtime[False-compile] 0.6200ms 0.4387ms 2.2796 KOps/s 2.2828 KOps/s $\color{#d91a1a}-0.14\%$
test_func_call_runtime[False-compile-overhead] 0.6120ms 0.4378ms 2.2842 KOps/s 2.2927 KOps/s $\color{#d91a1a}-0.37\%$
test_func_call_runtime[True-eager] 1.2822ms 0.7763ms 1.2881 KOps/s 1.2709 KOps/s $\color{#35bf28}+1.35\%$
test_func_call_runtime[True-compile] 0.6252ms 0.4777ms 2.0933 KOps/s 2.0986 KOps/s $\color{#d91a1a}-0.25\%$
test_func_call_runtime[True-compile-overhead] 0.6233ms 0.4796ms 2.0851 KOps/s 2.1020 KOps/s $\color{#d91a1a}-0.81\%$
test_func_call_cm_runtime[False-eager] 0.7936ms 0.5619ms 1.7797 KOps/s 1.7703 KOps/s $\color{#35bf28}+0.53\%$
test_func_call_cm_runtime[False-compile] 0.6477ms 0.4370ms 2.2885 KOps/s 2.2709 KOps/s $\color{#35bf28}+0.77\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6754ms 0.4380ms 2.2832 KOps/s 2.2917 KOps/s $\color{#d91a1a}-0.37\%$
test_func_call_cm_runtime[True-eager] 1.5028ms 0.9340ms 1.0707 KOps/s 1.0678 KOps/s $\color{#35bf28}+0.27\%$
test_func_call_cm_runtime[True-compile] 0.7260ms 0.5096ms 1.9624 KOps/s 1.9853 KOps/s $\color{#d91a1a}-1.15\%$
test_func_call_cm_runtime[True-compile-overhead] 0.7344ms 0.5054ms 1.9787 KOps/s 1.9760 KOps/s $\color{#35bf28}+0.13\%$
test_vmap_func_call_cm_runtime[eager] 2.7046ms 1.9863ms 503.4567 Ops/s 496.8840 Ops/s $\color{#35bf28}+1.32\%$
test_vmap_func_call_cm_runtime[compile] 0.9268ms 0.5494ms 1.8203 KOps/s 1.8412 KOps/s $\color{#d91a1a}-1.14\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.9215ms 0.5360ms 1.8657 KOps/s 1.8614 KOps/s $\color{#35bf28}+0.23\%$
test_distributed 0.3414ms 0.1298ms 7.7058 KOps/s 7.3921 KOps/s $\color{#35bf28}+4.24\%$
test_tdmodule 54.6320μs 20.0235μs 49.9414 KOps/s 51.4084 KOps/s $\color{#d91a1a}-2.85\%$
test_tdmodule_dispatch 67.4660μs 40.5997μs 24.6307 KOps/s 26.0339 KOps/s $\textbf{\color{#d91a1a}-5.39\%}$
test_tdseq 41.6670μs 22.9679μs 43.5390 KOps/s 41.6432 KOps/s $\color{#35bf28}+4.55\%$
test_tdseq_dispatch 86.9020μs 46.1447μs 21.6710 KOps/s 22.6654 KOps/s $\color{#d91a1a}-4.39\%$
test_instantiation_functorch 2.1346ms 1.5573ms 642.1185 Ops/s 642.0934 Ops/s $+0.00\%$
test_exec_functorch 0.3113ms 0.1827ms 5.4731 KOps/s 5.5143 KOps/s $\color{#d91a1a}-0.75\%$
test_exec_functional_call 0.3095ms 0.1771ms 5.6461 KOps/s 5.6653 KOps/s $\color{#d91a1a}-0.34\%$
test_exec_td_decorator 0.6201ms 0.2412ms 4.1462 KOps/s 4.1012 KOps/s $\color{#35bf28}+1.10\%$
test_vmap_mlp_speed_decorator[True-True] 1.0497ms 0.6763ms 1.4786 KOps/s 1.4599 KOps/s $\color{#35bf28}+1.28\%$
test_vmap_mlp_speed_decorator[True-False] 1.0127ms 0.6649ms 1.5041 KOps/s 1.4965 KOps/s $\color{#35bf28}+0.51\%$
test_vmap_mlp_speed_decorator[False-True] 0.8611ms 0.5476ms 1.8261 KOps/s 1.7914 KOps/s $\color{#35bf28}+1.94\%$
test_vmap_mlp_speed_decorator[False-False] 0.7513ms 0.5484ms 1.8234 KOps/s 1.8146 KOps/s $\color{#35bf28}+0.49\%$
test_to_module_speed[True] 2.2438ms 1.3871ms 720.9304 Ops/s 717.4964 Ops/s $\color{#35bf28}+0.48\%$
test_to_module_speed[False] 2.0127ms 1.3528ms 739.2044 Ops/s 740.3845 Ops/s $\color{#d91a1a}-0.16\%$
test_tc_init 0.1201ms 49.7626μs 20.0954 KOps/s 21.3957 KOps/s $\textbf{\color{#d91a1a}-6.08\%}$
test_tc_init_nested 0.1788ms 97.5372μs 10.2525 KOps/s 11.0621 KOps/s $\textbf{\color{#d91a1a}-7.32\%}$
test_tc_first_layer_tensor 55.3190μs 1.5534μs 643.7690 KOps/s 660.9454 KOps/s $\color{#d91a1a}-2.60\%$
test_tc_first_layer_nontensor 21.4100μs 4.9098μs 203.6759 KOps/s 205.5285 KOps/s $\color{#d91a1a}-0.90\%$
test_tc_second_layer_tensor 37.5300μs 2.9673μs 337.0111 KOps/s 361.0589 KOps/s $\textbf{\color{#d91a1a}-6.66\%}$
test_tc_second_layer_nontensor 40.5250μs 6.2076μs 161.0933 KOps/s 164.8450 KOps/s $\color{#d91a1a}-2.28\%$
test_unbind 0.2898s 16.7845ms 59.5788 Ops/s 71.2530 Ops/s $\textbf{\color{#d91a1a}-16.38\%}$
test_full_like 20.3237ms 13.9758ms 71.5521 Ops/s 107.5348 Ops/s $\textbf{\color{#d91a1a}-33.46\%}$
test_zeros_like 6.3961ms 4.5115ms 221.6538 Ops/s 256.4473 Ops/s $\textbf{\color{#d91a1a}-13.57\%}$
test_ones_like 6.2114ms 4.6735ms 213.9729 Ops/s 140.7847 Ops/s $\textbf{\color{#35bf28}+51.99\%}$
test_clone 10.5636ms 7.0856ms 141.1303 Ops/s 109.9305 Ops/s $\textbf{\color{#35bf28}+28.38\%}$
test_squeeze 83.6460μs 13.1647μs 75.9608 KOps/s 81.1872 KOps/s $\textbf{\color{#d91a1a}-6.44\%}$
test_unsqueeze 0.2880ms 94.8437μs 10.5437 KOps/s 10.5053 KOps/s $\color{#35bf28}+0.37\%$
test_split 0.4181ms 0.1977ms 5.0579 KOps/s 4.9502 KOps/s $\color{#35bf28}+2.18\%$
test_permute 0.4413ms 0.2312ms 4.3243 KOps/s 4.4136 KOps/s $\color{#d91a1a}-2.02\%$
test_stack 41.5252ms 31.3724ms 31.8752 Ops/s 35.8903 Ops/s $\textbf{\color{#d91a1a}-11.19\%}$
test_cat 35.9056ms 29.8498ms 33.5011 Ops/s 34.8183 Ops/s $\color{#d91a1a}-3.78\%$

Copy link

github-actions bot commented Sep 20, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 228. Improved: $\large\color{#35bf28}22$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1398ms 14.0510μs 71.1694 KOps/s 71.6220 KOps/s $\color{#d91a1a}-0.63\%$
test_plain_set_stack_nested 40.8610μs 14.0714μs 71.0659 KOps/s 70.5660 KOps/s $\color{#35bf28}+0.71\%$
test_plain_set_nested_inplace 44.4510μs 14.9467μs 66.9044 KOps/s 66.6646 KOps/s $\color{#35bf28}+0.36\%$
test_plain_set_stack_nested_inplace 0.1871ms 14.9939μs 66.6940 KOps/s 66.6082 KOps/s $\color{#35bf28}+0.13\%$
test_items 29.3310μs 2.8550μs 350.2578 KOps/s 347.6205 KOps/s $\color{#35bf28}+0.76\%$
test_items_nested 0.3789ms 0.3287ms 3.0421 KOps/s 3.0848 KOps/s $\color{#d91a1a}-1.38\%$
test_items_nested_locked 0.3891ms 0.3311ms 3.0205 KOps/s 3.0425 KOps/s $\color{#d91a1a}-0.72\%$
test_items_nested_leaf 77.6720μs 55.6077μs 17.9831 KOps/s 17.8809 KOps/s $\color{#35bf28}+0.57\%$
test_items_stack_nested 0.3918ms 0.3327ms 3.0060 KOps/s 3.0146 KOps/s $\color{#d91a1a}-0.28\%$
test_items_stack_nested_leaf 86.0220μs 56.5979μs 17.6685 KOps/s 17.4597 KOps/s $\color{#35bf28}+1.20\%$
test_items_stack_nested_locked 0.3873ms 0.3337ms 2.9963 KOps/s 3.0367 KOps/s $\color{#d91a1a}-1.33\%$
test_keys 37.5410μs 3.4019μs 293.9538 KOps/s 274.8335 KOps/s $\textbf{\color{#35bf28}+6.96\%}$
test_keys_nested 96.9030μs 55.8942μs 17.8910 KOps/s 17.6889 KOps/s $\color{#35bf28}+1.14\%$
test_keys_nested_locked 2.5347ms 62.1416μs 16.0923 KOps/s 16.1348 KOps/s $\color{#d91a1a}-0.26\%$
test_keys_nested_leaf 74.1420μs 46.9139μs 21.3157 KOps/s 21.3000 KOps/s $\color{#35bf28}+0.07\%$
test_keys_stack_nested 84.8720μs 56.7365μs 17.6254 KOps/s 17.6302 KOps/s $\color{#d91a1a}-0.03\%$
test_keys_stack_nested_leaf 74.2420μs 46.9471μs 21.3006 KOps/s 20.5544 KOps/s $\color{#35bf28}+3.63\%$
test_keys_stack_nested_locked 0.1166ms 61.4624μs 16.2701 KOps/s 16.1109 KOps/s $\color{#35bf28}+0.99\%$
test_values 5.4752μs 0.8714μs 1.1476 MOps/s 1.1753 MOps/s $\color{#d91a1a}-2.35\%$
test_values_nested 72.4920μs 40.4113μs 24.7456 KOps/s 24.2922 KOps/s $\color{#35bf28}+1.87\%$
test_values_nested_locked 70.5510μs 42.2952μs 23.6434 KOps/s 23.3334 KOps/s $\color{#35bf28}+1.33\%$
test_values_nested_leaf 67.9020μs 34.9982μs 28.5729 KOps/s 28.0646 KOps/s $\color{#35bf28}+1.81\%$
test_values_stack_nested 78.5910μs 41.3561μs 24.1802 KOps/s 23.8290 KOps/s $\color{#35bf28}+1.47\%$
test_values_stack_nested_leaf 71.7220μs 35.8991μs 27.8559 KOps/s 27.6089 KOps/s $\color{#35bf28}+0.89\%$
test_values_stack_nested_locked 85.0520μs 43.0954μs 23.2044 KOps/s 22.8338 KOps/s $\color{#35bf28}+1.62\%$
test_membership 1.5476μs 0.5040μs 1.9842 MOps/s 1.9828 MOps/s $\color{#35bf28}+0.07\%$
test_membership_nested 19.1605μs 1.9089μs 523.8555 KOps/s 530.4487 KOps/s $\color{#d91a1a}-1.24\%$
test_membership_nested_leaf 13.4055μs 1.8915μs 528.6671 KOps/s 531.3800 KOps/s $\color{#d91a1a}-0.51\%$
test_membership_stacked_nested 29.7810μs 1.9695μs 507.7383 KOps/s 522.3695 KOps/s $\color{#d91a1a}-2.80\%$
test_membership_stacked_nested_leaf 32.6010μs 1.9825μs 504.4088 KOps/s 516.6656 KOps/s $\color{#d91a1a}-2.37\%$
test_membership_nested_last 38.4010μs 2.8505μs 350.8172 KOps/s 351.9531 KOps/s $\color{#d91a1a}-0.32\%$
test_membership_nested_leaf_last 26.1300μs 2.8229μs 354.2396 KOps/s 355.6954 KOps/s $\color{#d91a1a}-0.41\%$
test_membership_stacked_nested_last 29.0310μs 3.1879μs 313.6906 KOps/s 234.5753 KOps/s $\textbf{\color{#35bf28}+33.73\%}$
test_membership_stacked_nested_leaf_last 29.9410μs 3.2153μs 311.0106 KOps/s 237.3361 KOps/s $\textbf{\color{#35bf28}+31.04\%}$
test_nested_getleaf 35.0010μs 6.1846μs 161.6922 KOps/s 161.9744 KOps/s $\color{#d91a1a}-0.17\%$
test_nested_get 27.3600μs 5.7342μs 174.3936 KOps/s 172.6007 KOps/s $\color{#35bf28}+1.04\%$
test_stacked_getleaf 35.0400μs 6.0353μs 165.6916 KOps/s 164.5749 KOps/s $\color{#35bf28}+0.68\%$
test_stacked_get 33.0910μs 5.6195μs 177.9531 KOps/s 174.2211 KOps/s $\color{#35bf28}+2.14\%$
test_nested_getitemleaf 33.8610μs 6.1483μs 162.6457 KOps/s 161.1655 KOps/s $\color{#35bf28}+0.92\%$
test_nested_getitem 33.0800μs 5.7548μs 173.7666 KOps/s 172.0980 KOps/s $\color{#35bf28}+0.97\%$
test_stacked_getitemleaf 37.6710μs 6.0543μs 165.1723 KOps/s 163.3039 KOps/s $\color{#35bf28}+1.14\%$
test_stacked_getitem 33.9910μs 5.7794μs 173.0291 KOps/s 173.8919 KOps/s $\color{#d91a1a}-0.50\%$
test_lock_nested 5.0900ms 0.4207ms 2.3771 KOps/s 2.3530 KOps/s $\color{#35bf28}+1.02\%$
test_lock_stack_nested 0.4354ms 0.3843ms 2.6023 KOps/s 2.6160 KOps/s $\color{#d91a1a}-0.52\%$
test_unlock_nested 0.7607ms 0.3583ms 2.7913 KOps/s 2.7656 KOps/s $\color{#35bf28}+0.93\%$
test_unlock_stack_nested 0.3725ms 0.3240ms 3.0863 KOps/s 3.1061 KOps/s $\color{#d91a1a}-0.64\%$
test_flatten_speed 0.1495ms 69.6921μs 14.3488 KOps/s 14.2443 KOps/s $\color{#35bf28}+0.73\%$
test_unflatten_speed 0.3385ms 0.2808ms 3.5614 KOps/s 3.4071 KOps/s $\color{#35bf28}+4.53\%$
test_common_ops 1.5521ms 1.2773ms 782.9121 Ops/s 731.5360 Ops/s $\textbf{\color{#35bf28}+7.02\%}$
test_creation 33.5810μs 1.4832μs 674.1959 KOps/s 667.4252 KOps/s $\color{#35bf28}+1.01\%$
test_creation_empty 45.7410μs 15.5172μs 64.4445 KOps/s 65.0973 KOps/s $\color{#d91a1a}-1.00\%$
test_creation_nested_1 46.3510μs 17.3372μs 57.6794 KOps/s 57.3238 KOps/s $\color{#35bf28}+0.62\%$
test_creation_nested_2 65.6110μs 19.8274μs 50.4352 KOps/s 49.8204 KOps/s $\color{#35bf28}+1.23\%$
test_clone 59.8920μs 29.6163μs 33.7652 KOps/s 34.1279 KOps/s $\color{#d91a1a}-1.06\%$
test_getitem[int] 1.3547ms 16.2531μs 61.5269 KOps/s 56.8697 KOps/s $\textbf{\color{#35bf28}+8.19\%}$
test_getitem[slice_int] 0.1198ms 27.6207μs 36.2047 KOps/s 32.4668 KOps/s $\textbf{\color{#35bf28}+11.51\%}$
test_getitem[range] 0.2343ms 0.1131ms 8.8418 KOps/s 8.8031 KOps/s $\color{#35bf28}+0.44\%$
test_getitem[tuple] 0.1205ms 23.6230μs 42.3316 KOps/s 40.7698 KOps/s $\color{#35bf28}+3.83\%$
test_getitem[list] 0.2026ms 0.1022ms 9.7876 KOps/s 9.2848 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_setitem_dim[int] 70.6020μs 46.3912μs 21.5558 KOps/s 19.4015 KOps/s $\textbf{\color{#35bf28}+11.10\%}$
test_setitem_dim[slice_int] 97.1420μs 69.5090μs 14.3866 KOps/s 14.2352 KOps/s $\color{#35bf28}+1.06\%$
test_setitem_dim[range] 0.1595ms 0.1307ms 7.6490 KOps/s 7.5977 KOps/s $\color{#35bf28}+0.68\%$
test_setitem_dim[tuple] 0.1034ms 63.2289μs 15.8156 KOps/s 15.7832 KOps/s $\color{#35bf28}+0.21\%$
test_setitem 84.7130μs 42.5314μs 23.5120 KOps/s 23.7585 KOps/s $\color{#d91a1a}-1.04\%$
test_set 0.1153ms 41.6547μs 24.0069 KOps/s 24.1831 KOps/s $\color{#d91a1a}-0.73\%$
test_set_shared 0.3733ms 52.0617μs 19.2080 KOps/s 19.3457 KOps/s $\color{#d91a1a}-0.71\%$
test_update 0.3018ms 50.0804μs 19.9679 KOps/s 19.8293 KOps/s $\color{#35bf28}+0.70\%$
test_update_nested 0.1190ms 57.2637μs 17.4631 KOps/s 17.5418 KOps/s $\color{#d91a1a}-0.45\%$
test_update__nested 0.1036ms 60.4565μs 16.5408 KOps/s 16.6118 KOps/s $\color{#d91a1a}-0.43\%$
test_set_nested 0.1019ms 44.0056μs 22.7244 KOps/s 22.6947 KOps/s $\color{#35bf28}+0.13\%$
test_set_nested_new 0.1119ms 47.5709μs 21.0212 KOps/s 21.3297 KOps/s $\color{#d91a1a}-1.45\%$
test_select 0.1103ms 61.1058μs 16.3651 KOps/s 16.2440 KOps/s $\color{#35bf28}+0.75\%$
test_select_nested 82.4820μs 42.0248μs 23.7955 KOps/s 23.5172 KOps/s $\color{#35bf28}+1.18\%$
test_exclude_nested 0.1022ms 58.8460μs 16.9935 KOps/s 16.8691 KOps/s $\color{#35bf28}+0.74\%$
test_empty[True] 0.2960ms 0.2412ms 4.1465 KOps/s 4.0987 KOps/s $\color{#35bf28}+1.17\%$
test_empty[False] 4.1951μs 0.7357μs 1.3593 MOps/s 1.3490 MOps/s $\color{#35bf28}+0.76\%$
test_to 71.3820μs 24.8164μs 40.2960 KOps/s 38.6131 KOps/s $\color{#35bf28}+4.36\%$
test_to_nonblocking 61.9120μs 24.1171μs 41.4643 KOps/s 39.3731 KOps/s $\textbf{\color{#35bf28}+5.31\%}$
test_unbind_speed 0.3135ms 0.2823ms 3.5428 KOps/s 3.5121 KOps/s $\color{#35bf28}+0.87\%$
test_unbind_speed_stack0 0.3618ms 0.2816ms 3.5516 KOps/s 3.5456 KOps/s $\color{#35bf28}+0.17\%$
test_unbind_speed_stack1 93.3092ms 0.7086ms 1.4113 KOps/s 1.5315 KOps/s $\textbf{\color{#d91a1a}-7.85\%}$
test_split 95.4190ms 2.1751ms 459.7553 Ops/s 436.5296 Ops/s $\textbf{\color{#35bf28}+5.32\%}$
test_chunk 95.2672ms 2.1528ms 464.5174 Ops/s 428.9869 Ops/s $\textbf{\color{#35bf28}+8.28\%}$
test_creation[device0] 0.2907ms 0.1267ms 7.8901 KOps/s 7.5570 KOps/s $\color{#35bf28}+4.41\%$
test_creation_from_tensor 0.3594ms 0.1303ms 7.6749 KOps/s 7.4000 KOps/s $\color{#35bf28}+3.71\%$
test_add_one[memmap_tensor0] 0.2198ms 8.9249μs 112.0467 KOps/s 106.3867 KOps/s $\textbf{\color{#35bf28}+5.32\%}$
test_contiguous[memmap_tensor0] 33.0310μs 2.2021μs 454.1068 KOps/s 447.1720 KOps/s $\color{#35bf28}+1.55\%$
test_stack[memmap_tensor0] 51.4410μs 6.8241μs 146.5391 KOps/s 142.9476 KOps/s $\color{#35bf28}+2.51\%$
test_memmaptd_index 1.1631ms 0.4293ms 2.3293 KOps/s 2.2891 KOps/s $\color{#35bf28}+1.76\%$
test_memmaptd_index_astensor 0.7244ms 0.4791ms 2.0874 KOps/s 2.0080 KOps/s $\color{#35bf28}+3.96\%$
test_memmaptd_index_op 1.4160ms 1.0275ms 973.2404 Ops/s 912.6920 Ops/s $\textbf{\color{#35bf28}+6.63\%}$
test_serialize_model 0.1316s 0.1299s 7.6977 Ops/s 7.6824 Ops/s $\color{#35bf28}+0.20\%$
test_serialize_model_pickle 1.3515s 1.2121s 0.8250 Ops/s 0.8228 Ops/s $\color{#35bf28}+0.26\%$
test_serialize_weights 0.2253s 0.1426s 7.0132 Ops/s 7.0324 Ops/s $\color{#d91a1a}-0.27\%$
test_serialize_weights_returnearly 0.2336s 56.9592ms 17.5564 Ops/s 17.6422 Ops/s $\color{#d91a1a}-0.49\%$
test_serialize_weights_pickle 1.3718s 1.2164s 0.8221 Ops/s 0.8217 Ops/s $\color{#35bf28}+0.06\%$
test_reshape_pytree 63.6120μs 35.8947μs 27.8593 KOps/s 27.4861 KOps/s $\color{#35bf28}+1.36\%$
test_reshape_td 74.9420μs 42.1493μs 23.7252 KOps/s 23.3973 KOps/s $\color{#35bf28}+1.40\%$
test_view_pytree 66.3510μs 35.4150μs 28.2367 KOps/s 27.5089 KOps/s $\color{#35bf28}+2.65\%$
test_view_td 85.0620μs 46.0091μs 21.7348 KOps/s 20.8806 KOps/s $\color{#35bf28}+4.09\%$
test_unbind_pytree 63.9920μs 35.0768μs 28.5089 KOps/s 27.9981 KOps/s $\color{#35bf28}+1.82\%$
test_unbind_td 0.5109ms 43.7185μs 22.8736 KOps/s 22.9630 KOps/s $\color{#d91a1a}-0.39\%$
test_split_pytree 0.5287ms 47.0563μs 21.2511 KOps/s 21.3861 KOps/s $\color{#d91a1a}-0.63\%$
test_split_td 0.1476ms 55.9662μs 17.8679 KOps/s 17.5668 KOps/s $\color{#35bf28}+1.71\%$
test_add_pytree 0.1001ms 57.7554μs 17.3144 KOps/s 17.5550 KOps/s $\color{#d91a1a}-1.37\%$
test_add_td 0.1640ms 96.5171μs 10.3609 KOps/s 11.0197 KOps/s $\textbf{\color{#d91a1a}-5.98\%}$
test_compile_add_one_nested[tensordict-compile] 0.4282ms 0.2127ms 4.7012 KOps/s 4.6114 KOps/s $\color{#35bf28}+1.95\%$
test_compile_add_one_nested[tensordict-eager] 0.1979ms 0.1514ms 6.6038 KOps/s 6.6746 KOps/s $\color{#d91a1a}-1.06\%$
test_compile_add_one_nested[pytree-compile] 0.1830ms 0.1453ms 6.8841 KOps/s 6.8742 KOps/s $\color{#35bf28}+0.14\%$
test_compile_add_one_nested[pytree-eager] 0.2527ms 0.1855ms 5.3896 KOps/s 5.4415 KOps/s $\color{#d91a1a}-0.95\%$
test_compile_copy_nested[tensordict-compile] 50.8910μs 21.9777μs 45.5008 KOps/s 43.5984 KOps/s $\color{#35bf28}+4.36\%$
test_compile_copy_nested[tensordict-eager] 90.6420μs 44.2572μs 22.5952 KOps/s 22.5008 KOps/s $\color{#35bf28}+0.42\%$
test_compile_copy_nested[pytree-compile] 0.2377ms 63.1173μs 15.8435 KOps/s 15.6026 KOps/s $\color{#35bf28}+1.54\%$
test_compile_copy_nested[pytree-eager] 86.7320μs 49.0089μs 20.4045 KOps/s 20.4098 KOps/s $\color{#d91a1a}-0.03\%$
test_compile_add_one_flat[tensordict-compile] 0.3861ms 0.3217ms 3.1088 KOps/s 3.1230 KOps/s $\color{#d91a1a}-0.45\%$
test_compile_add_one_flat[tensordict-eager] 0.2824ms 0.2099ms 4.7632 KOps/s 4.7392 KOps/s $\color{#35bf28}+0.51\%$
test_compile_add_one_flat[tensorclass-compile] 0.1843ms 0.1287ms 7.7682 KOps/s 7.6369 KOps/s $\color{#35bf28}+1.72\%$
test_compile_add_one_flat[tensorclass-eager] 0.1101ms 59.8019μs 16.7219 KOps/s 15.7429 KOps/s $\textbf{\color{#35bf28}+6.22\%}$
test_compile_add_one_flat[pytree-compile] 0.3951ms 0.3221ms 3.1045 KOps/s 3.1058 KOps/s $\color{#d91a1a}-0.04\%$
test_compile_add_one_flat[pytree-eager] 0.6945ms 0.6423ms 1.5570 KOps/s 1.6058 KOps/s $\color{#d91a1a}-3.04\%$
test_compile_add_self_flat[tensordict-eager] 0.2947ms 0.2476ms 4.0386 KOps/s 4.0024 KOps/s $\color{#35bf28}+0.90\%$
test_compile_add_self_flat[tensordict-compile] 0.3825ms 0.3248ms 3.0785 KOps/s 3.0800 KOps/s $\color{#d91a1a}-0.05\%$
test_compile_add_self_flat[tensorclass-eager] 0.1159ms 69.4929μs 14.3900 KOps/s 13.7655 KOps/s $\color{#35bf28}+4.54\%$
test_compile_add_self_flat[tensorclass-compile] 0.1734ms 0.1308ms 7.6478 KOps/s 7.4615 KOps/s $\color{#35bf28}+2.50\%$
test_compile_add_self_flat[pytree-eager] 0.6038ms 0.5336ms 1.8741 KOps/s 1.8880 KOps/s $\color{#d91a1a}-0.74\%$
test_compile_add_self_flat[pytree-compile] 0.3989ms 0.3222ms 3.1035 KOps/s 3.1115 KOps/s $\color{#d91a1a}-0.26\%$
test_compile_copy_flat[tensordict-compile] 67.5010μs 18.5081μs 54.0304 KOps/s 55.1974 KOps/s $\color{#d91a1a}-2.11\%$
test_compile_copy_flat[tensordict-eager] 64.4020μs 26.7970μs 37.3176 KOps/s 37.1119 KOps/s $\color{#35bf28}+0.55\%$
test_compile_copy_flat[pytree-compile] 0.1107ms 69.4912μs 14.3903 KOps/s 14.5702 KOps/s $\color{#d91a1a}-1.23\%$
test_compile_copy_flat[pytree-eager] 79.6920μs 51.6724μs 19.3527 KOps/s 19.5388 KOps/s $\color{#d91a1a}-0.95\%$
test_compile_assign_and_add[tensordict-compile] 2.3169ms 0.8121ms 1.2314 KOps/s 1.1100 KOps/s $\textbf{\color{#35bf28}+10.94\%}$
test_compile_assign_and_add[tensordict-eager] 3.4347ms 3.2951ms 303.4788 Ops/s 300.5918 Ops/s $\color{#35bf28}+0.96\%$
test_compile_assign_and_add[pytree-compile] 2.3125ms 0.8151ms 1.2269 KOps/s 1.1244 KOps/s $\textbf{\color{#35bf28}+9.12\%}$
test_compile_assign_and_add[pytree-eager] 3.5630ms 3.3262ms 300.6429 Ops/s 304.4343 Ops/s $\color{#d91a1a}-1.25\%$
test_compile_indexing[tensor-tensordict-compile] 0.1528ms 0.1093ms 9.1467 KOps/s 8.8319 KOps/s $\color{#35bf28}+3.56\%$
test_compile_indexing[tensor-tensordict-eager] 0.1952ms 65.8807μs 15.1790 KOps/s 15.0117 KOps/s $\color{#35bf28}+1.11\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1496ms 0.1034ms 9.6672 KOps/s 9.5485 KOps/s $\color{#35bf28}+1.24\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1467ms 44.2961μs 22.5754 KOps/s 22.2333 KOps/s $\color{#35bf28}+1.54\%$
test_compile_indexing[tensor-pytree-compile] 0.1588ms 0.1086ms 9.2080 KOps/s 9.3104 KOps/s $\color{#d91a1a}-1.10\%$
test_compile_indexing[tensor-pytree-eager] 92.9220μs 44.2527μs 22.5975 KOps/s 22.4984 KOps/s $\color{#35bf28}+0.44\%$
test_compile_indexing[slice-tensordict-compile] 0.1989ms 0.1379ms 7.2541 KOps/s 7.1707 KOps/s $\color{#35bf28}+1.16\%$
test_compile_indexing[slice-tensordict-eager] 0.1634ms 25.4665μs 39.2673 KOps/s 38.1116 KOps/s $\color{#35bf28}+3.03\%$
test_compile_indexing[slice-tensorclass-compile] 0.1672ms 0.1318ms 7.5883 KOps/s 7.3971 KOps/s $\color{#35bf28}+2.59\%$
test_compile_indexing[slice-tensorclass-eager] 56.6620μs 20.3289μs 49.1910 KOps/s 46.8778 KOps/s $\color{#35bf28}+4.93\%$
test_compile_indexing[slice-pytree-compile] 0.1838ms 0.1331ms 7.5104 KOps/s 7.2176 KOps/s $\color{#35bf28}+4.06\%$
test_compile_indexing[slice-pytree-eager] 56.7810μs 20.4175μs 48.9777 KOps/s 47.1847 KOps/s $\color{#35bf28}+3.80\%$
test_compile_indexing[int-tensordict-compile] 0.1812ms 0.1394ms 7.1743 KOps/s 7.1279 KOps/s $\color{#35bf28}+0.65\%$
test_compile_indexing[int-tensordict-eager] 0.4911ms 24.5580μs 40.7199 KOps/s 38.7132 KOps/s $\textbf{\color{#35bf28}+5.18\%}$
test_compile_indexing[int-tensorclass-compile] 0.1966ms 0.1340ms 7.4601 KOps/s 7.3090 KOps/s $\color{#35bf28}+2.07\%$
test_compile_indexing[int-tensorclass-eager] 0.1541ms 22.5983μs 44.2511 KOps/s 46.9272 KOps/s $\textbf{\color{#d91a1a}-5.70\%}$
test_compile_indexing[int-pytree-compile] 0.1854ms 0.1338ms 7.4711 KOps/s 7.4841 KOps/s $\color{#d91a1a}-0.17\%$
test_compile_indexing[int-pytree-eager] 65.9520μs 20.5969μs 48.5509 KOps/s 47.6069 KOps/s $\color{#35bf28}+1.98\%$
test_mod_add[eager] 81.6420μs 32.0422μs 31.2088 KOps/s 30.4546 KOps/s $\color{#35bf28}+2.48\%$
test_mod_add[compile] 0.3827ms 69.8231μs 14.3219 KOps/s 13.9641 KOps/s $\color{#35bf28}+2.56\%$
test_mod_add[compile-overhead] 0.2627ms 0.1364ms 7.3301 KOps/s 7.0108 KOps/s $\color{#35bf28}+4.55\%$
test_mod_wrap[eager] 0.3235ms 0.2443ms 4.0935 KOps/s 4.0007 KOps/s $\color{#35bf28}+2.32\%$
test_mod_wrap[compile] 1.4681ms 0.2998ms 3.3359 KOps/s 3.1661 KOps/s $\textbf{\color{#35bf28}+5.36\%}$
test_mod_wrap[compile-overhead] 7.6595ms 4.0040ms 249.7505 Ops/s 248.9984 Ops/s $\color{#35bf28}+0.30\%$
test_mod_wrap_and_backward[eager] 1.4577ms 1.3667ms 731.7052 Ops/s 687.6753 Ops/s $\textbf{\color{#35bf28}+6.40\%}$
test_mod_wrap_and_backward[compile] 1.5795ms 1.3348ms 749.1638 Ops/s 686.2619 Ops/s $\textbf{\color{#35bf28}+9.17\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3432ms 0.9067ms 1.1029 KOps/s 971.2357 Ops/s $\textbf{\color{#35bf28}+13.56\%}$
test_seq_add[eager] 0.1498ms 97.6527μs 10.2404 KOps/s 10.1878 KOps/s $\color{#35bf28}+0.52\%$
test_seq_add[compile] 0.1477ms 81.0903μs 12.3319 KOps/s 12.1919 KOps/s $\color{#35bf28}+1.15\%$
test_seq_add[compile-overhead] 0.1535ms 0.1148ms 8.7102 KOps/s 8.5528 KOps/s $\color{#35bf28}+1.84\%$
test_seq_wrap[eager] 0.4456ms 0.3875ms 2.5808 KOps/s 2.5402 KOps/s $\color{#35bf28}+1.60\%$
test_seq_wrap[compile] 0.3812ms 0.3176ms 3.1487 KOps/s 3.1004 KOps/s $\color{#35bf28}+1.56\%$
test_seq_wrap[compile-overhead] 0.3023ms 0.2229ms 4.4871 KOps/s 4.4311 KOps/s $\color{#35bf28}+1.26\%$
test_func_call_runtime[False-eager] 0.8167ms 0.7386ms 1.3540 KOps/s 1.3303 KOps/s $\color{#35bf28}+1.78\%$
test_func_call_runtime[False-compile] 0.8794ms 0.7999ms 1.2502 KOps/s 1.2299 KOps/s $\color{#35bf28}+1.65\%$
test_func_call_runtime[False-compile-overhead] 0.4139ms 0.3626ms 2.7579 KOps/s 2.7281 KOps/s $\color{#35bf28}+1.09\%$
test_func_call_runtime[True-eager] 0.9725ms 0.9013ms 1.1095 KOps/s 1.0722 KOps/s $\color{#35bf28}+3.48\%$
test_func_call_runtime[True-compile] 0.9312ms 0.8344ms 1.1985 KOps/s 1.1780 KOps/s $\color{#35bf28}+1.74\%$
test_func_call_runtime[True-compile-overhead] 0.4542ms 0.3984ms 2.5100 KOps/s 2.4984 KOps/s $\color{#35bf28}+0.46\%$
test_func_call_cm_runtime[False-eager] 0.8102ms 0.7407ms 1.3501 KOps/s 1.2517 KOps/s $\textbf{\color{#35bf28}+7.86\%}$
test_func_call_cm_runtime[False-compile] 0.9490ms 0.8051ms 1.2421 KOps/s 1.2227 KOps/s $\color{#35bf28}+1.59\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4387ms 0.3664ms 2.7295 KOps/s 2.7347 KOps/s $\color{#d91a1a}-0.19\%$
test_func_call_cm_runtime[True-eager] 1.1212ms 1.0030ms 996.9759 Ops/s 983.8462 Ops/s $\color{#35bf28}+1.33\%$
test_func_call_cm_runtime[True-compile] 0.9491ms 0.8624ms 1.1595 KOps/s 1.1391 KOps/s $\color{#35bf28}+1.79\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4832ms 0.4234ms 2.3617 KOps/s 2.3428 KOps/s $\color{#35bf28}+0.80\%$
test_vmap_func_call_cm_runtime[eager] 2.5686ms 2.0924ms 477.9122 Ops/s 475.5572 Ops/s $\color{#35bf28}+0.50\%$
test_vmap_func_call_cm_runtime[compile] 0.9772ms 0.8818ms 1.1341 KOps/s 1.1198 KOps/s $\color{#35bf28}+1.28\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4791ms 0.4309ms 2.3205 KOps/s 2.3269 KOps/s $\color{#d91a1a}-0.28\%$
test_distributed 2.2133ms 0.2002ms 4.9944 KOps/s 8.9291 KOps/s $\textbf{\color{#d91a1a}-44.07\%}$
test_tdmodule 80.4520μs 15.0300μs 66.5335 KOps/s 63.5575 KOps/s $\color{#35bf28}+4.68\%$
test_tdmodule_dispatch 57.8110μs 28.7745μs 34.7530 KOps/s 34.6011 KOps/s $\color{#35bf28}+0.44\%$
test_tdseq 42.6210μs 16.0971μs 62.1231 KOps/s 63.1077 KOps/s $\color{#d91a1a}-1.56\%$
test_tdseq_dispatch 56.8020μs 32.5273μs 30.7434 KOps/s 31.2041 KOps/s $\color{#d91a1a}-1.48\%$
test_instantiation_functorch 2.4227ms 1.8886ms 529.5004 Ops/s 522.7627 Ops/s $\color{#35bf28}+1.29\%$
test_instantiation_td 1.7868ms 1.2015ms 832.2859 Ops/s 826.2625 Ops/s $\color{#35bf28}+0.73\%$
test_exec_functorch 0.2819ms 0.2080ms 4.8078 KOps/s 4.6742 KOps/s $\color{#35bf28}+2.86\%$
test_exec_functional_call 0.2703ms 0.2120ms 4.7172 KOps/s 4.6576 KOps/s $\color{#35bf28}+1.28\%$
test_exec_td 0.2799ms 0.2180ms 4.5862 KOps/s 4.5472 KOps/s $\color{#35bf28}+0.86\%$
test_exec_td_decorator 0.6798ms 0.2584ms 3.8697 KOps/s 3.7960 KOps/s $\color{#35bf28}+1.94\%$
test_vmap_mlp_speed[True-True] 0.7645ms 0.6906ms 1.4479 KOps/s 1.4324 KOps/s $\color{#35bf28}+1.09\%$
test_vmap_mlp_speed[True-False] 0.7468ms 0.6868ms 1.4561 KOps/s 1.4434 KOps/s $\color{#35bf28}+0.88\%$
test_vmap_mlp_speed[False-True] 0.7086ms 0.5804ms 1.7230 KOps/s 1.6704 KOps/s $\color{#35bf28}+3.15\%$
test_vmap_mlp_speed[False-False] 0.6687ms 0.6078ms 1.6451 KOps/s 1.7065 KOps/s $\color{#d91a1a}-3.60\%$
test_vmap_mlp_speed_decorator[True-True] 1.4322ms 0.6822ms 1.4659 KOps/s 1.4666 KOps/s $\color{#d91a1a}-0.05\%$
test_vmap_mlp_speed_decorator[True-False] 0.8429ms 0.6807ms 1.4691 KOps/s 1.4720 KOps/s $\color{#d91a1a}-0.19\%$
test_vmap_mlp_speed_decorator[False-True] 0.7100ms 0.6085ms 1.6434 KOps/s 1.6749 KOps/s $\color{#d91a1a}-1.88\%$
test_vmap_mlp_speed_decorator[False-False] 0.7492ms 0.6256ms 1.5985 KOps/s 1.6477 KOps/s $\color{#d91a1a}-2.99\%$
test_vmap_transformer_speed[True-True] 8.8495ms 8.4518ms 118.3179 Ops/s 117.7615 Ops/s $\color{#35bf28}+0.47\%$
test_vmap_transformer_speed[True-False] 8.9342ms 8.4537ms 118.2908 Ops/s 117.7776 Ops/s $\color{#35bf28}+0.44\%$
test_vmap_transformer_speed[False-True] 8.4434ms 8.1908ms 122.0881 Ops/s 120.7464 Ops/s $\color{#35bf28}+1.11\%$
test_vmap_transformer_speed[False-False] 8.3043ms 8.1979ms 121.9827 Ops/s 119.8967 Ops/s $\color{#35bf28}+1.74\%$
test_vmap_transformer_speed_decorator[True-True] 19.8267ms 19.7100ms 50.7356 Ops/s 50.6794 Ops/s $\color{#35bf28}+0.11\%$
test_vmap_transformer_speed_decorator[True-False] 20.7671ms 19.8264ms 50.4379 Ops/s 50.1700 Ops/s $\color{#35bf28}+0.53\%$
test_vmap_transformer_speed_decorator[False-True] 20.7505ms 19.6091ms 50.9968 Ops/s 51.3185 Ops/s $\color{#d91a1a}-0.63\%$
test_vmap_transformer_speed_decorator[False-False] 19.6557ms 19.5184ms 51.2338 Ops/s 51.1055 Ops/s $\color{#35bf28}+0.25\%$
test_to_module_speed[True] 1.2098ms 0.9383ms 1.0657 KOps/s 1.0593 KOps/s $\color{#35bf28}+0.61\%$
test_to_module_speed[False] 1.3441ms 0.9228ms 1.0837 KOps/s 1.0953 KOps/s $\color{#d91a1a}-1.06\%$
test_tc_init 62.3120μs 32.5688μs 30.7042 KOps/s 30.8415 KOps/s $\color{#d91a1a}-0.44\%$
test_tc_init_nested 0.1038ms 66.6339μs 15.0074 KOps/s 15.5366 KOps/s $\color{#d91a1a}-3.41\%$
test_tc_first_layer_tensor 5.3887μs 0.6797μs 1.4713 MOps/s 1.4640 MOps/s $\color{#35bf28}+0.50\%$
test_tc_first_layer_nontensor 33.0610μs 2.2435μs 445.7403 KOps/s 441.3346 KOps/s $\color{#35bf28}+1.00\%$
test_tc_second_layer_tensor 47.2713μs 1.3843μs 722.3918 KOps/s 730.4920 KOps/s $\color{#d91a1a}-1.11\%$
test_tc_second_layer_nontensor 31.7110μs 2.9376μs 340.4139 KOps/s 341.8278 KOps/s $\color{#d91a1a}-0.41\%$
test_unbind 0.1956s 12.2958ms 81.3286 Ops/s 90.4173 Ops/s $\textbf{\color{#d91a1a}-10.05\%}$
test_full_like 0.6570ms 0.5756ms 1.7373 KOps/s 1.7427 KOps/s $\color{#d91a1a}-0.31\%$
test_zeros_like 0.2836ms 0.1980ms 5.0506 KOps/s 5.0494 KOps/s $\color{#35bf28}+0.03\%$
test_ones_like 0.2333ms 0.1979ms 5.0529 KOps/s 5.0547 KOps/s $\color{#d91a1a}-0.03\%$
test_clone 0.4779ms 0.4149ms 2.4102 KOps/s 2.4117 KOps/s $\color{#d91a1a}-0.06\%$
test_squeeze 38.1210μs 9.8297μs 101.7323 KOps/s 99.6491 KOps/s $\color{#35bf28}+2.09\%$
test_unsqueeze 0.2800ms 75.0819μs 13.3188 KOps/s 13.1423 KOps/s $\color{#35bf28}+1.34\%$
test_split 0.2596ms 0.1534ms 6.5206 KOps/s 6.3078 KOps/s $\color{#35bf28}+3.37\%$
test_permute 0.2385ms 0.1743ms 5.7369 KOps/s 5.5181 KOps/s $\color{#35bf28}+3.97\%$
test_stack 1.2546ms 0.8439ms 1.1850 KOps/s 1.1658 KOps/s $\color{#35bf28}+1.65\%$
test_cat 1.2476ms 1.2314ms 812.0726 Ops/s 811.7995 Ops/s $\color{#35bf28}+0.03\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Sep 20, 2024
ghstack-source-id: 18a5798c5377d3e5b65e7b6c87d59917c474fd64
Pull Request resolved: #1004
@vmoens vmoens changed the title [BugFix] Fix parsing integer batch size in AOT [BugFix] Fix parsing integer batch size within export Sep 20, 2024
x_new, y_new = torch.zeros(5, 100), torch.zeros(5, 100)
export_test = export_mod(x_new, y_new)
eager_test = test(x_new, y_new)
assert eager_test.batch_size == export_test.batch_size
Copy link
Contributor Author

@vmoens vmoens Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang this test fails when using dynamic shape - the eager shape is [5] but the export is [].
Both across strict=False and True.

The batch size [s0] becomes [] when using dynamic shapes and when the 2nd output shape mismatches the 1st.

We do get a warning though

W0920 10:19:28.564000 20340 torch/fx/experimental/symbolic_shapes.py:5136] Ignored guard Eq(s0, 5) == False, this could result in accuracy problems

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there's something a bit nontrivial going on here. In torch.compile eager, if we produce a fresh TensorDict and that TensorDict holds a list of dynamic ints, then in the residual bytecode we have to construct the TensorDict and also put in the freshly computed dynamic shapes from the FX graph (that has some int outputs now). So actually building a TensorDict isn't just a matter of putting in the right tensors, you also have to put some ints in too. Does this work?

Assuming this does work, export also has to be setup to do the same thing as well. It wouldn't be surprising if it didn't. In particular, if all export is doing is a pytree unflatten on Tensor leaves, the batch size won't be modified at all. To address this, we need to fix the export bug. But I also saw the comment about TensorDict not being pytree-able, so I am uncertain about the status there.

If you want to workaround, perhaps batch size can store rank instead of size and lazily compute it from tensor if it's not set? Better to fix things though. Just not sure what you expect to work and not work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this does work, export also has to be setup to do the same thing as well. It wouldn't be surprising if it didn't. In particular, if all export is doing is a pytree unflatten on Tensor leaves, the batch size won't be modified at all. To address this, we need to fix the export bug. But I also saw the comment about TensorDict not being pytree-able, so I am uncertain about the status there.

TensorDict is pytreeable but you can deactivate it, this is what the comment is about (don't do it or the test will fail)

Copy link
Contributor Author

@vmoens vmoens Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what works and what doesn't

    class Test(torch.nn.Module):
            def forward(self, x: torch.Tensor, y: torch.Tensor):
                return TensorDict(
                    {
                        "x": x,
                        "y": y,
                    },
                    batch_size=x.shape[0],
                )
     x, y = torch.zeros(5, 100), torch.zeros(5, 100)
     result = torch.export.export(test, args=(x, y), strict=False, dynamic_shapes={
                    "x": {0: torch.export.Dim("batch"), 1: torch.export.Dim("time")},
                    "y": {0: torch.export.Dim("batch"), 1: torch.export.Dim("time")},
                })
    result = torch.export.export(test, args=(x, y), strict=False, **kwargs)
    export_mod = result.module()
    x_new, y_new = torch.zeros(5, 100), torch.zeros(5, 100)
    export_test = export_mod(x_new, y_new)
    eager_test = test(x_new, y_new)
    assert torch.Size([5]) == eager_test.batch_size == export_test.batch_size # Works because x and x_new have the same shape

    x_new, y_new = torch.zeros(2, 100), torch.zeros(2, 100)
    export_test = export_mod(x_new, y_new)
    eager_test = test(x_new, y_new)
    assert torch.Size([2]) == eager_test.batch_size == export_test.batch_size # Fails! now export_test.batch_size is torch.Size([])

So it's a weird behaviour, the SymInt just vanished into thin air in the second case

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 21, 2024
ghstack-source-id: ffd60b71e6e9424b81eeabee77fb8710589f6cae
Pull Request resolved: #1004
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] TensorDict with dynamic, input-dependent batch_size is not torch.export.exportable
3 participants