Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] map_names for composite dists #809

Merged
merged 2 commits into from
Jun 10, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 10, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 10, 2024
@vmoens
Copy link
Contributor Author

vmoens commented Jun 10, 2024

See pytorch/rl#2167

@vmoens vmoens added enhancement New feature or request labels Jun 10, 2024
@vmoens vmoens merged commit c8df202 into main Jun 10, 2024
21 of 33 checks passed
@vmoens vmoens deleted the named-compositedist-output branch June 10, 2024 14:36
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}27$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 31.8790μs 16.8143μs 59.4730 KOps/s 54.5038 KOps/s $\textbf{\color{#35bf28}+9.12\%}$
test_plain_set_stack_nested 68.7680μs 17.1376μs 58.3511 KOps/s 54.3385 KOps/s $\textbf{\color{#35bf28}+7.38\%}$
test_plain_set_nested_inplace 55.6440μs 19.2848μs 51.8544 KOps/s 48.5122 KOps/s $\textbf{\color{#35bf28}+6.89\%}$
test_plain_set_stack_nested_inplace 77.4050μs 19.2770μs 51.8753 KOps/s 49.0593 KOps/s $\textbf{\color{#35bf28}+5.74\%}$
test_items 27.0510μs 2.5107μs 398.2896 KOps/s 390.5026 KOps/s $\color{#35bf28}+1.99\%$
test_items_nested 1.3040ms 0.2771ms 3.6087 KOps/s 3.6974 KOps/s $\color{#d91a1a}-2.40\%$
test_items_nested_locked 0.4595ms 0.2849ms 3.5105 KOps/s 3.6689 KOps/s $\color{#d91a1a}-4.32\%$
test_items_nested_leaf 0.1374ms 77.2897μs 12.9383 KOps/s 12.6904 KOps/s $\color{#35bf28}+1.95\%$
test_items_stack_nested 0.3980ms 0.2783ms 3.5932 KOps/s 3.7058 KOps/s $\color{#d91a1a}-3.04\%$
test_items_stack_nested_leaf 0.1383ms 78.0929μs 12.8053 KOps/s 12.3716 KOps/s $\color{#35bf28}+3.51\%$
test_items_stack_nested_locked 0.4654ms 0.2767ms 3.6136 KOps/s 3.7140 KOps/s $\color{#d91a1a}-2.70\%$
test_keys 30.6870μs 3.7674μs 265.4377 KOps/s 254.8918 KOps/s $\color{#35bf28}+4.14\%$
test_keys_nested 0.2292ms 0.1369ms 7.3034 KOps/s 6.9295 KOps/s $\textbf{\color{#35bf28}+5.40\%}$
test_keys_nested_locked 0.7355ms 0.1421ms 7.0395 KOps/s 6.7785 KOps/s $\color{#35bf28}+3.85\%$
test_keys_nested_leaf 0.2119ms 0.1170ms 8.5492 KOps/s 8.5187 KOps/s $\color{#35bf28}+0.36\%$
test_keys_stack_nested 0.2300ms 0.1373ms 7.2840 KOps/s 7.0484 KOps/s $\color{#35bf28}+3.34\%$
test_keys_stack_nested_leaf 0.1990ms 0.1160ms 8.6201 KOps/s 8.3646 KOps/s $\color{#35bf28}+3.05\%$
test_keys_stack_nested_locked 0.2528ms 0.1420ms 7.0432 KOps/s 6.8531 KOps/s $\color{#35bf28}+2.77\%$
test_values 11.3788μs 1.1789μs 848.2213 KOps/s 816.7362 KOps/s $\color{#35bf28}+3.85\%$
test_values_nested 0.1149ms 50.9994μs 19.6081 KOps/s 18.8143 KOps/s $\color{#35bf28}+4.22\%$
test_values_nested_locked 0.1069ms 50.6815μs 19.7311 KOps/s 19.2342 KOps/s $\color{#35bf28}+2.58\%$
test_values_nested_leaf 0.1021ms 46.2830μs 21.6062 KOps/s 20.8158 KOps/s $\color{#35bf28}+3.80\%$
test_values_stack_nested 0.1040ms 51.6593μs 19.3576 KOps/s 18.5309 KOps/s $\color{#35bf28}+4.46\%$
test_values_stack_nested_leaf 98.5240μs 45.4683μs 21.9933 KOps/s 20.9040 KOps/s $\textbf{\color{#35bf28}+5.21\%}$
test_values_stack_nested_locked 98.6450μs 51.4971μs 19.4186 KOps/s 18.7215 KOps/s $\color{#35bf28}+3.72\%$
test_membership 22.1920μs 1.3513μs 740.0370 KOps/s 736.8167 KOps/s $\color{#35bf28}+0.44\%$
test_membership_nested 45.3650μs 3.4738μs 287.8708 KOps/s 293.5836 KOps/s $\color{#d91a1a}-1.95\%$
test_membership_nested_leaf 38.9130μs 3.4506μs 289.8044 KOps/s 292.7973 KOps/s $\color{#d91a1a}-1.02\%$
test_membership_stacked_nested 18.1940μs 3.4686μs 288.3022 KOps/s 294.8410 KOps/s $\color{#d91a1a}-2.22\%$
test_membership_stacked_nested_leaf 38.4120μs 3.4627μs 288.7948 KOps/s 291.9745 KOps/s $\color{#d91a1a}-1.09\%$
test_membership_nested_last 33.6830μs 4.1863μs 238.8724 KOps/s 238.0334 KOps/s $\color{#35bf28}+0.35\%$
test_membership_nested_leaf_last 40.3260μs 4.2349μs 236.1308 KOps/s 240.6240 KOps/s $\color{#d91a1a}-1.87\%$
test_membership_stacked_nested_last 45.7850μs 5.2262μs 191.3454 KOps/s 236.2173 KOps/s $\textbf{\color{#d91a1a}-19.00\%}$
test_membership_stacked_nested_leaf_last 48.9210μs 5.3059μs 188.4704 KOps/s 236.9515 KOps/s $\textbf{\color{#d91a1a}-20.46\%}$
test_nested_getleaf 0.1985ms 10.6628μs 93.7841 KOps/s 93.2969 KOps/s $\color{#35bf28}+0.52\%$
test_nested_get 44.7040μs 10.0860μs 99.1476 KOps/s 100.8561 KOps/s $\color{#d91a1a}-1.69\%$
test_stacked_getleaf 54.5820μs 10.4087μs 96.0737 KOps/s 96.1903 KOps/s $\color{#d91a1a}-0.12\%$
test_stacked_get 49.8130μs 9.8976μs 101.0342 KOps/s 100.4825 KOps/s $\color{#35bf28}+0.55\%$
test_nested_getitemleaf 39.9540μs 11.0661μs 90.3659 KOps/s 90.7825 KOps/s $\color{#d91a1a}-0.46\%$
test_nested_getitem 56.1050μs 10.2950μs 97.1345 KOps/s 96.2044 KOps/s $\color{#35bf28}+0.97\%$
test_stacked_getitemleaf 0.1277ms 11.0152μs 90.7836 KOps/s 90.9248 KOps/s $\color{#d91a1a}-0.16\%$
test_stacked_getitem 98.3040μs 10.2170μs 97.8756 KOps/s 98.9146 KOps/s $\color{#d91a1a}-1.05\%$
test_lock_nested 4.3913ms 0.3580ms 2.7935 KOps/s 2.7848 KOps/s $\color{#35bf28}+0.31\%$
test_lock_stack_nested 0.5439ms 0.3137ms 3.1877 KOps/s 3.1417 KOps/s $\color{#35bf28}+1.46\%$
test_unlock_nested 0.8870ms 0.3626ms 2.7579 KOps/s 2.4854 KOps/s $\textbf{\color{#35bf28}+10.96\%}$
test_unlock_stack_nested 0.6547ms 0.3202ms 3.1233 KOps/s 3.0940 KOps/s $\color{#35bf28}+0.95\%$
test_flatten_speed 0.5332ms 95.8344μs 10.4347 KOps/s 10.3504 KOps/s $\color{#35bf28}+0.81\%$
test_unflatten_speed 0.8861ms 0.4113ms 2.4316 KOps/s 2.4065 KOps/s $\color{#35bf28}+1.04\%$
test_common_ops 3.4339ms 0.7129ms 1.4027 KOps/s 1.3414 KOps/s $\color{#35bf28}+4.57\%$
test_creation 54.7420μs 1.8878μs 529.7113 KOps/s 520.4356 KOps/s $\color{#35bf28}+1.78\%$
test_creation_empty 45.9560μs 10.7946μs 92.6387 KOps/s 81.2010 KOps/s $\textbf{\color{#35bf28}+14.09\%}$
test_creation_nested_1 37.2790μs 13.6453μs 73.2852 KOps/s 67.1438 KOps/s $\textbf{\color{#35bf28}+9.15\%}$
test_creation_nested_2 75.2010μs 16.7778μs 59.6027 KOps/s 54.9433 KOps/s $\textbf{\color{#35bf28}+8.48\%}$
test_clone 89.0670μs 13.6967μs 73.0105 KOps/s 73.9217 KOps/s $\color{#d91a1a}-1.23\%$
test_getitem[int] 39.2940μs 11.5677μs 86.4476 KOps/s 87.7734 KOps/s $\color{#d91a1a}-1.51\%$
test_getitem[slice_int] 4.4507ms 22.7964μs 43.8666 KOps/s 43.8757 KOps/s $\color{#d91a1a}-0.02\%$
test_getitem[range] 81.3620μs 59.6294μs 16.7703 KOps/s 17.1147 KOps/s $\color{#d91a1a}-2.01\%$
test_getitem[tuple] 80.3200μs 19.0195μs 52.5778 KOps/s 52.4934 KOps/s $\color{#35bf28}+0.16\%$
test_getitem[list] 0.1244ms 40.5087μs 24.6860 KOps/s 23.8801 KOps/s $\color{#35bf28}+3.37\%$
test_setitem_dim[int] 63.3790μs 34.3079μs 29.1478 KOps/s 26.8431 KOps/s $\textbf{\color{#35bf28}+8.59\%}$
test_setitem_dim[slice_int] 0.1052ms 59.6781μs 16.7566 KOps/s 15.7056 KOps/s $\textbf{\color{#35bf28}+6.69\%}$
test_setitem_dim[range] 0.1177ms 82.4164μs 12.1335 KOps/s 11.2161 KOps/s $\textbf{\color{#35bf28}+8.18\%}$
test_setitem_dim[tuple] 86.7830μs 48.4529μs 20.6386 KOps/s 18.8844 KOps/s $\textbf{\color{#35bf28}+9.29\%}$
test_setitem 61.2550μs 20.4973μs 48.7868 KOps/s 47.0906 KOps/s $\color{#35bf28}+3.60\%$
test_set 80.1210μs 20.1872μs 49.5363 KOps/s 47.3562 KOps/s $\color{#35bf28}+4.60\%$
test_set_shared 1.6975ms 0.1376ms 7.2671 KOps/s 6.9242 KOps/s $\color{#35bf28}+4.95\%$
test_update 0.1074ms 22.6861μs 44.0800 KOps/s 41.7420 KOps/s $\textbf{\color{#35bf28}+5.60\%}$
test_update_nested 99.3890μs 30.7050μs 32.5680 KOps/s 31.1777 KOps/s $\color{#35bf28}+4.46\%$
test_update__nested 98.0130μs 25.7736μs 38.7993 KOps/s 37.6062 KOps/s $\color{#35bf28}+3.17\%$
test_set_nested 97.0320μs 22.1365μs 45.1742 KOps/s 43.9610 KOps/s $\color{#35bf28}+2.76\%$
test_set_nested_new 79.9900μs 26.5534μs 37.6600 KOps/s 37.1647 KOps/s $\color{#35bf28}+1.33\%$
test_select 0.1062ms 41.6610μs 24.0033 KOps/s 23.2255 KOps/s $\color{#35bf28}+3.35\%$
test_select_nested 0.9942ms 61.6164μs 16.2294 KOps/s 16.3196 KOps/s $\color{#d91a1a}-0.55\%$
test_exclude_nested 0.2409ms 0.1213ms 8.2443 KOps/s 8.1900 KOps/s $\color{#35bf28}+0.66\%$
test_empty[True] 0.6337ms 0.4019ms 2.4883 KOps/s 2.4593 KOps/s $\color{#35bf28}+1.18\%$
test_empty[False] 12.1428μs 1.1887μs 841.2842 KOps/s 854.3987 KOps/s $\color{#d91a1a}-1.53\%$
test_unbind_speed 4.3537ms 0.2798ms 3.5745 KOps/s 3.7726 KOps/s $\textbf{\color{#d91a1a}-5.25\%}$
test_unbind_speed_stack0 0.4148ms 0.2518ms 3.9719 KOps/s 3.8854 KOps/s $\color{#35bf28}+2.23\%$
test_unbind_speed_stack1 68.1481ms 0.7247ms 1.3798 KOps/s 1.2599 KOps/s $\textbf{\color{#35bf28}+9.51\%}$
test_split 67.1730ms 1.6300ms 613.4966 Ops/s 618.9558 Ops/s $\color{#d91a1a}-0.88\%$
test_chunk 68.8898ms 1.6262ms 614.9385 Ops/s 662.2592 Ops/s $\textbf{\color{#d91a1a}-7.15\%}$
test_creation[device0] 0.2046ms 83.5820μs 11.9643 KOps/s 11.5201 KOps/s $\color{#35bf28}+3.86\%$
test_creation_from_tensor 0.2383ms 84.5034μs 11.8338 KOps/s 11.4785 KOps/s $\color{#35bf28}+3.10\%$
test_add_one[memmap_tensor0] 66.7050μs 5.5010μs 181.7851 KOps/s 179.6485 KOps/s $\color{#35bf28}+1.19\%$
test_contiguous[memmap_tensor0] 11.7120μs 0.6296μs 1.5883 MOps/s 1.5185 MOps/s $\color{#35bf28}+4.60\%$
test_stack[memmap_tensor0] 14.4580μs 3.6599μs 273.2344 KOps/s 279.2174 KOps/s $\color{#d91a1a}-2.14\%$
test_memmaptd_index 1.0975ms 0.2706ms 3.6955 KOps/s 3.9130 KOps/s $\textbf{\color{#d91a1a}-5.56\%}$
test_memmaptd_index_astensor 1.1360ms 0.3491ms 2.8645 KOps/s 2.9853 KOps/s $\color{#d91a1a}-4.04\%$
test_memmaptd_index_op 0.9943ms 0.6297ms 1.5880 KOps/s 1.4105 KOps/s $\textbf{\color{#35bf28}+12.59\%}$
test_serialize_model 0.1838s 0.1136s 8.7998 Ops/s 8.2573 Ops/s $\textbf{\color{#35bf28}+6.57\%}$
test_serialize_model_pickle 0.4516s 0.3804s 2.6291 Ops/s 2.5736 Ops/s $\color{#35bf28}+2.16\%$
test_serialize_weights 0.1118s 0.1050s 9.5236 Ops/s 8.5132 Ops/s $\textbf{\color{#35bf28}+11.87\%}$
test_serialize_weights_returnearly 0.2002s 0.1401s 7.1369 Ops/s 7.7649 Ops/s $\textbf{\color{#d91a1a}-8.09\%}$
test_serialize_weights_pickle 1.0436s 0.6031s 1.6580 Ops/s 1.3937 Ops/s $\textbf{\color{#35bf28}+18.96\%}$
test_serialize_weights_filesystem 0.1031s 94.3751ms 10.5960 Ops/s 9.3781 Ops/s $\textbf{\color{#35bf28}+12.99\%}$
test_serialize_model_filesystem 0.1087s 97.9012ms 10.2144 Ops/s 9.4168 Ops/s $\textbf{\color{#35bf28}+8.47\%}$
test_reshape_pytree 65.2220μs 25.3078μs 39.5135 KOps/s 38.1383 KOps/s $\color{#35bf28}+3.61\%$
test_reshape_td 86.0410μs 34.6651μs 28.8475 KOps/s 28.9214 KOps/s $\color{#d91a1a}-0.26\%$
test_view_pytree 68.7890μs 25.4230μs 39.3345 KOps/s 38.2858 KOps/s $\color{#35bf28}+2.74\%$
test_view_td 82.5750μs 38.3329μs 26.0873 KOps/s 25.7733 KOps/s $\color{#35bf28}+1.22\%$
test_unbind_pytree 62.9580μs 29.4388μs 33.9688 KOps/s 33.0548 KOps/s $\color{#35bf28}+2.76\%$
test_unbind_td 0.4295ms 38.1801μs 26.1916 KOps/s 26.2425 KOps/s $\color{#d91a1a}-0.19\%$
test_split_pytree 64.7210μs 29.1272μs 34.3321 KOps/s 33.3794 KOps/s $\color{#35bf28}+2.85\%$
test_split_td 0.1240ms 41.5013μs 24.0957 KOps/s 24.3786 KOps/s $\color{#d91a1a}-1.16\%$
test_add_pytree 0.1179ms 35.4673μs 28.1950 KOps/s 28.3758 KOps/s $\color{#d91a1a}-0.64\%$
test_add_td 0.1577ms 56.1692μs 17.8033 KOps/s 17.2186 KOps/s $\color{#35bf28}+3.40\%$
test_distributed 0.2523ms 99.7369μs 10.0264 KOps/s 9.7087 KOps/s $\color{#35bf28}+3.27\%$
test_tdmodule 71.9050μs 17.3345μs 57.6883 KOps/s 55.4900 KOps/s $\color{#35bf28}+3.96\%$
test_tdmodule_dispatch 62.7780μs 34.7805μs 28.7517 KOps/s 27.6584 KOps/s $\color{#35bf28}+3.95\%$
test_tdseq 36.3980μs 20.1896μs 49.5304 KOps/s 47.5855 KOps/s $\color{#35bf28}+4.09\%$
test_tdseq_dispatch 99.0260μs 45.6168μs 21.9217 KOps/s 23.5201 KOps/s $\textbf{\color{#d91a1a}-6.80\%}$
test_instantiation_functorch 1.6018ms 1.2920ms 773.9830 Ops/s 739.4889 Ops/s $\color{#35bf28}+4.66\%$
test_instantiation_td 73.6967ms 1.0792ms 926.5787 Ops/s 941.7299 Ops/s $\color{#d91a1a}-1.61\%$
test_exec_functorch 0.2515ms 0.1598ms 6.2584 KOps/s 6.1696 KOps/s $\color{#35bf28}+1.44\%$
test_exec_functional_call 0.2696ms 0.1468ms 6.8110 KOps/s 6.5915 KOps/s $\color{#35bf28}+3.33\%$
test_exec_td 0.4173ms 0.1501ms 6.6633 KOps/s 6.6702 KOps/s $\color{#d91a1a}-0.10\%$
test_exec_td_decorator 0.3292ms 0.2228ms 4.4883 KOps/s 4.4243 KOps/s $\color{#35bf28}+1.45\%$
test_vmap_mlp_speed[True-True] 0.9465ms 0.4825ms 2.0725 KOps/s 2.0203 KOps/s $\color{#35bf28}+2.59\%$
test_vmap_mlp_speed[True-False] 0.7988ms 0.4818ms 2.0757 KOps/s 2.0408 KOps/s $\color{#35bf28}+1.71\%$
test_vmap_mlp_speed[False-True] 0.6355ms 0.3936ms 2.5409 KOps/s 2.4924 KOps/s $\color{#35bf28}+1.95\%$
test_vmap_mlp_speed[False-False] 0.9035ms 0.3938ms 2.5391 KOps/s 2.4742 KOps/s $\color{#35bf28}+2.62\%$
test_vmap_mlp_speed_decorator[True-True] 1.1999ms 0.5596ms 1.7870 KOps/s 1.7772 KOps/s $\color{#35bf28}+0.55\%$
test_vmap_mlp_speed_decorator[True-False] 0.9312ms 0.5534ms 1.8071 KOps/s 1.7871 KOps/s $\color{#35bf28}+1.12\%$
test_vmap_mlp_speed_decorator[False-True] 0.8706ms 0.4543ms 2.2010 KOps/s 2.1392 KOps/s $\color{#35bf28}+2.89\%$
test_vmap_mlp_speed_decorator[False-False] 0.8832ms 0.4556ms 2.1950 KOps/s 2.1648 KOps/s $\color{#35bf28}+1.39\%$
test_to_module_speed[True] 2.6282ms 1.7163ms 582.6468 Ops/s 578.1126 Ops/s $\color{#35bf28}+0.78\%$
test_to_module_speed[False] 1.9282ms 1.6804ms 595.1134 Ops/s 589.7945 Ops/s $\color{#35bf28}+0.90\%$
test_tc_init 64.4010μs 28.9753μs 34.5121 KOps/s 30.9916 KOps/s $\textbf{\color{#35bf28}+11.36\%}$
test_tc_init_nested 0.1302ms 55.9244μs 17.8813 KOps/s 14.9857 KOps/s $\textbf{\color{#35bf28}+19.32\%}$
test_tc_first_layer_tensor 3.9289μs 0.7053μs 1.4179 MOps/s 1.4104 MOps/s $\color{#35bf28}+0.53\%$
test_tc_first_layer_nontensor 2.1010μs 0.6871μs 1.4554 MOps/s 1.4218 MOps/s $\color{#35bf28}+2.37\%$
test_tc_second_layer_tensor 34.9460μs 1.8659μs 535.9487 KOps/s 531.0771 KOps/s $\color{#35bf28}+0.92\%$
test_tc_second_layer_nontensor 26.3400μs 1.6498μs 606.1465 KOps/s 612.5426 KOps/s $\color{#d91a1a}-1.04\%$
test_unbind 5.3361ms 5.1612ms 193.7517 Ops/s 121.2662 Ops/s $\textbf{\color{#35bf28}+59.77\%}$
test_full_like 18.5613ms 11.3929ms 87.7738 Ops/s 84.0683 Ops/s $\color{#35bf28}+4.41\%$
test_zeros_like 12.3630ms 5.3702ms 186.2143 Ops/s 168.6751 Ops/s $\textbf{\color{#35bf28}+10.40\%}$
test_ones_like 11.5464ms 6.2960ms 158.8319 Ops/s 154.3663 Ops/s $\color{#35bf28}+2.89\%$
test_clone 16.0743ms 7.8525ms 127.3473 Ops/s 122.2659 Ops/s $\color{#35bf28}+4.16\%$
test_squeeze 56.2550μs 13.5702μs 73.6911 KOps/s 67.1102 KOps/s $\textbf{\color{#35bf28}+9.81\%}$
test_unsqueeze 0.1100ms 59.4325μs 16.8258 KOps/s 16.4663 KOps/s $\color{#35bf28}+2.18\%$
test_split 0.2297ms 0.1128ms 8.8651 KOps/s 8.9393 KOps/s $\color{#d91a1a}-0.83\%$
test_permute 0.2755ms 0.1284ms 7.7910 KOps/s 7.7198 KOps/s $\color{#35bf28}+0.92\%$
test_stack 29.8896ms 22.5711ms 44.3044 Ops/s 42.4738 Ops/s $\color{#35bf28}+4.31\%$
test_cat 30.0099ms 22.5345ms 44.3763 Ops/s 42.5330 Ops/s $\color{#35bf28}+4.33\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}21$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 34.0520μs 12.3184μs 81.1792 KOps/s 78.0187 KOps/s $\color{#35bf28}+4.05\%$
test_plain_set_stack_nested 28.0520μs 12.4506μs 80.3173 KOps/s 77.0365 KOps/s $\color{#35bf28}+4.26\%$
test_plain_set_nested_inplace 47.6530μs 13.6811μs 73.0937 KOps/s 71.1850 KOps/s $\color{#35bf28}+2.68\%$
test_plain_set_stack_nested_inplace 38.6420μs 13.8223μs 72.3471 KOps/s 70.4414 KOps/s $\color{#35bf28}+2.71\%$
test_items 15.8210μs 4.6759μs 213.8615 KOps/s 213.7449 KOps/s $\color{#35bf28}+0.05\%$
test_items_nested 0.3952ms 0.3412ms 2.9305 KOps/s 2.9049 KOps/s $\color{#35bf28}+0.88\%$
test_items_nested_locked 0.3994ms 0.3440ms 2.9067 KOps/s 2.9430 KOps/s $\color{#d91a1a}-1.24\%$
test_items_nested_leaf 0.1241ms 83.5803μs 11.9645 KOps/s 12.1369 KOps/s $\color{#d91a1a}-1.42\%$
test_items_stack_nested 0.4123ms 0.3432ms 2.9135 KOps/s 2.9398 KOps/s $\color{#d91a1a}-0.89\%$
test_items_stack_nested_leaf 0.1204ms 84.2533μs 11.8690 KOps/s 11.8775 KOps/s $\color{#d91a1a}-0.07\%$
test_items_stack_nested_locked 0.4118ms 0.3452ms 2.8970 KOps/s 2.9148 KOps/s $\color{#d91a1a}-0.61\%$
test_keys 25.6610μs 4.3573μs 229.5012 KOps/s 228.0654 KOps/s $\color{#35bf28}+0.63\%$
test_keys_nested 0.1088ms 66.6790μs 14.9972 KOps/s 14.7689 KOps/s $\color{#35bf28}+1.55\%$
test_keys_nested_locked 0.8042ms 71.3981μs 14.0060 KOps/s 13.6880 KOps/s $\color{#35bf28}+2.32\%$
test_keys_nested_leaf 0.1025ms 57.3176μs 17.4467 KOps/s 17.1631 KOps/s $\color{#35bf28}+1.65\%$
test_keys_stack_nested 95.6750μs 66.4909μs 15.0397 KOps/s 14.7547 KOps/s $\color{#35bf28}+1.93\%$
test_keys_stack_nested_leaf 89.8960μs 57.5910μs 17.3638 KOps/s 17.1471 KOps/s $\color{#35bf28}+1.26\%$
test_keys_stack_nested_locked 0.1022ms 71.8033μs 13.9269 KOps/s 13.8448 KOps/s $\color{#35bf28}+0.59\%$
test_values 10.6140μs 1.8095μs 552.6387 KOps/s 548.4384 KOps/s $\color{#35bf28}+0.77\%$
test_values_nested 60.5740μs 34.8783μs 28.6711 KOps/s 28.4959 KOps/s $\color{#35bf28}+0.62\%$
test_values_nested_locked 63.3840μs 36.4856μs 27.4080 KOps/s 27.0952 KOps/s $\color{#35bf28}+1.15\%$
test_values_nested_leaf 54.3730μs 31.0269μs 32.2301 KOps/s 32.1136 KOps/s $\color{#35bf28}+0.36\%$
test_values_stack_nested 56.3030μs 35.8214μs 27.9163 KOps/s 27.9782 KOps/s $\color{#d91a1a}-0.22\%$
test_values_stack_nested_leaf 59.7240μs 31.9918μs 31.2580 KOps/s 31.3035 KOps/s $\color{#d91a1a}-0.15\%$
test_values_stack_nested_locked 60.5040μs 37.4290μs 26.7172 KOps/s 26.7109 KOps/s $\color{#35bf28}+0.02\%$
test_membership 4.4874μs 0.7273μs 1.3749 MOps/s 1.1384 MOps/s $\textbf{\color{#35bf28}+20.77\%}$
test_membership_nested 17.1110μs 2.5825μs 387.2266 KOps/s 392.6248 KOps/s $\color{#d91a1a}-1.37\%$
test_membership_nested_leaf 17.5610μs 2.5824μs 387.2325 KOps/s 394.3887 KOps/s $\color{#d91a1a}-1.81\%$
test_membership_stacked_nested 20.8410μs 2.6048μs 383.9034 KOps/s 388.2634 KOps/s $\color{#d91a1a}-1.12\%$
test_membership_stacked_nested_leaf 33.2020μs 2.5847μs 386.8993 KOps/s 387.8894 KOps/s $\color{#d91a1a}-0.26\%$
test_membership_nested_last 23.7920μs 3.0900μs 323.6214 KOps/s 324.5826 KOps/s $\color{#d91a1a}-0.30\%$
test_membership_nested_leaf_last 34.0720μs 3.0968μs 322.9163 KOps/s 325.3581 KOps/s $\color{#d91a1a}-0.75\%$
test_membership_stacked_nested_last 20.5910μs 3.1362μs 318.8550 KOps/s 256.6773 KOps/s $\textbf{\color{#35bf28}+24.22\%}$
test_membership_stacked_nested_leaf_last 18.1910μs 3.0731μs 325.4050 KOps/s 257.1147 KOps/s $\textbf{\color{#35bf28}+26.56\%}$
test_nested_getleaf 40.3530μs 8.5848μs 116.4843 KOps/s 119.5785 KOps/s $\color{#d91a1a}-2.59\%$
test_nested_get 34.9220μs 8.0458μs 124.2887 KOps/s 127.2979 KOps/s $\color{#d91a1a}-2.36\%$
test_stacked_getleaf 22.5710μs 8.5830μs 116.5097 KOps/s 118.4745 KOps/s $\color{#d91a1a}-1.66\%$
test_stacked_get 37.3120μs 8.0990μs 123.4720 KOps/s 126.6096 KOps/s $\color{#d91a1a}-2.48\%$
test_nested_getitemleaf 35.1920μs 8.7095μs 114.8167 KOps/s 117.0385 KOps/s $\color{#d91a1a}-1.90\%$
test_nested_getitem 21.9210μs 8.2084μs 121.8271 KOps/s 124.7481 KOps/s $\color{#d91a1a}-2.34\%$
test_stacked_getitemleaf 48.6930μs 8.7127μs 114.7753 KOps/s 115.6728 KOps/s $\color{#d91a1a}-0.78\%$
test_stacked_getitem 44.7330μs 8.2289μs 121.5235 KOps/s 123.4967 KOps/s $\color{#d91a1a}-1.60\%$
test_lock_nested 57.1989ms 0.4181ms 2.3918 KOps/s 2.3952 KOps/s $\color{#d91a1a}-0.14\%$
test_lock_stack_nested 0.3662ms 0.3136ms 3.1885 KOps/s 3.2564 KOps/s $\color{#d91a1a}-2.09\%$
test_unlock_nested 0.7302ms 0.3621ms 2.7619 KOps/s 2.8021 KOps/s $\color{#d91a1a}-1.43\%$
test_unlock_stack_nested 0.3927ms 0.3211ms 3.1146 KOps/s 3.1792 KOps/s $\color{#d91a1a}-2.03\%$
test_flatten_speed 0.1883ms 0.1029ms 9.7165 KOps/s 9.7712 KOps/s $\color{#d91a1a}-0.56\%$
test_unflatten_speed 0.3652ms 0.2933ms 3.4100 KOps/s 3.4766 KOps/s $\color{#d91a1a}-1.92\%$
test_common_ops 1.1389ms 0.5732ms 1.7447 KOps/s 1.7628 KOps/s $\color{#d91a1a}-1.02\%$
test_creation 38.5920μs 1.6642μs 600.9036 KOps/s 598.6802 KOps/s $\color{#35bf28}+0.37\%$
test_creation_empty 28.6320μs 7.6100μs 131.4065 KOps/s 115.3144 KOps/s $\textbf{\color{#35bf28}+13.95\%}$
test_creation_nested_1 29.0110μs 9.4238μs 106.1148 KOps/s 95.7513 KOps/s $\textbf{\color{#35bf28}+10.82\%}$
test_creation_nested_2 46.7230μs 11.6532μs 85.8137 KOps/s 79.8337 KOps/s $\textbf{\color{#35bf28}+7.49\%}$
test_clone 77.4350μs 12.1450μs 82.3386 KOps/s 88.3952 KOps/s $\textbf{\color{#d91a1a}-6.85\%}$
test_getitem[int] 33.3220μs 11.9358μs 83.7812 KOps/s 91.8029 KOps/s $\textbf{\color{#d91a1a}-8.74\%}$
test_getitem[slice_int] 57.8930μs 21.7115μs 46.0584 KOps/s 49.0835 KOps/s $\textbf{\color{#d91a1a}-6.16\%}$
test_getitem[range] 69.0040μs 49.9890μs 20.0044 KOps/s 19.1193 KOps/s $\color{#35bf28}+4.63\%$
test_getitem[tuple] 48.5320μs 19.6746μs 50.8269 KOps/s 54.8287 KOps/s $\textbf{\color{#d91a1a}-7.30\%}$
test_getitem[list] 0.1230ms 35.7794μs 27.9490 KOps/s 29.9067 KOps/s $\textbf{\color{#d91a1a}-6.55\%}$
test_setitem_dim[int] 51.5730μs 28.6500μs 34.9040 KOps/s 33.8125 KOps/s $\color{#35bf28}+3.23\%$
test_setitem_dim[slice_int] 72.4940μs 48.8933μs 20.4527 KOps/s 20.1544 KOps/s $\color{#35bf28}+1.48\%$
test_setitem_dim[range] 0.1091ms 67.4865μs 14.8178 KOps/s 14.7248 KOps/s $\color{#35bf28}+0.63\%$
test_setitem_dim[tuple] 78.1150μs 42.5519μs 23.5007 KOps/s 22.8846 KOps/s $\color{#35bf28}+2.69\%$
test_setitem 45.9220μs 16.4785μs 60.6850 KOps/s 62.8143 KOps/s $\color{#d91a1a}-3.39\%$
test_set 57.2030μs 15.8347μs 63.1524 KOps/s 65.6941 KOps/s $\color{#d91a1a}-3.87\%$
test_set_shared 1.1689ms 98.5446μs 10.1477 KOps/s 10.3743 KOps/s $\color{#d91a1a}-2.18\%$
test_update 89.3860μs 17.6776μs 56.5687 KOps/s 56.8203 KOps/s $\color{#d91a1a}-0.44\%$
test_update_nested 85.0460μs 22.9192μs 43.6316 KOps/s 44.0651 KOps/s $\color{#d91a1a}-0.98\%$
test_update__nested 61.1840μs 22.7622μs 43.9324 KOps/s 45.9820 KOps/s $\color{#d91a1a}-4.46\%$
test_set_nested 74.5040μs 17.1537μs 58.2965 KOps/s 61.2690 KOps/s $\color{#d91a1a}-4.85\%$
test_set_nested_new 71.4140μs 20.0572μs 49.8575 KOps/s 52.2910 KOps/s $\color{#d91a1a}-4.65\%$
test_select 75.1240μs 33.2483μs 30.0767 KOps/s 30.9835 KOps/s $\color{#d91a1a}-2.93\%$
test_select_nested 96.3760μs 56.1446μs 17.8112 KOps/s 18.5414 KOps/s $\color{#d91a1a}-3.94\%$
test_exclude_nested 0.1565ms 0.1129ms 8.8598 KOps/s 9.2051 KOps/s $\color{#d91a1a}-3.75\%$
test_empty[True] 0.4200ms 0.3499ms 2.8576 KOps/s 2.9063 KOps/s $\color{#d91a1a}-1.68\%$
test_empty[False] 3.0972μs 0.9451μs 1.0581 MOps/s 1.0571 MOps/s $\color{#35bf28}+0.10\%$
test_to 0.1039ms 79.9786μs 12.5033 KOps/s 13.3669 KOps/s $\textbf{\color{#d91a1a}-6.46\%}$
test_to_nonblocking 0.1170ms 64.5584μs 15.4898 KOps/s 16.3365 KOps/s $\textbf{\color{#d91a1a}-5.18\%}$
test_unbind_speed 1.6060ms 0.2759ms 3.6247 KOps/s 3.7606 KOps/s $\color{#d91a1a}-3.61\%$
test_unbind_speed_stack0 0.3425ms 0.2754ms 3.6306 KOps/s 3.7215 KOps/s $\color{#d91a1a}-2.44\%$
test_unbind_speed_stack1 74.5231ms 0.8080ms 1.2377 KOps/s 1.2293 KOps/s $\color{#35bf28}+0.68\%$
test_split 74.9811ms 1.7894ms 558.8612 Ops/s 633.3254 Ops/s $\textbf{\color{#d91a1a}-11.76\%}$
test_chunk 74.9102ms 1.7908ms 558.4169 Ops/s 642.6691 Ops/s $\textbf{\color{#d91a1a}-13.11\%}$
test_creation[device0] 0.1336ms 60.3812μs 16.5614 KOps/s 17.1957 KOps/s $\color{#d91a1a}-3.69\%$
test_creation_from_tensor 0.1352ms 55.3706μs 18.0601 KOps/s 18.5590 KOps/s $\color{#d91a1a}-2.69\%$
test_add_one[memmap_tensor0] 94.8060μs 7.1287μs 140.2774 KOps/s 152.9119 KOps/s $\textbf{\color{#d91a1a}-8.26\%}$
test_contiguous[memmap_tensor0] 23.2710μs 0.7279μs 1.3739 MOps/s 1.4350 MOps/s $\color{#d91a1a}-4.26\%$
test_stack[memmap_tensor0] 27.3820μs 5.2427μs 190.7403 KOps/s 217.4964 KOps/s $\textbf{\color{#d91a1a}-12.30\%}$
test_memmaptd_index 1.3202ms 0.3158ms 3.1669 KOps/s 3.4103 KOps/s $\textbf{\color{#d91a1a}-7.14\%}$
test_memmaptd_index_astensor 0.6608ms 0.3880ms 2.5776 KOps/s 2.7778 KOps/s $\textbf{\color{#d91a1a}-7.21\%}$
test_memmaptd_index_op 1.1121ms 0.6771ms 1.4768 KOps/s 1.5345 KOps/s $\color{#d91a1a}-3.76\%$
test_serialize_model 0.1803s 0.1104s 9.0572 Ops/s 9.3825 Ops/s $\color{#d91a1a}-3.47\%$
test_serialize_model_pickle 1.3512s 1.2359s 0.8091 Ops/s 0.8086 Ops/s $\color{#35bf28}+0.06\%$
test_serialize_weights 0.1809s 0.1094s 9.1374 Ops/s 8.8200 Ops/s $\color{#35bf28}+3.60\%$
test_serialize_weights_returnearly 0.2556s 99.7480ms 10.0253 Ops/s 10.2959 Ops/s $\color{#d91a1a}-2.63\%$
test_serialize_weights_pickle 1.3492s 1.2484s 0.8010 Ops/s 0.8012 Ops/s $\color{#d91a1a}-0.03\%$
test_reshape_pytree 49.9230μs 28.1518μs 35.5217 KOps/s 38.5752 KOps/s $\textbf{\color{#d91a1a}-7.92\%}$
test_reshape_td 66.7640μs 33.0578μs 30.2501 KOps/s 31.9068 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_view_pytree 0.1875ms 28.5485μs 35.0281 KOps/s 39.2087 KOps/s $\textbf{\color{#d91a1a}-10.66\%}$
test_view_td 0.1659ms 37.6636μs 26.5508 KOps/s 28.8335 KOps/s $\textbf{\color{#d91a1a}-7.92\%}$
test_unbind_pytree 69.3740μs 32.6484μs 30.6293 KOps/s 31.5587 KOps/s $\color{#d91a1a}-2.94\%$
test_unbind_td 0.4725ms 42.3295μs 23.6242 KOps/s 24.4217 KOps/s $\color{#d91a1a}-3.27\%$
test_split_pytree 75.5340μs 36.6741μs 27.2672 KOps/s 28.6374 KOps/s $\color{#d91a1a}-4.78\%$
test_split_td 0.1078ms 41.1522μs 24.3001 KOps/s 24.0548 KOps/s $\color{#35bf28}+1.02\%$
test_add_pytree 71.4940μs 38.7936μs 25.7774 KOps/s 27.3155 KOps/s $\textbf{\color{#d91a1a}-5.63\%}$
test_add_td 94.1150μs 47.8135μs 20.9146 KOps/s 19.0672 KOps/s $\textbf{\color{#35bf28}+9.69\%}$
test_distributed 2.0137ms 70.4727μs 14.1899 KOps/s 13.6238 KOps/s $\color{#35bf28}+4.16\%$
test_tdmodule 60.5540μs 14.3548μs 69.6631 KOps/s 66.0757 KOps/s $\textbf{\color{#35bf28}+5.43\%}$
test_tdmodule_dispatch 52.4530μs 28.0403μs 35.6630 KOps/s 32.9037 KOps/s $\textbf{\color{#35bf28}+8.39\%}$
test_tdseq 36.6320μs 16.2548μs 61.5203 KOps/s 60.1750 KOps/s $\color{#35bf28}+2.24\%$
test_tdseq_dispatch 59.9530μs 31.4363μs 31.8104 KOps/s 31.4130 KOps/s $\color{#35bf28}+1.26\%$
test_instantiation_functorch 1.6951ms 1.5840ms 631.3041 Ops/s 652.1915 Ops/s $\color{#d91a1a}-3.20\%$
test_instantiation_td 1.5517ms 1.0656ms 938.4011 Ops/s 881.7645 Ops/s $\textbf{\color{#35bf28}+6.42\%}$
test_exec_functorch 0.1901ms 0.1501ms 6.6612 KOps/s 6.9283 KOps/s $\color{#d91a1a}-3.86\%$
test_exec_functional_call 0.1684ms 0.1354ms 7.3852 KOps/s 7.5526 KOps/s $\color{#d91a1a}-2.22\%$
test_exec_td 0.1732ms 0.1363ms 7.3346 KOps/s 7.6291 KOps/s $\color{#d91a1a}-3.86\%$
test_exec_td_decorator 0.6277ms 0.2063ms 4.8475 KOps/s 4.8805 KOps/s $\color{#d91a1a}-0.68\%$
test_vmap_mlp_speed[True-True] 0.7049ms 0.6013ms 1.6632 KOps/s 1.6667 KOps/s $\color{#d91a1a}-0.21\%$
test_vmap_mlp_speed[True-False] 0.7314ms 0.5995ms 1.6680 KOps/s 1.6815 KOps/s $\color{#d91a1a}-0.80\%$
test_vmap_mlp_speed[False-True] 0.5945ms 0.5306ms 1.8848 KOps/s 1.9017 KOps/s $\color{#d91a1a}-0.89\%$
test_vmap_mlp_speed[False-False] 0.6003ms 0.5321ms 1.8793 KOps/s 1.9007 KOps/s $\color{#d91a1a}-1.13\%$
test_vmap_mlp_speed_decorator[True-True] 0.7702ms 0.6683ms 1.4964 KOps/s 1.5073 KOps/s $\color{#d91a1a}-0.72\%$
test_vmap_mlp_speed_decorator[True-False] 0.8823ms 0.6735ms 1.4848 KOps/s 1.5133 KOps/s $\color{#d91a1a}-1.89\%$
test_vmap_mlp_speed_decorator[False-True] 0.7369ms 0.5924ms 1.6880 KOps/s 1.7088 KOps/s $\color{#d91a1a}-1.22\%$
test_vmap_mlp_speed_decorator[False-False] 0.7200ms 0.5893ms 1.6970 KOps/s 1.7101 KOps/s $\color{#d91a1a}-0.77\%$
test_vmap_transformer_speed[True-True] 8.9793ms 8.5621ms 116.7941 Ops/s 126.4600 Ops/s $\textbf{\color{#d91a1a}-7.64\%}$
test_vmap_transformer_speed[True-False] 8.9100ms 8.3268ms 120.0948 Ops/s 126.9000 Ops/s $\textbf{\color{#d91a1a}-5.36\%}$
test_vmap_transformer_speed[False-True] 8.6961ms 8.2315ms 121.4852 Ops/s 128.3681 Ops/s $\textbf{\color{#d91a1a}-5.36\%}$
test_vmap_transformer_speed[False-False] 8.6351ms 8.1730ms 122.3542 Ops/s 127.9720 Ops/s $\color{#d91a1a}-4.39\%$
test_vmap_transformer_speed_decorator[True-True] 20.6357ms 19.9066ms 50.2345 Ops/s 52.2831 Ops/s $\color{#d91a1a}-3.92\%$
test_vmap_transformer_speed_decorator[True-False] 20.6219ms 19.9576ms 50.1062 Ops/s 52.3111 Ops/s $\color{#d91a1a}-4.21\%$
test_vmap_transformer_speed_decorator[False-True] 20.5003ms 19.8854ms 50.2881 Ops/s 52.4217 Ops/s $\color{#d91a1a}-4.07\%$
test_vmap_transformer_speed_decorator[False-False] 20.3093ms 19.8123ms 50.4737 Ops/s 52.6818 Ops/s $\color{#d91a1a}-4.19\%$
test_to_module_speed[True] 3.0952ms 1.5982ms 625.7029 Ops/s 650.6334 Ops/s $\color{#d91a1a}-3.83\%$
test_to_module_speed[False] 2.0586ms 1.5798ms 632.9798 Ops/s 657.7509 Ops/s $\color{#d91a1a}-3.77\%$
test_tc_init 0.1518ms 22.4810μs 44.4820 KOps/s 40.0170 KOps/s $\textbf{\color{#35bf28}+11.16\%}$
test_tc_init_nested 0.1938ms 47.6141μs 21.0022 KOps/s 20.3099 KOps/s $\color{#35bf28}+3.41\%$
test_tc_first_layer_tensor 3.4102μs 0.3637μs 2.7497 MOps/s 2.7998 MOps/s $\color{#d91a1a}-1.79\%$
test_tc_first_layer_nontensor 7.3889μs 0.3874μs 2.5812 MOps/s 2.5822 MOps/s $\color{#d91a1a}-0.04\%$
test_tc_second_layer_tensor 30.5300μs 0.9890μs 1.0112 MOps/s 1.0230 MOps/s $\color{#d91a1a}-1.16\%$
test_tc_second_layer_nontensor 21.9465μs 0.8352μs 1.1973 MOps/s 1.2179 MOps/s $\color{#d91a1a}-1.69\%$
test_unbind 98.2544ms 7.6940ms 129.9715 Ops/s 110.9132 Ops/s $\textbf{\color{#35bf28}+17.18\%}$
test_full_like 13.7675ms 13.1476ms 76.0596 Ops/s 76.6439 Ops/s $\color{#d91a1a}-0.76\%$
test_zeros_like 8.2214ms 7.8228ms 127.8310 Ops/s 126.0532 Ops/s $\color{#35bf28}+1.41\%$
test_ones_like 8.6684ms 7.9025ms 126.5419 Ops/s 126.4027 Ops/s $\color{#35bf28}+0.11\%$
test_clone 9.4636ms 9.1772ms 108.9655 Ops/s 108.3999 Ops/s $\color{#35bf28}+0.52\%$
test_squeeze 60.4330μs 11.0721μs 90.3172 KOps/s 89.8805 KOps/s $\color{#35bf28}+0.49\%$
test_unsqueeze 97.5260μs 55.1563μs 18.1303 KOps/s 18.8013 KOps/s $\color{#d91a1a}-3.57\%$
test_split 0.1437ms 0.1018ms 9.8206 KOps/s 10.0585 KOps/s $\color{#d91a1a}-2.37\%$
test_permute 0.1607ms 0.1172ms 8.5323 KOps/s 8.4481 KOps/s $\color{#35bf28}+1.00\%$
test_stack 26.8676ms 26.6005ms 37.5932 Ops/s 37.6125 Ops/s $\color{#d91a1a}-0.05\%$
test_cat 26.8871ms 26.5434ms 37.6741 Ops/s 37.6704 Ops/s $+0.01\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants