Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Planning Time #13015

Open
alamb opened this issue Oct 19, 2024 · 1 comment
Open

Improve Planning Time #13015

alamb opened this issue Oct 19, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Oct 19, 2024

Is your feature request related to a problem or challenge?

Follow on to #12950

@askalt added a benchmark for planning a benchmark with several aggregates. You can run it (with flamegraph) like

Some profiling

cargo flamegraph --root --bench sql_planner -- --bench physical_select_aggregates_from_200

As can be seen from the flame graphs, a significant amount of the physical planning time is spent on creating ProjectionMapping. However, there are many other interesting places that can probably be made faster too

flamegraph
Screenshot 2024-10-19 at 6 50 15 AM
Screenshot 2024-10-19 at 6 54 00 AM
Screenshot 2024-10-19 at 6 56 02 AM

Describe the solution you'd like

It would be very nice to make this benchmark faster by optimizing these codepaths

Describe alternatives you've considered

Physical Planning could likely be made faster by optimizing EquivalenceProperty calculations. However, since it is a small part of the overall planning time maybe LogicalPlanning / SQLPlanning are better places to start

Additional context

No response

@goldmedal
Copy link
Contributor

goldmedal commented Oct 19, 2024

I noticed this case has many aggregations (about 200). When planning an aggregation, the planner wraps a projection for it and creates the corresponding Column for the aggregation expressions. I guess it could be a potential bottleneck 🤔
截圖 2024-10-19 晚上11 53 46

See the flamegraph, I highlight two parts:

  • purple part: the planner spends much time building a large hash map for columnizing the expressions.
  • green part: Creating many columns from a schema that invokes many string clones also takes much time.

I did a simple POC #13018 to avoid building the hash map and reduce the clone.

The benchmark result like as below. all-after is optimized. all-before is main branch.

group                                         all-after                              all-before
-----                                         ---------                              ----------
logical_aggregate_with_join                   1.00  1113.2±38.87µs        ? ?/sec    1.01  1125.6±39.87µs        ? ?/sec
logical_plan_tpcds_all                        1.08   225.2±53.43ms        ? ?/sec    1.00    207.8±3.14ms        ? ?/sec
logical_plan_tpch_all                         1.00     17.5±0.50ms        ? ?/sec    1.04     18.1±1.10ms        ? ?/sec
logical_select_all_from_1000                  1.00      9.7±0.24ms        ? ?/sec    1.00      9.7±0.23ms        ? ?/sec
logical_select_one_from_700                   1.00   799.5±17.64µs        ? ?/sec    1.00   797.5±18.42µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00   711.7±18.17µs        ? ?/sec    1.00   714.2±26.44µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   685.4±15.69µs        ? ?/sec    1.01   695.2±74.00µs        ? ?/sec
physical_plan_tpcds_all                       1.00  1683.8±19.94ms        ? ?/sec    1.02  1709.8±26.84ms        ? ?/sec
physical_plan_tpch_all                        1.00    106.3±2.50ms        ? ?/sec    1.01    107.5±2.07ms        ? ?/sec
physical_plan_tpch_q1                         1.00      3.6±0.11ms        ? ?/sec    1.01      3.6±0.10ms        ? ?/sec
physical_plan_tpch_q10                        1.00      5.1±0.14ms        ? ?/sec    1.01      5.2±0.16ms        ? ?/sec
physical_plan_tpch_q11                        1.00      4.5±0.12ms        ? ?/sec    1.00      4.5±0.16ms        ? ?/sec
physical_plan_tpch_q12                        1.00      3.8±0.11ms        ? ?/sec    1.01      3.8±0.19ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.6±0.13ms        ? ?/sec    1.00      2.6±0.06ms        ? ?/sec
physical_plan_tpch_q14                        1.00      3.1±0.08ms        ? ?/sec    1.00      3.2±0.09ms        ? ?/sec
physical_plan_tpch_q16                        1.00      4.6±0.17ms        ? ?/sec    1.00      4.6±0.12ms        ? ?/sec
physical_plan_tpch_q17                        1.00      4.2±0.11ms        ? ?/sec    1.00      4.2±0.11ms        ? ?/sec
physical_plan_tpch_q18                        1.00      4.8±0.13ms        ? ?/sec    1.01      4.8±0.15ms        ? ?/sec
physical_plan_tpch_q19                        1.00      7.8±0.23ms        ? ?/sec    1.01      7.9±0.28ms        ? ?/sec
physical_plan_tpch_q2                         1.01      9.1±0.53ms        ? ?/sec    1.00      9.0±0.26ms        ? ?/sec
physical_plan_tpch_q20                        1.00      5.6±0.16ms        ? ?/sec    1.00      5.6±0.17ms        ? ?/sec
physical_plan_tpch_q21                        1.00      7.3±0.24ms        ? ?/sec    1.02      7.5±0.33ms        ? ?/sec
physical_plan_tpch_q22                        1.00      4.0±0.11ms        ? ?/sec    1.04      4.1±0.35ms        ? ?/sec
physical_plan_tpch_q3                         1.00      3.7±0.14ms        ? ?/sec    1.01      3.8±0.11ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.8±0.07ms        ? ?/sec    1.00      2.8±0.06ms        ? ?/sec
physical_plan_tpch_q5                         1.00      5.3±0.15ms        ? ?/sec    1.00      5.3±0.15ms        ? ?/sec
physical_plan_tpch_q6                         1.00  1838.5±47.53µs        ? ?/sec    1.00  1842.0±33.97µs        ? ?/sec
physical_plan_tpch_q7                         1.00      6.9±0.20ms        ? ?/sec    1.01      7.0±0.21ms        ? ?/sec
physical_plan_tpch_q8                         1.00      8.3±0.22ms        ? ?/sec    1.00      8.3±0.25ms        ? ?/sec
physical_plan_tpch_q9                         1.00      6.4±0.19ms        ? ?/sec    1.00      6.3±0.16ms        ? ?/sec
physical_select_aggregates_from_200           1.00     33.7±0.92ms        ? ?/sec    1.17     39.4±0.86ms        ? ?/sec
physical_select_all_from_1000                 1.00     70.3±1.79ms        ? ?/sec    1.01     70.7±1.77ms        ? ?/sec
physical_select_one_from_700                  1.00      4.2±0.11ms        ? ?/sec    1.00      4.2±0.11ms        ? ?/se

Most queries don't be affected but we can see the case physical_select_aggregates_from_200:

group                                         all-after                              all-before
-----                                         ---------                              ----------
physical_select_aggregates_from_200           1.00     33.7±0.92ms        ? ?/sec    1.17     39.4±0.86ms        ? ?/sec

It reduces about 15% execution time.

I think it could be a point that we can improve 🤔

I am still considering whether it is possible to avoid using columnzied_output_exprs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants