-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SNOW-902943]: Add support for pd.NamedAgg in DataFrame and Series.agg #1652
Conversation
self, allow_duplication=False, axis=axis, is_from_agg=True, **kwargs | ||
) | ||
else: | ||
func = validate_and_try_convert_agg_func_arg_func_to_str( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sfc-gh-joshi with the modin upstreaming work, I think we probably should start moving many of those conversion to backend, just leave basic checking at frontend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though our frontend implementation of aggregate
already diverges pretty heavily from the modin version. We might want to use the extension API to overwrite aggregate
when we start the migration.
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Outdated
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed most of the frontend/argument parsing logic, and haven't looked at the query compiler changes yet since I left a few questions about column ordering that I'd like to understand first.
self, allow_duplication=False, axis=axis, is_from_agg=True, **kwargs | ||
) | ||
else: | ||
func = validate_and_try_convert_agg_func_arg_func_to_str( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though our frontend implementation of aggregate
already diverges pretty heavily from the modin version. We might want to use the extension API to overwrite aggregate
when we start the migration.
src/snowflake/snowpark/modin/plugin/_internal/aggregation_utils.py
Outdated
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/_internal/aggregation_utils.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I left a few more minor comments/suggestions.
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Outdated
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Outdated
Show resolved
Hide resolved
…ompiler.py Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Show resolved
Hide resolved
# and if not, reorder them. | ||
if uses_named_aggs: | ||
correct_ordering = list(agg_kwargs.keys()) | ||
if correct_ordering != new_data_column_pandas_labels: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems fine for now since most of the code is done in this way now. Ideally, i think we should only do function string conversion at frontend, and then pass them directly to query compiler (the str conversion probably should also go to query backend directly also) when we are getting the column_to_agg_func, it should be in correct order based on the input. pass the order as part of agg_kwargs seems hacky to me.
@sfc-gh-joshi when we do the modin upstreaming support for this, let's talk about this.
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-902943
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Adds support for NamedAggregations for df and series.agg