-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1445416, SNOW-1445419: Implement DataFrame/Series.attrs #2386
Conversation
7aff78e
to
674d689
Compare
fb1b0d8
to
e1ac41f
Compare
13aec05
to
4b18f6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great work adding all the tests!
860cacb
to
9a3a9b8
Compare
@functools.wraps(method) | ||
def wrap(self, *args, **kwargs): # type: ignore | ||
result = method(self, *args, **kwargs) | ||
if isinstance(result, SnowflakeQueryCompiler) and len(self._attrs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this check len(self.attrs)
for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It checks that self._attrs
was not empty, and avoids setting result._attrs
if so, since intermediate calls to query compiler operations may have already set result._attrs = self._attrs
. This lets us potentially avoid creating a new empty dict for result._attrs
if it's already empty (not a very important optimization, but I think it's good practice anyway).
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Show resolved
Hide resolved
a5f3248
to
0654156
Compare
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1445416 and SNOW-1445419
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Implements
DataFrame
/Series.attrs
by adding a new query compiler variable_attrs
that is read out by frontend objects. A new annotation on the query compiler,_propagate_attrs_on_methods
, will either copy_attrs
fromself
to the return value, or reset_attrs
on the return value.I initially intended to implement this solely at the frontend layer with the override system (similar to how telemetry is added to all methods), but this created difficulties when preserving
attrs
across in-place operations likedf.columns = [...]
, and could create ambiguity if the frame had a column named"_attrs"
. Implementing propagation at the query compiler level is simpler.This PR also adds a new
test_attrs=True
parameter toeval_snowpark_pandas_result
.eval_snowpark_pandas_result
will set a dummy value ofattrs
on its inputs, and ensure that if the result is a DF/Series, theattrs
field on the result matches that of pandas. Since pandas isn't always consistent about whether it propagates attrs or resets it (for some methods, the behavior depends on the input, and for some methods, it is inconsistent between Series/DF), settingtest_attrs=False
skips this check. When I encountered such inconsistent methods, I elected to have Snowpark pandas always propagateattrs
, since it seems unlikely that users would rely on theattrs
of a result being empty if they did not explicitly set it.