-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1690923 Encode Intervals #2454
base: ls-SNOW-1491199-merge-phase0-server-side
Are you sure you want to change the base?
SNOW-1690923 Encode Intervals #2454
Conversation
…NOW-1690923-encode-intervals # Conflicts: # tests/ast/data/DataFrame.agg.test # tests/ast/data/DataFrame.collect.test # tests/ast/data/DataFrame.count.test # tests/ast/data/DataFrame.count2.test # tests/ast/data/DataFrame.create_or_replace.test # tests/ast/data/DataFrame.cross_join.lsuffix.test # tests/ast/data/DataFrame.cross_join.rsuffix.test # tests/ast/data/DataFrame.cross_join.suffix.test # tests/ast/data/DataFrame.describe.test # tests/ast/data/DataFrame.flatten.test # tests/ast/data/DataFrame.indexers.test # tests/ast/data/DataFrame.join.inner.column.test # tests/ast/data/DataFrame.join.inner.column_list.test # tests/ast/data/DataFrame.join.inner.column_list_predicate.test # tests/ast/data/DataFrame.join.inner.predicate.test # tests/ast/data/DataFrame.join.inner.predicate_rsuffix.test # tests/ast/data/DataFrame.join.left_outer.column.test # tests/ast/data/DataFrame.join.right_outer.predicate.test # tests/ast/data/DataFrame.natural_join.test # tests/ast/data/DataFrame.pivot.test # tests/ast/data/DataFrame.select_expr.test # tests/ast/data/DataFrame.stat.test # tests/ast/data/DataFrame.to_df.test # tests/ast/data/DataFrame.to_local_iterator.test # tests/ast/data/DataFrame.to_pandas.test # tests/ast/data/DataFrame.to_pandas_batch.test # tests/ast/data/DataFrame.unpivot.test # tests/ast/data/DataFrame.write.test # tests/ast/data/Dataframe.cube.test # tests/ast/data/Dataframe.distinct.test # tests/ast/data/Dataframe.drop_duplicates.test # tests/ast/data/Dataframe.filter.test # tests/ast/data/Dataframe.getitem.test # tests/ast/data/Dataframe.group_by.test # tests/ast/data/Dataframe.group_by_grouping_sets.test # tests/ast/data/Dataframe.join.asof.test # tests/ast/data/Dataframe.rollup.test # tests/ast/data/Dataframe.with_col_fns.test # tests/ast/data/DataframeNaFunctions.test # tests/ast/data/RelationalGroupedDataFrame.agg.test # tests/ast/data/RelationalGroupedDataFrame.test # tests/ast/data/Session.call.test # tests/ast/data/Session.create_dataframe.test # tests/ast/data/Session.flatten.test # tests/ast/data/Session.table_function.test # tests/ast/data/Table.init.test # tests/ast/data/case_when.test # tests/ast/data/col_alias.test # tests/ast/data/col_asc.test # tests/ast/data/col_between.test # tests/ast/data/col_binops.test # tests/ast/data/col_bitops.test # tests/ast/data/col_cast.test # tests/ast/data/col_cast_coll.test # tests/ast/data/col_desc.test # tests/ast/data/col_getitem.test # tests/ast/data/col_in_.test # tests/ast/data/col_literal.test # tests/ast/data/col_null_nan.test # tests/ast/data/col_rbinops.test # tests/ast/data/col_star.test # tests/ast/data/col_string.test # tests/ast/data/col_try_cast.test # tests/ast/data/col_udf.test # tests/ast/data/col_unary_ops.test # tests/ast/data/df_alias.test # tests/ast/data/df_analytics_functions.test # tests/ast/data/df_col.test # tests/ast/data/df_drop.test # tests/ast/data/df_except.test # tests/ast/data/df_first.test # tests/ast/data/df_intersect.test # tests/ast/data/df_limit.test # tests/ast/data/df_random_split.test # tests/ast/data/df_sample.test # tests/ast/data/df_sort.test # tests/ast/data/df_union.test # tests/ast/data/functions.test # tests/ast/data/select.test # tests/ast/data/session.read.test # tests/ast/data/session.sql.test # tests/ast/data/session_generator.test # tests/ast/data/session_range.test # tests/ast/data/session_table_dq_abs_l.test # tests/ast/data/session_table_dq_abs_s.test # tests/ast/data/session_table_dq_rs_l.test # tests/ast/data/session_table_dq_rs_s.test # tests/ast/data/session_table_dq_rt_l.test # tests/ast/data/session_table_dq_rt_s.test # tests/ast/data/session_table_uq_abs_l.test # tests/ast/data/session_table_uq_abs_s.test # tests/ast/data/session_table_uq_rs_l.test # tests/ast/data/session_table_uq_rs_s.test # tests/ast/data/session_table_uq_rt_l.test # tests/ast/data/session_table_uq_rt_s.test # tests/ast/data/session_write_pandas.test # tests/ast/data/shadowed_local_name.test # tests/ast/data/sproc.test # tests/ast/data/udaf.test # tests/ast/data/udtf.test # tests/ast/data/windows.test
) -> None: | ||
from snowflake.snowpark._internal.ast_utils import with_src_position |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to import here to prevent circular imports issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far in the code, we tried to avoid building expressions directly. The reasoning was that Expressions should disappear as they're not a first-class entity in snowpark. Instead we chose an approach where always the caller (i.e. the function that creates Interval) creates the AST and assigns it to the expression if need be. I.e.,
def build_expr_from_snowpark_column_or_python_val(
expr_builder: proto.Expr, value: ColumnOrLiteral
) -> None:
"""Copy from a Column object's AST, or copy a literal value into an AST expression.
Args:
ast (proto.Expr): A previously created Expr() IR entity intance to be filled
value (ColumnOrLiteral): The value from which to populate the provided ast parameter.
Raises:
TypeError: The Expr provided should only be populated from a Snowpark Column with a valid _ast field or a literal value
"""
if isinstance(value, snowflake.snowpark.Column):
build_expr_from_snowpark_column(expr_builder, value)
elif isinstance(value, VALID_PYTHON_TYPES_FOR_LITERAL_VALUE):
build_expr_from_python_val(expr_builder, value)
elif isinstance(value, Expression):
# Expressions must be handled by caller.
pass
else:
raise TypeError(f"{type(value)} is not a valid type for Column or literal AST.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we prob. can move the AST logic to build_expr_from_snowpark_column_or_python_val and have an Interval
case there. I don't think the original intention of snowpark was to use Interval
directly, but rather the make_interval
function.
I ran the linter which is why there's 100+ file changes - most of these changes are in the test data directory and I think are superficial. I'm not sure how to fix the pyright/precommit type checking test. |
@@ -405,6 +409,14 @@ def __init__( | |||
if nanosecond is not None: | |||
self.values_dict["NANOSECOND"] = nanosecond | |||
|
|||
if self._ast is None and _emit_ast: | |||
expr = proto.Expr() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/home/runner/work/snowpark-python/snowpark-python/src/snowflake/snowpark/_internal/analyzer/expression.py:413:26 - error: "Expr" is not a known member of module "snowflake.snowpark._internal.proto.ast_pb2" (reportGeneralTypeIssues)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you repro locally?
python -m tox -e pyright
It is saying you may need a new version:
"Please install the new version or set PYRIGHT_PYTHON_FORCE_VERSION to latest
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the same errors locally, I'm not sure what to do about the first error though. There's other parts of the code that use proto.Expr()
as well so I'm not sure why it's erroring here
# source: proto/ast.proto | ||
# Protobuf Python Version: 5.28.2 | ||
# Protobuf Python Version: 4.25.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to protobuf to 5.28.2 to avoid so many conflicts?
ast = with_src_position(expr.sp_interval) | ||
# Set the AST values based on the values_dict. | ||
for k, v in self.values_dict.items(): | ||
getattr(ast, k.lower()).value = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's try to avoid indirection via getattr and use the values explicitly. It's much easier to read and reason about :)
tests/ast/data/interval.test
Outdated
df2 = df1.select( | ||
df1["a"] | ||
+ Column( | ||
Interval( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use make_interval here to test. Not sure we should support the Column(Interval(...))
syntax at all.
tests/ast/data/interval.test
Outdated
|
||
df1 = session.create_dataframe([[datetime.datetime(2010, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST")), datetime.datetime(2011, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST"))], [datetime.datetime(2012, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST")), datetime.datetime(2013, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST"))]], schema=["a", "b"]) | ||
|
||
df2 = df1.select(df1["a"] + Interval(quarter=1, month=1, week=2, day=2, hour=2, minute=3, second=3, millisecond=3, microsecond=4, nanosecond=4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work in snowpark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it does - I pulled this from test_interval
:
def test_interval(session):
df1 = session.create_dataframe(
[
[datetime.datetime(2010, 1, 1), datetime.datetime(2011, 1, 1)],
[datetime.datetime(2012, 1, 1), datetime.datetime(2013, 1, 1)],
],
schema=["a", "b"],
)
df2 = df1.select(
df1["a"]
+ Column(
Interval(
quarters=1,
months=1,
weeks=2,
days=2,
hours=2,
minutes=3,
seconds=3,
milliseconds=3,
microseconds=4,
nanoseconds=4,
)
)
)
verify_column_result(
session,
df2,
[
'"(""A"" + INTERVAL \'1 QUARTER,1 MONTH,2 WEEK,2 DAY,2 HOUR,3 MINUTE,3 SECOND,3 MILLISECOND,4 MICROSECOND,4 NANOSECOND\')"',
],
[TimestampType(timezone=TimestampTimeZone.NTZ)],
None,
)
but I see that you added the make_interval
changes - I'll update my tests accordingly
…NOW-1690923-encode-intervals # Conflicts: # src/snowflake/snowpark/_internal/proto/ast_pb2.py # src/snowflake/snowpark/functions.py # tests/ast/data/DataFrame.agg.test # tests/ast/data/DataFrame.collect.test # tests/ast/data/DataFrame.count.test # tests/ast/data/DataFrame.count2.test # tests/ast/data/DataFrame.create_or_replace.test # tests/ast/data/DataFrame.cross_join.lsuffix.test # tests/ast/data/DataFrame.cross_join.rsuffix.test # tests/ast/data/DataFrame.cross_join.suffix.test # tests/ast/data/DataFrame.describe.test # tests/ast/data/DataFrame.flatten.test # tests/ast/data/DataFrame.indexers.test # tests/ast/data/DataFrame.join.inner.column.test # tests/ast/data/DataFrame.join.inner.column_list.test # tests/ast/data/DataFrame.join.inner.column_list_predicate.test # tests/ast/data/DataFrame.join.inner.predicate.test # tests/ast/data/DataFrame.join.inner.predicate_rsuffix.test # tests/ast/data/DataFrame.join.left_outer.column.test # tests/ast/data/DataFrame.join.right_outer.predicate.test # tests/ast/data/DataFrame.natural_join.test # tests/ast/data/DataFrame.pivot.test # tests/ast/data/DataFrame.select_expr.test # tests/ast/data/DataFrame.stat.test # tests/ast/data/DataFrame.to_df.test # tests/ast/data/DataFrame.to_local_iterator.test # tests/ast/data/DataFrame.to_pandas.test # tests/ast/data/DataFrame.to_pandas_batch.test # tests/ast/data/DataFrame.unpivot.test # tests/ast/data/DataFrame.write.test # tests/ast/data/Dataframe.cube.test # tests/ast/data/Dataframe.distinct.test # tests/ast/data/Dataframe.drop_duplicates.test # tests/ast/data/Dataframe.filter.test # tests/ast/data/Dataframe.getitem.test # tests/ast/data/Dataframe.group_by.test # tests/ast/data/Dataframe.group_by_grouping_sets.test # tests/ast/data/Dataframe.join.asof.test # tests/ast/data/Dataframe.rollup.test # tests/ast/data/Dataframe.with_col_fns.test # tests/ast/data/DataframeNaFunctions.test # tests/ast/data/RelationalGroupedDataFrame.agg.test # tests/ast/data/RelationalGroupedDataFrame.test # tests/ast/data/Session.call.test # tests/ast/data/Session.create_dataframe.test # tests/ast/data/Session.flatten.test # tests/ast/data/Session.table_function.test # tests/ast/data/Table.init.test # tests/ast/data/case_when.test # tests/ast/data/col_alias.test # tests/ast/data/col_asc.test # tests/ast/data/col_between.test # tests/ast/data/col_binops.test # tests/ast/data/col_bitops.test # tests/ast/data/col_cast.test # tests/ast/data/col_cast_coll.test # tests/ast/data/col_desc.test # tests/ast/data/col_getitem.test # tests/ast/data/col_in_.test # tests/ast/data/col_literal.test # tests/ast/data/col_null_nan.test # tests/ast/data/col_rbinops.test # tests/ast/data/col_star.test # tests/ast/data/col_string.test # tests/ast/data/col_try_cast.test # tests/ast/data/col_udf.test # tests/ast/data/col_unary_ops.test # tests/ast/data/df_alias.test # tests/ast/data/df_analytics_functions.test # tests/ast/data/df_col.test # tests/ast/data/df_drop.test # tests/ast/data/df_except.test # tests/ast/data/df_first.test # tests/ast/data/df_intersect.test # tests/ast/data/df_limit.test # tests/ast/data/df_random_split.test # tests/ast/data/df_sample.test # tests/ast/data/df_sort.test # tests/ast/data/df_union.test # tests/ast/data/functions.test # tests/ast/data/select.test # tests/ast/data/session.read.test # tests/ast/data/session.sql.test # tests/ast/data/session_generator.test # tests/ast/data/session_range.test # tests/ast/data/session_table_dq_abs_l.test # tests/ast/data/session_table_dq_abs_s.test # tests/ast/data/session_table_dq_rs_l.test # tests/ast/data/session_table_dq_rs_s.test # tests/ast/data/session_table_dq_rt_l.test # tests/ast/data/session_table_dq_rt_s.test # tests/ast/data/session_table_uq_abs_l.test # tests/ast/data/session_table_uq_abs_s.test # tests/ast/data/session_table_uq_rs_l.test # tests/ast/data/session_table_uq_rs_s.test # tests/ast/data/session_table_uq_rt_l.test # tests/ast/data/session_table_uq_rt_s.test # tests/ast/data/session_write_pandas.test # tests/ast/data/shadowed_local_name.test # tests/ast/data/sproc.test # tests/ast/data/udaf.test # tests/ast/data/udtf.test # tests/ast/data/windows.test
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1690923
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Encoding the Interval object in AST. Relevant test is
interval.test
.