-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1690923 Encode Intervals #2454
base: ls-SNOW-1491199-merge-phase0-server-side
Are you sure you want to change the base?
Changes from 2 commits
a05a978
5f492e0
1905e6c
967121b
737eb3a
fa37dd3
fc7ac94
31ce5b1
12679bc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,7 @@ | |
import uuid | ||
from typing import TYPE_CHECKING, AbstractSet, Any, Dict, List, Optional, Tuple | ||
|
||
import snowflake.snowpark._internal.proto.ast_pb2 as proto | ||
import snowflake.snowpark._internal.utils | ||
from snowflake.snowpark._internal.analyzer.query_plan_analysis_utils import ( | ||
PlanNodeCategory, | ||
|
@@ -379,7 +380,10 @@ def __init__( | |
millisecond: Optional[int] = None, | ||
microsecond: Optional[int] = None, | ||
nanosecond: Optional[int] = None, | ||
_emit_ast: bool = True, | ||
) -> None: | ||
from snowflake.snowpark._internal.ast_utils import with_src_position | ||
|
||
super().__init__() | ||
self.values_dict = {} | ||
if year is not None: | ||
|
@@ -405,6 +409,14 @@ def __init__( | |
if nanosecond is not None: | ||
self.values_dict["NANOSECOND"] = nanosecond | ||
|
||
if self._ast is None and _emit_ast: | ||
expr = proto.Expr() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. /home/runner/work/snowpark-python/snowpark-python/src/snowflake/snowpark/_internal/analyzer/expression.py:413:26 - error: "Expr" is not a known member of module "snowflake.snowpark._internal.proto.ast_pb2" (reportGeneralTypeIssues) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you repro locally? python -m tox -e pyright It is saying you may need a new version: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I get the same errors locally, I'm not sure what to do about the first error though. There's other parts of the code that use |
||
ast = with_src_position(expr.sp_interval) | ||
# Set the AST values based on the values_dict. | ||
for k, v in self.values_dict.items(): | ||
getattr(ast, k.lower()).value = v | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's try to avoid indirection via getattr and use the values explicitly. It's much easier to read and reason about :) |
||
self._ast = expr | ||
|
||
@property | ||
def sql(self) -> str: | ||
return f"""INTERVAL '{",".join(f"{v} {k}" for k, v in self.values_dict.items())}'""" | ||
|
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
## TEST CASE | ||
|
||
from snowflake.snowpark._internal.analyzer.expression import Interval | ||
|
||
df1 = session.create_dataframe( | ||
[ | ||
[datetime.datetime(2010, 1, 1), datetime.datetime(2011, 1, 1)], | ||
[datetime.datetime(2012, 1, 1), datetime.datetime(2013, 1, 1)], | ||
], | ||
schema=["a", "b"], | ||
) | ||
|
||
df2 = df1.select( | ||
df1["a"] | ||
+ Column( | ||
Interval( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's use make_interval here to test. Not sure we should support the |
||
quarter=1, | ||
month=1, | ||
week=2, | ||
day=2, | ||
hour=2, | ||
minute=3, | ||
second=3, | ||
millisecond=3, | ||
microsecond=4, | ||
nanosecond=4, | ||
) | ||
) | ||
) | ||
|
||
df4 = df1.select(df1["a"] + Column(Interval(1234))) | ||
|
||
df5 = df1.select( | ||
df1["a"] | ||
+ Column( | ||
Interval( | ||
quarter=1, | ||
month=2, | ||
week=3, | ||
day=4, | ||
hour=5, | ||
minute=6, | ||
second=7 | ||
) | ||
) | ||
) | ||
|
||
df6 = df1.select( | ||
df1["a"] | ||
+ Column( | ||
Interval( | ||
year=1, | ||
month=2, | ||
week=3, | ||
day=4, | ||
hour=5, | ||
minute=6, | ||
second=7, | ||
millisecond=8, | ||
microsecond=9, | ||
nanosecond=10 | ||
) | ||
) | ||
) | ||
|
||
|
||
## EXPECTED UNPARSER OUTPUT | ||
|
||
df1 = session.create_dataframe([[datetime.datetime(2010, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST")), datetime.datetime(2011, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST"))], [datetime.datetime(2012, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST")), datetime.datetime(2013, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST"))]], schema=["a", "b"]) | ||
|
||
df2 = df1.select(df1["a"] + Interval(quarter=1, month=1, week=2, day=2, hour=2, minute=3, second=3, millisecond=3, microsecond=4, nanosecond=4)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does this work in snowpark? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe it does - I pulled this from def test_interval(session):
df1 = session.create_dataframe(
[
[datetime.datetime(2010, 1, 1), datetime.datetime(2011, 1, 1)],
[datetime.datetime(2012, 1, 1), datetime.datetime(2013, 1, 1)],
],
schema=["a", "b"],
)
df2 = df1.select(
df1["a"]
+ Column(
Interval(
quarters=1,
months=1,
weeks=2,
days=2,
hours=2,
minutes=3,
seconds=3,
milliseconds=3,
microseconds=4,
nanoseconds=4,
)
)
)
verify_column_result(
session,
df2,
[
'"(""A"" + INTERVAL \'1 QUARTER,1 MONTH,2 WEEK,2 DAY,2 HOUR,3 MINUTE,3 SECOND,3 MILLISECOND,4 MICROSECOND,4 NANOSECOND\')"',
],
[TimestampType(timezone=TimestampTimeZone.NTZ)],
None,
) but I see that you added the |
||
|
||
df4 = df1.select(df1["a"] + Interval(year=1234)) | ||
|
||
df5 = df1.select(df1["a"] + Interval(quarter=1, month=2, week=3, day=4, hour=5, minute=6, second=7)) | ||
|
||
df6 = df1.select(df1["a"] + Interval(year=1, month=2, week=3, day=4, hour=5, minute=6, second=7, millisecond=8, microsecond=9, nanosecond=10)) | ||
|
||
## EXPECTED ENCODED AST | ||
|
||
CvkCCvYCCuYC8gXiAgq5Agq2AgqYAcoClAEKGhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERShcEjryAzcIASgBOhoaFlNSQ19QT1NJVElPTl9URVNUX01PREUoXEISCgUKA0VTVBCw8/7///////8BSNoPEjryAzcIASgBOhoaFlNSQ19QT1NJVElPTl9URVNUX01PREUoXEISCgUKA0VTVBCw8/7///////8BSNsPCpgBygKUAQoaGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKFwSOvIDNwgBKAE6GhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERShcQhIKBQoDRVNUELDz/v///////wFI3A8SOvIDNwgBKAE6GhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERShcQhIKBQoDRVNUELDz/v///////wFI3Q8SCAoGCgFhCgFiGhoaFlNSQ19QT1NJVElPTl9URVNUX01PREUoXBIFCgNkZjEYASICCAEK1gEK0wEKwwH6CL8BCpUBepIBCivCBigKAWESB4ICBAoCCAEaGhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERShlEkfaCkQKAggCEgIIAhoCCAQiAggDKgIIAzICCAE6AggEQgIIAUoCCANSGhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERShnWgIIAhoaGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKGUSB4ICBAoCCAEaGhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERShkIAESBQoDZGYyGAIiAggCCrEBCq4BCp4B+giaAQpxem8KK8IGKAoBYRIHggIECgIIARoaGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKHYSJNoKIVIaGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKHZiAwjSCRoaGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKHYSB4ICBAoCCAEaGhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERSh2IAESBQoDZGY0GAMiAggDCsoBCscBCrcB+gizAQqJAXqGAQorwgYoCgFhEgeCAgQKAggBGhoaFlNSQ19QT1NJVElPTl9URVNUX01PREUoeRI72go4CgIIBBICCAUqAggGMgIIAkICCAFKAggHUhoaFlNSQ19QT1NJVElPTl9URVNUX01PREUoe1oCCAMaGhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERSh5EgeCAgQKAggBGhoaFlNSQ19QT1NJVElPTl9URVNUX01PREUoeCABEgUKA2RmNRgEIgIIBAraAQrXAQrHAfoIwwEKmAF6lQEKLMIGKQoBYRIHggIECgIIARobGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKIgBEkjaCkUKAggEEgIIBRoCCAkiAggIKgIIBjICCAI6AggKSgIIB1IbGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKIoBWgIIA2ICCAEaGxoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERSiIARIHggIECgIIARobGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKIcBIAESBQoDZGY2GAUiAggFEAEaERIPCg0KBWZpbmFsEAMYCSAUIgQQARgV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to import here to prevent circular imports issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far in the code, we tried to avoid building expressions directly. The reasoning was that Expressions should disappear as they're not a first-class entity in snowpark. Instead we chose an approach where always the caller (i.e. the function that creates Interval) creates the AST and assigns it to the expression if need be. I.e.,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we prob. can move the AST logic to build_expr_from_snowpark_column_or_python_val and have an
Interval
case there. I don't think the original intention of snowpark was to useInterval
directly, but rather themake_interval
function.