Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1690923 Encode Intervals #2454

Open
wants to merge 9 commits into
base: ls-SNOW-1491199-merge-phase0-server-side
Choose a base branch
from
12 changes: 12 additions & 0 deletions src/snowflake/snowpark/_internal/analyzer/expression.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import uuid
from typing import TYPE_CHECKING, AbstractSet, Any, Dict, List, Optional, Tuple

import snowflake.snowpark._internal.proto.ast_pb2 as proto
import snowflake.snowpark._internal.utils
from snowflake.snowpark._internal.analyzer.query_plan_analysis_utils import (
PlanNodeCategory,
Expand Down Expand Up @@ -379,7 +380,10 @@ def __init__(
millisecond: Optional[int] = None,
microsecond: Optional[int] = None,
nanosecond: Optional[int] = None,
_emit_ast: bool = True,
) -> None:
from snowflake.snowpark._internal.ast_utils import with_src_position
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to import here to prevent circular imports issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far in the code, we tried to avoid building expressions directly. The reasoning was that Expressions should disappear as they're not a first-class entity in snowpark. Instead we chose an approach where always the caller (i.e. the function that creates Interval) creates the AST and assigns it to the expression if need be. I.e.,

def build_expr_from_snowpark_column_or_python_val(
    expr_builder: proto.Expr, value: ColumnOrLiteral
) -> None:
    """Copy from a Column object's AST, or copy a literal value into an AST expression.

    Args:
        ast (proto.Expr): A previously created Expr() IR entity intance to be filled
        value (ColumnOrLiteral): The value from which to populate the provided ast parameter.

    Raises:
        TypeError: The Expr provided should only be populated from a Snowpark Column with a valid _ast field or a literal value
    """
    if isinstance(value, snowflake.snowpark.Column):
        build_expr_from_snowpark_column(expr_builder, value)
    elif isinstance(value, VALID_PYTHON_TYPES_FOR_LITERAL_VALUE):
        build_expr_from_python_val(expr_builder, value)
    elif isinstance(value, Expression):
        # Expressions must be handled by caller.
        pass
    else:
        raise TypeError(f"{type(value)} is not a valid type for Column or literal AST.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we prob. can move the AST logic to build_expr_from_snowpark_column_or_python_val and have an Interval case there. I don't think the original intention of snowpark was to use Interval directly, but rather the make_interval function.


super().__init__()
self.values_dict = {}
if year is not None:
Expand All @@ -405,6 +409,14 @@ def __init__(
if nanosecond is not None:
self.values_dict["NANOSECOND"] = nanosecond

if self._ast is None and _emit_ast:
expr = proto.Expr()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/home/runner/work/snowpark-python/snowpark-python/src/snowflake/snowpark/_internal/analyzer/expression.py:413:26 - error: "Expr" is not a known member of module "snowflake.snowpark._internal.proto.ast_pb2" (reportGeneralTypeIssues)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you repro locally?

python -m tox -e pyright

It is saying you may need a new version:
"Please install the new version or set PYRIGHT_PYTHON_FORCE_VERSION to latest"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the same errors locally, I'm not sure what to do about the first error though. There's other parts of the code that use proto.Expr() as well so I'm not sure why it's erroring here

ast = with_src_position(expr.sp_interval)
# Set the AST values based on the values_dict.
for k, v in self.values_dict.items():
getattr(ast, k.lower()).value = v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try to avoid indirection via getattr and use the values explicitly. It's much easier to read and reason about :)

self._ast = expr

@property
def sql(self) -> str:
return f"""INTERVAL '{",".join(f"{v} {k}" for k, v in self.values_dict.items())}'"""
Expand Down
876 changes: 439 additions & 437 deletions src/snowflake/snowpark/_internal/proto/ast_pb2.py

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions src/snowflake/snowpark/mock/_analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
Expression,
FunctionExpression,
InExpression,
Interval,
Like,
ListAgg,
Literal,
Expand Down Expand Up @@ -464,6 +465,9 @@ def analyze(
expr.ignore_nulls,
)

if isinstance(expr, Interval):
return str(expr)

raise SnowparkClientExceptionMessages.PLAN_INVALID_TYPE(
str(expr)
) # pragma: no cover
Expand Down
81 changes: 81 additions & 0 deletions tests/ast/data/interval.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
## TEST CASE

from snowflake.snowpark._internal.analyzer.expression import Interval

df1 = session.create_dataframe(
[
[datetime.datetime(2010, 1, 1), datetime.datetime(2011, 1, 1)],
[datetime.datetime(2012, 1, 1), datetime.datetime(2013, 1, 1)],
],
schema=["a", "b"],
)

df2 = df1.select(
df1["a"]
+ Column(
Interval(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use make_interval here to test. Not sure we should support the Column(Interval(...)) syntax at all.

quarter=1,
month=1,
week=2,
day=2,
hour=2,
minute=3,
second=3,
millisecond=3,
microsecond=4,
nanosecond=4,
)
)
)

df4 = df1.select(df1["a"] + Column(Interval(1234)))

df5 = df1.select(
df1["a"]
+ Column(
Interval(
quarter=1,
month=2,
week=3,
day=4,
hour=5,
minute=6,
second=7
)
)
)

df6 = df1.select(
df1["a"]
+ Column(
Interval(
year=1,
month=2,
week=3,
day=4,
hour=5,
minute=6,
second=7,
millisecond=8,
microsecond=9,
nanosecond=10
)
)
)


## EXPECTED UNPARSER OUTPUT

df1 = session.create_dataframe([[datetime.datetime(2010, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST")), datetime.datetime(2011, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST"))], [datetime.datetime(2012, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST")), datetime.datetime(2013, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(seconds=-18000), name="EST"))]], schema=["a", "b"])

df2 = df1.select(df1["a"] + Interval(quarter=1, month=1, week=2, day=2, hour=2, minute=3, second=3, millisecond=3, microsecond=4, nanosecond=4))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work in snowpark?

Copy link
Contributor Author

@sfc-gh-vbudati sfc-gh-vbudati Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it does - I pulled this from test_interval:

def test_interval(session):
    df1 = session.create_dataframe(
        [
            [datetime.datetime(2010, 1, 1), datetime.datetime(2011, 1, 1)],
            [datetime.datetime(2012, 1, 1), datetime.datetime(2013, 1, 1)],
        ],
        schema=["a", "b"],
    )
    df2 = df1.select(
        df1["a"]
        + Column(
            Interval(
                quarters=1,
                months=1,
                weeks=2,
                days=2,
                hours=2,
                minutes=3,
                seconds=3,
                milliseconds=3,
                microseconds=4,
                nanoseconds=4,
            )
        )
    )
    verify_column_result(
        session,
        df2,
        [
            '"(""A"" + INTERVAL \'1 QUARTER,1 MONTH,2 WEEK,2 DAY,2 HOUR,3 MINUTE,3 SECOND,3 MILLISECOND,4 MICROSECOND,4 NANOSECOND\')"',
        ],
        [TimestampType(timezone=TimestampTimeZone.NTZ)],
        None,
    )

but I see that you added the make_interval changes - I'll update my tests accordingly


df4 = df1.select(df1["a"] + Interval(year=1234))

df5 = df1.select(df1["a"] + Interval(quarter=1, month=2, week=3, day=4, hour=5, minute=6, second=7))

df6 = df1.select(df1["a"] + Interval(year=1, month=2, week=3, day=4, hour=5, minute=6, second=7, millisecond=8, microsecond=9, nanosecond=10))

## EXPECTED ENCODED AST

CvkCCvYCCuYC8gXiAgq5Agq2AgqYAcoClAEKGhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERShcEjryAzcIASgBOhoaFlNSQ19QT1NJVElPTl9URVNUX01PREUoXEISCgUKA0VTVBCw8/7///////8BSNoPEjryAzcIASgBOhoaFlNSQ19QT1NJVElPTl9URVNUX01PREUoXEISCgUKA0VTVBCw8/7///////8BSNsPCpgBygKUAQoaGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKFwSOvIDNwgBKAE6GhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERShcQhIKBQoDRVNUELDz/v///////wFI3A8SOvIDNwgBKAE6GhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERShcQhIKBQoDRVNUELDz/v///////wFI3Q8SCAoGCgFhCgFiGhoaFlNSQ19QT1NJVElPTl9URVNUX01PREUoXBIFCgNkZjEYASICCAEK1gEK0wEKwwH6CL8BCpUBepIBCivCBigKAWESB4ICBAoCCAEaGhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERShlEkfaCkQKAggCEgIIAhoCCAQiAggDKgIIAzICCAE6AggEQgIIAUoCCANSGhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERShnWgIIAhoaGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKGUSB4ICBAoCCAEaGhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERShkIAESBQoDZGYyGAIiAggCCrEBCq4BCp4B+giaAQpxem8KK8IGKAoBYRIHggIECgIIARoaGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKHYSJNoKIVIaGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKHZiAwjSCRoaGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKHYSB4ICBAoCCAEaGhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERSh2IAESBQoDZGY0GAMiAggDCsoBCscBCrcB+gizAQqJAXqGAQorwgYoCgFhEgeCAgQKAggBGhoaFlNSQ19QT1NJVElPTl9URVNUX01PREUoeRI72go4CgIIBBICCAUqAggGMgIIAkICCAFKAggHUhoaFlNSQ19QT1NJVElPTl9URVNUX01PREUoe1oCCAMaGhoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERSh5EgeCAgQKAggBGhoaFlNSQ19QT1NJVElPTl9URVNUX01PREUoeCABEgUKA2RmNRgEIgIIBAraAQrXAQrHAfoIwwEKmAF6lQEKLMIGKQoBYRIHggIECgIIARobGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKIgBEkjaCkUKAggEEgIIBRoCCAkiAggIKgIIBjICCAI6AggKSgIIB1IbGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKIoBWgIIA2ICCAEaGxoWU1JDX1BPU0lUSU9OX1RFU1RfTU9ERSiIARIHggIECgIIARobGhZTUkNfUE9TSVRJT05fVEVTVF9NT0RFKIcBIAESBQoDZGY2GAUiAggFEAEaERIPCg0KBWZpbmFsEAMYCSAUIgQQARgV
4 changes: 0 additions & 4 deletions tests/integ/test_column_names.py
Original file line number Diff line number Diff line change
Expand Up @@ -373,10 +373,6 @@ def test_literal(session, local_testing_mode):
)


@pytest.mark.skipif(
"config.getoption('local_testing_mode', default=False)",
reason="SNOW-1358946: Interval is not supported in Local Testing",
)
def test_interval(session):
df1 = session.create_dataframe(
[
Expand Down
Loading