Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1491199 Server-side Snowpark: Update package dependencies #2525

Draft
wants to merge 12 commits into
base: ls-SNOW-1491199-merge-phase0-server-side
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,77 @@

#### New Features

- Added support for 'Service' domain to `session.lineage.trace` API.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this file in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging changes to main requires updating the changelog. However, it's unclear to me why this change made it here. I'll need to see how the commit chain got applied - this isn't correct here.

- Added support for `copy_grants` parameter when registering UDxF and stored procedures.
- Added support for the following methods in `DataFrameWriter` to support daisy-chaining:
- `option`
- `options`
- `partition_by`
- Added support for `snowflake_cortex_summarize`.

#### Improvements

- Disables sql simplification when sort is performed after limit.
- Previously, `df.sort().limit()` and `df.limit().sort()` generates the same query with sort in front of limit. Now, `df.limit().sort()` will generate query that reads `df.limit().sort()`.
- Improve performance of generated query for `df.limit().sort()`, because limit stops table scanning as soon as the number of records is satisfied.

#### Bug Fixes

- Fixed a bug where the automatic cleanup of temporary tables could interfere with the results of async query execution.
- Fixed a bug in `DataFrame.analytics.time_series_agg` function to handle multiple data points in same sliding interval.

#### Dependency Updates

- Added a dependency on `protobuf>=5.28` and `tzlocal` at runtime.
- Added a dependency on `protoc-wheel-0` for the development profile.
- Require `snowflake-connector-python>=3.12.0, <4.0.0` (was `>=3.10.0`).

### Snowpark pandas API Updates

#### New Features

- Added support for `np.subtract`, `np.multiply`, `np.divide`, and `np.true_divide`.
- Added support for tracking usages of `__array_ufunc__`.
- Added numpy compatibility support for `np.float_power`, `np.mod`, `np.remainder`, `np.greater`, `np.greater_equal`, `np.less`, `np.less_equal`, `np.not_equal`, and `np.equal`.
- Added numpy compatibility support for `np.log`, `np.log2`, and `np.log10`
- Added support for `DataFrameGroupBy.bfill`, `SeriesGroupBy.bfill`, `DataFrameGroupBy.ffill`, and `SeriesGroupBy.ffill`.
- Added support for `on` parameter with `Resampler`.
- Added support for timedelta inputs in `value_counts()`.
- Added support for applying Snowpark Python function `snowflake_cortex_summarize`.
- Added support for `DataFrame.attrs` and `Series.attrs`.
- Added support for `DataFrame.style`.

#### Improvements

- Improved generated SQL query for `head` and `iloc` when the row key is a slice.
- Improved error message when passing an unknown timezone to `tz_convert` and `tz_localize` in `Series`, `DataFrame`, `Series.dt`, and `DatetimeIndex`.
- Improved documentation for `tz_convert` and `tz_localize` in `Series`, `DataFrame`, `Series.dt`, and `DatetimeIndex` to specify the supported timezone formats.
- Added additional kwargs support for `df.apply` and `series.apply` ( as well as `map` and `applymap` ) when using snowpark functions. This allows for some position independent compatibility between apply and functions where the first argument is not a pandas object.
- Improved generated SQL query for `iloc` and `iat` when the row key is a scalar.
- Removed all joins in `iterrows`.
- Improved documentation for `Series.map` to reflect the unsupported features.
- Added support for `np.may_share_memory` which is used internally by many scikit-learn functions. This method will always return false when called with a Snowpark pandas object.

#### Bug Fixes

- Fixed a bug where `DataFrame` and `Series` `pct_change()` would raise `TypeError` when input contained timedelta columns.
- Fixed a bug where `replace()` would sometimes propagate `Timedelta` types incorrectly through `replace()`. Instead raise `NotImplementedError` for `replace()` on `Timedelta`.
- Fixed a bug where `DataFrame` and `Series` `round()` would raise `AssertionError` for `Timedelta` columns. Instead raise `NotImplementedError` for `round()` on `Timedelta`.
- Fixed a bug where `reindex` fails when the new index is a Series with non-overlapping types from the original index.
- Fixed a bug where calling `__getitem__` on a DataFrameGroupBy object always returned a DataFrameGroupBy object if `as_index=False`.
- Fixed a bug where inserting timedelta values into an existing column would silently convert the values to integers instead of raising `NotImplementedError`.
- Fixed a bug where `DataFrame.shift()` on axis=0 and axis=1 would fail to propagate timedelta types.
- `DataFrame.abs()`, `DataFrame.__neg__()`, `DataFrame.stack()`, and `DataFrame.unstack()` now raise `NotImplementedError` for timedelta inputs instead of failing to propagate timedelta types.

### Snowpark Local Testing Updates

#### Bug Fixes

- Fixed a bug where `DataFrame.alias` raises `KeyError` for input column name.

#### Bug Fixes
- Fixed a bug where `to_csv` on Snowflake stage fails when data contains empty strings.

## 1.23.0 (2024-10-09)

### Snowpark Python API Updates
Expand Down
8 changes: 3 additions & 5 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,6 @@
MODIN_DEPENDENCY_VERSION = (
"==0.28.1" # Snowpark pandas requires modin 0.28.1, which depends on pandas 2.2.1
)
# Use HEAD of main branch in connector. This doesn't work with [pandas] extra.
# CONNECTOR_DEPENDENCY = "snowflake-connector-python @ git+https://github.com/snowflakedb/snowflake-connector-python@main#egg=snowflake-connector-python"
CONNECTOR_DEPENDENCY_VERSION = ">=3.12.0, <4.0.0"
CONNECTOR_DEPENDENCY = f"snowflake-connector-python{CONNECTOR_DEPENDENCY_VERSION}"
INSTALL_REQ_LIST = [
Expand All @@ -23,11 +21,11 @@
CONNECTOR_DEPENDENCY,
# snowpark directly depends on typing-extension, so we should not remove it even if connector also depends on it.
"typing-extensions>=4.1.0, <5.0.0",
"protobuf>=5.28",
"tzlocal",
"pyyaml",
"cloudpickle>=1.6.0,<=2.2.1,!=2.1.0,!=2.2.0;python_version<'3.11'",
"cloudpickle==2.2.1;python_version~='3.11'", # backend only supports cloudpickle 2.2.1 + python 3.11 at the moment
"protobuf>=5.28", # Snowpark IR
"tzlocal", # Snowpark IR
]
REQUIRED_PYTHON_VERSION = ">=3.8, <3.12"

Expand All @@ -46,7 +44,6 @@
DEVELOPMENT_REQUIREMENTS = [
"pytest<8.0.0", # check SNOW-1022240 for more details on the pin here
"pytest-cov",
"wrapt",
"coverage",
"sphinx==5.0.2",
"cachetools", # used in UDF doctest
Expand All @@ -64,6 +61,7 @@
"graphviz", # used in plot tests
"pytest-assume", # sql counter check
"decorator", # sql counter check
"protoc-wheel-0", # Protocol buffer compiler, for Snowpark IR
]

# read the version
Expand Down
Loading