Skip to content

Commit

Permalink
resolve merge conf
Browse files Browse the repository at this point in the history
Signed-off-by: Labanya Mukhopadhyay <labanya.mukhopadhyay@snowflake.com>
  • Loading branch information
sfc-gh-lmukhopadhyay committed Oct 31, 2024
2 parents 23fc257 + c8926fb commit 4708b92
Show file tree
Hide file tree
Showing 15 changed files with 638 additions and 44 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@
- Added support for `Index.to_numpy`.
- Added support for `DataFrame.align` and `Series.align` for `axis=0`.
- Added support for `size` in `GroupBy.aggregate`, `DataFrame.aggregate`, and `Series.aggregate`.
- Added support for `pd.read_pickle` (Uses native pandas for processing).
- Added support for `pd.read_html` (Uses native pandas for processing).
- Added support for `pd.read_xml` (Uses native pandas for processing).
- Added support for `DataFrame.align` and `Series.align` for `axis=1 and None`.

### Snowpark Local Testing Updates
Expand Down Expand Up @@ -55,6 +58,7 @@
- Disables sql simplification when sort is performed after limit.
- Previously, `df.sort().limit()` and `df.limit().sort()` generates the same query with sort in front of limit. Now, `df.limit().sort()` will generate query that reads `df.limit().sort()`.
- Improve performance of generated query for `df.limit().sort()`, because limit stops table scanning as soon as the number of records is satisfied.
- Added a client side error message for when an invalid stage location is passed to DataFrame read functions.

#### Bug Fixes

Expand Down
3 changes: 3 additions & 0 deletions docs/source/modin/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ Input/Output
read_json
read_parquet
read_sas
read_pickle
read_html
read_xml

.. rubric:: SQL

Expand Down
6 changes: 6 additions & 0 deletions docs/source/modin/supported/general_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ Data manipulations
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``qcut`` | P | | ``N`` if ``labels!=False`` or ``retbins=True``. |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``read_pickle`` | Y | | Uses native pandas for reading. |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``read_csv`` | P | | Reads both local and staged file(s) into a Snowpark|
| | | | pandas DataFrame. Note, the order of rows in the |
| | | | may differ from the order of rows in the original |
Expand Down Expand Up @@ -84,6 +86,10 @@ Data manipulations
| | | ``dtype_backend``, and | |
| | | ``storage_options`` are ignored. | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``read_html`` | Y | | Uses native pandas for reading. |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``read_xml`` | Y | | Uses native pandas for reading. |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``read_parquet`` | P | ``use_nullable_dtypes``, | Supported parameter(s) are: ``columns`` |
| | | ``filesystem``, and ``filters`` | |
| | | will raise an error if used. | |
Expand Down
1 change: 1 addition & 0 deletions docs/source/snowpark/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,7 @@ Functions
sum
sum_distinct
sysdate
system_reference
table_function
tan
tanh
Expand Down
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
"pytest-assume", # sql counter check
"decorator", # sql counter check
"protoc-wheel-0", # Protocol buffer compiler, for Snowpark IR
"lxml", # used in read_xml tests
]

# read the version
Expand Down
8 changes: 8 additions & 0 deletions src/snowflake/snowpark/_internal/analyzer/metadata_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ def infer_metadata(
)
from snowflake.snowpark._internal.analyzer.snowflake_plan import SnowflakePlan
from snowflake.snowpark._internal.analyzer.unary_plan_node import (
Aggregate,
Filter,
Project,
Sample,
Expand All @@ -97,6 +98,13 @@ def infer_metadata(
# When source_plan is a SnowflakeValues, metadata is already defined locally
elif isinstance(source_plan, SnowflakeValues):
attributes = source_plan.output
# When source_plan is Aggregate or Project, we already have quoted_identifiers
elif isinstance(source_plan, Aggregate):
quoted_identifiers = infer_quoted_identifiers_from_expressions(
source_plan.aggregate_expressions, # type: ignore
analyzer,
df_aliased_col_name_to_real_col_name,
)
elif isinstance(source_plan, Project):
quoted_identifiers = infer_quoted_identifiers_from_expressions(
source_plan.project_list, # type: ignore
Expand Down
12 changes: 12 additions & 0 deletions src/snowflake/snowpark/dataframe_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
from snowflake.snowpark._internal.type_utils import ColumnOrName, convert_sf_to_sp_type
from snowflake.snowpark._internal.utils import (
INFER_SCHEMA_FORMAT_TYPES,
SNOWFLAKE_PATH_PREFIXES,
TempObjectType,
get_aliased_option_name,
get_copy_into_table_options,
Expand Down Expand Up @@ -59,6 +60,15 @@
}


def _validate_stage_path(path: str) -> str:
stripped_path = path.strip("\"'")
if not any(stripped_path.startswith(prefix) for prefix in SNOWFLAKE_PATH_PREFIXES):
raise ValueError(
f"'{path}' is an invalid Snowflake stage location. DataFrameReader can only read files from stage locations."
)
return path


class DataFrameReader:
"""Provides methods to load data in various supported formats from a Snowflake
stage to a :class:`DataFrame`. The paths provided to the DataFrameReader must refer
Expand Down Expand Up @@ -411,6 +421,7 @@ def csv(self, path: str) -> DataFrame:
Returns:
a :class:`DataFrame` that is set up to load data from the specified CSV file(s) in a Snowflake stage.
"""
path = _validate_stage_path(path)
self._file_path = path
self._file_type = "CSV"

Expand Down Expand Up @@ -706,6 +717,7 @@ def _read_semi_structured_file(self, path: str, format: str) -> DataFrame:

if self._user_schema:
raise ValueError(f"Read {format} does not support user schema")
path = _validate_stage_path(path)
self._file_path = path
self._file_type = format

Expand Down
Loading

0 comments on commit 4708b92

Please sign in to comment.