Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Filter in agg context #1176

Open
IsaiasGutierrezCruz opened this issue Oct 14, 2024 · 1 comment
Open

[Bug]: Filter in agg context #1176

IsaiasGutierrezCruz opened this issue Oct 14, 2024 · 1 comment

Comments

@IsaiasGutierrezCruz
Copy link
Contributor

IsaiasGutierrezCruz commented Oct 14, 2024

Describe the bug

When the filter expression is used within an agg context using a pandas dataframe, the following exception is raised:

ValueError: Length of values (1) does not match length of index (0)

I was also evaluating the use of a when-otherwise expression to emulate the behavior of filtering, but anonymous expressions are not supported in the agg context. Is there a plan to support this kind of expression in that context? c:

Steps or code to reproduce the bug

def replicate_error_filter() -> None:
    data = {
        "cat": ["a", "a", "a", "b", "b", "b", "c", "c", "c"],
        "cat_2": ["d", "d", "e", "e", "f", "f", "g", "g", "g"],
        'values': [1, 2, 3, 4, 5, 6, 7, 8, 10]
    }
    original_data = pd.DataFrame(data)
    df_nw = nw.from_native(original_data)
    df_nw = (
        df_nw
        .group_by(
            'cat'
        )
        .agg(
            nw.col("values")
            .filter(nw.col("cat_2") == "d")
            .max()
        )
    )
    print(df_nw.to_native())

Expected results

┌─────┬────────┐
│ cat ┆ values │
│ --- ┆ ---    │
│ str ┆ i64    │
╞═════╪════════╡
│ a   ┆ 2      │
│ b   ┆ null   │
│ c   ┆ null   │
└─────┴────────┘

Actual results

ValueError: Length of values (1) does not match length of index (0)

Please run narwhals.show_version() and enter the output below.

System:
    python: 3.12.5 (main, Aug 14 2024, 04:32:18) [Clang 18.1.8 ]
executable: /Users/abelisaiasgutierrezcruz/Documents/Proyects/test_narwhals/.venv/bin/python
   machine: macOS-15.0.1-arm64-arm-64bit

Python dependencies:
     narwhals: 1.9.3
       pandas: 2.2.3
       polars: 1.8.2
         cudf: 
        modin: 
      pyarrow: 17.0.0
        numpy: 2.1.1

Relevant log output

File "/Users/abelisaiasgutierrezcruz/Documents/Proyects/test_narwhals/.venv/lib/python3.12/site-packages/narwhals/_pandas_like/group_by.py", line 246, in func
    results_keys = expr._call(from_dataframe(df))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abelisaiasgutierrezcruz/Documents/Proyects/test_narwhals/.venv/lib/python3.12/site-packages/narwhals/_expression_parsing.py", line 232, in func
    out.append(plx._create_series_from_scalar(_out, column))  # type: ignore[arg-type]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abelisaiasgutierrezcruz/Documents/Proyects/test_narwhals/.venv/lib/python3.12/site-packages/narwhals/_pandas_like/namespace.py", line 72, in _create_series_from_scalar
    return PandasLikeSeries._from_iterable(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abelisaiasgutierrezcruz/Documents/Proyects/test_narwhals/.venv/lib/python3.12/site-packages/narwhals/_pandas_like/series.py", line 152, in _from_iterable
    native_series_from_iterable(
  File "/Users/abelisaiasgutierrezcruz/Documents/Proyects/test_narwhals/.venv/lib/python3.12/site-packages/narwhals/_pandas_like/utils.py", line 185, in native_series_from_iterable
    return implementation.to_native_namespace().Series(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abelisaiasgutierrezcruz/Documents/Proyects/test_narwhals/.venv/lib/python3.12/site-packages/pandas/core/series.py", line 575, in __init__
    com.require_length_match(data, index)
  File "/Users/abelisaiasgutierrezcruz/Documents/Proyects/test_narwhals/.venv/lib/python3.12/site-packages/pandas/core/common.py", line 573, in require_length_match
    raise ValueError(
ValueError: Length of values (1) does not match length of index (0)
@MarcoGorelli MarcoGorelli added the bug: incorrect result Something isn't working label Oct 14, 2024
@MarcoGorelli
Copy link
Member

thanks @IsaiasGutierrezCruz for the report!

anonymous expressions are not supported in the agg context. Is there a plan to support this kind of expression in that context?

I think it would be tricky to do in the general case, at least for pandas/pyarrow/dask. but in #299 there is a suggestion to at least support more cases than we currently do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants