Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: cum_sum expression raises an exception with pandas when 'over' is used #1177

Open
IsaiasGutierrezCruz opened this issue Oct 14, 2024 · 1 comment

Comments

@IsaiasGutierrezCruz
Copy link
Contributor

Describe the bug

The cum_sum expression raises an exception when used with the over expression.

I did a quick review of the code and I believe this issue can be resolved by simply adding the cum_sum expression name to the dictionary that maps pandas method names to polars method names. I would like to open a pull request to solve this c:

Steps or code to reproduce the bug

def replicating_error_cum_sum() -> None:
    data = {
        "cat": ["a", "a", "a", "b", "b", "b", "c", "c", "c"],
        'values': [1, 2, 3, 4, 5, 6, 7, 8, None]
    }
    original_data = pd.DataFrame(data)
    df_nw = nw.from_native(original_data)
    df_nw = (
        df_nw
        .with_columns(
            nw.col("values").cum_sum().over("cat")
        )
    )
    print(nw.to_native(df_nw))

Expected results

┌─────┬────────┐
│ cat ┆ values │
│ --- ┆ ---    │
│ str ┆ i64    │
╞═════╪════════╡
│ a   ┆ 1      │
│ a   ┆ 3      │
│ a   ┆ 6      │
│ b   ┆ 4      │
│ b   ┆ 9      │
│ b   ┆ 15     │
│ c   ┆ 7      │
│ c   ┆ 15     │
│ c   ┆ null   │
└─────┴────────┘

Actual results

RuntimeError: Failed to aggregated - does your aggregation function return a scalar?

Please run narwhals.show_version() and enter the output below.

System:
    python: 3.12.5 (main, Aug 14 2024, 04:32:18) [Clang 18.1.8 ]
executable: /Users/abelisaiasgutierrezcruz/Documents/Proyects/test_narwhals/.venv/bin/python
   machine: macOS-15.0.1-arm64-arm-64bit

Python dependencies:
     narwhals: 1.9.3
       pandas: 2.2.3
       polars: 1.8.2
         cudf: 
        modin: 
      pyarrow: 17.0.0
        numpy: 2.1.1

Relevant log output

File "/Users/abelisaiasgutierrezcruz/Documents/Proyects/test_narwhals/.venv/lib/python3.12/site-packages/narwhals/_expression_parsing.py", line 98, in <genexpr>
    for sublist in (evaluate_into_expr(df, into_expr) for into_expr in exprs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abelisaiasgutierrezcruz/Documents/Proyects/test_narwhals/.venv/lib/python3.12/site-packages/narwhals/_expression_parsing.py", line 63, in evaluate_into_expr
    return expr._call(df)  # type: ignore[arg-type]
           ^^^^^^^^^^^^^^
  File "/Users/abelisaiasgutierrezcruz/Documents/Proyects/test_narwhals/.venv/lib/python3.12/site-packages/narwhals/_pandas_like/expr.py", line 334, in func
    tmp = df.group_by(*keys).agg(self)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/abelisaiasgutierrezcruz/Documents/Proyects/test_narwhals/.venv/lib/python3.12/site-packages/narwhals/_pandas_like/group_by.py", line 76, in agg
    return agg_pandas(
           ^^^^^^^^^^^
  File "/Users/abelisaiasgutierrezcruz/Documents/Proyects/test_narwhals/.venv/lib/python3.12/site-packages/narwhals/_pandas_like/group_by.py", line 184, in agg_pandas
    raise RuntimeError(msg) from exc
RuntimeError: Failed to aggregated - does your aggregation function return a scalar?
@MarcoGorelli
Copy link
Member

thanks for the report!

I did a quick review of the code and I believe this issue can be resolved by simply adding the cum_sum expression name to the dictionary that maps pandas method names to polars method names. I would like to open a pull request to solve this c:

sounds great, thanks! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants