Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support window functions with window function arguments #13017

Open
timsaucer opened this issue Oct 19, 2024 · 0 comments
Open

Support window functions with window function arguments #13017

timsaucer opened this issue Oct 19, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@timsaucer
Copy link
Contributor

Describe the bug

In the following example, if we have one window function that depends upon another window function, we cannot do these in a single step. However if we break the operation into two steps, it succeeds. You can see in the trivial example, by doing a select operation after the first window operation we get the desired result.

The difficulty here is that when using the DataFrame API it is common to build up a set of library functions that should be able to take any kind of expression. The window expression should be valid input to other window functions. If we do not support this, then the end user needs to track down places where their library function is returning a window function expression and force a select operation on the DataFrame. This is particularly difficult when building libraries that generate large chains of operations.

To Reproduce

    #[tokio::test]
    async fn window_over_window() -> Result<()> {
        use datafusion_common::record_batch;
        use datafusion_common::create_array;
        use datafusion_functions_aggregate::min_max::max_udaf;
        let ctx = SessionContext::new();
        let _ = ctx.register_batch("t", record_batch!(("a", Int32, vec![1, 2, 3]))?);
        let df = ctx.table("t").await?;

        let max_of_col = Expr::WindowFunction(WindowFunction::new(
            WindowFunctionDefinition::AggregateUDF(max_udaf()),
            vec![col("row_num")],
        ));

        let max_of_window = Expr::WindowFunction(WindowFunction::new(
            WindowFunctionDefinition::AggregateUDF(max_udaf()),
            vec![row_number()],
        ));

        let passing_df = df.clone().select(vec![row_number().alias("row_num")])?.select(vec![max_of_col])?;

        passing_df.show().await?;

        let failing_df = df.select(vec![max_of_window])?;

        failing_df.show().await?;

        Ok(())
    }

Expected behavior

These two approaches should yield identical results.

Additional context

This is a trivial example, but I have an actual use case that this is based upon.

@timsaucer timsaucer added the bug Something isn't working label Oct 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant