You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the following example, if we have one window function that depends upon another window function, we cannot do these in a single step. However if we break the operation into two steps, it succeeds. You can see in the trivial example, by doing a select operation after the first window operation we get the desired result.
The difficulty here is that when using the DataFrame API it is common to build up a set of library functions that should be able to take any kind of expression. The window expression should be valid input to other window functions. If we do not support this, then the end user needs to track down places where their library function is returning a window function expression and force a select operation on the DataFrame. This is particularly difficult when building libraries that generate large chains of operations.
To Reproduce
#[tokio::test]
async fn window_over_window() -> Result<()> {
use datafusion_common::record_batch;
use datafusion_common::create_array;
use datafusion_functions_aggregate::min_max::max_udaf;
let ctx = SessionContext::new();
let _ = ctx.register_batch("t", record_batch!(("a", Int32, vec![1, 2, 3]))?);
let df = ctx.table("t").await?;
let max_of_col = Expr::WindowFunction(WindowFunction::new(
WindowFunctionDefinition::AggregateUDF(max_udaf()),
vec![col("row_num")],
));
let max_of_window = Expr::WindowFunction(WindowFunction::new(
WindowFunctionDefinition::AggregateUDF(max_udaf()),
vec![row_number()],
));
let passing_df = df.clone().select(vec![row_number().alias("row_num")])?.select(vec![max_of_col])?;
passing_df.show().await?;
let failing_df = df.select(vec![max_of_window])?;
failing_df.show().await?;
Ok(())
}
Expected behavior
These two approaches should yield identical results.
Additional context
This is a trivial example, but I have an actual use case that this is based upon.
The text was updated successfully, but these errors were encountered:
Describe the bug
In the following example, if we have one window function that depends upon another window function, we cannot do these in a single step. However if we break the operation into two steps, it succeeds. You can see in the trivial example, by doing a
select
operation after the first window operation we get the desired result.The difficulty here is that when using the DataFrame API it is common to build up a set of library functions that should be able to take any kind of expression. The window expression should be valid input to other window functions. If we do not support this, then the end user needs to track down places where their library function is returning a window function expression and force a select operation on the DataFrame. This is particularly difficult when building libraries that generate large chains of operations.
To Reproduce
Expected behavior
These two approaches should yield identical results.
Additional context
This is a trivial example, but I have an actual use case that this is based upon.
The text was updated successfully, but these errors were encountered: