Add a hypothesis test for getitem #1098

sjdenny · 2024-09-30T10:45:43Z

Compares polars __getitem__ calls with pandas & pyarrow.

What type of PR is this? (check all applicable)

Related issues

Related issue #
Closes test: add hypothesis test for DataFrame.__getitem__ #1008

Checklist

Code follows style guide (ruff)
~~Tests added~~
~~Documented the changes~~

If you have comments or can explain your changes, please do so below.

Putting this up for initial feedback - there's a bunch of cases which pyarrow doesn't support (documented in the test). Do we want to support these cases, or should be instead tighten the test (on pandas as well) and declare these unsupported for now?

Cases:

pyarrow: doesn't support negative indexes
pyarrow: pairs of slices unsupported, e.g. df[0:1, 'a':'b'] (trivial pairs such as df[:, 'a':'b'] where one slice is : are an exception).
pyarrow: empty edge case, e.g. df[[], "a":] fails.
pyarrow & pandas: df[..., ::step] is unsupported, ie slice(None, None, <something>)

Compares polars with pandas & pyarrow.

MarcoGorelli

this is awesome, thanks @sjdenny !

looks like there's a failure in the "random versions" CI job (specifically, pandas==2.0.3)

if we can address that, then I think we can merge this can then gradually address the rest

…he flake in 7b37e59

sjdenny · 2024-10-02T19:57:25Z

Thanks @MarcoGorelli !

looks like there's a failure in the "random versions" CI job (specifically, pandas==2.0.3)

Looks like this was a flake instead (at least, I was able to reproduce on pandas==2.2.3 too, by running through more examples). I've bumped the number of samples up to 10k in db69469: this does (did) reliably find the failing case, but the downside is it's a bit of a slow test now. Which end of the tradeoff scale do you want to go for? Related to this, I saw Hypothesis has settings profiles, which look neat for supporting faster settings locally & more thorough sampling in CI.

sjdenny · 2024-10-03T17:56:01Z

Ah, coverage now. Looking 👀

…hecks

MarcoGorelli · 2024-10-20T20:10:54Z

awesome stuff, thanks!

there's a couple of really minor things i wanted to address, but i'll try to get this in shortly, really appreciate your contribution here 🙏

sjdenny force-pushed the test/hypothesis-test-getitem branch from 2520ec0 to e31110c Compare September 30, 2024 11:03

sjdenny marked this pull request as ready for review September 30, 2024 11:05

First pass at a hypothesis test for __getitem__.

ddf1cbe

Compares polars with pandas & pyarrow.

sjdenny force-pushed the test/hypothesis-test-getitem branch from e31110c to ddf1cbe Compare September 30, 2024 11:34

MarcoGorelli reviewed Sep 30, 2024

View reviewed changes

sjdenny added 3 commits October 2, 2024 20:44

Clarify this pyarrow failure case

7c1d1fc

Additional failure example for pandas

7b37e59

Boost number of samples to 10k (from default=100) to reliably catch t…

db69469

…he flake in 7b37e59

sjdenny added 2 commits October 8, 2024 21:55

Add pytest.mark.slow

c2f1229

Move the "constructor" check into assume(...), to aid with coverage c…

e993b43

…hecks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a hypothesis test for getitem #1098

Add a hypothesis test for getitem #1098

sjdenny commented Sep 30, 2024 •

edited

Loading

MarcoGorelli left a comment

sjdenny commented Oct 2, 2024

sjdenny commented Oct 3, 2024

MarcoGorelli commented Oct 20, 2024

Add a hypothesis test for __getitem__ #1098

Are you sure you want to change the base?

Add a hypothesis test for __getitem__ #1098

Conversation

sjdenny commented Sep 30, 2024 • edited Loading

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below.

MarcoGorelli left a comment

Choose a reason for hiding this comment

sjdenny commented Oct 2, 2024

sjdenny commented Oct 3, 2024

MarcoGorelli commented Oct 20, 2024

Add a hypothesis test for getitem #1098

Add a hypothesis test for getitem #1098

sjdenny commented Sep 30, 2024 •

edited

Loading