-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-44232: [Python] Validate __arrow_c_array__ length for scalar construction #44434
base: main
Are you sure you want to change the base?
GH-44232: [Python] Validate __arrow_c_array__ length for scalar construction #44434
Conversation
Hi @lukedowling44. This looks awsome. However, I think you might have accidentally commited your build folder ( |
To clarify, make sure to only commit your tests and scalar.pxi changes. |
3ff3c08
to
9a67c88
Compare
9a67c88
to
4ee815f
Compare
4ee815f
to
45b3697
Compare
@lukedowling44 This looks perfect now and have fixed the previous issue. I would like to ask why most of the statements inside the if-clause is quite similar to the execution statements in the following. |
@Isaac7777-cpu-school You are right, I was repeating code unnecessarily. I have made changes and pushed again. |
@Lukedowling444 Nice, all the dead codes are removed. I think it is pretty much ready for the PR. However, just a minor thing and this may not make sense as much - why do we need another local variable for the carray rather than directly put it in the if condition? However, since this is really just a styling issue, please also consult with the maintainer for this as well? |
@Isaac7777-cpu-school Thanks for the feedback. Cython doesn’t allow cdef statements directly within conditional logic or non-function scopes, so I had to define it before the condition. |
Rationale for this change
In the current implementation of the pyarrow.scalar constructor, when an input has the arrow_c_array interface, there is no check to ensure the length of the array is exactly 1. This can lead to unexpected behavior or invalid data being accepted for scalar construction.
This change introduces a validation step that checks whether the input array has a length of 1 when using the arrow_c_array interface, ensuring that only valid inputs are processed. This improves the reliability of Arrow when handling data from multiple libraries.
An existing workaround involves calling pa.array(other_scalar).slice(0, 1) to achieve the same result, but this change eliminates the need for such a workaround.
Looking to close #44232
What changes are included in this PR?
A validation check was added inside the pyarrow.scalar constructor to ensure that when an input with the arrow_c_array interface is provided, it must be a length-1 array.
The constructor now raises a ValueError if this condition is not met.
Are these changes tested?
Yes, the project builds successfully, and all existing test cases pass without any issues.
Additionally, I have included new test cases to specifically check the behavior of this change in python/scripts/test_scalar.py. These tests ensure that the pyarrow.scalar constructor raises a ValueError when a non-length-1 array is provided and continues to work correctly for valid inputs.
Are there any user-facing changes?
No user-facing changes are introduced, and this validation should not affect any workflows where inputs to pyarrow.scalar are already correct.
GitHub Issue: #44232
pyarrow.scalar
constructor? #44232