Expected behavior of `getitem` #70

forFudan · 2024-07-09T18:20:35Z

forFudan
Jul 9, 2024
Collaborator

This is related to PR #69. Before I can continue working on iterators and indices, I found it particularly important to define the expected behavior of __getitem__, particularly when the dimension of arrays decrease. So, I would like to have this discussion. It shows:

The current behavior of __getitem__ of Numpy and Numojo.
The gaps and challenges.
The potential solutions.

Please let me know how you think @MadAlex1997 @shivasankarka .

On decrease of dimensions

A decrease of dimensions may or may not happen when __getitem__ is called on an ndarray. An ndarray of X-D array can become Y-D array after __getitem__ where Y<=X.

Whether the dimension decerases or not depends on:

What is passed into __getitem__.
The number of parameters that are passed in __getitem__.

On 0-D array

Scalar is a special case of ndarray with a dimension of 0.

When do dimensions decrease?

In Numpy, the number of dimensions to be decreased is determined by the number of Int passed in __getitem__.

For example, A is a 10x10x10 ndarray (3-D). Then,

A[1, 2, 3] leads to a 0-D array (scalar), since there are 3 integers.
A[1, 2] leads to a 1-D array (vector), since there are 2 integers, so the dimension decreases by 2.
A[1] leads to a 2-D array (matrix), since there is 1 integer, so the dimension decreases by 1.

When dimensions do not decrease?

In Numpy, the number of dimensions will not decrease when Slice is passed in __getitem__ or no argument is passed in for a certain dimension (it is an implicit slide and a slide of all items will be used).

Take the same example A with 10x10x10 in shape. Then,

A[1:4, 2:5, 3:6], leads to a 3-D array (no decrease in dimension), since there are 3 slices.
A[2:8], leads to a 3-D array (no decrease in dimension), since there are 1 explicit slice and 2 implicit slices.

Mixture of int and slices

When there is a mixture of int and slices passed into __getitem__, the number of integers will be the number of dimensions to be decreased. Example,

A[1:4, 2, 2], leads to a 1-D array (vector), since there are 2 integers, so the dimension decreases by 2.

Note that, even though a slice contains one row, it does not reduce the dimensions. Example,

A[1:2, 2:3, 3:4], leads to a 3-D array (no decrease in dimension), since there are 3 slices.

Gap between current Numpy and Numojo

Currently, Numojo behaves as follows:

A is a vecotr. A[0] returns the first scalar.
B is a matrix. B[0] returns the first scalar.
C is a 3-D array. C[0] returns the first scalar.

Numpy, however, behaves like this:

A is a vecotr. A[0] returns the first scalar.
B is a matrix. B[0] returns the first row (1-D darray).
C is a 3-D array. C[0] returns the first 2-D array.

PR #69 tries to align the behavior of Numojo on A and C (matrix and higher-level ndarray). After this PR, Numojo works as:

A is a vecotr. A[0] returns the first scalar but wrapped as a 1-D array with size 1.
B is a matrix. B[0] returns the first row (1-D darray).
C is a 3-D array. C[0] returns the first 2-D array.

Potential difficulties in complete alignment

The difficulty of a complete alignment between Numpy and Numjo is due to the existence of 0-D array (scalar). When passing an ineger into an ndarray, Numpy will return either a scalar (0-D arary) or a ndarray, depending on the ndim of the input array.

However, in Numojo, we cannot achieve this goal easily, as the dimension of the ndarray is not known at compile time. For example, if A is a 10x10 ndarray, and B is vector of 10 items. We cannot achive the following behavior as numpy does:

Solutions

There are several solutions:

Deviate from the behavior of Numpy for 0-D array (scalar)

We deviate from the behavior of Numpy, where 0-D array is not a scalar but a 1-D array with only one item. Use the previous example, if A is a 10x10 ndarray, and B is vector of 10 items:

A[0] returns the first row (1-D array), with shape [10].
B[0] returns the first scalar as a 1-D array, with shape [1].

This is what has been included in PR #69.

Pros of this approach is that we achieve the behavior of Numpy except for 0-D array (scalar).

Cons is that the scalar has to be expressed as a 1-D array with size 1.

Allow `ndim = 0`

We treat scalar as a 0-D array. It will be printed without brackets. (This is like that Float64 is a special SIMD vector).

Pros of this approach is to keep the generalization of the ndarray and the rules of decreased dimensions, as described above, will work perfectly.

Cons is that the 0-D array would be confused with a scalar.

Make `ndim` a parameter known at compile time

This is related to Issue #58.

Pros of this approach is that we can achieve the rules of decreased dimensions, as described above, perfectly. When ndim is 1, then A[0] results in a scalar.

Cons of this approach is that the reshape would be difficult, as mentioned by @MadAlex1997 .

But I think that reshape is possible: We construct a new ndarray according to the new shape, based on the data buffer of the old ndarray. This can be achieved by reusing the data buffer on memory via Unsafe Pointer.

forFudan · 2024-07-19T13:07:13Z

forFudan
Jul 19, 2024
Collaborator Author

The approach 2 (allow 0-d array) has been considered.

#69

4 replies

mmenendezg Jul 23, 2024

Hi @forFudan, I come from Python, and I am still adapting to the Mojo Types system.

You said above that NuMojo treats Scalars as 0-d arrays, but later you said that one of the cons of this approach is that Scalars may be confused with 0-d. Then, what would the 0-d arrays would return?

If I have an array var array = NDArray[nj.i32] with shape [1, 10] and I select array[0] I'd expect it to return an Int value. However, as far as I've seen this returns a SIMD[DType.int32, 1]. How would the behavior of __getitem__ work in these scenarios?

forFudan Jul 23, 2024
Collaborator Author

Hi @mmenendezg! Good questions! I would refer to the descriptions in #69. Maybe you can take a look at the examples.

Here are some more elaboration:

Then, what would the 0-d arrays would return?

As a principal, in NuMojo, getting items by indices or slices from an NDArray type will always return a NDArray type. Let's say A is a vector (1-D array). Then in numojo, A[0] will return the first item of the vector as a 0-D array.

0-D array is a special NDArray type which has 0 dimension, i.e., a point.

In order to tranform it into a scalar, e.g, float or int, you have to unpack it use .item(0). Example: A[0].item(0).

If I have an array var array = NDArray[nj.i32] with shape [1, 10] and I select array[0] I'd expect it to return an Int value.

Actually, even in numpy, you should not expect a Int value for array[0]. Because array is a 1x10 matrix (2-dimensional array), array[0] will always returns the first row, which is a vector (1-dimensional array) with 10 items.

However, as far as I've seen this returns a SIMD[DType.int32, 1]

I cannot replicate this result, would you provide the code you used?

One more thing to be noted. In Mojo, an 32-bit Integer is actually a SIMD with size 1. That is to say that, the following three types are identical:

Int32 type
Scalar[DType.int32] type
SIMD[DType.int32, 1] type

shivasankarka Jul 23, 2024
Collaborator

However, as far as I've seen this returns a SIMD[DType.int32, 1]

@forFudan in the main branch, we still have the __getitem__(idx: Int), I think that's why he is getting a SIMD as output.

@mmenendezg The 0-D arrays currently work in our experimental branch. This feature will be released in the v0.2 this week or sooner. Also as @forFudan mentioned, in Mojo, SIMD is the fundamental numeric data type, that's why your output is SIMD[DType.int32, 1] which is equivalent to Int32 and Scalar[DType.int32,1].

Please let us know if you have any follow up questions. Thank you for trying NuMojo!

mmenendezg Jul 24, 2024

Thanks to both of you! I learned a lot today. I will keep trying NuMojo, it is really interesting what you've achieved in so little time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expected behavior of `getitem` #70

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Expected behavior of __getitem__ #70

forFudan Jul 9, 2024 Collaborator

On decrease of dimensions

On 0-D array

When do dimensions decrease?

When dimensions do not decrease?

Mixture of int and slices

Gap between current Numpy and Numojo

Potential difficulties in complete alignment

Solutions

Deviate from the behavior of Numpy for 0-D array (scalar)

Allow ndim = 0

Make ndim a parameter known at compile time

Replies: 1 comment · 4 replies

forFudan Jul 19, 2024 Collaborator Author

mmenendezg Jul 23, 2024

forFudan Jul 23, 2024 Collaborator Author

shivasankarka Jul 23, 2024 Collaborator

mmenendezg Jul 24, 2024

Expected behavior of `getitem` #70

forFudan
Jul 9, 2024
Collaborator

Allow `ndim = 0`

Make `ndim` a parameter known at compile time

Replies: 1 comment 4 replies

forFudan
Jul 19, 2024
Collaborator Author

forFudan Jul 23, 2024
Collaborator Author

shivasankarka Jul 23, 2024
Collaborator