Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[change + remove] Clean and uniform the array generation routines #127

Merged

Conversation

forFudan
Copy link
Collaborator

@forFudan forFudan commented Oct 16, 2024

It is a step towards cleaner and uniformed array generation routines. The core principle is to (1) Uniform the interface for each function by limiting the main argument to only one class and (2) align it with numpy.

(1) array() function only reads in a "data object", e.g., string, PyObject, List. Other reloads are moved to standalone functions.

(2) The argument shape only takes in the type ShapeLike. See the following example:

zeros[f64](Shape(100, 100))  # Support.
zeros[f64](List[Int](100, 100))  # Support.
zeros[f64](VariadicList[Int](100, 100))  # Support.

This is aligned with numpy. In numpy, the shape parameter is wrapped in a tuple (or a single Int).

Note that the following way of creating array is not possible anymore because it is not a numpy style, and will be confused with the potential keyword arguments, e.g., order.

zeros[f64](100, 100, order="C")  # Not support.
zeros[f64](Shape(100, 100), "C")  # Support.

(3) Uniform the calling chains.

  • The full function will fill the memory directly, instead of via __init__. This is a step towards making NDArray a pure container.
  • The zeros and ones functions will call the function full.
  • The overloads with List and VariadicList will call the function that reads in NDArrayShape. The function hits will also reflect this. This is to significantly shrink the lines of code, make it less verbose, and make it more trackable. See figure below.
image

(4) Next step.

Reduce the number of overloads in NDArray.__init__ according to #110.

@forFudan forFudan added the enhancement New feature or request label Oct 16, 2024
@shivasankarka
Copy link
Collaborator

shivasankarka commented Oct 16, 2024

  1. I am not sure if it's a good idea to to remove all instances of List, VariadicList, Variadic arguments input to generate NDArrays. These are still containers in Mojo and we could allow at least some of them (either List or VariadicList) as input for flexibility. Removing all their instances from NuMojo would create a gap between NuMojo and any other workloads in Mojo that uses List, VariadicList etc. I think it should be fine since we are not overloading the NDArray constructor. We should try to keep it close to numpy, but we don't have to be too rigid in that regard and let some flexibility in NuMojo.

  2. If we are using an alias for NDArrayShape and using that as input everywhere, I think It should be in lowercase letters since it's for users and not internal use. Some of my suggestions would be,

  • alias shape/shp = NDArrayShape (But shp seems a bit weird)
  • alias dims/dim = NDArrayShape

@forFudan
Copy link
Collaborator Author

List, VariadicList, Variadic arguments input to generate NDArrays

This is not aligned with numpy and would cause ambiguity, especially List is used for both shape and object. In the following example, NDArray(List[Int](2, 2)), it can either mean an array [2, 2] or an array of shape 2x2.

To avoid ambiguity, it is better to wrap shape in a different container (numpy uses tuple, we use NDArrayShape), and to wrap data in the list type.

Removing all their instances from NuMojo would create a gap between NuMojo and any other workloads in Mojo that uses List, VariadicList etc.

Yes, but we should also remove these list and varadic list in other workload later. The reason is the same as above (avoid abiguity).

We should try to keep it close to numpy, but we don't have to be too rigid in that regard and let some flexibility in NuMojo.

I think at the moment, we'd better keep it close to numpy, since numpy is the best practice for years. Another reason is that we should limit the way of using each function. Overload brings flexibility, but also makes users confused.

I think It should be in lowercase letters since it's for users and not internal use

I am little bit hesitant for this. dim is a common word for variables. Users may tend to overwrite this, e.g., var dim = 2.

@forFudan
Copy link
Collaborator Author

alias shape/shp = NDArrayShape (But shp seems a bit weird)
shp looks weird but it is shorter and is not a common word 😄

@shivasankarka
Copy link
Collaborator

shivasankarka commented Oct 17, 2024

To avoid ambiguity, it is better to wrap shape in a different container (numpy uses tuple, we use NDArrayShape), and to wrap data in the list type.

This is fine.

Yes, but we should also remove these list and varadic list in other workload later. The reason is the same as above (avoid abiguity).

What I meant is that, if we remove any instances of List, VariadicList etc completely from NuMojo, anyone who uses NuMojo as part of their workflow, might have problems interpolating with native Mojo containers like List, VariadicList etc. So either we should allow for List, VariadicList input arguments in NDArray constructors (this pollutes the constructors), or allow for List, VariadicArguments input arguments in NDArrayShape constructors (This seems more simple, but the syntax becomes long), for example,

var arr = NDArray[i16](List[Int](1,2,3)) # NDArray constructors -> This overloads the NDArray constructor, but simple syntax
var arr = NDArray[i16](shape(List[Int](1,2,3))) # NDArrayShape constructor -> simple, but long syntax

@forFudan
Copy link
Collaborator Author

So either we should allow for List, VariadicList input arguments in NDArray constructors (this pollutes the constructors), or allow for List, VariadicArguments input arguments in NDArrayShape constructors (This seems more simple, but the syntax becomes long)

@shivasankarka, Yes, I agree with you. In the case of zeros and ones, there will be no ambiguity. In other words, the first argument, no matter if it is a List or a Shape or a Tuple, they all mean the "shape".

Thus, we actually need something like Iterable. Unluckily, this is not yet implemented in Mojo, so we have to make some overloads here as a temporary solution. In future, we can change it to an Iterable trait.

I will make a change for this.

@forFudan forFudan marked this pull request as draft October 17, 2024 09:52
@shivasankarka
Copy link
Collaborator

shivasankarka commented Oct 18, 2024

@forFudan Should the alias be at the top __init__ or the corresponding files? Does it have any difference? Since we aren't gonna use the alias internally, should it be at the top level? i.e top level __init__ file.

@forFudan
Copy link
Collaborator Author

@forFudan Should the alias be at the top __init__ or the corresponding files? Does it have any difference? Since we aren't gonna use the alias internally, should it be at the top level? i.e top level __init__ file.

@shivasankarka alias is a part of the struct definition. So in most cases it would be better to put them in the same file.

For idx, it is indeed special and is only for users' convenience. I will move it back to the __init__.

…os` and `ones`. The fill will be done in `full` function but not `__init__` of NDArray.
@forFudan forFudan changed the title [change + remove] Clean the array generation routine and wrap values in NDArrayShape. [change + remove] Clean and uniform the array generation routines Oct 21, 2024
@forFudan forFudan marked this pull request as ready for review October 21, 2024 12:15
@shivasankarka shivasankarka merged commit 48ca770 into Mojo-Numerics-and-Algorithms-group:pre-0.4 Oct 21, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants