Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assignment of Expression to Tensor of Incorrect Dimensions - Strange Error - Numpy Unable to Allocate very Large Array #287

Open
stellarpower opened this issue Mar 2, 2023 · 4 comments

Comments

@stellarpower
Copy link

stellarpower commented Mar 2, 2023

Version:

name: SomeEnvironment
channels:
  - https://conda.anaconda.org/conda-forge

# Originally created on Ubuntu Jammy:
dependencies:
  - numpy=1.24
  - python=3.9
  - xtensor-python=0.26.1
  - xtensor-blas: Need to check, the machine in question is down.

I spent a good few hours lost trying to dig out a bug that was leading to a segfault, and then an obscure message coming back from numpy about not being able to allocate enough space for an enormous array.

It turns out that I was simply assigning an expression that should have been a (1D) vector to a 2-dimensional pytensor. This one verifies my constant love of constraining dimension at compile-time! "Let's just keep things simple" they say, and then I waste time trying to debug code that never could have run correctly and without safety measures it's not so simple 😅

In any case, we are at runtime and here is a MRE:

void debugEntryPoint(){
    
    
    xt::pytensor<std::complex<float>, 2> matrix{
            {3, 0, 0, 0},
            {0, 4, 0, 0},
            {0, 0, 5, 0},
            {0, 0, 0, 6}
    };
    xt::pytensor<std::complex<float>, 1> vector{{0, 0, 0, 1}};


    xt::pytensor<std::complex<float>, 1>   correctResult = xt::linalg::dot(matrix, vector);
    xt::pytensor<std::complex<float>, 2> incorrectResult = xt::linalg::dot(matrix, vector);
    // We never get here

    cout << correctResult << endl;


    // Or equally:
    const auto &view = xt::linalg::dot(matrix, vector);
    xt::pytensor<std::complex<float>, 1>   correctResult = view;
    xt::pytensor<std::complex<float>, 2> incorrectResult = view;
    
}

I would expect a message explaining that the size of the view as the result of the expression created cannot be assigned to a tensor of this shape. I.e. note that the error comes at the moment we assign, not when we try to compute the result.
And the problem is that this is the error that results:

>>> debugEntryPoint()
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 3.98 PiB for an array with shape (4, 140095169944816) and data type complex64

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: NumPy: unable to create ndarray

Not sure if this is down to the BLAS or Python package, but opening here as it specifically relates to the message from Numpy and at the point we perform the copy.

Cheers

@tdegeus
Copy link
Member

tdegeus commented Mar 2, 2023

Most functions do not have compile-time checks for this. There could indeed be static assertions for many functions that are easy or less easy to write, but for the moment this is not xtensor's policy. There are run-time assertions I believe. You could compile with XTENSOR_ENABLE_ASSERT which should fire a runtime error : https://xtensor.readthedocs.io/en/latest/dev-build-options.html#build

As background information, it seems that upon construction the 2d return array tries to read the second dimension of vector. Since it is not part of vector's memory you simply get rubbish.

@stellarpower
Copy link
Author

Yes sorry, the C-T checking was just an aside; this example just illustrates why I like it and why it generates frustration to leave things to runtime.

I am using xtensor and the python module from conda-forge, updated the above with versions.

Just realised whilst describing this to a friend - is this a problem with the shape types? I assume the insane size NumPy wants to allocate is due to junk on the stack. Has it run pat the end of the std::array, expecting it to have two elements, but as a vector expression, size() is returning an array with just one? If so, I'd expect whichever side is responsible (python or BLAS) to be checking both for dimensional consistency of the shape, but also that the length of the shape (i.e. number of dimensions) is suitable.

@stellarpower
Copy link
Author

stellarpower commented Mar 4, 2023

I also noted there's nothing stopping me from writing:

xt::pytensor<int, 1> a(...);
xt::pytensor<int, 2> b(...);
a = b;

Or equally the same with regular xtensors. I'd assumed that as they're templated this should be illegal - is everything checked at runtime rather than compile-time then?

@tdegeus
Copy link
Member

tdegeus commented Mar 4, 2023

Indeed!

Personally, I'm not strictly against adding compile-time assertions. However, it would increase compile time, and I find this already somewhat long on many occasions. For me run-time assertions offer enough safety : I just run once with xtensor assertions and then never again. However, if you are willing to make the case of compile-time assertions and think about implementation I will for sure not stop you ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants