-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No way to copy a tensor from gpu to cpu to pre allocated array. #1388
Comments
One way to make a "fast" copy without overloading memory is make contiguous and add this on TensorAccessor.cs in function if (_tensor.is_contiguous()) {
//This is very fast. And work VERY WELL
var shps = _tensor.shape;
long TempCount = 1;
for (int i = 0; i < shps.Length; i++)
TempCount *= shps[i]; //Theorically the numel is simple as product of each element shape
unsafe {
return new Span<T>(_tensor_data_ptr.ToPointer(), Convert.ToInt32(TempCount)).ToArray();
}
} I Added these in one comit of my Pull Request Autocast. I try to figure out how make same idea if the tensor is not contiguous. Because this way for faster copy i always i need make the tensor as contiguous. torch.Tensor te /*blablabla*/;
te = te.contiguous().data<float>().ToArray() I noticed that if the tensor is not contiguous call always the method Numel so always computed. Edit: Oh sorry i misunderstood what you mean, i think with CopyTo will work. You mean like this? float[] data = new float[h*w*3]; //PreAllocated in top of function for example
//Intense functions and process blablabla
tenGPU.data<float>().CopyTo(data); //`tenGPU` is a variable of torch.Tensor that is allocated in GPU I will test this. If that not work, soon i investigate how do that. |
Great thanks. I don't know why I didnt see CopyTo before. tenGPU.data<float>().CopyTo(data); But its not faster. This takes 340ms for 12,000,000 floats. This is 150MB/s which is extremely slow for PCIE bandwidth. whys it so slow? |
@LukePoga torch.Tensor tenGPU;
//Blablabla
tenGPU = tenGPU.contiguous();
//After that you can call tenGPU.data<T>().ToArray() or a CopyTo.
tenGPU.data<T>().ToArray() //Or CopyTo My Fast //From my branch of TorchSharp/Utils/TensorAccessor.cs
unsafe {
return new Span<T>(_tensor_data_ptr.ToPointer(), Convert.ToInt32(TempCount)).ToArray();
} That this not iterate over array and assign value on index. This create a complete copy. Soon i will make a |
Doesnt appear to be any way to transfer a result tensor in to an existing cpu float array. Below requires new memory allocation.
If this is part of a loop, this is a lot of wasted memory allocation and time! Below is how libraries normally do things. eg. CUDA.
float[] cpuResult ..... (pre allocated further up)
gpuResult.CopyToHost(cpuResult);
Maybe I missed this CopyTo because its kinda essential for any gpu type library (?!)
Is this project maintained?
The text was updated successfully, but these errors were encountered: