-
In assignment 4 Section 3 we were asked to register external function for the convolution layer. However in pytorch lib, I only find conv2d function (torch.nn.functional.conv2d) without destination passing. My implementation using this function is as follows: @tvm.register_func("env.conv2d", override=True)
def torch_conv2d(
x: tvm.nd.NDArray,
w: tvm.nd.NDArray,
b: tvm.nd.NDArray,
o: tvm.nd.NDArray
):
x_torch = torch.from_dlpack(x)
w_torch = torch.from_dlpack(w)
b_torch = torch.from_dlpack(b)
o_torch = torch.from_dlpack(o)
# implementation of conv2d without destination passing
o_troch = torch.nn.functional.conv2d(x_torch, w_torch, b_torch) However I encountered some wrong output which not passed the File .../python3.9/site-packages/numpy/testing/_private/utils.py:844, in assert_array_compare(comparison, x, y, err_msg, verbose, header, precision, equal_nan, equal_inf)
840 err_msg += '\n' + '\n'.join(remarks)
841 msg = build_err_msg([ox, oy], err_msg,
842 verbose=verbose, header=header,
843 names=('x', 'y'), precision=precision)
--> 844 raise AssertionError(msg) In fact, it seems that using non-destination-passing lib function will cause some unexpected computation error and an experiment verified my hypothesis. @tvm.register_func("env.relu", override=True)
def torch_relu_1(
x: tvm.nd.NDArray,
o: tvm.nd.NDArray
):
x_torch = torch.from_dlpack(x)
o_torch = torch.from_dlpack(o)
# implementation using destination passing
torch.maximum(x_torch, torch.Tensor([0.0]), out=o_torch)
@tvm.register_func("env.relu", override=True)
def torch_relu_2(
x: tvm.nd.NDArray,
o: tvm.nd.NDArray
):
x_torch = torch.from_dlpack(x)
o_torch = torch.from_dlpack(o)
# implementation not using destination passing
o_troch = torch.maximum(x_torch, torch.Tensor([0.0])) I use them separately to replace the @tvm.register_func("env.conv2d", override=True)
def torch_conv2d(
x: tvm.nd.NDArray,
w: tvm.nd.NDArray,
b: tvm.nd.NDArray,
o: tvm.nd.NDArray
):
x_torch = torch.from_dlpack(x)
w_torch = torch.from_dlpack(w)
b_torch = torch.from_dlpack(b)
o_torch = torch.from_dlpack(o)
for b in range(4): # batch_size
for k in range(32): # out_channels
for i in range(26): # out_height
for j in range(26): # out_width
o_torch[b, k, i, j] = 0
for di in range(3): # kernel_size
for dj in range(3): # kernel_size
for q in range(1): # in_channels
o_torch[b, k, i, j] += x_torch[b, q, i + di, j + dj] * w_torch[k, q, di, dj]
o_torch[b, k, i, j] += b_torch[k] It works but obviously it is extremely slow. ('works' means I've run it for an hour and no assertion error appears yet, I have no idea how long it will take) I'm wondering if my hypothesis right? If not, what is the real reason behind this problem? And if there is a better way to implement an extern conv2d? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Update: inspired by hbsun's answer I tried this implementation: @tvm.register_func("env.conv2d", override=True)
def torch_conv2d(
x: tvm.nd.NDArray,
w: tvm.nd.NDArray,
b: tvm.nd.NDArray,
o: tvm.nd.NDArray
):
x_torch = torch.from_dlpack(x)
w_torch = torch.from_dlpack(w)
b_torch = torch.from_dlpack(b)
o_torch = torch.from_dlpack(o)
out_temp = torch.nn.functional.conv2d(x_torch, w_torch, b_torch)
torch.add(out_temp, 0, out=o_torch) and it works as expected. So it seems that the problem is using |
Beta Was this translation helpful? Give feedback.
Update: inspired by hbsun's answer I tried this implementation:
and it works as expected. So it seems that the problem is using
out=o_torch
will actually rewrite memory ofo
ando_torch = foo
will only let identifiero_torch
reference to variablefoo
and memory ofo_torch
will not chang…