Question about implement external conv2d in assignment 4 #33

cblmemo · 2022-07-28T13:59:34Z

cblmemo
Jul 28, 2022

In assignment 4 Section 3 we were asked to register external function for the convolution layer. However in pytorch lib, I only find conv2d function (torch.nn.functional.conv2d) without destination passing. My implementation using this function is as follows:

@tvm.register_func("env.conv2d", override=True)
def torch_conv2d(
    x: tvm.nd.NDArray,
    w: tvm.nd.NDArray,
    b: tvm.nd.NDArray,
    o: tvm.nd.NDArray
):
    x_torch = torch.from_dlpack(x)
    w_torch = torch.from_dlpack(w)
    b_torch = torch.from_dlpack(b)
    o_torch = torch.from_dlpack(o)
    # implementation of conv2d without destination passing
    o_troch = torch.nn.functional.conv2d(x_torch, w_torch, b_torch)

However I encountered some wrong output which not passed the assert_allclose test. Error message is like:

File .../python3.9/site-packages/numpy/testing/_private/utils.py:844, in assert_array_compare(comparison, x, y, err_msg, verbose, header, precision, equal_nan, equal_inf)
    840         err_msg += '\n' + '\n'.join(remarks)
    841         msg = build_err_msg([ox, oy], err_msg,
    842                             verbose=verbose, header=header,
    843                             names=('x', 'y'), precision=precision)
--> 844         raise AssertionError(msg)

In fact, it seems that using non-destination-passing lib function will cause some unexpected computation error and an experiment verified my hypothesis.
For the following implementation of relu function:

@tvm.register_func("env.relu", override=True)
def torch_relu_1(
    x: tvm.nd.NDArray, 
    o: tvm.nd.NDArray
):
    x_torch = torch.from_dlpack(x)
    o_torch = torch.from_dlpack(o)
    # implementation using destination passing
    torch.maximum(x_torch, torch.Tensor([0.0]), out=o_torch)

@tvm.register_func("env.relu", override=True)
def torch_relu_2(
    x: tvm.nd.NDArray,
    o: tvm.nd.NDArray
):
    x_torch = torch.from_dlpack(x)
    o_torch = torch.from_dlpack(o)
    # implementation not using destination passing
    o_troch = torch.maximum(x_torch, torch.Tensor([0.0]))

I use them separately to replace the relu Tensor Expression I implemented in Section 2 and only torch_relu_1 passed the check_equivalence test, torch_relu_2 encountered same error message like I got before in the conv2d case.
I suppose this is because torch.nn.functional.conv2d combine memory allocation and compute together. Inspired by this experiment, I implement an loop-based conv2d extern function :

@tvm.register_func("env.conv2d", override=True)
def torch_conv2d(
    x: tvm.nd.NDArray,
    w: tvm.nd.NDArray,
    b: tvm.nd.NDArray,
    o: tvm.nd.NDArray
):
    x_torch = torch.from_dlpack(x)
    w_torch = torch.from_dlpack(w)
    b_torch = torch.from_dlpack(b)
    o_torch = torch.from_dlpack(o)

    for b in range(4): # batch_size
        for k in range(32): # out_channels
            for i in range(26): # out_height
                for j in range(26): # out_width
                    o_torch[b, k, i, j] = 0
                    for di in range(3): # kernel_size
                        for dj in range(3): # kernel_size
                            for q in range(1): # in_channels
                                o_torch[b, k, i, j] += x_torch[b, q, i + di, j + dj] * w_torch[k, q, di, dj]
                    o_torch[b, k, i, j] += b_torch[k]

It works but obviously it is extremely slow. ('works' means I've run it for an hour and no assertion error appears yet, I have no idea how long it will take) I'm wondering if my hypothesis right? If not, what is the real reason behind this problem? And if there is a better way to implement an extern conv2d?

Answered by cblmemo

Jul 30, 2022

Update: inspired by hbsun's answer I tried this implementation:

@tvm.register_func("env.conv2d", override=True)
def torch_conv2d(
    x: tvm.nd.NDArray,
    w: tvm.nd.NDArray,
    b: tvm.nd.NDArray,
    o: tvm.nd.NDArray
):
    x_torch = torch.from_dlpack(x)
    w_torch = torch.from_dlpack(w)
    b_torch = torch.from_dlpack(b)
    o_torch = torch.from_dlpack(o)

    out_temp = torch.nn.functional.conv2d(x_torch, w_torch, b_torch)
    torch.add(out_temp, 0, out=o_torch)

and it works as expected. So it seems that the problem is using out=o_torch will actually rewrite memory of o and o_torch = foo will only let identifier o_torch reference to variable foo and memory of o_torch will not chang…

View full answer

cblmemo · 2022-07-30T06:40:23Z

cblmemo
Jul 30, 2022
Author

Update: inspired by hbsun's answer I tried this implementation:

@tvm.register_func("env.conv2d", override=True)
def torch_conv2d(
    x: tvm.nd.NDArray,
    w: tvm.nd.NDArray,
    b: tvm.nd.NDArray,
    o: tvm.nd.NDArray
):
    x_torch = torch.from_dlpack(x)
    w_torch = torch.from_dlpack(w)
    b_torch = torch.from_dlpack(b)
    o_torch = torch.from_dlpack(o)

    out_temp = torch.nn.functional.conv2d(x_torch, w_torch, b_torch)
    torch.add(out_temp, 0, out=o_torch)

and it works as expected. So it seems that the problem is using out=o_torch will actually rewrite memory of o and o_torch = foo will only let identifier o_torch reference to variable foo and memory of o_torch will not change, thus leads to computation error (o's memory not changed in this function).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about implement external conv2d in assignment 4 #33

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Question about implement external conv2d in assignment 4 #33

cblmemo Jul 28, 2022

Replies: 1 comment

cblmemo Jul 30, 2022 Author

cblmemo
Jul 28, 2022

cblmemo
Jul 30, 2022
Author