RuntimeError: Trying to backward through the graph a second time #16

mingo-x · 2021-09-09T18:15:17Z

Hi there, thanks so much for sharing the codes - really amazing work!

When I was trying to train the model on my own, I ran into an error at line 383 in train.py ((w_rec_loss * args.lambda_w_rec_loss).backward()) that

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

I read the model definition and the implementation looked correct to me - so I don't understand why such error was thrown. Do you maybe have any idea what could have gone wrong?

In the meanwhile, to unblock myself, I modified the codes a bit to run backward() for g_loss and w_rec_loss in one go (g_w_rec_loss in the following example). Does this modification make sense to you? Why did you separate the backward operation in the first place?

        adv_loss, w_rec_loss, stylecode = model(None, "G")
        adv_loss = adv_loss.mean()
        w_rec_loss = w_rec_loss.mean()
        g_loss = adv_loss * args.lambda_adv_loss

        g_optim.zero_grad()
        e_optim.zero_grad()
        g_w_rec_loss = g_loss + w_rec_loss * args.lambda_w_rec_loss
        g_w_rec_loss.backward()
        gather_grad(
            g_module.parameters(), world_size
        )  # Explicitly synchronize Generator parameters. There is a gradient sync bug in G.
        g_optim.step()
        e_optim.step()

Thanks in advance for your help!

The text was updated successfully, but these errors were encountered:

blandocs · 2021-10-15T04:53:33Z

Hi mingo-x, you can combine two losses (g_loss, w_rec_loss) together.
I think there is no huge difference.

However, you should be aware that w_rec_loss only affects the encoder, not the generator in the original version.
Your modification makes w_rec_loss also affect the update of the generator.

Lastly, if you didn't modify any training code, I don't know why RuntimeError occurs. Please check your torch version.

songquanpeng · 2021-12-12T14:38:47Z

I also ran into this error. Hi @mingo-x, have you solved it?

songquanpeng · 2021-12-13T05:46:55Z

PyTorch 1.8 is working for me, version 1.10 is not.

vicentowang · 2022-04-12T01:47:15Z

@songquanpeng backward(retain_graph=True), solved it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Trying to backward through the graph a second time #16

RuntimeError: Trying to backward through the graph a second time #16

mingo-x commented Sep 9, 2021

blandocs commented Oct 15, 2021 •

edited

Loading

songquanpeng commented Dec 12, 2021

songquanpeng commented Dec 13, 2021

vicentowang commented Apr 12, 2022

RuntimeError: Trying to backward through the graph a second time #16

RuntimeError: Trying to backward through the graph a second time #16

Comments

mingo-x commented Sep 9, 2021

blandocs commented Oct 15, 2021 • edited Loading

songquanpeng commented Dec 12, 2021

songquanpeng commented Dec 13, 2021

vicentowang commented Apr 12, 2022

blandocs commented Oct 15, 2021 •

edited

Loading