Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Trying to backward through the graph a second time #16

Open
mingo-x opened this issue Sep 9, 2021 · 4 comments
Open

RuntimeError: Trying to backward through the graph a second time #16

mingo-x opened this issue Sep 9, 2021 · 4 comments

Comments

@mingo-x
Copy link

mingo-x commented Sep 9, 2021

Hi there, thanks so much for sharing the codes - really amazing work!

When I was trying to train the model on my own, I ran into an error at line 383 in train.py ((w_rec_loss * args.lambda_w_rec_loss).backward()) that

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

I read the model definition and the implementation looked correct to me - so I don't understand why such error was thrown. Do you maybe have any idea what could have gone wrong?

In the meanwhile, to unblock myself, I modified the codes a bit to run backward() for g_loss and w_rec_loss in one go (g_w_rec_loss in the following example). Does this modification make sense to you? Why did you separate the backward operation in the first place?

        adv_loss, w_rec_loss, stylecode = model(None, "G")
        adv_loss = adv_loss.mean()
        w_rec_loss = w_rec_loss.mean()
        g_loss = adv_loss * args.lambda_adv_loss

        g_optim.zero_grad()
        e_optim.zero_grad()
        g_w_rec_loss = g_loss + w_rec_loss * args.lambda_w_rec_loss
        g_w_rec_loss.backward()
        gather_grad(
            g_module.parameters(), world_size
        )  # Explicitly synchronize Generator parameters. There is a gradient sync bug in G.
        g_optim.step()
        e_optim.step()

Thanks in advance for your help!

@blandocs
Copy link
Collaborator

blandocs commented Oct 15, 2021

Hi mingo-x, you can combine two losses (g_loss, w_rec_loss) together.
I think there is no huge difference.

However, you should be aware that w_rec_loss only affects the encoder, not the generator in the original version.
Your modification makes w_rec_loss also affect the update of the generator.

Lastly, if you didn't modify any training code, I don't know why RuntimeError occurs. Please check your torch version.

@songquanpeng
Copy link

I also ran into this error. Hi @mingo-x, have you solved it?

@songquanpeng
Copy link

PyTorch 1.8 is working for me, version 1.10 is not.

@vicentowang
Copy link

@songquanpeng backward(retain_graph=True), solved it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants