Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the Equation 5 for Full Surround Monodepth from Multiple Cameras #218

Open
haoweiz23 opened this issue Mar 4, 2022 · 20 comments
Open

Comments

@haoweiz23
Copy link

Hi, thank your for your works. I am trying to reproduce your pose consistency loss. This loss constraints the predicted pose from other camera to be consistent with the front camera after transformation. However, It is hard to understand how the coordinate transforms to other coordinate by Equ.5 . Could you please provide more explanation clues or detail code? Thanks.

@hjxwhy
Copy link

hjxwhy commented Mar 4, 2022

This is my code I implement, I can not promise it's right
`
calculate the pose consistency loss
:param poses: list of torch.tensor [B, 4, 4]
transform those prediction to coordinate frame of canonical camera.
:param extrinsics: torch.tensor [B, 4, 4]
extrinsics for all cameras, which be used to transform pose.
:return:

    rot_loss = 0
    trans_loss = 0
    extrinsics = extrinsics.to(poses[0].item().dtype)
    canonical_extrinsic = extrinsics[0].repeat([extrinsics.shape[0], 1, 1])  # [B, 4, 4]
    canonical_extrinsic = Pose(canonical_extrinsic)
    extrinsics = Pose(extrinsics)

    # extrinsic = extrinsics[1:, ...]
    for pose in poses:
        X_i2j = canonical_extrinsic.inverse() @ extrinsics
        X_ba = X_i2j @ pose @ X_i2j.inverse()
        rot_loss += torch.sum((X_ba.mat2vec()[:, :3] - pose.mat2vec()[0, :3]).pow(2))
        trans_loss += torch.sum((X_ba.mat2vec()[:, 3:] - pose.mat2vec()[0, 3:]).pow(2))
    loss = self.rotation_weight * rot_loss + self.translation_weight * trans_loss

`

@haoweiz23
Copy link
Author

@hjxwhy Thanks a lot. I believe this is right. By the way, have you eval implement spatio-temporal loss in FSM? I cannot achieve the same improvement (even decrease) as Table.3 in FSM paper. Maybe there are some problem in my implementation.

I implement spatial-wise pe loss as below, as Equation 3 in FSM paper.

` def spatial_wise_pe_loss(self, batch, output, return_logs=False, progress=0.0):
# Calculate spatial contexts
spatial_contexts_indices = np.array([[1, 2], [0, 3], [0, 4], [1, 5], [2, 5], [3, 4]])
spatial_contexts_rgb = [batch['rgb_original'][spatial_contexts_indices[:, 0]],
batch['rgb_original'][spatial_contexts_indices[:, 1]]]
poses = torch.Tensor(batch['extrinsics']) if isinstance(batch['extrinsics'], list) else batch['extrinsics']
intrinsics = torch.Tensor(batch['intrinsics']) if isinstance(batch['intrinsics'], list) else batch[
'intrinsics']
spatial_context_intrinsics = [intrinsics[spatial_contexts_indices[:, 0]],
intrinsics[spatial_contexts_indices[:, 1]]]

    spatial_context_masks = [batch['mask'][spatial_contexts_indices[:, 0]],
                                  batch['mask'][spatial_contexts_indices[:, 1]]]

    source_poses = Pose(poses)
    reference_poses = [Pose(poses[spatial_contexts_indices[:, 0]]),
                       Pose(poses[spatial_contexts_indices[:, 1]])]

    relative_poses = [Pose(torch.bmm(reference_poses[0].inverse().item(), source_poses.item())),
                      Pose(torch.bmm(reference_poses[1].inverse().item(), source_poses.item()))]
    spatial_output = self.self_supervised_loss(
        batch['rgb_original'], spatial_contexts_rgb,
        output['inv_depths'], relative_poses, intrinsics, spatial_context_intrinsics,
        return_logs=return_logs, progress=progress, mask=batch['mask'], ref_mask=spatial_context_masks)
    return spatial_output`

@VitorGuizilini-TRI
Copy link
Collaborator

The implementation looks alright to me. Some things that have helped other people achieving similar results:

  • Starting from a pre-trained model without the spatio-temporal constraints
  • Defining a larger value for the minimum depth of the network, so there is overlap between cameras to begin with (otherwise the temporal network can produce a scale that doesn't have any spatial overlap, and it doesn't leverage those constraints)
  • Focal length scaling for the output depth maps (the front camera of DDAD has a different intrinsics than other cameras)

@hurjunhwa
Copy link

Hi,
What do you mean by focal length scaling? Would you mind if you provide more details regarding that?
Instead of training the depth decoder to handle different intrinsics, is it about using a constant to rescale the depth value for the front view?

Thank you!

@haoweiz23
Copy link
Author

haoweiz23 commented Mar 5, 2022

@VitorGuizilini-TRI Thanks a lot ! Your suggestion is very helpful. I tried focal length scaling and it works. I am tryining start from a pretrained model without the spatio-temporal constraints now.
And I don't quite understand your second suggestion. Why larger value for the minimum depth helps? Is it because the larger depth can produce more overlapping areas when perform projection transformation between different cameras? If so, do you have a recommend minimum depth?
Thank you again for your timely suggestions.

@hurjunhwa Hi, I implement focal length scaling by scale the output depth by a constant, i.e., focal length. This focal length comes from the intrinsics input. Because I do not have the camera parameters, e.g., dx and dy. So I simply take the f_x in intrinsics as focal length to scale the depth. I tried this trick on DDAD and it works. Hope this can be helpful.

@hjxwhy
Copy link

hjxwhy commented Mar 5, 2022

@LionRoarRoar My STC implement is the same as you,but the result also degrade. You have try to scale the depth by focal length, which means that the every camera output multiple focal length or divide focal length? By the way, as my test, the input image with self occlusion cause the RMSE larger than front camera only, Have you faced this problem?

@haoweiz23
Copy link
Author

I scale each camera output with its corresponding focal length. All other cameras get all worse results than front camera in my experiments. Only RMSE larger than front camera seems unreasonable? Maybe you have wrong normalization layer in last output layer.

@hjxwhy
Copy link

hjxwhy commented Mar 6, 2022

@LionRoarRoar Thanks for your reply. I have an experiment that train only front camera and CAMERA_8 seperate, the CAMERA_8 is worse than front camera in all metrics, so I guess it's cause by the self occlusion in image in CAMERA_8. But I'm not sure because the paper seems don't have this problem. Do you plan to do this experiment? I'm sorry for ask again, the scale depth means inverse depth multiple focal length?

@haoweiz23
Copy link
Author

@hjxwhy
A1: Maybe your hypothesis is right. I noticed that self-occlusion have slightly shift on different frames, which means it is hard to pre-define a accurate self occlusion mask. Images from front camera is clean and withoud occlusion, so it should get better results than other cameras.

A2: You should scale depth map instead of inverse depth

@hjxwhy
Copy link

hjxwhy commented Mar 6, 2022

@LionRoarRoar THANKS, I will try again. If I have some new results I will share with you here. Best wishes!

@haoweiz23
Copy link
Author

Updates:
1、I tried spatial-wise constraint start from a pre-trained model without the spatio-temporal constraints. It indeed better than w/o pretrained. However, it still worse than baseline model. Besides, I am afraid this trick make spatial-wise constraint can not be compared with baseline fairly?

2、I also tried spatial-wise loss with a larger min_depth start from a pre-trained model without the spatio-temporal constraints. And the performance drops.

@abing222
Copy link

abing222 commented Mar 7, 2022

This is my code I implement, I can not promise it's right ` calculate the pose consistency loss :param poses: list of torch.tensor [B, 4, 4] transform those prediction to coordinate frame of canonical camera. :param extrinsics: torch.tensor [B, 4, 4] extrinsics for all cameras, which be used to transform pose. :return:

    rot_loss = 0
    trans_loss = 0
    extrinsics = extrinsics.to(poses[0].item().dtype)
    canonical_extrinsic = extrinsics[0].repeat([extrinsics.shape[0], 1, 1])  # [B, 4, 4]
    canonical_extrinsic = Pose(canonical_extrinsic)
    extrinsics = Pose(extrinsics)

    # extrinsic = extrinsics[1:, ...]
    for pose in poses:
        X_i2j = canonical_extrinsic.inverse() @ extrinsics
        X_ba = X_i2j @ pose @ X_i2j.inverse()
        rot_loss += torch.sum((X_ba.mat2vec()[:, :3] - pose.mat2vec()[0, :3]).pow(2))
        trans_loss += torch.sum((X_ba.mat2vec()[:, 3:] - pose.mat2vec()[0, 3:]).pow(2))
    loss = self.rotation_weight * rot_loss + self.translation_weight * trans_loss

`

rot_loss += torch.sum((X_ba.mat2vec()[:, :3] - pose.mat2vec()[0, :3]).pow(2)),I think the pose here should use cam1_pose supervise

@abing222
Copy link

abing222 commented Mar 7, 2022

Updates: 1、I tried spatial-wise constraint start from a pre-trained model without the spatio-temporal constraints. It indeed better than w/o pretrained. However, it still worse than baseline model. Besides, I am afraid this trick make spatial-wise constraint can not be compared with baseline fairly?

2、I also tried spatial-wise loss with a larger min_depth start from a pre-trained model without the spatio-temporal constraints. And the performance drops.

Have you reached the accuracy of the paper? I can't reproduce it

@haoweiz23
Copy link
Author

@abing222 No. Only Self-oclussion mask work. STC and Pose consistency loss does not work.

@abing222
Copy link

abing222 commented Mar 7, 2022

@abing222 No. Only Self-oclussion mask work. STC and Pose consistency loss does not work.

In my experiment, Only Self-oclussion mask absrel did not decline as much as the paper

@abing222
Copy link

abing222 commented Mar 7, 2022

@abing222 No. Only Self-oclussion mask work. STC and Pose consistency loss does not work.

At present, I can obtain the absolute scale through spatio, the accuracy decreases slightly. After adding STC, the accuracy increases a little

@haoweiz23
Copy link
Author

@abing222 No. Only Self-oclussion mask work. STC and Pose consistency loss does not work.

At present, I can obtain the absolute scale through spatio, the accuracy decreases slightly. After adding STC, the accuracy increases a little

You mean spatial-wise constraints not work but STC works? That is interesting. Could you please provide more implement details about your STC?Such as loss weight, how to warp
spatial-temporal image

@abing222
Copy link

abing222 commented Mar 8, 2022

Self-oclussion mask

@abing222 No. Only Self-oclussion mask work. STC and Pose consistency loss does not work.

At present, I can obtain the absolute scale through spatio, the accuracy decreases slightly. After adding STC, the accuracy increases a little

You mean spatial-wise constraints not work but STC works? That is interesting. Could you please provide more implement details about your STC?Such as loss weight, how to warp spatial-temporal image

spatial-wise is useful, provide absolute scale, but the accuracy decreased. I changed code on the basis of monodepth2 repo code without using packnet repo

@weiyithu
Copy link

@abing222 No. Only Self-oclussion mask work. STC and Pose consistency loss does not work.

At present, I can obtain the absolute scale through spatio, the accuracy decreases slightly. After adding STC, the accuracy increases a little

I also cannot obtain the absolute scale with the spatio photometric loss. Do you use any pretrained model? Or change the min_depth parameter in monodepth2 repo?

@haoweiz23
Copy link
Author

@abing222 No. Only Self-oclussion mask work. STC and Pose consistency loss does not work.

At present, I can obtain the absolute scale through spatio, the accuracy decreases slightly. After adding STC, the accuracy increases a little

I also cannot obtain the absolute scale with the spatio photometric loss. Do you use any pretrained model? Or change the min_depth parameter in monodepth2 repo?

Hi, weiyi. I am also try to implement this work. Maybe we can add wechat for discussion. My wechat: zhuhaow_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants