Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Did you use the temporal multi-frame inputs when finetuning? #2

Open
SeaBird-Go opened this issue Oct 9, 2024 · 1 comment
Open

Comments

@SeaBird-Go
Copy link

Hi, thanks for sharing this wonderful work. Since you use the multi-frame multi-view inputs during pretraining stage, I want to know whether did you still use the temporal multi-frame inputs during fine-tune stage?

If you did not use the temporal multi-frame inputs in the downstream tasks, did it mean you discard the voxel decoder in the finetune stage and only load the pre-trained voxel encoder?

@Doctor-James
Copy link
Member

Thank you for your interest in our work. Whether we used temporal multi-frame inputs during fine-tuning depended on whether the methods we were comparing against did so. You can roughly understand it as us providing a pre-trained backbone (such as ResNet50), and during fine-tuning, we adopted exactly the same training strategy as the baseline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants