You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I simply loaded the pretrain weight and fine-tuned it on the same dataset, and got the ckpt that generates more repetitive sequences then I thought. This is quite bizarre to me. Is there something wrong with the current training code or the released ckpts are too good?
could you provide the generation results and the way you load the checkpoint?
By the way, if you use the config yaml in config/experiment/lm and continue train from the pretrained weight, the learning rate is large, which may result in the large change of pretrained weight and lead to a bad performance. So if you want to continue train, the learning rate starts from the ending rate, i.e., 1e-5, may be better.
I tried to set a smaller LR even 1e-8, but the fine-tuning would gradually degrade the pLDDT. Below is the comparison between the base DPLM-150M and the fine-tuned DPLM-150M with LR 1e-8:
pLDDT:
Base
69.44743
finetune
66.5991
If I use LR 1e-5 or something larger than 1e-8, the generation is completely broken... : (
If you want to verify, you can simply set the LR to be 1e-5, load the ckpt, and fine-tune the mode for a couple thousand steps.
Also, could you please share the configs for DPLM-150M with us? I remember in the paper, you employ a two-stage training, I wonder the hyper-params for the two stages and the training steps. Would love to reproduce your training.
Hi,
I simply loaded the pretrain weight and fine-tuned it on the same dataset, and got the ckpt that generates more repetitive sequences then I thought. This is quite bizarre to me. Is there something wrong with the current training code or the released ckpts are too good?
cc @zhengzx-nlp @wxy-nlp @leiyu-bytedance @lark
The text was updated successfully, but these errors were encountered: