Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

微调过程中loss问题 #14

Open
zhanghang-official opened this issue May 13, 2024 · 2 comments
Open

微调过程中loss问题 #14

zhanghang-official opened this issue May 13, 2024 · 2 comments

Comments

@zhanghang-official
Copy link

训练过程中很快出现loss跳变为0的现象,降低学习率无法解决该问题。
image
配置文件如下:
model:
arch: st_llm_hf
model_type: instructblip_vicuna0
use_grad_checkpoint: True
max_txt_len: 256
end_sym: "###"
#prompt_path: "prompts/alignment.txt"
prompt_template: '###Human: {} ###Assistant: '
llama_model: '/root/qfs/lmm/weights/stllm/pretrained/vicuna-7b-v1.1/'
ckpt: '/root/qfs/lmm/weights/stllm/pretrained/instruct_blip_vicuna7b_trimmed.pth'
q_former_model: '/root/qfs/lmm/weights/stllm/pretrained/instruct_blip_vicuna7b_trimmed.pth'
qformer_text_input: True
freeze_LLM: False
video_input: "residual"
residual_size: 16
use_mask : True
mvm_decode: True

datasets:
caption_体育240402_en:
num_frames: 64

run:
task: video_text_it
bf16: True
tf32: False
output_dir: "./output/instructblipbase_stllm_conversation"
num_train_epochs: 4
dataloader_num_workers: 2
per_device_train_batch_size: 2
per_device_eval_batch_size: 2
gradient_accumulation_steps: 1
evaluation_strategy: "no"

learning_rate: 2e-5

learning_rate: 1e-10
weight_decay: 0.

warmup_ratio: 0.03

warmup_ratio: 0.3
lr_scheduler_type: 'cosine'
logging_steps: 1
model_max_length: 1024
save_steps: 3000
#save_strategy: "epoch"
save_total_limit: 10
deepspeed: 'stllm/train/zero2.json'

deepspeed: 'stllm/train/zero3.json'

deepspeed: 'stllm/train/zero3_offload.json'

@zhanghang-official
Copy link
Author

训练机器是8卡A10040G

@farewellthree
Copy link
Collaborator

你好,可以康康是不是visual encoder,qformer或是LLM初始化出了问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants