-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add unbalanced param_sync example. #126
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
if __name__ == "__main__": | ||
chatlearn.init() | ||
args = chatlearn.get_args() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以像https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/tests/test_parameter_sync.py#L37 一样设置debug=True,另外 parameter_sync 文件中的 validate 函数是不是还不支持
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate 函数单独提一个来做校验,这个用来测试第一个episode的str outputs是否正确
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
reward_load_iteration=${REWARD_LOAD_ITERATION} \ | ||
reward_load=${REWARD_LOAD} \ | ||
tokenizer_model=${TOKENIZER_MODEL} \ | ||
num_episode=${num_ppo_episode:-0} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个地方可以设置成2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同下。
reward_load=${REWARD_LOAD} \ | ||
tokenizer_model=${TOKENIZER_MODEL} \ | ||
num_episode=${num_ppo_episode:-0} \ | ||
data_path=${DATASET_PATH} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
设置环境变量 validate_param_sync 为True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个不是e2e的测试,只是测试一次param sync的正确性。e2e的得换成rlhf或其他alignment格式的逻辑。多个episode会有ppo_policy forward step参数不足的问题
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate_param_sync 这个参数只是在parameter sync的时候触发 validate函数,和是否是e2e无关
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.