Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add unbalanced param_sync example. #126

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

add unbalanced param_sync example. #126

wants to merge 8 commits into from

Conversation

charles9304
Copy link
Collaborator

No description provided.

haolin-nju
haolin-nju previously approved these changes Oct 18, 2024
Copy link
Collaborator

@haolin-nju haolin-nju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


if __name__ == "__main__":
chatlearn.init()
args = chatlearn.get_args()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以像https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/tests/test_parameter_sync.py#L37 一样设置debug=True,另外 parameter_sync 文件中的 validate 函数是不是还不支持

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate 函数单独提一个来做校验,这个用来测试第一个episode的str outputs是否正确

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

reward_load_iteration=${REWARD_LOAD_ITERATION} \
reward_load=${REWARD_LOAD} \
tokenizer_model=${TOKENIZER_MODEL} \
num_episode=${num_ppo_episode:-0} \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方可以设置成2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同下。

reward_load=${REWARD_LOAD} \
tokenizer_model=${TOKENIZER_MODEL} \
num_episode=${num_ppo_episode:-0} \
data_path=${DATASET_PATH} \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

设置环境变量 validate_param_sync 为True

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个不是e2e的测试,只是测试一次param sync的正确性。e2e的得换成rlhf或其他alignment格式的逻辑。多个episode会有ppo_policy forward step参数不足的问题

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_param_sync 这个参数只是在parameter sync的时候触发 validate函数,和是否是e2e无关

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

haolin-nju
haolin-nju previously approved these changes Oct 28, 2024
Copy link
Collaborator

@haolin-nju haolin-nju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants