Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt:opt ltor masks #1155

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

opt:opt ltor masks #1155

wants to merge 1 commit into from

Conversation

Baibaifan
Copy link

@Baibaifan Baibaifan commented Sep 24, 2024

Problem:

In Megatron-LM, there is a memory bottleneck when using the reset attention mask to construct long sequences. The following code: (_get_ltor_masks_and_position_ids)
image

When a seq_len consists of multiple short documents, there will be multiple values ​​in eod_index. Each value means that attention_mask needs to be accessed and loaded once, and the corresponding position is assigned 0. For example, in the 32k scenario, there are multiple assignments in extreme scenarios, which makes the data loading time very slow. As the sequence length increases, the number of positions that need to be assigned increases, and the time consumption will be longer. As shown in the figure below.
image

Solution:

Perform a tensor access using the block_diag value.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant