opt:opt ltor masks #1155

Baibaifan · 2024-09-24T06:51:41Z

Problem:

In Megatron-LM, there is a memory bottleneck when using the reset attention mask to construct long sequences. The following code: (_get_ltor_masks_and_position_ids)

When a seq_len consists of multiple short documents, there will be multiple values in eod_index. Each value means that attention_mask needs to be accessed and loaded once, and the corresponding position is assigned 0. For example, in the 32k scenario, there are multiple assignments in extreme scenarios, which makes the data loading time very slow. As the sequence length increases, the number of positions that need to be assigned increases, and the time consumption will be longer. As shown in the figure below.

Solution:

Perform a tensor access using the block_diag value.

opt:opt ltor masks

ed80199

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt:opt ltor masks #1155

opt:opt ltor masks #1155

Baibaifan commented Sep 24, 2024 •

edited

Loading

opt:opt ltor masks #1155

Are you sure you want to change the base?

opt:opt ltor masks #1155

Conversation

Baibaifan commented Sep 24, 2024 • edited Loading

Problem:

Solution:

Baibaifan commented Sep 24, 2024 •

edited

Loading