pytorch-labs / attention-gym Public

Notifications You must be signed in to change notification settings
Fork 22
Star 438

Code
Issues 33
Pull requests 1
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: pytorch-labs/attention-gym

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

33 Open 21 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Test with random cross attention

#67 opened Oct 29, 2024 by ssmmnn11

How to manually check if one position or row has correct masking?

#66 opened Oct 28, 2024 by Leo-T-Zang

Selection of BLOCK_SIZE in create_block_mask

#65 opened Oct 23, 2024 by tsrikris

How to reason about efficiency of different score/mask mod functions

#63 opened Oct 22, 2024 by alex-hh

FlexAttention Output Differs from SDPA

#62 opened Oct 22, 2024 by chayut-t

How to do KV Cache with FlexAttention and BlockMask by slicing?

#60 opened Oct 21, 2024 by Leo-T-Zang

A simple adaption to Jax

#59 opened Oct 21, 2024 by zinccat

What is the best practice to save and load a BlockMask object?

#58 opened Oct 20, 2024 by complexfilter

Optimal ordering with block mask

#56 opened Oct 19, 2024 by francois-rozet

What is the expected gpu memory performance drop wrt flash attention with block masks?

#54 opened Oct 19, 2024 by arilato

FlexAttention results do not match FlashAttention results

#50 opened Oct 7, 2024 by tilmto

Two errors: (1) NameError: ModularIndexing is not defined & (2) LoweringException: AttributeError: 'View' object has no attribute 'get_stride'

#45 opened Sep 23, 2024 by tobiasvanderwerff

Distributed Attention Methods

#44 opened Sep 20, 2024 by tsrikris

CUDA OOM Issue When Using Approx Tanh with softcapping score mod

#43 opened Sep 18, 2024 by kebijuelun

[Feature request] End-to-end transformer example with flex attention enhancement

New feature or request

#42 opened Sep 16, 2024 by vladkvit

How to avoid re-compute mask

#34 opened Sep 5, 2024 by NonvolatileMemory

Dynamic shape compilation support for flex attention with block mask

#33 opened Aug 28, 2024 by SamGalanakis

Support varied input sequence lengths with a fixed block mask question

Further information is requested

#31 opened Aug 27, 2024 by tilmto

It seems that visualize_attention_scores can only visualize either mask-mod-only or score-mod-only enhancement

New feature or request

good first issue

Good for newcomers

#29 opened Aug 23, 2024 by XinDongol

error: 'tt.broadcast' op requires the same encoding for all operands and results for local window attention

#26 opened Aug 19, 2024 by fteufel

Does FlexAttention Support torch.vmap?

#25 opened Aug 17, 2024 by MiladInk

[flex_attention] Softcap perf questions question

Further information is requested

#22 opened Aug 16, 2024 by meshtag

V100 GPUs supported ? question

Further information is requested

#21 opened Aug 15, 2024 by boren-ms

Bias gradient support?

#20 opened Aug 15, 2024 by ardagoreci

Writing to a globally scoped tensor from score_mod function

#19 opened Aug 15, 2024 by jeffwillette

Previous 1 2 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly