You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question about a use case of flex attention.
I am wondering if it is possible to write to some globally scoped tensor the way that the alibi bias example in the link above reads from a globally scoped tensor. From the flex attention intro page: https://pytorch.org/blog/flexattention/ alibi bias should be implemented as follows:
The example initializes a globally scoped tensor, and then reads from it within the score_mod function. But, say I wanted to retrieve all scores from the attention to plot the attention matrix, or maybe sum the columns of the attention matrix to modify a KV cache eviction policy, or some other use case that requires writing some function of the scores to a tensor.
Is it possible to accomplish this by writing to a globally globally scoped tensor in the same way that the alibi example can read from one? I tried the following, but it didn’t work. Is there a way to accomplish this with flex attention?
I have a question about a use case of flex attention.
I am wondering if it is possible to write to some globally scoped tensor the way that the alibi bias example in the link above reads from a globally scoped tensor. From the flex attention intro page: https://pytorch.org/blog/flexattention/ alibi bias should be implemented as follows:
The example initializes a globally scoped tensor, and then reads from it within the score_mod function. But, say I wanted to retrieve all scores from the attention to plot the attention matrix, or maybe sum the columns of the attention matrix to modify a KV cache eviction policy, or some other use case that requires writing some function of the scores to a tensor.
Is it possible to accomplish this by writing to a globally globally scoped tensor in the same way that the alibi example can read from one? I tried the following, but it didn’t work. Is there a way to accomplish this with flex attention?
The code above fails with the following trace:
The text was updated successfully, but these errors were encountered: