why is xformers not used for attention computation? #608

jason718 · 2024-10-09T23:21:23Z

Curious why xformers is not used? Is it for simplicity or is there performance reason.

awgu · 2024-10-09T23:34:06Z

F.scaled_dot_product_attention calls into flash or memory efficient attention depending on some factors (should be mainly flash for the torchtitan case iiuc). Are there other ops that you have in mind?

casper-hansen · 2024-10-11T10:53:35Z

@awgu It looks like xformers has support for Flash Attention v3 starting from 0.0.28 (flash3.FwOp and flash3.BwOp). Could bring extra training efficiency for Hopper arch as it's not implemented in pytorch yet.

As I read it from the blog, this brings a 1.6x-1.8x speedup over FAv2.

awgu · 2024-10-11T15:13:51Z

@casper-hansen Makes sense!

I guess it should not be too hard for users to install xformers and replace the F.scaled_dot_product_attention_call with the xformers attention call. This should work as long as the xformers attention is torch.compile-compatible, which I recall it is.

Since torchtitan is mainly for showing an example of how to set this kind of distributed training up, I think including xformers attention is not as important as showing what is achievable with torch native.

Chillee · 2024-10-13T08:53:06Z

@casper-hansen On H100, F.scaled_dot_product_attention calls into CuDNN attention, which has a much smaller gap in performance with FA3.

tianyu-l added the question Further information is requested label Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why is xformers not used for attention computation? #608

why is xformers not used for attention computation? #608

jason718 commented Oct 9, 2024

awgu commented Oct 9, 2024

casper-hansen commented Oct 11, 2024

awgu commented Oct 11, 2024

Chillee commented Oct 13, 2024

why is xformers not used for attention computation? #608

why is xformers not used for attention computation? #608

Comments

jason718 commented Oct 9, 2024

awgu commented Oct 9, 2024

casper-hansen commented Oct 11, 2024

awgu commented Oct 11, 2024

Chillee commented Oct 13, 2024