Skip to content

Commit

Permalink
Enable better vectorization for generic convolution
Browse files Browse the repository at this point in the history
Break the single dependence chain into two parallel sub-chains.
Provides 2-4% performance uplift as measured on modern ARM systems
when using the generic codepath.
  • Loading branch information
heshpdx committed Apr 24, 2024
1 parent 49ab34d commit 1fd5cd8
Showing 1 changed file with 692 additions and 641 deletions.
Loading

0 comments on commit 1fd5cd8

Please sign in to comment.