-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected timings lu vs precision #82
Comments
Float64's bad performance is a Julia 1.10 bug. julia> speed_test(2048)
Time for Float64 = 0.037588217
Time for Float32 = 0.01992773
Time for Float16 = 1.875628065
julia> speed_test(4096)
Time for Float64 = 0.309555341
Time for Float32 = 0.11584808
Time for Float16 = 14.296447513
julia> speed_test(8192)
Time for Float64 = 2.595502994
Time for Float32 = 1.237905769
Time for Float16 = 113.31325847 Julia master: julia> speed_test(2048)
Time for Float64 = 0.183237959
Time for Float32 = 0.019942183
Time for Float16 = 0.278549435
julia> speed_test(4096)
Time for Float64 = 0.864374881
Time for Float32 = 0.104797648
Time for Float16 = 1.54903296
julia> speed_test(8192)
Time for Float64 = 6.382979536
Time for Float32 = 1.499058728
Time for Float16 = 14.409940574 I am now rebuilding to see if this was fixed by JuliaLang/julia@bea8c44 |
The julia> speed_test(2048)
Time for Float64 = 0.035810193
Time for Float32 = 0.016416317
Time for Float16 = 0.24953308
julia> speed_test(4096)
Time for Float64 = 0.297429925
Time for Float32 = 0.131990487
Time for Float16 = 1.526133223
julia> speed_test(8192)
Time for Float64 = 3.204072447
Time for Float32 = 1.394537522
Time for Float16 = 13.691441064 As for why A secondary issue is that julia> using VectorizationBase
julia> VectorizationBase.pick_vector_width(Float16)
static(16)
julia> VectorizationBase.pick_vector_width(Float32)
static(16)
julia> VectorizationBase.pick_vector_width(Float64)
static(8) You'd probably get |
Is I looked at LoopVectorization, but don't understand what "vector_width" means or how to change it. This one is far above my pay grade. |
Within.
Vector width is the width of the SIMD vectors it uses. |
I made the change to
I am in over my head. Is there something simple I can to to fix this. If there's a line or two in a file that I can change, I will do it if you will tell me what the lines and files are. |
I don't know. The vast majority of the work is knowing the lines, not making the changes. In this case, you could look at the stack trace and try to see why ERROR: MethodError: no method matching VectorizationBase.VecUnroll(::Tuple{
VectorizationBase.Vec{4, Float16},
VectorizationBase.Vec{4, Float32},
VectorizationBase.Vec{4, Float32},
VectorizationBase.Vec{4, Float32}
}) we have only one |
Upon further review ... This is on 1.9.2. I looked at the lines in I see that
but do not understand what is happening there. julia> A=rand(Float16,5,5); julia> AF=RecursiveFactorization.lu!(A); julia> B=rand(Float16,512,512); julia> BF=RecursiveFactorization.lu!(B); ERROR: MethodError: no method matching VectorizationBase.VecUnroll(::Tuple{VectorizationBase.Vec{4, Float16}, VectorizationBase.Vec{4, Float32}, VectorizationBase.Vec{4, Float32}, VectorizationBase.Vec{4, Float32}}) Closest candidates are: Stacktrace: julia> |
Hi, I'm playing with RecursiveFactorization and am finding that single precision factorizations are much faster than the cubic scaling would predict. Float16 seems slower than the prediction as well. I was hoping the Apple silicon and Julia support for Float16 in hardware would show better, but it is still far better than using OpenBlas LU in Float16.
I'd think that time(double) = 2 x time(single) = 4 x time(half)
would be the case for larger problems, but am not seeing that. Is there an algorithmic reason for this? Should I be setting parameters in the factorization to something other than the default values?
I ran into this while using this package for my own work.
This post is my happy story.
If you have insight or advice, I'd like that. The results come from a M2 Pro Mac and Julia 1.10-beta 1.
I ran this script (see below) to factor a random matrix of size N at three precisions and got this
The text was updated successfully, but these errors were encountered: