-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance enhancements #139
Conversation
Could you please rebase onto experimental? |
Odd, I thought this was based on current experimental, it looks like that to me? |
Any updates here? |
Sorry - nothing much has happened this month - other distractions kicked in! I'll try and get back to this next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just some minor ideas, nothing dramatic …
CMakeLists.txt
Outdated
app/calcmask.cc | ||
app/utils.cc | ||
) | ||
# special compile options for blend.cc to use SIMD optimiser | ||
set_source_files_properties(app/blend.cc PROPERTIES COMPILE_OPTIONS "-march=native;-O9") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not for everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure.. I guess that if one module is targetted / optimised for the current CPU, then it may as well all be. Initially I only wanted to mess with / break one specific part of the code. I was also thinking about compiling run-time swappable implementations for different targets (this gets talked about a bit in the GCC manuals when using intrinsics) - that is probably over-engineering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as we are compiling all 3rd party deps ourselves we may as well compile everything for -march=native. Let's split this into another PR, so we could merge this one for now.
Having run-time swappable implementations is probably overkill for us right now. Let's handle this separately.
Please try to always rebase on the target branch instead of merging things … |
Specifically to
alpha_blend
andconvert_rgb_to_yuyv
as per #138This first draft addresses
alpha_blend
, reducing run time by a factor of 3-4x simply by separating the function and turning on full compiler optimisation for the machine that we are compiling on (-march=native
). This may be a problem if compiling for a different target that doesn't have the same CPU features...