Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There are several functions in the FilteringFunctions that should not have DSP extended macros. #193

Open
JoleenSun opened this issue Jul 4, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@JoleenSun
Copy link

JoleenSun commented Jul 4, 2024

P extension instruction does not work with floating point types.
But function arm_conv_f32/arm_conv_partial_f32/arm_correlate_f32/arm_correlate_f16 has "#if defined(ARM_MATH_DSP)" ,which impacts on the performance when testing without any extension but the "ARM_MATH_LOOPUNROLL" is defined.

@christophe0606 christophe0606 added the review Under review label Jul 4, 2024
@christophe0606
Copy link
Contributor

@JoleenSun I agree that the test with ARM_MATH_DSP is not useful here and could be removed (What is tested in fact here is that we want a function that may be unrolled or not).

So the problem is : you'd like to have a function that is not impacted by ARM_MATH_LOOP_UNROLL. Because if I just disable the ARM_MATH_DSP cases when ARM_MATH_LOOP_UNROLL is defined, then it selects the C version of the function where the loop is never unrolled (whatever the value of ARM_MATH_LOOP_UNROLL).

Unfortunately, the effect of ARM_MATH_LOOP_UNROLL and the manual unrolling is highly compiler dependent. Most of the times it improves the performances but there are some cases where it does not.

And relying on the compiler to do the automatic unrolling has shown (in our tests) that most of the case it is even less predictable that doing the manual unrolling.

So, we will always have cases where enabling ARM_MATH_LOOP_UNROLL may decrease the performance of a specific function on a specific version of a compiler.

If it is a case where it always decrease the performance (whatever the compiler version) then the manually unrolled version should be removed. But it would require tests with several compiler versions to be really sure.

@JoleenSun
Copy link
Author

JoleenSun commented Jul 5, 2024

@christophe0606 Thanks for your reply.

I think I get what you mean. But I have another question: why can't we remove the "ARM_MATH_DSP "macro but keep "ARM_MATH_LOOP_UNROLL", so that the C version have both a version with loop_unrolling and a version without.

In this case, maybe we only need to test the two C versions both without loop unrolling: based on the current fuction, a version is define the "ARM_MATH_DSP " but disable the "ARM_MATH_LOOP_UNROLL", the other is just disable the "ARM_MATH_DSP "(also disable RISCV_MATH_VECTOR).

@christophe0606
Copy link
Contributor

@JoleenSun I can remove the ARM_MATH_DSP in this case (and I'll tag this issue as enhancement so that this change is done at some point).

Now the problem is that `ARM_MATH_LOOP_UNROLL" applies to several functions. So if you unset it because it gives better performance for this function perhaps it will give worse performance for another function you use.

One should have a better granularity and be able to set this for each function. But I have not yet found a developer friendly way to do it.

@christophe0606 christophe0606 added enhancement New feature or request and removed review Under review labels Jul 5, 2024
@JoleenSun
Copy link
Author

Thanks!
Well, maybe I didn't express myself clearly, sorry for my poor English.

I just mean that the "ARM_MATH_DSP " maybe can be removed for functions with float-type(in which case "ARM_MATH_LOOP_UNROLL" also can be reserved), not desiring the granularity of "ARM_MATH_LOOP_UNROLL" needs to be small enough as can be set for each function separately.

And all I said is just related to functions arm_conv_f32/arm_conv_partial_f32/arm_correlate_f32/arm_correlate_f16.

@christophe0606
Copy link
Contributor

@JoleenSun I think I understand the issue. But i think it is not just restricted to arm_conv_f32/arm_conv_partial_f32/arm_correlate_f32/arm_correlate_f16 since the ARM_MATH_LOOP_UNROLL is everywhere.

But for the float function, I totally agree than ARM_MATH_DSP is useless and will have to be removed.

Anyway, your feedback is taken into account and I have tagged this issue as enhancement.

@JoleenSun
Copy link
Author

Haha, thank you so much.

Actually, the point is that the “ARM_MATH_DSP ” but not "ARM_MATH_LOOP_UNROLL ", as long as the use of "ARM_MATH_LOOP_UNROLL "is not bound to the "ARM_MATH_DSP ", and can be set independently, the current adaptation and support are already perfect(personal opinion ^_^ ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants