Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modulo and unsigned divisions by constant #73

Open
RamonUnch opened this issue Apr 8, 2022 · 2 comments
Open

Modulo and unsigned divisions by constant #73

RamonUnch opened this issue Apr 8, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@RamonUnch
Copy link
Contributor

RamonUnch commented Apr 8, 2022

FastDoom could benefit from the optimisation of the modulo % operation. Internally WC generates an idiv operation.
The optimization is similar to the divisions by a constant because it is almost the same thing:
ie:

Modulo 10 for signed ints, according to GCC (fastcall convention)
int Mod10(int x)
    mov    eax, 1717986919
    imul   ecx
    mov    eax, edx
    mov    edx, ecx
    sar    edx, 31
    sar    eax, 2
    sub    eax, edx
    lea    edx, [eax+eax*4]
    mov    eax, ecx
    add    edx, edx
    sub    eax, edx
    ret
; Unsigned version:
unsigned ModU10(unsigned eax)
    mov    eax, ecx
    mov    edx, -858993459
    mul    edx
    mov    eax, edx
    shr    eax, 3
    lea    edx, [eax+eax*4]
    mov    eax, ecx
    add    edx, edx
    sub    eax, edx
    ret

There are very few places where the x % constant is used and it would almost not affect much performances.

Also there are some missing optimizations: signed and unsigned divisions should be separated and when you are dividing a value you know to be positive (ie: monster damage), then an unsigned variant of the division should be used:
example for unsigned Div10:

; Again fastcall convention:
unsigned Div10u(unsigned x)
	mov	eax, ecx
	mov	edx, -858993459
	mul	edx
	mov	eax, edx
	shr	eax, 3
	ret

unsigned versions save a few instructions and mul is faster than imul (presumably?)
I simply use GCC with -O2 -march=i386 and -mtune=generic
Those should not be hard to inline with OpenWatcom.
EDIT: I did not check properly all instances of DivXX() clls, maybe you never need unsigned versions...

@viti95 viti95 added the enhancement New feature or request label Apr 8, 2022
@maxxoccupancy
Copy link

For modulo by constants, masking the upper bits accomplishes that for powers of two: 2, 4, 8.... 256, etc. It may be worth it to build a separate lookup table for other values. Multiplication and division on these old, 1970s microprocessors was so slow that some CISCs didn't even bother with the microcode for them, let alone dedicated hardware.

@RamonUnch
Copy link
Contributor Author

for modulo by a power of two WC already makes the optimization IIRC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants