[Patch] Optimization for 'mod'
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
Currently, the Free Pascal Compiler makes very little effort to optimise expressions that use the 'mod' operator, complacent to use the relatively slow DIV operand. The attached patch will optimize operations that use unsigned operands, more specifically with constant denominators. While the output code is larger, the execution speed is over twice as fast and sometimes over four times faster when dealing with 64-bit integers.
The algorithm used is the same as that used for 'div', by calculating a reciprocal constant to multiply the numerator by. The patch takes it one step further by multiplying the result by the original denominator and then subtracting that product from the original numerator to calculate the remainder.
For constants between $80000001 and $FFFFFFFF for 32-bit unsigned denominators, a simple comparison algorithm is used, although it makes use of CMOV in order to avoid jumps - if the operation is not available (it checks "CPUX86_HAS_CMOV in cpu_capabilities[current_settings.cputype]"), then it will fall back to the default 'mod' algorithm that exists already.
NOTE: Only unsigned operations are currently optimised - signed operations are not affected. How you set the result also affects this (e.g. doing something like "WriteLn(X mod 1000)" will assume a signed result even if X is unsigned) - safest way around this is to do "Y := X mod 1000;", for example, where X and Y are of type LongWord or QWord.
WARNING: Attempting to calculate "Y := X mod $FFFFFFFFFFFFFFFF", where X and Y are of type QWord, will raise an EIntOverflow exception at run-time; this may be a separate compiler bug.
Steps to reproduce:
Compile any short program that utilises the 'mod' operator with a constant denominator (the numerator should be a variable so the compiler doesn't optimize the entire expression into a single constant). Observe the compiled code. Then do the same thing with the patch applied. Note more complex but better optimized code.
Additional information:
Find attached two test projects to compare speed metrics, correct output, and compiled assembly with. Also, a snapshot of calculations and the average computation time over 1,000,000 iterations:
LongWord without patch | with patch
995 mod 3 5.283 ns | 1.513 ns
1000000 mod 1000 5.380 ns | 1.681 ns
2147483650 mod 2147483647 6.454 ns | 2.508 ns
QWord without patch | with patch
995 mod 3 9.580 ns | 1.546 ns
1000000 mod 1000 9.759 ns | 2.535 ns
4294967297 mod 2147483647 9.759 ns | 2.169 ns
Currently tested with a range of unsigned 32-bit values and 64-bit values.
Mantis conversion info:
- Mantis ID: 32945
- OS: Windows 7 (64-bit)
- OS Build: Enterprise
- Build: x86_64-win64-win32/win64
- Platform: Win64
- Version: 3.1.1
- Fixed in version: 3.1.1
- Fixed in revision: 37922 (#81b2cf5d)
- Monitored by: » @CuriousKit (J. Gareth Moreton)