[Patch] Speed boosts to linear list case blocks
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
This patch makes a number of minor improvements to the compiler where linear lists are concerned with case blocks. Benchmarks on "tests/bench/bcase.pp" suggest a 5% speed increase (although this improvement is saturated by the jump tree and jump table tests, which are not affected by these changes).
Steps to reproduce:
Compile "tests/bench/bcase.pp" with the trunk compiler, then apply the patch and compile the benchmark test again. Observe reduction in total time and also confirm satisfactory code quality.
(Note that the first six "Domain of 1024" tests utilise linear comparison lists due to the presence of negative values in the enumeration)
Additional information:
The improvements to the linear list:
- The lower bound check and first subtraction are merged into a single subtraction operation, since they always contained the same value.
- Except for the last 4, single-value branches now have a "jl" or "jb" conditional jump to immediately go to the else block after each value (at the last 4, it's faster to just go through the last few comparisons and save the CPU from having to deal with extra branch predictions). For case blocks with a lot of single-value branches, this saves the program from stepping through each one in turn until it reaches a range or the end of the block.
- For linear lists and linear comparison lists, the compiler attempts to insert the last branch block first (it won't if the label has been shortcut), since this allows for the elimination of a jump and alignment hint (enough to cause "tests/bench/bcase.pp" to be 512 bytes smaller). To accommodate for this, the "shortcut" field has been removed from TCaseBlock and replaced with a "Flags" field that can hold more information (see "compiler/nset.pas").
- Some refactoring and optimisations in the genlinearlist routine for x86 so programs compile faster. Main foci are in minimising the number of comparisons done against variables of type TConstExprInt, and also attempting to save the peephole optimiser some work (e.g. directly inserting "test %reg, %reg" when it would otherwise insert "cmp 0, %reg"), since this reduces the number of function calls and increases the chance of reducing the number of optimiser passes on -O3.
- The "last branch first" policy on linear comparison lists is cross-platform, but currently written in a way that requires a platform-specific peephole optimiser to finish the job.
Mantis conversion info:
- Mantis ID: 34859
- OS: Microsoft Windows
- OS Build: 10 Professional
- Build: x86_64-win64
- Platform: x86_64-win64
- Version: 3.3.1
- Monitored by: » @xhajt03 (Tomas Hajny), » AntonK (Anton Kavalenka)
- Target version: 3.3.1