[Patch] Jump optimisations in code generator
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
As part of a rewrite for the x86-64 optimiser overhaul, here are some largely cross-platform optimisations to jumps and jump chains. The attached PDF file (even though that itself could use a revision) explains the individual changes and optimisations, but overall it seeks to improve the quality of conditional and unconditional jumps in generated code.
Note that "jump-optimizations.patch" requires "jump-optimizations-condition_in.patch" to work, due to the introduction of the new "condition_in" function that requires platform-specific implementations.
Steps to reproduce:
Apply patches and compile. Confirm correct functionality of compiled binaries and no regressions in test suite.
(Non-Intel platforms will need extensive testing)
Additional information:
Though not intended, there seems to be a major improvement in compile time under x86_64-win64. This is possibly due to label stripping and other optimisations that remove entries from the linked list of instructions etc, thereby reducing the time it takes for subsequent passes to analyse a procedure.
Time to compile (not link) Lazarus under trunk:
[72.883] 1304078 lines compiled, 72.9 sec
[74.453] 1304078 lines compiled, 74.5 sec
[82.164] 1304078 lines compiled, 82.2 sec
Time to compile (not link) Lazarus with jump optimisations:
[66.648] 1304078 lines compiled, 66.6 sec
[65.609] 1304078 lines compiled, 65.6 sec
[64.695] 1304078 lines compiled, 64.7 sec
----
The compiled and linked Lazarus binary is smaller as well due to the additional stripping of unnecessary alignment hints and finding new optimisations etc:
20,110,336 (trunk)
20,092,416 (jump optimisations)
----
The compilation of Lazarus source file "lazarus\components\codetools\basiccodetools.pas" shows a wide range of improvements and is a good showcase for the many jump optimisations - for example:
Trunk:
...
.Lj2729:
movslq %r8d,%r9
subq %r9,%rdx
leaq 1(%rdx),%r9
cmpl %r9d,%r11d
jge .Lj2732
.p2align 4,,10
.p2align 3
movl %r11d,%r9d
.Lj2732:
# Peephole Optimization: MovTestJxx2MovTestJxx done
movq %rcx,%rdx
testq %rcx,%rcx
...
Jump optimisations;
.Lj2729:
movslq %r8d,%r9
subq %r9,%rdx
leaq 1(%rdx),%r9
cmpl %r9d,%r11d
cmovngel %r11d,%r9d
# Peephole Optimization: MovTestJxx2MovTestJxx done
movq %rcx,%rdx
testq %rcx,%rcx
...
(Note that label .Lj2732 has been removed completely)