[Patch] AddMov2LeaAdd and -Os change for SubMov2LeaSub
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
This patch adds the logical partner to the SubMov2LeaSub optimisation, namely AddMov2LeaAdd. It also modifies SubMov2LeaSub slightly to be selective in when to apply the optimisation under -Os (size over speed) settings, in that it will apply the optimisation if it can remove the Sub instruction (the same goes for the Add instruction in this new optimisation).
Tested on i386-win32 and x86_64-win64 and confirmed no regressions in test suite when run normally and when run with "-O4 -gl" options.
Steps to reproduce:
Apply patch and confirm correct compilation.
Additional information:
There is a slight risk of the optimisation causing diminishing returns when applied multiple times in a row. For example, in the System unit under -O4:
.Lj749:
addq $1,%rax
movq %rax,%rcx
# Peephole Optimization: %rcx = %rax; changed to minimise pipeline stall (MovXXX2MovXXX)
movq %rax,%rdx
shrq $3,%rdx
andq $7,%rcx
movzbl (%rbx,%rdx),%r8d
...
Becomes...
.Lj749:
leaq 1(%rax),%rcx
# Peephole Optimization: AddMov2LeaAdd
leaq 1(%rax),%rdx
# Peephole Optimization: AddMov2LeaAdd
addq $1,%rax
shrq $3,%rdx
andq $7,%rcx
movzbl (%rbx,%rdx),%r8d
...
There comes a point where there is no speed gain because the processor runs out of available AGUs (used for LEA), while ALUs (used for MOV and ADD) are more numerous (Broadwell CPUs, for example, have 4 integer ALUs and 3 AGUs). Such a diminishing return is an edge case though, and might be possible to mitigate in the post-peephole stage by, say, changing ADD or MOV instructions to LEA or vice versa in order to balance out the AGU and ALU usage. In the example given though, it is likely there will still be a gain of 1 cycle, since the ADD instruction will run at the same time as the two LEA instructions, or simultaneously with the SHR and AND instructions.
Mantis conversion info:
- Mantis ID: 38579
- OS: Microsoft Windows
- OS Build: 10 Home
- Build: r48871
- Platform: i386 and x86_64
- Version: 3.3.1
- Fixed in version: 3.3.1
- Fixed in revision: 48989 (#612f0637)