[Patch] SHL-centric peephole optimisations
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
The patch attached contains a pair of optimisations centred around the SHL instruction.
----
If a MOVSX, MOVSXD or MOVZX instruction is immediately followed by a SHL instruction, it is removed if the SHL instruction completly overwrites the extended bits, since the instruction is unnecessary - e.g:
movslq %esi,%rsi
shlq $32,%rsi
becomes just "shlq $32,%rsi".
----
If a SHL is followed by an AND (or a MOV and an AND for 64-bit operands), and the FLAGS register isn't used afterwards, the AND instruction is removed if it has no effect on the value of the register - e.g:
shlq $32,%rsi
movq $-4294967296,%rax
andq %rax,%rsi
Becomes just "shlq $32,%rsi", since the constant $-4294967296, or 0xFFFFFFFF00000000, does not change the value of %rsi because the SHL instruction ensures the lower 32 bits are already zero.
Steps to reproduce:
Apply patch and confirm correct compilation with reduced binary size with no loss of performance.
Additional information:
It's hard to determine the effect of false dependencies with the first optimisation. If expanding from 32-bit to 64-bit, there is no false dependency because the upper 32-bits are guaranteed to be set to zero beforehand. In other situations, given that the SHL instruction depends on the MOVS/ZX instruction, the performance should be identical.
If a SHL/AND (or SHL/MOV/AND) instruction group cannot be optimised because the mask changes the register value, the mask is nonetheless modified based on what bits are set to zero by the SHL instruction (e.g. if shifting left by 32 and then anding by 0x7FFFFFFF000000FF, it is changed to 0x7FFFFFFF00000000 because the lower 32 bits will always be zero). The aim of this is to potentially improve other optimisations involving AND instructions.
Mantis conversion info:
- Mantis ID: 37389
- OS: Microsoft Windows
- OS Build: 10 Professional
- Build: r45802
- Platform: i386 and x86_64
- Version: 3.3.1
- Fixed in version: 3.3.1
- Fixed in revision: 45811 (#09125e83)