[Patch] New 3*MOV -> XCHG optimisation
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
This patch adds a new optimisation into OptPass2MOV that looks for the following sequence:
mov %reg1,%reg2
mov %reg3,%reg1
mov %reg2,%reg3
(With %reg2 not used afterwards)
Change to
xchg %reg3,%reg1
There are a few restrictions though - see Additional Information.
Steps to reproduce:
Apply patch and confirm correct compilation and successful regression testing under i386 and x86_64 platforms.
Additional information:
- Optimisation is done as xchg %reg3,%reg1 instead of xchg %reg1,%reg3 because XCHG is commutative and it means only having to change the middle MOV instruction's opcode and delete the rest, rather than modifying operands as well (i.e. it runs slightly faster!).
- Optimisation is undertaken in pass 2 so the MOVs can be optimised as deeply as possible first.
- Optimisation is always performed when optimising for size.
- On i8086, when optimising for speed, the optimisation is only performed if the optimiser's target is Pentium M.
- On i386, when optimising for speed, the optimisation is only performed if the optimiser's target is Pentium M or higher.
- On x86_64, when optimising for speed, the optimisation is only performed if the optimiser's target is Core I or higher.
The reason being is that these architectures were the first where XCHG's latency was reduced from 3 to 2, thereby allowing it to be equal in speed to the three MOV operations (the first two MOVs can run simultaneously because there isn't a dependency chain between them).
Mantis conversion info:
- Mantis ID: 36511
- OS: Microsoft Windows
- OS Build: 10 Professional
- Build: r43854
- Platform: i386 and x86_64
- Version: 3.3.1
- Fixed in version: 3.3.1
- Fixed in revision: 43858 (#73c6cab0)
- Monitored by: » Vincent (Vincent Snijders)