[Patch] New 3*MOV -> XCHG optimisation

Original Reporter info from Mantis: CuriousKit @CuriousKit

Reporter name: J. Gareth Moreton

Description:

This patch adds a new optimisation into OptPass2MOV that looks for the following sequence:

mov %reg1,%reg2
mov %reg3,%reg1
mov %reg2,%reg3
(With %reg2 not used afterwards)

Change to

xchg %reg3,%reg1

There are a few restrictions though - see Additional Information.

Steps to reproduce:

Apply patch and confirm correct compilation and successful regression testing under i386 and x86_64 platforms.

Additional information:

- Optimisation is done as xchg %reg3,%reg1 instead of xchg %reg1,%reg3 because XCHG is commutative and it means only having to change the middle MOV instruction's opcode and delete the rest, rather than modifying operands as well (i.e. it runs slightly faster!).
- Optimisation is undertaken in pass 2 so the MOVs can be optimised as deeply as possible first.
- Optimisation is always performed when optimising for size.
- On i8086, when optimising for speed, the optimisation is only performed if the optimiser's target is Pentium M.
- On i386, when optimising for speed, the optimisation is only performed if the optimiser's target is Pentium M or higher.
- On x86_64, when optimising for speed, the optimisation is only performed if the optimiser's target is Core I or higher.

The reason being is that these architectures were the first where XCHG's latency was reduced from 3 to 2, thereby allowing it to be equal in speed to the three MOV operations (the first two MOVs can run simultaneously because there isn't a dependency chain between them).

Mantis conversion info:

Mantis ID: 36511
OS: Microsoft Windows
OS Build: 10 Professional
Build: r43854
Platform: i386 and x86_64
Version: 3.3.1
Fixed in version: 3.3.1
Fixed in revision: 43858 (#73c6cab0)
Monitored by: » Vincent (Vincent Snijders)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information