[Patch] ARM/AArch64 Some short-range LDR/STR optimisations
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
The "ldrstr.patch" file provides some short-term optimisations for LDR and STR instructions that removes unnecessary instructions (e.g. storing a register to memory, then loading from the same address to the same register). These optimisations are performed over all ARM platforms, although it fixes a minor bug in the RedundantMovProcess routine for AArch64 (this optimisation often occurs after the new "load/load -> load/move" optimisation is made).
The "peephole-string.patch" seeks to homogenise the optimisation comments that appear when DEBUG_AOPTCPU is declared, prepending all such messages with the SPeepholeOptimization string constant, much like the x86 implementations.
Steps to reproduce:
Apply patch and confirm correct compilation on all ARM and AArch64 platforms.
Additional information:
The two patches share a hunk (the declaration of SPeepholeOptimization) and a single rejection will occur when they are applied together. This won't cause a bad merge.
I confess that this patch hasn't been fully tested on Arm-32 platforms due to technical reasons - third-party testing would be required.
Some examples of the optimisations under aarch64-linux:
In the Sysutils unit - before:
strh w2,[x0]
ldrh w0,[x0]
ldp x29,x30,[sp], 16
ret
.Le429:
After:
strh w2,[x0]
uxth w0,w2 <-- ldr changed to a uxth instruction based on the postfixes of the str and ldr instructions (minimises read-after-write penalty).
ldp x29,x30,[sp], 16
ret
.Le429:
----
Also in Sysutils - before:
str x0,[sp, 24]
ldr x0,[sp, 24]
str x0,[sp, 32]
.Lj3450:
After:
stp x0,x0,[sp, 24] <-- the ldr instruction is removed because x0 already contains the value at the address specified (because it was just written there), and then the two str instructions are merged into an stp instruction later on.<br/>
.Lj3450:
----
In the Classes unit - before:
b.ne .Lj947
ldr x0,[sp, 16]
ldr x1,[sp, 16]
ldr x1,[x1, 104]
blr x1
str x0,[sp, #16]
.Lj947:
After:
b.ne .Lj947
ldr x0,[sp, 16]
ldr x1,[x0, 104] <-- Second ldr was changed to "mov x1,x0", which was then optimised by RedundantMovProcess and merged into the 3rd ldr.
blr x1
str x0,[sp, 16]
.Lj947:
----
Longer-range optimisations of this kind are still being researched because of the fact that references are involved - watch this space!
Mantis conversion info:
- Mantis ID: 38841
- OS: Debian GNU/LInux (Raspberry Pi)
- OS Build: 10
- Build: r49298
- Platform: arm and aarch64
- Version: 3.3.1