[Patch] MOV/LDR/STR/MOV optimisations
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
This patch builds on the RedundantMovProcess routine that is general to all ARM targets, and attempts to reduce unnecessary MOV instructions by swapping registers that appear in references and STR instructions for ones that are equal in value but have retained said value for longer; e.g. if we take the instruction group:
...
mov r1,r0
str r1,[r1]
ldr r2,[r1]
ldr r1,[r2]
...
The routine will now detect that r1 is equal to r0 and change the references in the STR and the first LDR instruction, and also the value register for STR:
...
mov r1,r0
str r0,[r0]
ldr r2,[r0]
ldr r1,[r2]
...
Finally, after seeing that r1 is no longer in use before its value gets overwritten by the second LDR instruction, it can remove the original MOV completely:
...
str r0,[r0]
ldr r2,[r0]
ldr r1,[r2]
...
Though the pipeline on ARM processors is nowhere near as complex as Intel processors, later models have multiple arithmetic ports, so even if the instruction count isn't reduced, it can still provide a minor speed boost.
Steps to reproduce:
Apply patch and confirm correct compilation and overall optimisation improvement on -O2 and -O3.
Additional information:
The MOV/LDR optimisation seems to conflict with another optimisation on 32-bit ARM that causes an instant access violation during the make cycle (on ppc2). The single optimisation that triggers it (changing the base register of a reference) is disabled on 32-bit ARM, but the actual reason for why it causes this error is currently undetermined and needs further investigation.
Some registers, especially the stack register, aren't always tracked properly, so there are extra checks for these situations so instructions don't get removed incorrectly. For example, if the "(getsupreg(taicpu(p).oper[1]^.reg) <> RS_STACK_POINTER_REG)" condition that appears in line 180 of the patch file is not present, it causes a single test to fail when compiled under -O3 because "packages/bzip2/src/bzip2comn.pp" gets optimised incorrectly and seems to be the only time this happens. Further refactoring of the peephole optimiser should iron out these problems and allow this routine to have fewer exceptional checks.
Tested using "-a -O3" (-a for comparing assembler dumps) on arm-linux and aarch64-linux - no regressions currently noted.
Further improvements should be possible later on.
Mantis conversion info:
- Mantis ID: 37638
- OS: LInux (Raspberry Pi OS)
- OS Build: 5.4.51-v8+
- Build: r46676
- Platform: arm and aarch64
- Version: 3.3.1
- Fixed in version: 3.3.1
- Fixed in revision: 47330 (#6ec460c6)