[Patch] ARM - str/str -> stm optimisation (and peephole debug strings)
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
The stm-merge patch searches for optimisations on arm (32-bit non-thumb) platfroms where sequential str instructions can be merged into a single stm instruction, depending on the registers used and the addresses in question.
It required making a distinct Pass 2 procedure, which previously just had the large conditional branch optimisation, which has now been moved into a separate OptPass2Bcc routine.
For the other patch... lIke on other platforms, arm-peephole.patch is a refactor that standardises the "Peephole Optimization: " prefix on peephole debug messages.
Steps to reproduce:
Apply patches and confirm correct compilation
Additional information:
- stm-merge.patch requires arm-peephole.patch to work.
Example in baseunix.s - trunk:
.section .text.n_baseunix_$$_fpselect$longint$pfdset$pfdset$pfdset$ptimeval$$longint,"ax"
.balign 4
.globl BASEUNIX_$$_FPSELECT$LONGINT$PFDSET$PFDSET$PFDSET$PTIMEVAL$$LONGINT
.type BASEUNIX_$$_FPSELECT$LONGINT$PFDSET$PFDSET$PFDSET$PTIMEVAL$$LONGINT,#function
BASEUNIX_$$_FPSELECT$LONGINT$PFDSET$PFDSET$PFDSET$PTIMEVAL$$LONGINT:
mov r12,r13
stmfd r13!,{r11,r12,r14,r15}
sub r11,r12,4
ldr r12,[r11, 4]
sub r13,r13,56
str r12,[r13, 4]
str r3,[r13]
mov r3,r2
mov r2,r1
mov r1,r0
mov r0,142
bl FPC_SYSCALL5
ldmea r11,{r11,r13,r15}
.Le54:
.size BASEUNIX_$$_FPSELECT$LONGINT$PFDSET$PFDSET$PFDSET$PTIMEVAL$$LONGINT, .Le54 - BASEUNIX_$$_FPSELECT$LONGINT$PFDSET$PFDSET$PFDSET$PTIMEVAL$$LONGINT
Patch:
.section .text.n_baseunix_$$_fpselect$longint$pfdset$pfdset$pfdset$ptimeval$$longint,"ax"
.balign 4
.globl BASEUNIX_$$_FPSELECT$LONGINT$PFDSET$PFDSET$PFDSET$PTIMEVAL$$LONGINT
.type BASEUNIX_$$_FPSELECT$LONGINT$PFDSET$PFDSET$PFDSET$PTIMEVAL$$LONGINT,#function
BASEUNIX_$$_FPSELECT$LONGINT$PFDSET$PFDSET$PFDSET$PTIMEVAL$$LONGINT:
mov r12,r13
stmfd r13!,{r11,r12,r14,r15}
sub r11,r12,4
ldr r12,[r11, 4]
sub r13,r13,56
stm r13,{r3,r12} // <-- Two str instructions successfully merged.
mov r3,r2
mov r2,r1
mov r1,r0
mov r0,142
bl FPC_SYSCALL5
ldmea r11,{r11,r13,r15}
.Le54:
.size BASEUNIX_$$_FPSELECT$LONGINT$PFDSET$PFDSET$PFDSET$PTIMEVAL$$LONGINT, .Le54 - BASEUNIX_$$_FPSELECT$LONGINT$PFDSET$PFDSET$PFDSET$PTIMEVAL$$LONGINT
ldr/ldr -> ldm optimisations are not yet present due to some additional work required to detect if it's safe to overwrite the registers.
Mantis conversion info:
- Mantis ID: 38975
- OS: Debian GNU/LInux (Raspberry Pi)
- OS Build: 10
- Build: r49489
- Platform: arm
- Version: 3.3.1
- Fixed in version: 3.3.1
- Fixed in revision: 49499 (#77666736)