[Patch] JMP -> MOV/RET optimisation
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
This patch serves to extend the JMP -> RET optimisation in OptPass2JMP by also doing the same for JMP -> MOV/RET, since there are often cases where the result (e.g. EAX) is set just prior to the function exiting. If future C-Builder support is ever implemented, this optimisation will serve that language well because "return 1;", for example, sets the result and exits the function in a single instruction.
As an example of this optimisation in action in lazarus/ide/etfpcmsgparser.pas (~line 7481) - BEFORE:
...
.Lj1187:
testl %edx,%edx
je .Lj1189
subl $1,%edx
je .Lj1190
jmp .Lj1188
.balign 16,0x90
.Lj1189:
movb $1,97(%rcx)
jmp .Lj1188
.balign 16,0x90
.Lj1190:
movb $1,96(%rcx)
.balign 16,0x90
.Lj1188:
movb $1,%al
.Lc635:
ret
.Lc633:
---
AFTER:
...
.Lj1187:
testl %edx,%edx
je .Lj1189
subl $1,%edx
je .Lj1190
movb $1,%al
ret
.balign 16,0x90
.Lj1189:
movb $1,97(%rcx)
movb $1,%al
ret
.balign 16,0x90
.Lj1190:
movb $1,96(%rcx)
movb $1,%al
.Lc635:
ret
.Lc633:
(.LJ1188 becomes a dead label and is subsequently stripped)
Steps to reproduce:
Apply patch and confirm correct code compilation as well as a minor speed boost caused by the elimination of jumps.
Additional information:
By itself, this optimisation can cause code size to increase, but sometimes an optimisation can be made to the MOV instructions that are added. To accommodate for this, OptPass2MOV calls OptPass2JMP if the instruction that follows is JMP, and then calls OptPass1MOV if an optimisation is made. At the same time, a call to "GetNextInstruction" was factored out of the if-conditions in OptPass2MOV, providing a boost in efficiency, since "GetNextInstruction" is a rather expensive call.
Because of the complexity of this optimisation process, it is only performed when optimising for speed, or under -O3.
Currently, the JMP -> MOV/RET optimisation cannot be moved to pass 1 in order to simplify the above process, because it causes other optimisations to perform worse (most notably, the Jcc -> CMOVcc optimisation).
As a result, there is potential for future research in this part of the optimisation process, not just from the intermixing of pass 1 and pass 2, but also noting in the above example that mov $1,%al appears in both branches of a conditional jump, something that may require more complex data-flow analysis to detect and optimise (by inserting the MOV instruction to before the jump and removing the copies that appear after if safe to do so).
Mantis conversion info:
- Mantis ID: 36355
- OS: Microsoft Windows
- OS Build: 10 Professional
- Build: r43582
- Platform: i386 and x86_64
- Version: 3.3.1
- Fixed in version: 3.3.1
- Fixed in revision: 43592 (#af107ca8)
- Monitored by: » Vincent (Vincent Snijders)