[Patch / Refactor] x86: Merging of Post-Peephole and Reference Optimization stages
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
This patch seeks to reduce the number of passes required to compile a Pascal program on x86 platforms by merging the Post-Peephole Optimization stage with the Reference Optimization stage.
The justification is that the Post-Peephole Optimization stage only converts individual instructions into more efficient forms (or removes unnecessary CMP operations) after passes 1 and 2 have done their heavy lifting. At the same time, the Reference Optimization pass only stops on instructions and checks to see if their operands are references, and if they are, tidy them up so they are more consistent and can be stored more efficiently. When it comes to maintenance, this Reference Optimization pass is easy to overlook because it's hidden in an overridden "PostPeepHoleOpts" routine that otherwise just calls the inherited version.
By removing this pass and adding its per-instruction code to the end of the "PostPeepHoleOptsCpu" routine (only called if an optimisation routine returns False), the two passes can be efficiently and cleanly merged for a time saving of about 10% on large projects.
Steps to reproduce:
Apply patches and confirm correct compilation and cross-compilation of compiler, and no change to binaries built with the compiler.
Additional information:
Personal timing metrics when compiling Lazarus on i386-win32 and x86_64-win64:
O3 Trunk (win64):
[103.164] 1308396 lines compiled, 103.2 sec
[104.633] 1308396 lines compiled, 104.6 sec
[97.766] 1308396 lines compiled, 97.8 sec
[99.023] 1308396 lines compiled, 99.0 sec
[99.352] 1308396 lines compiled, 99.4 sec
O3 New (win64):
[87.594] 1308396 lines compiled, 87.6 sec
[86.039] 1308396 lines compiled, 86.0 sec
[86.906] 1308396 lines compiled, 86.9 sec
[87.492] 1308396 lines compiled, 87.5 sec
[87.688] 1308396 lines compiled, 87.7 sec
----
O3 Trunk (win32):
[94.837] 1312002 lines compiled, 94.8 sec
[95.554] 1312002 lines compiled, 95.6 sec
[94.838] 1312002 lines compiled, 94.8 sec
[94.806] 1312002 lines compiled, 94.8 sec
[94.025] 1312002 lines compiled, 94.0 sec
O3 New (win32):
[83.923] 1312002 lines compiled, 83.9 sec
[84.648] 1312002 lines compiled, 84.6 sec
[83.618] 1312002 lines compiled, 83.6 sec
[84.051] 1312002 lines compiled, 84.1 sec
[88.813] 1312002 lines compiled, 88.8 sec
Mantis conversion info:
- Mantis ID: 36583
- OS: Microsoft Windows
- OS Build: 10 Professional
- Build: r43920
- Platform: i8086, i386 and x86_64
- Version: 3.3.1
- Fixed in version: 3.3.1
- Fixed in revision: 44001 (#16152cf9)