[Patch / Refactor] TmpUsedRegs object pooling and optimisation
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
This patch serves to make a small speed saving and reduce maintenance costs by using a paradigm known as 'object pooling' in the peephole optimizer on x86 systems. All references to TmpUsedRegs are changed to point to a common object that is created with the instance of the optimizer (which runs in a single thread). This serves to reduce time wasted on constantly creating and destroying a number of instances of TUsedRegs whenever the peephole optimizer wishes to look ahead with register usage.
Some new base functions are introduced, namely two versions of TransferUsedRegs that copy the used registers from one TAllUsedRegs collection to another, but without creating a new object (it calls the constructor as a regular method in order to update the internal state - no new object is instantiated - comments explain what's going on). One version copies the internal states of all register types, while the second copies only one, specified by a parameter - this version is also inlined (as is GetUsedRegs) because it collapses into a single line of code. For example, "OptPass1MOV" frequently calls:
TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
If there's concern that this increases the maintenance factor and hence the risk of bugs, or an optimisation uses more than one type of register, then the first parameter can be removed - for example:
TransferUsedRegs(TmpUsedRegs);
...and the code will still compile successfully with no change in function, although it will run slightly slower.
Steps to reproduce:
Apply patch and confirm correct compilation.
Additional information:
As a side-effect, a couple of memory leaks were fixed with this patch, because "OptPass1VMOVAP" and "OptPass1VOP" failed to call ReleaseUsedRegs after calling CopyUsedRegs (which creates instances of TUsedRegs). Thus maintenance costs are reduced because there is now no need to remember to call ReleaseUsedRegs every time TmpUsedRegs is used; the pooled instances are freed when the optimizer's destructor is called.
Time savings are marginal at best - no more than a few seconds with very large projects.
On win64, the size of the compiler increases by only 512 bytes, largely due to the inlined methods. It isn't larger because the version of TransferUsedRegs that omits the register type is never used. There is, however, potential for further savings in both speed and size with future peephole optimisations, because, for example, the following pair of instructions frequently appear:
movq %rbx,%rax
movq (%rax),%rax
...which could be collapsed into just:
movq (%rbx),%rax
Mantis conversion info:
- Mantis ID: 34679
- OS: Microsoft Windows
- OS Build: 10 Professional
- Build: x86_64-win64
- Platform: x86_64-win64
- Version: 3.3.1
- Fixed in version: 3.3.1
- Target version: 3.3.1