View Revisions: Issue #34628

Summary 0034628: [Patch / Refactor] x86_64 optimizer overhaul
Revision 2018-12-01 17:05 by J. Gareth Moreton
Description This patch serves to overhaul the optimiser for x86_64 to minimise the number of passes required and to be more intelligent. Preliminary tests show about a 5% speed increase on an -O1 compilation of Lazarus and about a 15% speed increase for -O3. See the attached Metric.txt file showcasing the timings.

To minimise the pass count, the pre-peephole, pass 1 and pass 2 stages have been merged, and jump and MOV optimisations have been overhauled. One of the control cases is that a compilation under -O1 should not produce worse code than the trunk - it turns out though that in many cases, the compiler produces better code even though no new actual optimization combinations have been introduced.

Additionally, for individual passes, the optimizer attempts to mark the end of function prologues so as to not waste time on sequences that won't change.

The code isn't completely clean as I have attempted to separate i386 from the changes, mostly as a control case to show it doesn't affect other platforms. Once testing and implementation is successful for x86_64, I plan to port my changes over to i386.

(NOTE: Linux testing hasn't yet been overly successful due to configuration difficulties)
Revision 2018-12-01 17:04 by J. Gareth Moreton
Description This patch serves to overhaul the optimiser for x86_64 to minimise the number of passes required and to be more intelligent. Preliminary tests show 0000005:0000005% speed increase on an -O1 compilation of Lazarus and 0000016:0000015% speed increase for -O3. See the attached Metric.txt file showcasing the timings.

To minimise the pass count, the pre-peephole, pass 1 and pass 2 stages have been merged, and jump and MOV optimisations have been overhauled. One of the control cases is that a compilation under -O1 should not produce worse code than the trunk - it turns out though that in many cases, the compiler produces better code even though no new actual optimization combinations have been introduced.

Additionally, for individual passes, the optimizer attempts to mark the end of function prologues so as to not waste time on sequences that won't change.

The code isn't completely clean as I have attempted to separate i386 from the changes, mostly as a control case to show it doesn't affect other platforms. Once testing and implementation is successful for x86_64, I plan to port my changes over to i386.

(NOTE: Linux testing hasn't yet been overly successful due to configuration difficulties)