View Issue Details

IDProjectCategoryView StatusLast Update
0039216FPCCompilerpublic2021-07-20 11:05
ReporterJ. Gareth Moreton Assigned To 
PrioritylowSeveritytweakReproducibilityN/A
Status newResolutionopen 
Platformi386 and x86_64OSMicrosoft Windows 
Product Version3.3.1 
Summary0039216: [Patch / Refactor] x86: DeepMovOpt speed gains
DescriptionThis patch provides some speed gains to the optimisations located around DeepMovOpt in OptPass1MOV:

- A pair of calls to RegLoadedWithNewValue and "not RegUsedAfterInstruction" is replaced with a call to RegEndOfLife. Notably, this reduces the number of calls to RegLoadedWithNewValue from 2 to 1 (RegUsedAfterInstruction calls it as part of its checks).
- RegEndOfLife is now inline as it calls two relatively simple methods in a single statement with minimal parameter manipulation.
- With the setting of TempRegUsed, the RegUsedAfterInstruction and RegReadByInstruction calls have been swapped around as RegReadByInstruction is quicker to execute (meaning RegUsedAfterInstruction doesn't have to be called if it returns True).
- Just before calling DeepMovOpt, "MatchOpType(taicpu(p), top_reg, top_reg)" has been changed to "taicpu(p).oper[0]^.typ = top_reg" because the second operand is already known to be a register at this point.
- CurrentReg is temporarily used to store taicpu(p).oper[0]^.reg to reduce the number of pointer deferences.
- Because RegEndOfLife doesn't rely on a TUsedRegs array, all the nearby updates to TmpUsedRegs have been removed.
- For the "not RegModifiedByInstruction(taicpu(p).oper[0]^.reg, hp1) and not RegModifiedBetween(taicpu(p).oper[0]^.reg, hp1, hp2)" calls, hp1 has been changed to hp3 (which is initially equal to hp1) - when "Continue" is called later in an attempt to do further optimisations after a successful deep optimisation, it reduces the number of instructions to check against, and hp3 gets set to hp2 just before Continue, thus removing the need to check against instructions where we already know the register hasn't been modified.
Steps To ReproduceApply patch and confirm correct compilation with slight speed gain on -O3 and -O4.

(Also confirm slight speed gains on other platforms that use RegEndOfLife)
Additional InformationInterestingly, when building the compiler under -O4, there is a single difference in the RTL assembler dumps in the system unit - before:

.section .text.n_system_$$_defaultenumresourcelanguages$qword$pchar$pchar$enumreslangproc$int64$$longbool,"ax"
    .balign 16,0x90
.globl SYSTEM_$$_DEFAULTENUMRESOURCELANGUAGES$QWORD$PCHAR$PCHAR$ENUMRESLANGPROC$INT64$$LONGBOOL
SYSTEM_$$_DEFAULTENUMRESOURCELANGUAGES$QWORD$PCHAR$PCHAR$ENUMRESLANGPROC$INT64$$LONGBOOL:
.seh_proc SYSTEM_$$_DEFAULTENUMRESOURCELANGUAGES$QWORD$PCHAR$PCHAR$ENUMRESLANGPROC$INT64$$LONGBOOL
    pushq %rbp
.seh_pushreg %rbp
# Peephole Optimization: Mov2Nop 3b done
.seh_endprologue
    xorl %eax,%eax
# Peephole Optimization: %rbp = %rsp; changed to minimise pipeline stall (MovXXX2MovXXX)
    leaq (%rsp),%rsp
    popq %rbp
    ret
.seh_endproc

After:

.section .text.n_system_$$_defaultenumresourcelanguages$qword$pchar$pchar$enumreslangproc$int64$$longbool,"ax"
    .balign 16,0x90
.globl SYSTEM_$$_DEFAULTENUMRESOURCELANGUAGES$QWORD$PCHAR$PCHAR$ENUMRESLANGPROC$INT64$$LONGBOOL
SYSTEM_$$_DEFAULTENUMRESOURCELANGUAGES$QWORD$PCHAR$PCHAR$ENUMRESLANGPROC$INT64$$LONGBOOL:
.seh_proc SYSTEM_$$_DEFAULTENUMRESOURCELANGUAGES$QWORD$PCHAR$PCHAR$ENUMRESLANGPROC$INT64$$LONGBOOL
    pushq %rbp
.seh_pushreg %rbp
    movq %rsp,%rbp ; <-- MOV instruction no longer removed.
.seh_endprologue
    xorl %eax,%eax
# Peephole Optimization: %rbp = %rsp; changed to minimise pipeline stall (MovXXX2MovXXX)
    leaq (%rsp),%rsp
    popq %rbp
    ret
.seh_endproc

----

This change occured simply from changing: "RegLoadedWithNewValue(CurrentReg, hp2) or not RegUsedAfterInstruction(CurrentReg, hp2, TmpUsedRegs)" to "RegEndOfLife(CurrentReg, taicpu(hp2))". Though the removal of the MOV instruction is harmless and technically correct since %rbp is no longer used, it shouldn't have happened because DeepMovOpt should have returned False on "popq %rbp", and the removal of the MOV instruction implies it erroneously returned True. This is also implied under the i386 tests, as with this patch, ../packages/regexpr/tests/testregexpr.pp now passes.

I'll investigate this further by analysing the assembly dumps of the compiler.
Tagscompiler, i386, optimization, patch, refactor, x86, x86_64
Fixed in Revision
FPCOldBugId
FPCTarget-
Attached Files

Activities

J. Gareth Moreton

2021-07-13 09:20

developer  

deepmov-optimisation.patch (4,361 bytes)   
Index: compiler/aoptobj.pas
===================================================================
--- compiler/aoptobj.pas	(revision 49603)
+++ compiler/aoptobj.pas	(working copy)
@@ -336,7 +336,7 @@
 
         { returns true if reg reaches it's end of life at p, this means it is either
           reloaded with a new value or it is deallocated afterwards }
-        function RegEndOfLife(reg: TRegister;p: taicpu): boolean;
+        function RegEndOfLife(reg: TRegister;p: taicpu): boolean; inline;
 
         { removes p from asml, updates registers and replaces it by a valid value, if this is the case true is returned }
         function RemoveCurrentP(var p : tai): boolean;
@@ -1483,7 +1483,7 @@
       end;
 
 
-    function TAOptObj.RegEndOfLife(reg : TRegister;p : taicpu) : boolean;
+    function TAOptObj.RegEndOfLife(reg : TRegister;p : taicpu) : boolean; inline;
       begin
          Result:=assigned(FindRegDealloc(reg,tai(p.Next))) or
            RegLoadedWithNewValue(reg,p);
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 49603)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -2899,8 +2899,8 @@
 
                         TempRegUsed :=
                           CrossJump { Assume the register is in use if it crossed a conditional jump } or
-                          RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp2, TmpUsedRegs) or
-                          RegReadByInstruction(taicpu(p).oper[1]^.reg, hp1);
+                          RegReadByInstruction(taicpu(p).oper[1]^.reg, hp3) or
+                          RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp2, TmpUsedRegs);
 
                         case taicpu(p).oper[0]^.typ Of
                           top_reg:
@@ -3040,16 +3040,16 @@
                         Exit;
                       end;
                   else
-                    if MatchOpType(taicpu(p), top_reg, top_reg) then
+                    if taicpu(p).oper[0]^.typ = top_reg then
                       begin
-                        CurrentReg := taicpu(p).oper[1]^.reg;
-                        TransferUsedRegs(TmpUsedRegs);
-                        TmpUsedRegs[R_INTREGISTER].Update(tai(p.Next));
+                        CurrentReg := taicpu(p).oper[0]^.reg;
                         if
-                          not RegModifiedByInstruction(taicpu(p).oper[0]^.reg, hp1) and
-                          not RegModifiedBetween(taicpu(p).oper[0]^.reg, hp1, hp2) and
+                          not RegModifiedByInstruction(CurrentReg, hp3) and
+                          not RegModifiedBetween(CurrentReg, hp3, hp2) and
                           DeepMovOpt(taicpu(p), taicpu(hp2)) then
                           begin
+                            CurrentReg := taicpu(p).oper[1]^.reg;
+
                             { Just in case something didn't get modified (e.g. an
                               implicit register) }
                             if not RegReadByInstruction(CurrentReg, hp2) and
@@ -3057,15 +3057,10 @@
                                 the original MOV no matter what }
                               not CrossJump then
                               begin
-                                TransferUsedRegs(TmpUsedRegs);
-                                UpdateUsedRegs(TmpUsedRegs, tai(p.Next));
-                                UpdateUsedRegs(TmpUsedRegs, tai(hp1.Next));
-
-                                if
-                                  { Make sure the original register isn't still present
-                                    and has been written to (e.g. with SHRX) }
-                                  RegLoadedWithNewValue(CurrentReg, hp2) or
-                                  not RegUsedAfterInstruction(CurrentReg, hp2, TmpUsedRegs) then
+                                { RegEndOfLife returns True if the register is
+                                  deallocated before the next instruction or has
+                                  been loaded with a new value }
+                                if RegEndOfLife(CurrentReg, taicpu(hp2)) then
                                   begin
                                     { We can remove the original MOV }
                                     DebugMsg(SPeepholeOptimization + 'Mov2Nop 3b done',p);
deepmov-optimisation.patch (4,361 bytes)   

J. Gareth Moreton

2021-07-13 09:26

developer   ~0131870

Keeping priority at Normal and severity at Minor due to a possible compiler bug that gets fixed with this patch.

J. Gareth Moreton

2021-07-13 16:30

developer   ~0131878

So a bit weird - I haven't found anything in the assembly dumps that suggests a compiler bug yet - all I found was other instances of "mov %rsp,%rbp" not being removed - I'll keep investigating thuogh.

J. Gareth Moreton

2021-07-13 17:21

developer   ~0131879

Last edited: 2021-07-13 17:53

View 5 revisions

Ah, false alarm... there's no bug - it's just that RegUsedAfterInstruction actually checks the next instruction and allows the correct removal of the MOV instruction if the next instruction is "pop %rbp" (it did this check on the preceding "leaq (%rsp),%rsp" instruction after changing it from "leaq (%rbp),%rsp" - I'll add the optimisation that removes "lea (%rsp),%rsp" later, since it's a null operation; it doesn't do so currently because it's using the stack pointer).

So it's not an exact refactor, but removing "movq %rsp,%rbp" is probably something I can do in a subsequent update in a way that's broader and faster.

(Not sure what that fixed failure on i386 is about though)

Bart Broersma

2021-07-15 20:56

reporter   ~0131901

> # Peephole Optimization: %rbp = %rsp; changed to minimise pipeline stall (MovXXX2MovXXX)
OT: since we tend to use American English (e.g. color instead of colour): minimise should be changed to minimize ??

J. Gareth Moreton

2021-07-15 21:15

developer   ~0131902

What can I say? I'm born and bred in Britain, and Free Pascal isn't an American project, so I don't think it matters all that much. I have kept the comment header as "Peephole Optimization" though since that's a declared string constant and that part of the compiler is specificially called the "peephole optimizer".

Generally, end users do not ever see those comments in their own assembly dumps because it requires building the compiler with DEBUG_AOPTCPU defined.

J. Gareth Moreton

2021-07-20 11:05

developer   ~0131952

Does something break with this?

Issue History

Date Modified Username Field Change
2021-07-13 09:20 J. Gareth Moreton New Issue
2021-07-13 09:20 J. Gareth Moreton File Added: deepmov-optimisation.patch
2021-07-13 09:20 J. Gareth Moreton Tag Attached: patch
2021-07-13 09:20 J. Gareth Moreton Tag Attached: compiler
2021-07-13 09:20 J. Gareth Moreton Tag Attached: refactor
2021-07-13 09:20 J. Gareth Moreton Tag Attached: i386
2021-07-13 09:20 J. Gareth Moreton Tag Attached: x86_64
2021-07-13 09:20 J. Gareth Moreton Tag Attached: x86
2021-07-13 09:20 J. Gareth Moreton Tag Attached: optimization
2021-07-13 09:25 J. Gareth Moreton Description Updated View Revisions
2021-07-13 09:25 J. Gareth Moreton FPCTarget => -
2021-07-13 09:26 J. Gareth Moreton Note Added: 0131870
2021-07-13 16:30 J. Gareth Moreton Note Added: 0131878
2021-07-13 17:21 J. Gareth Moreton Note Added: 0131879
2021-07-13 17:22 J. Gareth Moreton Note Edited: 0131879 View Revisions
2021-07-13 17:24 J. Gareth Moreton Priority normal => low
2021-07-13 17:24 J. Gareth Moreton Severity minor => tweak
2021-07-13 17:24 J. Gareth Moreton Note Edited: 0131879 View Revisions
2021-07-13 17:36 J. Gareth Moreton Note Edited: 0131879 View Revisions
2021-07-13 17:53 J. Gareth Moreton Note Edited: 0131879 View Revisions
2021-07-14 21:11 J. Gareth Moreton Summary [Patch / Refactor] DeepMovOpt speed gains => [Patch / Refactor] x86: DeepMovOpt speed gains
2021-07-15 20:56 Bart Broersma Note Added: 0131901
2021-07-15 21:15 J. Gareth Moreton Note Added: 0131902
2021-07-20 11:05 J. Gareth Moreton Note Added: 0131952