View Issue Details

IDProjectCategoryView StatusLast Update
0038579FPCCompilerpublic2021-03-15 22:12
ReporterJ. Gareth Moreton Assigned ToFlorian  
PrioritylowSeveritytweakReproducibilityN/A
Status resolvedResolutionfixed 
Platformi386 and x86_64OSMicrosoft Windows 
Product Version3.3.1 
Fixed in Version3.3.1 
Summary0038579: [Patch] AddMov2LeaAdd and -Os change for SubMov2LeaSub
DescriptionThis patch adds the logical partner to the SubMov2LeaSub optimisation, namely AddMov2LeaAdd. It also modifies SubMov2LeaSub slightly to be selective in when to apply the optimisation under -Os (size over speed) settings, in that it will apply the optimisation if it can remove the Sub instruction (the same goes for the Add instruction in this new optimisation).

Tested on i386-win32 and x86_64-win64 and confirmed no regressions in test suite when run normally and when run with "-O4 -gl" options.
Steps To ReproduceApply patch and confirm correct compilation.
Additional InformationThere is a slight risk of the optimisation causing diminishing returns when applied multiple times in a row. For example, in the System unit under -O4:

.Lj749:
    addq $1,%rax
    movq %rax,%rcx
# Peephole Optimization: %rcx = %rax; changed to minimise pipeline stall (MovXXX2MovXXX)
    movq %rax,%rdx
    shrq $3,%rdx
    andq $7,%rcx
    movzbl (%rbx,%rdx),%r8d
    ...

Becomes...

.Lj749:
    leaq 1(%rax),%rcx
# Peephole Optimization: AddMov2LeaAdd
    leaq 1(%rax),%rdx
# Peephole Optimization: AddMov2LeaAdd
    addq $1,%rax
    shrq $3,%rdx
    andq $7,%rcx
    movzbl (%rbx,%rdx),%r8d
    ...

There comes a point where there is no speed gain because the processor runs out of available AGUs (used for LEA), while ALUs (used for MOV and ADD) are more numerous (Broadwell CPUs, for example, have 4 integer ALUs and 3 AGUs). Such a diminishing return is an edge case though, and might be possible to mitigate in the post-peephole stage by, say, changing ADD or MOV instructions to LEA or vice versa in order to balance out the AGU and ALU usage. In the example given though, it is likely there will still be a gain of 1 cycle, since the ADD instruction will run at the same time as the two LEA instructions, or simultaneously with the SHR and AND instructions.
Tagsi386, optimizations, patch.compiler, x86, x86_64
Fixed in Revision48989
FPCOldBugId
FPCTarget-
Attached Files

Relationships

child of 0038555 resolvedFlorian [Patch] SubMov2LeaSub optimisation improvement 

Activities

Sven Barth

2021-03-04 09:16

manager   ~0129370

Would you please provide a more useful title than a revision number?

J. Gareth Moreton

2021-03-04 09:17

developer   ~0129371

Last edited: 2021-03-04 09:22

View 2 revisions

Separated the SubMov2LeaSub change to a separate patch to keep it distinct.

(Also updated them because the patch was accidentally out of date and had a bug)

J. Gareth Moreton

2021-03-04 09:20

developer   ~0129372

addmov2lea.patch (6,683 bytes)   
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 48872)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -7025,7 +7025,7 @@
 
     function TX86AsmOptimizer.OptPass2ADD(var p : tai) : boolean;
       var
-        hp1: tai;
+        hp1: tai; NewRef: TReference;
 
         { This entire nested function is used in an if-statement below, but we
           want to avoid all the used reg transfers and GetNextInstruction calls
@@ -7045,45 +7045,103 @@
 
       begin
         Result := False;
+        if not GetNextInstruction(p, hp1) or (hp1.typ <> ait_instruction) then
+          Exit;
 
-        { Change:
-            add     %reg2,%reg1
-            mov/s/z #(%reg1),%reg1  (%reg1 superregisters must be the same)
+        if (taicpu(p).opsize in [S_L{$ifdef x86_64}, S_Q{$endif}]) then
+          begin
+            { Change:
+                add     %reg2,%reg1
+                mov/s/z #(%reg1),%reg1  (%reg1 superregisters must be the same)
 
-          To:
-            mov/s/z #(%reg1,%reg2),%reg1
-        }
+              To:
+                mov/s/z #(%reg1,%reg2),%reg1
+            }
+            if MatchOpType(taicpu(p), top_reg, top_reg) and
+              MatchInstruction(hp1, [A_MOV, A_MOVZX, A_MOVSX{$ifdef x86_64}, A_MOVSXD{$endif}], []) and
+              MatchOpType(taicpu(hp1), top_ref, top_reg) and
+              (taicpu(hp1).oper[0]^.ref^.scalefactor <= 1) and
+              (
+                (
+                  (taicpu(hp1).oper[0]^.ref^.base = taicpu(p).oper[1]^.reg) and
+                  (taicpu(hp1).oper[0]^.ref^.index = NR_NO)
+                ) or (
+                  (taicpu(hp1).oper[0]^.ref^.index = taicpu(p).oper[1]^.reg) and
+                  (taicpu(hp1).oper[0]^.ref^.base = NR_NO)
+                )
+              ) and (
+                Reg1WriteOverwritesReg2Entirely(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[1]^.reg) or
+                (
+                  { If the super registers ARE equal, then this MOV/S/Z does a partial write }
+                  not SuperRegistersEqual(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[1]^.reg) and
+                  MemRegisterNotUsedLater
+                )
+              ) then
+              begin
+                taicpu(hp1).oper[0]^.ref^.base := taicpu(p).oper[1]^.reg;
+                taicpu(hp1).oper[0]^.ref^.index := taicpu(p).oper[0]^.reg;
 
-        if (taicpu(p).opsize in [S_L{$ifdef x86_64}, S_Q{$endif}]) and
-          MatchOpType(taicpu(p), top_reg, top_reg) and
-          GetNextInstruction(p, hp1) and
-          MatchInstruction(hp1, [A_MOV, A_MOVZX, A_MOVSX{$ifdef x86_64}, A_MOVSXD{$endif}], []) and
-          MatchOpType(taicpu(hp1), top_ref, top_reg) and
-          (taicpu(hp1).oper[0]^.ref^.scalefactor <= 1) and
-          (
-            (
-              (taicpu(hp1).oper[0]^.ref^.base = taicpu(p).oper[1]^.reg) and
-              (taicpu(hp1).oper[0]^.ref^.index = NR_NO)
-            ) or (
-              (taicpu(hp1).oper[0]^.ref^.index = taicpu(p).oper[1]^.reg) and
-              (taicpu(hp1).oper[0]^.ref^.base = NR_NO)
-            )
-          ) and (
-            Reg1WriteOverwritesReg2Entirely(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[1]^.reg) or
-            (
-              { If the super registers ARE equal, then this MOV/S/Z does a partial write }
-              not SuperRegistersEqual(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[1]^.reg) and
-              MemRegisterNotUsedLater
-            )
-          ) then
-          begin
-            taicpu(hp1).oper[0]^.ref^.base := taicpu(p).oper[1]^.reg;
-            taicpu(hp1).oper[0]^.ref^.index := taicpu(p).oper[0]^.reg;
+                DebugMsg(SPeepholeOptimization + 'AddMov2Mov done', p);
+                RemoveCurrentp(p, hp1);
+                Result := True;
+                Exit;
+              end;
 
-            DebugMsg(SPeepholeOptimization + 'AddMov2Mov done', p);
-            RemoveCurrentp(p, hp1);
-            Result := True;
-            Exit;
+            { Change:
+                addl/q $x,%reg1
+                movl/q %reg1,%reg2
+              To:
+                leal/q $x(%reg1),%reg2
+                addl/q $x,%reg1 (can be removed if %reg1 or the flags are not used afterwards)
+
+              Breaks the dependency chain.
+            }
+            if MatchOpType(taicpu(p),top_const,top_reg) and
+              MatchInstruction(hp1, A_MOV, [taicpu(p).opsize]) and
+              (taicpu(hp1).oper[1]^.typ = top_reg) and
+              MatchOperand(taicpu(hp1).oper[0]^, taicpu(p).oper[1]^.reg) and
+              (
+                { Don't do AddMov2LeaAdd under -Os, but do allow AddMov2Lea }
+                not (cs_opt_size in current_settings.optimizerswitches) or
+                (
+                  not RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp1, TmpUsedRegs) and
+                  RegUsedAfterInstruction(NR_DEFAULTFLAGS, hp1, TmpUsedRegs)
+                )
+              ) then
+              begin
+                { Change the MOV instruction to a LEA instruction, and update the
+                  first operand }
+
+                reference_reset(NewRef, 1, []);
+                NewRef.base := taicpu(p).oper[1]^.reg;
+                NewRef.scalefactor := 1;
+                NewRef.offset := taicpu(p).oper[0]^.val;
+
+                taicpu(hp1).opcode := A_LEA;
+                taicpu(hp1).loadref(0, NewRef);
+
+                TransferUsedRegs(TmpUsedRegs);
+                UpdateUsedRegs(TmpUsedRegs, tai(p.Next));
+                if RegUsedAfterInstruction(NewRef.base, hp1, TmpUsedRegs) or
+                  RegUsedAfterInstruction(NR_DEFAULTFLAGS, hp1, TmpUsedRegs) then
+                  begin
+                    { Move what is now the LEA instruction to before the SUB instruction }
+                    Asml.Remove(hp1);
+                    Asml.InsertBefore(hp1, p);
+                    AllocRegBetween(taicpu(hp1).oper[1]^.reg, hp1, p, UsedRegs);
+
+                    DebugMsg(SPeepholeOptimization + 'AddMov2LeaAdd', p);
+                    p := hp1;
+                  end
+                else
+                  begin
+                    { Since %reg1 or the flags aren't used afterwards, we can delete p completely }
+                    RemoveCurrentP(p, hp1);
+                    DebugMsg(SPeepholeOptimization + 'AddMov2Lea', p);
+                  end;
+
+                Result := True;
+              end;
           end;
       end;
 
addmov2lea.patch (6,683 bytes)   
submov2lea-upgrade.patch (1,197 bytes)   
@@ -7131,13 +7189,20 @@
           a CMP instruction if one follows.
         }
         Result := False;
-        if not (cs_opt_size in current_settings.optimizerswitches) and
-          (taicpu(p).opsize in [S_L{$ifdef x86_64}, S_Q{$endif x86_64}]) and
+        if (taicpu(p).opsize in [S_L{$ifdef x86_64}, S_Q{$endif x86_64}]) and
           MatchOpType(taicpu(p),top_const,top_reg) and
           GetNextInstruction(p, hp1) and
           MatchInstruction(hp1, A_MOV, [taicpu(p).opsize]) and
           (taicpu(hp1).oper[1]^.typ = top_reg) and
-          MatchOperand(taicpu(hp1).oper[0]^, taicpu(p).oper[1]^.reg) then
+          MatchOperand(taicpu(hp1).oper[0]^, taicpu(p).oper[1]^.reg) and
+          (
+            { Don't do SubMov2LeaSub under -Os, but do allow SubMov2Lea }
+            not (cs_opt_size in current_settings.optimizerswitches) or
+            (
+              not RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp1, TmpUsedRegs) and
+              RegUsedAfterInstruction(NR_DEFAULTFLAGS, hp1, TmpUsedRegs)
+            )
+          ) then
           begin
             { Change the MOV instruction to a LEA instruction, and update the
               first operand }
submov2lea-upgrade.patch (1,197 bytes)   

J. Gareth Moreton

2021-03-04 09:21

developer   ~0129374

Last edited: 2021-03-04 09:23

View 3 revisions

What's an example of a useful title, Sven, that isn't specified in another field?

(Oh, sorry, I got the title and build number back to front somehow)

Sven Barth

2021-03-04 13:15

manager   ~0129379

Thank you for fixing the title. :)

Florian

2021-03-15 22:12

administrator   ~0129698

Thanks, both applied.

Issue History

Date Modified Username Field Change
2021-03-04 09:15 J. Gareth Moreton New Issue
2021-03-04 09:15 J. Gareth Moreton File Added: addmov2lea.patch
2021-03-04 09:16 Sven Barth Note Added: 0129370
2021-03-04 09:17 J. Gareth Moreton File Deleted: addmov2lea.patch
2021-03-04 09:17 J. Gareth Moreton Note Added: 0129371
2021-03-04 09:17 J. Gareth Moreton File Added: addmov2lea.patch
2021-03-04 09:17 J. Gareth Moreton File Added: submov2lea-upgrade.patch
2021-03-04 09:20 J. Gareth Moreton File Deleted: addmov2lea.patch
2021-03-04 09:20 J. Gareth Moreton File Deleted: submov2lea-upgrade.patch
2021-03-04 09:20 J. Gareth Moreton Note Added: 0129372
2021-03-04 09:20 J. Gareth Moreton File Added: addmov2lea.patch
2021-03-04 09:20 J. Gareth Moreton File Added: submov2lea-upgrade.patch
2021-03-04 09:21 J. Gareth Moreton Note Added: 0129374
2021-03-04 09:22 J. Gareth Moreton Tag Attached: patch.compiler
2021-03-04 09:22 J. Gareth Moreton Tag Attached: optimizations
2021-03-04 09:22 J. Gareth Moreton Tag Attached: i386
2021-03-04 09:22 J. Gareth Moreton Tag Attached: x86
2021-03-04 09:22 J. Gareth Moreton Tag Attached: x86_64
2021-03-04 09:22 J. Gareth Moreton Note Edited: 0129371 View Revisions
2021-03-04 09:23 J. Gareth Moreton Priority normal => low
2021-03-04 09:23 J. Gareth Moreton Severity minor => tweak
2021-03-04 09:23 J. Gareth Moreton Build [Patch] AddMov2LeaAdd => r48871
2021-03-04 09:23 J. Gareth Moreton Summary r48871 => [Patch] AddMov2LeaAdd
2021-03-04 09:23 J. Gareth Moreton FPCTarget => -
2021-03-04 09:23 J. Gareth Moreton Note Edited: 0129374 View Revisions
2021-03-04 09:23 J. Gareth Moreton Note Edited: 0129374 View Revisions
2021-03-04 09:24 J. Gareth Moreton Summary [Patch] AddMov2LeaAdd => [Patch] AddMov2LeaAdd and -Os change for SubMov2LeaSub
2021-03-04 09:30 J. Gareth Moreton Relationship added related to 0038555
2021-03-04 09:31 J. Gareth Moreton Relationship deleted related to 0038555
2021-03-04 09:31 J. Gareth Moreton Relationship added child of 0038555
2021-03-04 13:15 Sven Barth Note Added: 0129379
2021-03-15 22:12 Florian Assigned To => Florian
2021-03-15 22:12 Florian Status new => resolved
2021-03-15 22:12 Florian Resolution open => fixed
2021-03-15 22:12 Florian Fixed in Version => 3.3.1
2021-03-15 22:12 Florian Fixed in Revision => 48989
2021-03-15 22:12 Florian Note Added: 0129698