View Issue Details

IDProjectCategoryView StatusLast Update
0037422FPCCompilerpublic2020-07-27 23:01
ReporterJ. Gareth Moreton Assigned ToFlorian  
PrioritylowSeverityminorReproducibilityN/A
Status resolvedResolutionfixed 
Platformi386 and x86_64OSMicrosoft Windows 
Product Version3.3.1 
Fixed in Version3.3.1 
Summary0037422: [Patch] MovOpMov2Op/Lea2Add/Lea2Sub consolidation
DescriptionThis patch addresses a minor issue in that there are two near-identical optimisations in OptPass1MOV and OptPass2MOV, and the question of which one gets run is largely just down to how many times pass 1 is run. Often, the pass 2 version never gets to run.

This patch removes the version in pass 2 and improves the version in pass 1 so it can do more in a single iteration (usually if the sequence is MOV/LEA/MOV, and the LEA can be converted into ADD or SUB). This is done by moving some LEA conversion code into a separate procedure and having IsFoldableArithOp call this conversion routine if the LEA instruction is eligible.
Steps To ReproduceApply patch and confirm correct compilation.
Additional InformationOften, a LEA instruction is coverted to ADD/SUB, but because it sits either side of a MOV, the MovOpMov2Op optimisation does not get noticed until the next iteration of pass 1, hence the potential in reducing the iteration count.

Time savings are currently marginal at best and the binary output size isn't improved, but it makes a notable difference in my personal work of reducing the number of iterations of pass 1 while ensuring the optimisation is as efficient as possible.

Also the compiler binaries are notably reduced in size:

ppc386: 3,156,480 -> 3,147,776 (8,704 byte saving)
ppcx64: 3,793,920 -> 3,782,656 (11,264 byte saving)
Tagscompiler, i386, optimizations, patch, refactor, x86, x86_64
Fixed in Revision45865
FPCOldBugId
FPCTarget-
Attached Files

Activities

Florian

2020-07-26 20:50

administrator   ~0124345

Last edited: 2020-07-26 20:51

View 2 revisions

I do not like the idea that IsFoldableArithOp has a side effect, this should be solved differently.

J. Gareth Moreton

2020-07-26 20:54

developer   ~0124346

Okay, I can fix that fairly easily.

Bi0T1N

2020-07-26 21:46

reporter   ~0124348

Additionally the InternalError code should be changed to 2020 ;)

J. Gareth Moreton

2020-07-26 21:52

developer   ~0124349

Last edited: 2020-07-26 21:56

View 2 revisions

Oh whoops - good catch. Thanks @Bi0T1N

In the meantime, I've made an alternative patch, but I'm just testing it to see if still does everything correctly.

J. Gareth Moreton

2020-07-27 01:08

developer   ~0124350

Last edited: 2020-07-27 01:09

View 2 revisions

I've rewritten the patch to not change IsFoldableArithOp, but for some reason I've lost the size saving on the binaries... not that it was too critical:

ppcx64: 3,793,920 -> 3,792,896 (1,024 byte saving)
ppc386: 3,156,480 -> 3,154,432 (2,048 byte saving)

Truthfully these values feel a bit more realistic and I might have had something configured differently, or my development branch was out of date and didn't have some new features that the master branch had.

Regarding the patch, I moved the MOV/LEA optimisation to before the arithmetic folding optimisation so it has a chance to change the LEA instruction beforehand, while the MOVX/LEA version uses some trickery with the if statement.

J. Gareth Moreton

2020-07-27 02:56

developer   ~0124351

Made a minor improvement to the aforementioned MOV/LEA optimisation so it handles 64-bit operands. This hasn't changed the disassembly for Lazarus (my test project) but it's shaved another kilobyte from ppcx64.exe (3,791,872 bytes), which means something went right!

Before, since the instruction is "leaq (%reg1,%reg2),%reg2", this would just be changed to "addq %reg1,%reg2" and merged with the mov in the next iteration of pass 1.
mov-op-lea-improvement.patch (14,983 bytes)   
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 45864)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -105,6 +105,10 @@
         class function CanBeCMOV(p : tai) : boolean; static;
 
 
+        { Converts the LEA instruction to ADD/INC/SUB/DEC. Returns True if the
+          conversion was successful }
+        function ConvertLEA(const p : taicpu): Boolean;
+
         function DeepMOVOpt(const p_mov: taicpu; const hp: taicpu): Boolean;
 
         procedure DebugMsg(const s : string; p : tai);inline;
@@ -1773,6 +1777,55 @@
       end;
 
 
+    function TX86AsmOptimizer.ConvertLEA(const p: taicpu): Boolean;
+      var
+        l: asizeint;
+      begin
+        { Should have been checked previously }
+        if p.opcode <> A_LEA then
+          InternalError(2020072501);
+
+        with p.oper[0]^.ref^ do
+          begin
+            if (base <> p.oper[1]^.reg) or (index <> NR_NO) then
+              Exit(False);
+
+            l:=offset;
+            if (l=1) and UseIncDec then
+              begin
+                p.opcode:=A_INC;
+                p.loadreg(0,p.oper[1]^.reg);
+                p.ops:=1;
+                DebugMsg(SPeepholeOptimization + 'Lea2Inc done',p);
+              end
+            else if (l=-1) and UseIncDec then
+              begin
+                p.opcode:=A_DEC;
+                p.loadreg(0,p.oper[1]^.reg);
+                p.ops:=1;
+                DebugMsg(SPeepholeOptimization + 'Lea2Dec done',p);
+              end
+            else
+              begin
+                if (l<0) and (l<>-2147483648) then
+                  begin
+                    p.opcode:=A_SUB;
+                    p.loadConst(0,-l);
+                    DebugMsg(SPeepholeOptimization + 'Lea2Sub done',p);
+                  end
+                else
+                  begin
+                    p.opcode:=A_ADD;
+                    p.loadConst(0,l);
+                    DebugMsg(SPeepholeOptimization + 'Lea2Add done',p);
+                  end;
+              end;
+          end;
+
+        Result := True;
+      end;
+
+
     function TX86AsmOptimizer.DeepMOVOpt(const p_mov: taicpu; const hp: taicpu): Boolean;
       var
         CurrentReg, ReplaceReg: TRegister;
@@ -2755,14 +2808,59 @@
             AllocRegBetween(taicpu(p).oper[0]^.reg,p,hp1,usedregs);
             exit;
           end;
+
+        if MatchInstruction(hp1,A_LEA,[S_L{$ifdef x86_64},S_Q{$endif x86_64}]) then
+          begin
+            if MatchOpType(Taicpu(p),top_ref,top_reg) and
+               ((MatchReference(Taicpu(hp1).oper[0]^.ref^,Taicpu(hp1).oper[1]^.reg,Taicpu(p).oper[1]^.reg) and
+                 (Taicpu(hp1).oper[0]^.ref^.base<>Taicpu(p).oper[1]^.reg)
+                ) or
+                (MatchReference(Taicpu(hp1).oper[0]^.ref^,Taicpu(p).oper[1]^.reg,Taicpu(hp1).oper[1]^.reg) and
+                 (Taicpu(hp1).oper[0]^.ref^.index<>Taicpu(p).oper[1]^.reg)
+                )
+               ) then
+               { mov reg1,ref
+                 lea reg2,[reg1,reg2]
+
+                 to
+
+                 add reg2,ref}
+              begin
+                TransferUsedRegs(TmpUsedRegs);
+                { reg1 may not be used afterwards }
+                if not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp1, TmpUsedRegs)) then
+                  begin
+                    Taicpu(hp1).opcode:=A_ADD;
+                    Taicpu(hp1).oper[0]^.ref^:=Taicpu(p).oper[0]^.ref^;
+                    DebugMsg(SPeepholeOptimization + 'MovLea2Add done',hp1);
+                    RemoveCurrentp(p, hp1);
+                    result:=true;
+                    exit;
+                  end;
+              end;
+
+            { If the LEA instruction can be converted into an arithmetic instruction,
+              it may be possible to then fold it in the next optimisation, otherwise
+              there's nothing more that can be optimised here. }
+            if not ConvertLEA(taicpu(hp1)) then
+              Exit;
+
+          end;
+
         if (taicpu(p).oper[1]^.typ = top_reg) and
           (hp1.typ = ait_instruction) and
           GetNextInstruction(hp1, hp2) and
           MatchInstruction(hp2,A_MOV,[]) and
           (SuperRegistersEqual(taicpu(hp2).oper[0]^.reg,taicpu(p).oper[1]^.reg)) and
-          (IsFoldableArithOp(taicpu(hp1), taicpu(p).oper[1]^.reg) or
-           ((taicpu(p).opsize=S_L) and (taicpu(hp1).opsize=S_Q) and (taicpu(hp2).opsize=S_L) and
-            IsFoldableArithOp(taicpu(hp1), newreg(R_INTREGISTER,getsupreg(taicpu(p).oper[1]^.reg),R_SUBQ)))
+          (
+            IsFoldableArithOp(taicpu(hp1), taicpu(p).oper[1]^.reg)
+{$ifdef x86_64}
+            or
+            (
+              (taicpu(p).opsize=S_L) and (taicpu(hp1).opsize=S_Q) and (taicpu(hp2).opsize=S_L) and
+              IsFoldableArithOp(taicpu(hp1), newreg(R_INTREGISTER,getsupreg(taicpu(p).oper[1]^.reg),R_SUBQ))
+            )
+{$endif x86_64}
           ) then
           begin
             if OpsEqual(taicpu(hp2).oper[1]^, taicpu(p).oper[0]^) and
@@ -2911,6 +3009,7 @@
                     hp2.Free;
                   end;
               end;
+
           end;
         if MatchInstruction(hp1,A_BTS,A_BTR,[Taicpu(p).opsize]) and
           GetNextInstruction(hp1, hp2) and
@@ -2932,37 +3031,6 @@
             Result:=true;
             exit;
           end;
-
-        if MatchInstruction(hp1,A_LEA,[S_L]) and
-           MatchOpType(Taicpu(p),top_ref,top_reg) and
-           ((MatchReference(Taicpu(hp1).oper[0]^.ref^,Taicpu(hp1).oper[1]^.reg,Taicpu(p).oper[1]^.reg) and
-             (Taicpu(hp1).oper[0]^.ref^.base<>Taicpu(p).oper[1]^.reg)
-            ) or
-            (MatchReference(Taicpu(hp1).oper[0]^.ref^,Taicpu(p).oper[1]^.reg,Taicpu(hp1).oper[1]^.reg) and
-             (Taicpu(hp1).oper[0]^.ref^.index<>Taicpu(p).oper[1]^.reg)
-            )
-           ) then
-           { mov reg1,ref
-             lea reg2,[reg1,reg2]
-
-             to
-
-             add reg2,ref}
-          begin
-            TransferUsedRegs(TmpUsedRegs);
-            { reg1 may not be used afterwards }
-            if not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp1, TmpUsedRegs)) then
-              begin
-                Taicpu(hp1).opcode:=A_ADD;
-                Taicpu(hp1).oper[0]^.ref^:=Taicpu(p).oper[0]^.ref^;
-                DebugMsg(SPeepholeOptimization + 'MovLea2Add done',hp1);
-                asml.remove(p);
-                p.free;
-                p:=hp1;
-                result:=true;
-                exit;
-              end;
-          end;
       end;
 
 
@@ -3074,65 +3142,36 @@
            (taicpu(p).oper[1]^.reg <> NR_STACK_POINTER_REG) and
            (not(Assigned(taicpu(p).oper[0]^.ref^.Symbol))) then
           begin
-            if (taicpu(p).oper[0]^.ref^.base <> taicpu(p).oper[1]^.reg) and
-               (taicpu(p).oper[0]^.ref^.offset = 0) then
+            if (taicpu(p).oper[0]^.ref^.offset = 0) then
               begin
-                hp1:=taicpu.op_reg_reg(A_MOV,taicpu(p).opsize,taicpu(p).oper[0]^.ref^.base,
-                  taicpu(p).oper[1]^.reg);
-                InsertLLItem(p.previous,p.next, hp1);
-                DebugMsg(SPeepholeOptimization + 'Lea2Mov done',hp1);
-                p.free;
-                p:=hp1;
+                if (taicpu(p).oper[0]^.ref^.base <> taicpu(p).oper[1]^.reg) then
+                  begin
+                    hp1:=taicpu.op_reg_reg(A_MOV,taicpu(p).opsize,taicpu(p).oper[0]^.ref^.base,
+                      taicpu(p).oper[1]^.reg);
+                    InsertLLItem(p.previous,p.next, hp1);
+                    DebugMsg(SPeepholeOptimization + 'Lea2Mov done',hp1);
+                    p.free;
+                    p:=hp1;
+                  end
+                else
+                  begin
+                    DebugMsg(SPeepholeOptimization + 'Lea2Nop done',p);
+                    RemoveCurrentP(p);
+                  end;
                 Result:=true;
                 exit;
               end
-            else if (taicpu(p).oper[0]^.ref^.offset = 0) then
+            else if (
+              { continue to use lea to adjust the stack pointer,
+                it is the recommended way, but only if not optimizing for size }
+                (taicpu(p).oper[1]^.reg<>NR_STACK_POINTER_REG) or
+                (cs_opt_size in current_settings.optimizerswitches)
+              ) and
+              ConvertLEA(taicpu(p)) then
               begin
-                DebugMsg(SPeepholeOptimization + 'Lea2Nop done',p);
-                RemoveCurrentP(p);
                 Result:=true;
                 exit;
-              end
-            { continue to use lea to adjust the stack pointer,
-              it is the recommended way, but only if not optimizing for size }
-            else if (taicpu(p).oper[1]^.reg<>NR_STACK_POINTER_REG) or
-              (cs_opt_size in current_settings.optimizerswitches) then
-              with taicpu(p).oper[0]^.ref^ do
-                if (base = taicpu(p).oper[1]^.reg) then
-                  begin
-                    l:=offset;
-                    if (l=1) and UseIncDec then
-                      begin
-                        taicpu(p).opcode:=A_INC;
-                        taicpu(p).loadreg(0,taicpu(p).oper[1]^.reg);
-                        taicpu(p).ops:=1;
-                        DebugMsg(SPeepholeOptimization + 'Lea2Inc done',p);
-                      end
-                    else if (l=-1) and UseIncDec then
-                      begin
-                        taicpu(p).opcode:=A_DEC;
-                        taicpu(p).loadreg(0,taicpu(p).oper[1]^.reg);
-                        taicpu(p).ops:=1;
-                        DebugMsg(SPeepholeOptimization + 'Lea2Dec done',p);
-                      end
-                    else
-                      begin
-                        if (l<0) and (l<>-2147483648) then
-                          begin
-                            taicpu(p).opcode:=A_SUB;
-                            taicpu(p).loadConst(0,-l);
-                            DebugMsg(SPeepholeOptimization + 'Lea2Sub done',p);
-                          end
-                        else
-                          begin
-                            taicpu(p).opcode:=A_ADD;
-                            taicpu(p).loadConst(0,l);
-                            DebugMsg(SPeepholeOptimization + 'Lea2Add done',p);
-                          end;
-                      end;
-                    Result:=true;
-                    exit;
-                  end;
+              end;
           end;
         if GetNextInstruction(p,hp1) and
           MatchInstruction(hp1,A_MOV,[taicpu(p).opsize]) and
@@ -4617,70 +4656,6 @@
 *)
                   end;
               end;
-          end
-        else if (taicpu(p).oper[0]^.typ = top_ref) and
-          (hp1.typ = ait_instruction) and
-          { while the GetNextInstruction(hp1,hp2) call could be factored out,
-            doing it separately in both branches allows to do the cheap checks
-            with low probability earlier }
-          ((IsFoldableArithOp(taicpu(hp1),taicpu(p).oper[1]^.reg) and
-            GetNextInstruction(hp1,hp2) and
-            MatchInstruction(hp2,A_MOV,[])
-           ) or
-           ((taicpu(hp1).opcode=A_LEA) and
-             GetNextInstruction(hp1,hp2) and
-             MatchInstruction(hp2,A_MOV,[]) and
-            ((MatchReference(taicpu(hp1).oper[0]^.ref^,taicpu(p).oper[1]^.reg,NR_INVALID) and
-             (taicpu(hp1).oper[0]^.ref^.index<>taicpu(p).oper[1]^.reg)
-              ) or
-             (MatchReference(taicpu(hp1).oper[0]^.ref^,NR_INVALID,
-              taicpu(p).oper[1]^.reg) and
-             (taicpu(hp1).oper[0]^.ref^.base<>taicpu(p).oper[1]^.reg)) or
-             (MatchReferenceWithOffset(taicpu(hp1).oper[0]^.ref^,taicpu(p).oper[1]^.reg,NR_NO)) or
-             (MatchReferenceWithOffset(taicpu(hp1).oper[0]^.ref^,NR_NO,taicpu(p).oper[1]^.reg))
-            ) and
-            ((MatchOperand(taicpu(p).oper[1]^,taicpu(hp2).oper[0]^)) or not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp1,UsedRegs)))
-           )
-          ) and
-          MatchOperand(taicpu(hp1).oper[taicpu(hp1).ops-1]^,taicpu(hp2).oper[0]^) and
-          (taicpu(hp2).oper[1]^.typ = top_ref) then
-          begin
-            TransferUsedRegs(TmpUsedRegs);
-            UpdateUsedRegs(TmpUsedRegs,tai(p.next));
-            UpdateUsedRegs(TmpUsedRegs,tai(hp1.next));
-            if (RefsEqual(taicpu(hp2).oper[1]^.ref^,taicpu(p).oper[0]^.ref^) and
-              not(RegUsedAfterInstruction(taicpu(hp2).oper[0]^.reg,hp2,TmpUsedRegs))) then
-              { change   mov            (ref), reg
-                         add/sub/or/... reg2/$const, reg
-                         mov            reg, (ref)
-                         # release reg
-                to       add/sub/or/... reg2/$const, (ref)    }
-              begin
-                case taicpu(hp1).opcode of
-                  A_INC,A_DEC,A_NOT,A_NEG :
-                    taicpu(hp1).loadRef(0,taicpu(p).oper[0]^.ref^);
-                  A_LEA :
-                    begin
-                      taicpu(hp1).opcode:=A_ADD;
-                      if (taicpu(hp1).oper[0]^.ref^.index<>taicpu(p).oper[1]^.reg) and (taicpu(hp1).oper[0]^.ref^.index<>NR_NO) then
-                        taicpu(hp1).loadreg(0,taicpu(hp1).oper[0]^.ref^.index)
-                      else if (taicpu(hp1).oper[0]^.ref^.base<>taicpu(p).oper[1]^.reg) and (taicpu(hp1).oper[0]^.ref^.base<>NR_NO) then
-                        taicpu(hp1).loadreg(0,taicpu(hp1).oper[0]^.ref^.base)
-                      else
-                        taicpu(hp1).loadconst(0,taicpu(hp1).oper[0]^.ref^.offset);
-                      taicpu(hp1).loadRef(1,taicpu(p).oper[0]^.ref^);
-                      DebugMsg(SPeepholeOptimization + 'FoldLea done',hp1);
-                    end
-                  else
-                    taicpu(hp1).loadRef(1,taicpu(p).oper[0]^.ref^);
-                end;
-                asml.remove(p);
-                asml.remove(hp2);
-                p.free;
-                hp2.free;
-                p := hp1
-              end;
-            Exit;
 {$ifdef x86_64}
           end
         else if (taicpu(p).opsize = S_L) and
@@ -5382,7 +5357,14 @@
         reg_and_hp1_is_instr:=(taicpu(p).oper[1]^.typ = top_reg) and
           GetNextInstruction(p,hp1) and
           (hp1.typ = ait_instruction);
+
         if reg_and_hp1_is_instr and
+          (
+            (taicpu(hp1).opcode <> A_LEA) or
+            { If the LEA instruction can be converted into an arithmetic instruction,
+              it may be possible to then fold it. }
+            ConvertLEA(taicpu(hp1))
+          ) and
           IsFoldableArithOp(taicpu(hp1),taicpu(p).oper[1]^.reg) and
           GetNextInstruction(hp1,hp2) and
           MatchInstruction(hp2,A_MOV,[]) and
mov-op-lea-improvement.patch (14,983 bytes)   

Florian

2020-07-27 23:01

administrator   ~0124358

Thanks, applied with a small modification: lea is always used to adjust the stack if no size optimization is carried out.

Issue History

Date Modified Username Field Change
2020-07-26 18:22 J. Gareth Moreton New Issue
2020-07-26 18:22 J. Gareth Moreton File Added: mov-op-lea-improvement.patch
2020-07-26 18:24 J. Gareth Moreton Priority normal => low
2020-07-26 18:24 J. Gareth Moreton Description Updated View Revisions
2020-07-26 18:24 J. Gareth Moreton FPCTarget => -
2020-07-26 18:24 J. Gareth Moreton Tag Attached: patch
2020-07-26 18:24 J. Gareth Moreton Tag Attached: compiler
2020-07-26 18:24 J. Gareth Moreton Tag Attached: optimizations
2020-07-26 18:24 J. Gareth Moreton Tag Attached: x86_64
2020-07-26 18:24 J. Gareth Moreton Tag Attached: x86
2020-07-26 18:24 J. Gareth Moreton Tag Attached: i386
2020-07-26 18:24 J. Gareth Moreton Tag Attached: refactor
2020-07-26 20:50 Florian Note Added: 0124345
2020-07-26 20:51 Florian Note Edited: 0124345 View Revisions
2020-07-26 20:54 J. Gareth Moreton Note Added: 0124346
2020-07-26 21:46 Bi0T1N Note Added: 0124348
2020-07-26 21:52 J. Gareth Moreton Note Added: 0124349
2020-07-26 21:56 J. Gareth Moreton Note Edited: 0124349 View Revisions
2020-07-27 00:59 J. Gareth Moreton File Deleted: mov-op-lea-improvement.patch
2020-07-27 01:08 J. Gareth Moreton Note Added: 0124350
2020-07-27 01:08 J. Gareth Moreton File Added: mov-op-lea-improvement.patch
2020-07-27 01:09 J. Gareth Moreton Note Edited: 0124350 View Revisions
2020-07-27 02:47 J. Gareth Moreton File Deleted: mov-op-lea-improvement.patch
2020-07-27 02:56 J. Gareth Moreton Note Added: 0124351
2020-07-27 02:56 J. Gareth Moreton File Added: mov-op-lea-improvement.patch
2020-07-27 23:01 Florian Assigned To => Florian
2020-07-27 23:01 Florian Status new => resolved
2020-07-27 23:01 Florian Resolution open => fixed
2020-07-27 23:01 Florian Fixed in Version => 3.3.1
2020-07-27 23:01 Florian Fixed in Revision => 45865
2020-07-27 23:01 Florian Note Added: 0124358