View Issue Details

IDProjectCategoryView StatusLast Update
0036382FPCCompilerpublic2019-12-06 16:04
ReporterJ. Gareth MoretonAssigned To 
PrioritynormalSeverityminorReproducibilityN/A
Status newResolutionopen 
Platformi386 and x86_64OSMicrosoft WindowsOS Version10 Professional
Product Version3.3.1Product Buildr43611 
Target VersionFixed in Version 
Summary0036382: [Patch] x86 "OptPass1MOV" improvements
Description"OptPass1MOV-Improvement.patch" applies some optimisations to the peephole optimisation function of the same name. Repeated checks to the "GetNextInstruction_p Boolean" variable are factored out, since all of the optimisations bar the first one require a succeeding instruction.

More importantly, the deep 'MovMov2Mov' optimisations have been improved to be slightly faster (by not checking the immediate next instruction, as that's done elsewhere), allowing it to be performed even if the temporary register remains in use (it just preserves the first instruction in this case) and allowing the optimiser to check the 2nd next instruction under -O1 and -O2, since "GetNextInstructionUsingReg" only checks one instruction ahead unless -O3 is specified (which means we can remove the -O3 check that was present in OptPass1MOV). This optimisation was only made possible thanks to the x86-specific implementation of "RegModifiedByInstruction" over at 0036376.

"x86_mov_call.patch" is a subtle performance boost for the compiler: Pass 1 of the peephole optimiser under i386 and x86_64 will now check if the instruction is a MOV before any other (it has been removed from the case block and added to its own if statement). The reason this has been done is because MOV instructions appear disproportionately more frequently than any other instruction. About 25% of all instructions are MOVs of some kind.
Steps To ReproduceApply patches and confirm correct compilation and successful regression testing under i386 and x86_64 platforms.
Additional InformationThe improved optimisations produce smaller and faster code and can sometimes reduce the number of passes required in the peephole optimizer.

Optimizer performance would be greatly improved if register virtualisation is delayed, since the MovMov2Mov optimisation can often remove registers completely, but which have already been reserved earlier when the registers were devirtualised. Also, the optimisation cannot be performed if the stack is used for temporary storage (due to a lack of free registers), something that wouldn't be a problem when the registers are still virtual and assumed to not be modified by another thread. This would be an area of future research.
Tagscompiler, i386, optimizations, patch, x86, x86_64
Fixed in Revision
FPCOldBugId
FPCTarget-
Attached Files
  • x86_mov_call.patch (1,966 bytes)
    Index: compiler/i386/aoptcpu.pas
    ===================================================================
    --- compiler/i386/aoptcpu.pas	(revision 43611)
    +++ compiler/i386/aoptcpu.pas	(working copy)
    @@ -144,6 +144,15 @@
                       Result:=true;
                       exit;
                     end;
    +                
    +              { Subtle speed boost: MOV appears disproportionately more
    +                frequently than any other instruction }
    +              if taicpu(p).opcode = A_MOV then
    +                begin
    +                  Result:=OptPass1MOV(p);
    +                  Exit;
    +                end;
    +                
                   case taicpu(p).opcode Of
                     A_AND:
                       Result:=OptPass1And(p);
    @@ -154,9 +163,7 @@
                     A_FSTP,A_FISTP:
                       Result:=OptPass1FSTP(p);
                     A_LEA:
                       Result:=OptPass1LEA(p);
    -                A_MOV:
    -                  Result:=OptPass1MOV(p);
                     A_MOVSX,
                     A_MOVZX :
                       Result:=OptPass1Movx(p);
    Index: compiler/x86_64/aoptcpu.pas
    ===================================================================
    --- compiler/x86_64/aoptcpu.pas	(revision 43611)
    +++ compiler/x86_64/aoptcpu.pas	(working copy)
    @@ -71,11 +71,17 @@
             case p.typ of
               ait_instruction:
                 begin
    +              { Subtle speed boost: MOV appears disproportionately more
    +                frequently than any other instruction }
    +              if taicpu(p).opcode = A_MOV then
    +                begin
    +                  Result:=OptPass1MOV(p);
    +                  Exit;
    +                end;
    +				
                   case taicpu(p).opcode of
                     A_AND:
                       Result:=OptPass1AND(p);
    -                A_MOV:
    -                  Result:=OptPass1MOV(p);
                     A_MOVSX,
                     A_MOVZX:
                       Result:=OptPass1Movx(p);
    
    x86_mov_call.patch (1,966 bytes)
  • OptPass1MOV-Improvement-v2.patch (12,378 bytes)
    Index: compiler/x86/aoptx86.pas
    ===================================================================
    --- compiler/x86/aoptx86.pas	(revision 43644)
    +++ compiler/x86/aoptx86.pas	(working copy)
    @@ -1489,7 +1489,7 @@
         function TX86AsmOptimizer.OptPass1MOV(var p : tai) : boolean;
           var
             hp1, hp2: tai;
    -        GetNextInstruction_p: Boolean;
    +        GetNextInstruction_p, TempRegUsed: Boolean;
             PreMessage, RegName1, RegName2, InputVal, MaskNum: string;
             NewSize: topsize;
           begin
    @@ -1511,8 +1511,11 @@
                 exit;
               end;
     
    -        if GetNextInstruction_p and
    -          MatchInstruction(hp1,A_AND,[]) and
    +        { All the next optimisations require a next instruction }
    +        if not GetNextInstruction_p or (hp1.typ <> ait_instruction) then
    +          Exit;
    +
    +        if (taicpu(hp1).opcode = A_AND) and
               (taicpu(p).oper[1]^.typ = top_reg) and
               MatchOpType(taicpu(hp1),top_const,top_reg) then
               begin
    @@ -1651,8 +1654,7 @@
                   end;
               end;
             { Next instruction is also a MOV ? }
    -        if GetNextInstruction_p and
    -          MatchInstruction(hp1,A_MOV,[taicpu(p).opsize]) then
    +        if MatchInstruction(hp1,A_MOV,[taicpu(p).opsize]) then
               begin
                 if (taicpu(p).oper[1]^.typ = top_reg) and
                   MatchOperand(taicpu(p).oper[1]^,taicpu(hp1).oper[0]^) then
    @@ -1935,13 +1937,15 @@
                   end;
               end;
             { search further than the next instruction for a mov }
    -        if (cs_opt_level3 in current_settings.optimizerswitches) and
    +        if
               { check as much as possible before the expensive GetNextInstructionUsingReg call }
               (taicpu(p).oper[1]^.typ = top_reg) and
               (taicpu(p).oper[0]^.typ in [top_reg,top_const]) and
    +          not RegModifiedByInstruction(taicpu(p).oper[1]^.reg, hp1) and
               { we work with hp2 here, so hp1 can be still used later on when
                 checking for GetNextInstruction_p }
    -          GetNextInstructionUsingReg(p,hp2,taicpu(p).oper[1]^.reg) and
    +          { GetNextInstructionUsingReg only searches one instruction ahead unless -O3 is specified }
    +          GetNextInstructionUsingReg(hp1,hp2,taicpu(p).oper[1]^.reg) and
               MatchInstruction(hp2,A_MOV,[]) and
               MatchOperand(taicpu(p).oper[1]^,taicpu(hp2).oper[0]^) and
               ((taicpu(p).oper[0]^.typ=top_const) or
    @@ -1951,12 +1955,20 @@
               ) then
               begin
                 TransferUsedRegs(TmpUsedRegs);
    +            { We don't need to call UpdateUsedRegs for every instruction between
    +              p and hp2 because the register we're concerned about will not
    +              become deallocated (otherwise GetNextInstructionUsingReg would
    +              have stopped at an earlier instruction). [Kit] }
    +
                 { we have
                     mov x, %treg
                     mov %treg, y
                 }
    -            if not(RegInOp(taicpu(p).oper[1]^.reg,taicpu(hp2).oper[1]^)) and
    -               not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp2, TmpUsedRegs)) then
    +            TempRegUsed :=
    +              RegReadByInstruction(taicpu(p).oper[1]^.reg, hp1) or
    +              RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp2, TmpUsedRegs);
    +
    +            if not(RegInOp(taicpu(p).oper[1]^.reg,taicpu(hp2).oper[1]^)) then
                   { we've got
     
                     mov x, %treg
    @@ -1974,44 +1986,95 @@
     
                             mov %reg, y
                         }
    -                    AllocRegBetween(taicpu(p).oper[0]^.reg,p,hp2,usedregs);
    -                    taicpu(hp2).loadOper(0,taicpu(p).oper[0]^);
    -                    DebugMsg(SPeepholeOptimization + 'MovMov2Mov 6 done',p);
    -                    { take care of the register (de)allocs following p }
    -                    UpdateUsedRegs(tai(p.next));
    -                    asml.remove(p);
    -                    p.free;
    -                    p:=hp1;
    -                    Result:=true;
    -                    Exit;
    -                  end;
    -                top_const:
    -                  begin
    -                    { change
    -                        mov const, %treg
    -                        mov %treg, y
    +                    RegName1 := debug_regname(taicpu(hp2).oper[0]^.reg);
    +                    if taicpu(hp2).oper[1]^.reg = taicpu(p).oper[0]^.reg then
    +                      begin
    +                        { %reg = y - remove hp2 completely (doing it here instead of relying on the mov %reg,%reg optimisation might cut down on a pass iteration) }
    +                        DebugMsg(SPeepholeOptimization + debug_regname(taicpu(p).oper[0]^.reg) + ' = ' + RegName1 + '; removed unnecessary instruction (MovMov2MovNop 6b}',hp2);
    +                        asml.remove(hp2);
    +                        hp2.Free;
     
    -                        to
    -
    -                        mov const, y
    -                    }
    -                    if (taicpu(hp2).oper[1]^.typ=top_reg) or
    -                      ((taicpu(p).oper[0]^.val>=low(longint)) and (taicpu(p).oper[0]^.val<=high(longint))) then
    +                        if not TempRegUsed then
    +                          begin
    +                            { We can remove the original MOV too }
    +                            DebugMsg(SPeepholeOptimization + 'MovMov2NopNop 6b done',p);
    +                            { take care of the register (de)allocs following p }
    +                            UpdateUsedRegs(tai(p.next));
    +                            asml.remove(p);
    +                            p.free;
    +                            p:=hp1;
    +                            Result:=true;
    +                            Exit;
    +                          end;
    +                      end
    +                    else
                           begin
    +                        AllocRegBetween(taicpu(p).oper[0]^.reg,p,hp2,usedregs);
                             taicpu(hp2).loadOper(0,taicpu(p).oper[0]^);
    -                        DebugMsg(SPeepholeOptimization + 'MovMov2Mov 7 done',p);
    -                        { take care of the register (de)allocs following p }
    -                        UpdateUsedRegs(tai(p.next));
    -                        asml.remove(p);
    -                        p.free;
    -                        p:=hp1;
    -                        Result:=true;
    -                        Exit;
    +
    +                        if TempRegUsed then
    +                          begin
    +                            { Don't remove the first instruction if the temporary register is in use }
    +                            DebugMsg(SPeepholeOptimization + RegName1 + ' = ' + debug_regname(taicpu(p).oper[0]^.reg) + '; changed to minimise pipeline stall (MovMov2Mov 6a}',hp2);
    +
    +                            { No need to set Result to True. If there's another instruction later on
    +                              that can be optimised, it will be detected when the main Pass 1 loop
    +                              reaches what is now hp2 and passes it through OptPass1MOV. [Kit] };
    +                          end
    +                        else
    +                          begin
    +                            DebugMsg(SPeepholeOptimization + 'MovMov2Mov 6 done',p);
    +                            { take care of the register (de)allocs following p }
    +                            UpdateUsedRegs(tai(p.next));
    +                            asml.remove(p);
    +                            p.free;
    +                            p:=hp1;
    +                            Result:=true;
    +                            Exit;
    +                          end;
                           end;
                       end;
    -                else
    -                  Internalerror(2019103001);
    -              end;
    +                top_const:
    +                  if not (cs_opt_size in current_settings.optimizerswitches) or (taicpu(hp2).opsize = S_B) then
    +                    begin
    +                      { change
    +                          mov const, %treg
    +                          mov %treg, y
    +
    +                          to
    +
    +                          mov const, y
    +                      }
    +                      if (taicpu(hp2).oper[1]^.typ=top_reg) or
    +                        ((taicpu(p).oper[0]^.val>=low(longint)) and (taicpu(p).oper[0]^.val<=high(longint))) then
    +                        begin
    +                          RegName1 := debug_regname(taicpu(hp2).oper[0]^.reg);
    +                          taicpu(hp2).loadOper(0,taicpu(p).oper[0]^);
    +                          if TempRegUsed then
    +                            begin
    +                              { Don't remove the first instruction if the temporary register is in use }
    +                              DebugMsg(SPeepholeOptimization + RegName1 + ' = ' + debug_tostr(taicpu(p).oper[0]^.val) + '; changed to minimise pipeline stall (MovMov2Mov 7a)',hp2);
    +
    +                              { No need to set Result to True. If there's another instruction later on
    +                                that can be optimised, it will be detected when the main Pass 1 loop
    +                                reaches what is now hp2 and passes it through OptPass1MOV. [Kit] };
    +                            end
    +                          else
    +                            begin
    +                              DebugMsg(SPeepholeOptimization + 'MovMov2Mov 7 done',p);
    +                              { take care of the register (de)allocs following p }
    +                              UpdateUsedRegs(tai(p.next));
    +                              asml.remove(p);
    +                              p.free;
    +                              p:=hp1;
    +                              Result:=true;
    +                              Exit;
    +                            end;
    +                        end;
    +                    end;
    +                  else
    +                    Internalerror(2019103001);
    +                end;
               end;
             { Change
                mov %reg1, %reg2
    @@ -2024,8 +2087,7 @@
     
                to avoid a write/read penalty
             }
    -        if GetNextInstruction_p and
    -           MatchOpType(taicpu(p),top_reg,top_reg) and
    +        if MatchOpType(taicpu(p),top_reg,top_reg) and
                ((MatchInstruction(hp1,A_OR,A_AND,A_TEST,[]) and
                 MatchOperand(taicpu(p).oper[1]^,taicpu(hp1).oper[0]^) and
                 (taicpu(hp1).oper[1]^.typ = top_reg) and
    @@ -2095,8 +2157,7 @@
               parameter or to the temporary storage room for the function
               result)
             }
    -        if GetNextInstruction_p and
    -          IsExitCode(hp1) and
    +        if IsExitCode(hp1) and
               MatchOpType(taicpu(p),top_reg,top_ref) and
               (taicpu(p).oper[1]^.ref^.base = current_procinfo.FramePointer) and
               not(assigned(current_procinfo.procdef.funcretsym) and
    @@ -2111,8 +2172,7 @@
                 Result:=true;
                 exit;
               end;
    -        if GetNextInstruction_p and
    -          MatchOpType(taicpu(p),top_reg,top_ref) and
    +        if MatchOpType(taicpu(p),top_reg,top_ref) and
               MatchInstruction(hp1,A_CMP,A_TEST,[taicpu(p).opsize]) and
               (taicpu(hp1).oper[1]^.typ = top_ref) and
                RefsEqual(taicpu(p).oper[1]^.ref^, taicpu(hp1).oper[1]^.ref^) then
    @@ -2131,8 +2191,7 @@
                 AllocRegBetween(taicpu(p).oper[0]^.reg,p,hp1,usedregs);
                 exit;
               end;
    -        if GetNextInstruction_p and
    -          (taicpu(p).oper[1]^.typ = top_reg) and
    +        if (taicpu(p).oper[1]^.typ = top_reg) and
               (hp1.typ = ait_instruction) and
               GetNextInstruction(hp1, hp2) and
               MatchInstruction(hp2,A_MOV,[]) and
    @@ -2289,8 +2348,7 @@
                       end;
                   end;
               end;
    -        if GetNextInstruction_p and
    -          MatchInstruction(hp1,A_BTS,A_BTR,[Taicpu(p).opsize]) and
    +        if MatchInstruction(hp1,A_BTS,A_BTR,[Taicpu(p).opsize]) and
               GetNextInstruction(hp1, hp2) and
               MatchInstruction(hp2,A_OR,[Taicpu(p).opsize]) and
               MatchOperand(Taicpu(p).oper[0]^,0) and
    @@ -2311,8 +2369,7 @@
                 exit;
               end;
     
    -        if GetNextInstruction_p and
    -           MatchInstruction(hp1,A_LEA,[S_L]) and
    +        if MatchInstruction(hp1,A_LEA,[S_L]) and
                MatchOpType(Taicpu(p),top_ref,top_reg) and
                ((MatchReference(Taicpu(hp1).oper[0]^.ref^,Taicpu(hp1).oper[1]^.reg,Taicpu(p).oper[1]^.reg) and
                  (Taicpu(hp1).oper[0]^.ref^.base<>Taicpu(p).oper[1]^.reg)
    

Relationships

child of 0036376 resolvedFlorian [Patch] x86 implementation of RegModifiedByInstruction 

Activities

J. Gareth Moreton

2019-11-30 00:07

developer  

x86_mov_call.patch (1,966 bytes)
Index: compiler/i386/aoptcpu.pas
===================================================================
--- compiler/i386/aoptcpu.pas	(revision 43611)
+++ compiler/i386/aoptcpu.pas	(working copy)
@@ -144,6 +144,15 @@
                   Result:=true;
                   exit;
                 end;
+                
+              { Subtle speed boost: MOV appears disproportionately more
+                frequently than any other instruction }
+              if taicpu(p).opcode = A_MOV then
+                begin
+                  Result:=OptPass1MOV(p);
+                  Exit;
+                end;
+                
               case taicpu(p).opcode Of
                 A_AND:
                   Result:=OptPass1And(p);
@@ -154,9 +163,7 @@
                 A_FSTP,A_FISTP:
                   Result:=OptPass1FSTP(p);
                 A_LEA:
                   Result:=OptPass1LEA(p);
-                A_MOV:
-                  Result:=OptPass1MOV(p);
                 A_MOVSX,
                 A_MOVZX :
                   Result:=OptPass1Movx(p);
Index: compiler/x86_64/aoptcpu.pas
===================================================================
--- compiler/x86_64/aoptcpu.pas	(revision 43611)
+++ compiler/x86_64/aoptcpu.pas	(working copy)
@@ -71,11 +71,17 @@
         case p.typ of
           ait_instruction:
             begin
+              { Subtle speed boost: MOV appears disproportionately more
+                frequently than any other instruction }
+              if taicpu(p).opcode = A_MOV then
+                begin
+                  Result:=OptPass1MOV(p);
+                  Exit;
+                end;
+				
               case taicpu(p).opcode of
                 A_AND:
                   Result:=OptPass1AND(p);
-                A_MOV:
-                  Result:=OptPass1MOV(p);
                 A_MOVSX,
                 A_MOVZX:
                   Result:=OptPass1Movx(p);
x86_mov_call.patch (1,966 bytes)

Florian

2019-12-01 22:23

administrator   ~0119570

I tried it, but on i386-linux it freezes the test suite runs. Maybe it just triggers another bug but at least it does not seem to work right now. Does the full testsuite work for you on i386?

J. Gareth Moreton

2019-12-01 22:25

developer   ~0119571

No worries, thanks. I'll do some investigating. Can't get it perfect every time. Any particular test that it freezes on?

Florian

2019-12-01 22:40

administrator   ~0119572

No :( that's the weired part: the dotest program just stalls.

J. Gareth Moreton

2019-12-01 22:43

developer   ~0119573

That is definitely very weird. I'll let you know what I find and submit a fixed patch if needs be. I'll re-run all of my test suites overnight for Windows, and if that doesn't yield anything, I'll try to get things working with Linux. I'm trying to think what could possibly cause an infinite loop, unless I made a mistake in RegModifiedByInstruction.

Florian

2019-12-01 23:19

administrator   ~0119575

in r43623, I committed a modified RegModifiedByInstruction as the old implementation had some problems. But it does not fix the stall.

J. Gareth Moreton

2019-12-01 23:33

developer   ~0119576

What was the problem, curiously? I would check the diff, but my tests are running and I didn't pull today's changes first.

J. Gareth Moreton

2019-12-02 11:57

developer   ~0119577

Okay, on i386-win32 and x86_64-win64, there are no regressions in the test suite (logs are binary equal), so it's specific to Linux.

J. Gareth Moreton

2019-12-04 10:58

developer   ~0119608

Last edited: 2019-12-04 10:58

View 2 revisions

Hi Florian - can you confirm if the hang still happens? I built yesterday's trunk with the patch applied and the tests started running without a hitch (unless I messed up the script - "sudo make full TEST_FPC=/usr/local/lib/fpc/3.3.1/ppc386")

J. Gareth Moreton

2019-12-06 03:55

developer   ~0119650

I made a minor update to how the new optimisation checks for register usage. It should ultimately be a cheaper check while still achieving the same result. I am not sure if it fixes the crash on i386-linux (which I have yet to reproduce).

OptPass1MOV-Improvement-v2.patch (12,378 bytes)
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 43644)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -1489,7 +1489,7 @@
     function TX86AsmOptimizer.OptPass1MOV(var p : tai) : boolean;
       var
         hp1, hp2: tai;
-        GetNextInstruction_p: Boolean;
+        GetNextInstruction_p, TempRegUsed: Boolean;
         PreMessage, RegName1, RegName2, InputVal, MaskNum: string;
         NewSize: topsize;
       begin
@@ -1511,8 +1511,11 @@
             exit;
           end;
 
-        if GetNextInstruction_p and
-          MatchInstruction(hp1,A_AND,[]) and
+        { All the next optimisations require a next instruction }
+        if not GetNextInstruction_p or (hp1.typ <> ait_instruction) then
+          Exit;
+
+        if (taicpu(hp1).opcode = A_AND) and
           (taicpu(p).oper[1]^.typ = top_reg) and
           MatchOpType(taicpu(hp1),top_const,top_reg) then
           begin
@@ -1651,8 +1654,7 @@
               end;
           end;
         { Next instruction is also a MOV ? }
-        if GetNextInstruction_p and
-          MatchInstruction(hp1,A_MOV,[taicpu(p).opsize]) then
+        if MatchInstruction(hp1,A_MOV,[taicpu(p).opsize]) then
           begin
             if (taicpu(p).oper[1]^.typ = top_reg) and
               MatchOperand(taicpu(p).oper[1]^,taicpu(hp1).oper[0]^) then
@@ -1935,13 +1937,15 @@
               end;
           end;
         { search further than the next instruction for a mov }
-        if (cs_opt_level3 in current_settings.optimizerswitches) and
+        if
           { check as much as possible before the expensive GetNextInstructionUsingReg call }
           (taicpu(p).oper[1]^.typ = top_reg) and
           (taicpu(p).oper[0]^.typ in [top_reg,top_const]) and
+          not RegModifiedByInstruction(taicpu(p).oper[1]^.reg, hp1) and
           { we work with hp2 here, so hp1 can be still used later on when
             checking for GetNextInstruction_p }
-          GetNextInstructionUsingReg(p,hp2,taicpu(p).oper[1]^.reg) and
+          { GetNextInstructionUsingReg only searches one instruction ahead unless -O3 is specified }
+          GetNextInstructionUsingReg(hp1,hp2,taicpu(p).oper[1]^.reg) and
           MatchInstruction(hp2,A_MOV,[]) and
           MatchOperand(taicpu(p).oper[1]^,taicpu(hp2).oper[0]^) and
           ((taicpu(p).oper[0]^.typ=top_const) or
@@ -1951,12 +1955,20 @@
           ) then
           begin
             TransferUsedRegs(TmpUsedRegs);
+            { We don't need to call UpdateUsedRegs for every instruction between
+              p and hp2 because the register we're concerned about will not
+              become deallocated (otherwise GetNextInstructionUsingReg would
+              have stopped at an earlier instruction). [Kit] }
+
             { we have
                 mov x, %treg
                 mov %treg, y
             }
-            if not(RegInOp(taicpu(p).oper[1]^.reg,taicpu(hp2).oper[1]^)) and
-               not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp2, TmpUsedRegs)) then
+            TempRegUsed :=
+              RegReadByInstruction(taicpu(p).oper[1]^.reg, hp1) or
+              RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp2, TmpUsedRegs);
+
+            if not(RegInOp(taicpu(p).oper[1]^.reg,taicpu(hp2).oper[1]^)) then
               { we've got
 
                 mov x, %treg
@@ -1974,44 +1986,95 @@
 
                         mov %reg, y
                     }
-                    AllocRegBetween(taicpu(p).oper[0]^.reg,p,hp2,usedregs);
-                    taicpu(hp2).loadOper(0,taicpu(p).oper[0]^);
-                    DebugMsg(SPeepholeOptimization + 'MovMov2Mov 6 done',p);
-                    { take care of the register (de)allocs following p }
-                    UpdateUsedRegs(tai(p.next));
-                    asml.remove(p);
-                    p.free;
-                    p:=hp1;
-                    Result:=true;
-                    Exit;
-                  end;
-                top_const:
-                  begin
-                    { change
-                        mov const, %treg
-                        mov %treg, y
+                    RegName1 := debug_regname(taicpu(hp2).oper[0]^.reg);
+                    if taicpu(hp2).oper[1]^.reg = taicpu(p).oper[0]^.reg then
+                      begin
+                        { %reg = y - remove hp2 completely (doing it here instead of relying on the mov %reg,%reg optimisation might cut down on a pass iteration) }
+                        DebugMsg(SPeepholeOptimization + debug_regname(taicpu(p).oper[0]^.reg) + ' = ' + RegName1 + '; removed unnecessary instruction (MovMov2MovNop 6b}',hp2);
+                        asml.remove(hp2);
+                        hp2.Free;
 
-                        to
-
-                        mov const, y
-                    }
-                    if (taicpu(hp2).oper[1]^.typ=top_reg) or
-                      ((taicpu(p).oper[0]^.val>=low(longint)) and (taicpu(p).oper[0]^.val<=high(longint))) then
+                        if not TempRegUsed then
+                          begin
+                            { We can remove the original MOV too }
+                            DebugMsg(SPeepholeOptimization + 'MovMov2NopNop 6b done',p);
+                            { take care of the register (de)allocs following p }
+                            UpdateUsedRegs(tai(p.next));
+                            asml.remove(p);
+                            p.free;
+                            p:=hp1;
+                            Result:=true;
+                            Exit;
+                          end;
+                      end
+                    else
                       begin
+                        AllocRegBetween(taicpu(p).oper[0]^.reg,p,hp2,usedregs);
                         taicpu(hp2).loadOper(0,taicpu(p).oper[0]^);
-                        DebugMsg(SPeepholeOptimization + 'MovMov2Mov 7 done',p);
-                        { take care of the register (de)allocs following p }
-                        UpdateUsedRegs(tai(p.next));
-                        asml.remove(p);
-                        p.free;
-                        p:=hp1;
-                        Result:=true;
-                        Exit;
+
+                        if TempRegUsed then
+                          begin
+                            { Don't remove the first instruction if the temporary register is in use }
+                            DebugMsg(SPeepholeOptimization + RegName1 + ' = ' + debug_regname(taicpu(p).oper[0]^.reg) + '; changed to minimise pipeline stall (MovMov2Mov 6a}',hp2);
+
+                            { No need to set Result to True. If there's another instruction later on
+                              that can be optimised, it will be detected when the main Pass 1 loop
+                              reaches what is now hp2 and passes it through OptPass1MOV. [Kit] };
+                          end
+                        else
+                          begin
+                            DebugMsg(SPeepholeOptimization + 'MovMov2Mov 6 done',p);
+                            { take care of the register (de)allocs following p }
+                            UpdateUsedRegs(tai(p.next));
+                            asml.remove(p);
+                            p.free;
+                            p:=hp1;
+                            Result:=true;
+                            Exit;
+                          end;
                       end;
                   end;
-                else
-                  Internalerror(2019103001);
-              end;
+                top_const:
+                  if not (cs_opt_size in current_settings.optimizerswitches) or (taicpu(hp2).opsize = S_B) then
+                    begin
+                      { change
+                          mov const, %treg
+                          mov %treg, y
+
+                          to
+
+                          mov const, y
+                      }
+                      if (taicpu(hp2).oper[1]^.typ=top_reg) or
+                        ((taicpu(p).oper[0]^.val>=low(longint)) and (taicpu(p).oper[0]^.val<=high(longint))) then
+                        begin
+                          RegName1 := debug_regname(taicpu(hp2).oper[0]^.reg);
+                          taicpu(hp2).loadOper(0,taicpu(p).oper[0]^);
+                          if TempRegUsed then
+                            begin
+                              { Don't remove the first instruction if the temporary register is in use }
+                              DebugMsg(SPeepholeOptimization + RegName1 + ' = ' + debug_tostr(taicpu(p).oper[0]^.val) + '; changed to minimise pipeline stall (MovMov2Mov 7a)',hp2);
+
+                              { No need to set Result to True. If there's another instruction later on
+                                that can be optimised, it will be detected when the main Pass 1 loop
+                                reaches what is now hp2 and passes it through OptPass1MOV. [Kit] };
+                            end
+                          else
+                            begin
+                              DebugMsg(SPeepholeOptimization + 'MovMov2Mov 7 done',p);
+                              { take care of the register (de)allocs following p }
+                              UpdateUsedRegs(tai(p.next));
+                              asml.remove(p);
+                              p.free;
+                              p:=hp1;
+                              Result:=true;
+                              Exit;
+                            end;
+                        end;
+                    end;
+                  else
+                    Internalerror(2019103001);
+                end;
           end;
         { Change
            mov %reg1, %reg2
@@ -2024,8 +2087,7 @@
 
            to avoid a write/read penalty
         }
-        if GetNextInstruction_p and
-           MatchOpType(taicpu(p),top_reg,top_reg) and
+        if MatchOpType(taicpu(p),top_reg,top_reg) and
            ((MatchInstruction(hp1,A_OR,A_AND,A_TEST,[]) and
             MatchOperand(taicpu(p).oper[1]^,taicpu(hp1).oper[0]^) and
             (taicpu(hp1).oper[1]^.typ = top_reg) and
@@ -2095,8 +2157,7 @@
           parameter or to the temporary storage room for the function
           result)
         }
-        if GetNextInstruction_p and
-          IsExitCode(hp1) and
+        if IsExitCode(hp1) and
           MatchOpType(taicpu(p),top_reg,top_ref) and
           (taicpu(p).oper[1]^.ref^.base = current_procinfo.FramePointer) and
           not(assigned(current_procinfo.procdef.funcretsym) and
@@ -2111,8 +2172,7 @@
             Result:=true;
             exit;
           end;
-        if GetNextInstruction_p and
-          MatchOpType(taicpu(p),top_reg,top_ref) and
+        if MatchOpType(taicpu(p),top_reg,top_ref) and
           MatchInstruction(hp1,A_CMP,A_TEST,[taicpu(p).opsize]) and
           (taicpu(hp1).oper[1]^.typ = top_ref) and
            RefsEqual(taicpu(p).oper[1]^.ref^, taicpu(hp1).oper[1]^.ref^) then
@@ -2131,8 +2191,7 @@
             AllocRegBetween(taicpu(p).oper[0]^.reg,p,hp1,usedregs);
             exit;
           end;
-        if GetNextInstruction_p and
-          (taicpu(p).oper[1]^.typ = top_reg) and
+        if (taicpu(p).oper[1]^.typ = top_reg) and
           (hp1.typ = ait_instruction) and
           GetNextInstruction(hp1, hp2) and
           MatchInstruction(hp2,A_MOV,[]) and
@@ -2289,8 +2348,7 @@
                   end;
               end;
           end;
-        if GetNextInstruction_p and
-          MatchInstruction(hp1,A_BTS,A_BTR,[Taicpu(p).opsize]) and
+        if MatchInstruction(hp1,A_BTS,A_BTR,[Taicpu(p).opsize]) and
           GetNextInstruction(hp1, hp2) and
           MatchInstruction(hp2,A_OR,[Taicpu(p).opsize]) and
           MatchOperand(Taicpu(p).oper[0]^,0) and
@@ -2311,8 +2369,7 @@
             exit;
           end;
 
-        if GetNextInstruction_p and
-           MatchInstruction(hp1,A_LEA,[S_L]) and
+        if MatchInstruction(hp1,A_LEA,[S_L]) and
            MatchOpType(Taicpu(p),top_ref,top_reg) and
            ((MatchReference(Taicpu(hp1).oper[0]^.ref^,Taicpu(hp1).oper[1]^.reg,Taicpu(p).oper[1]^.reg) and
              (Taicpu(hp1).oper[0]^.ref^.base<>Taicpu(p).oper[1]^.reg)

J. Gareth Moreton

2019-12-06 04:49

developer   ~0119651

Some interesting insight here... I tried the idea of an optimisation where if you write to a register, and the register is immediately deallocated afterwards, then you can remove the instruction because it is effectively non-functional. That's the theory... in practice, it causes an unspecified runtime error while running ppc2.

In other words, "if (taicpu(p).oper[1]^.typ = top_reg) and not RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, p, UsedRegs) then" doesn't work.

My guess is that the registers aren't as cleanly tracked as they should be.

J. Gareth Moreton

2019-12-06 16:04

developer   ~0119663

Ignore that last message... turns out that RegUsedAfterInstruction always returns False if the instruction loads a new value to the register, and for MOV instructions to the function result, the register is often deallocated right before the instruction. Makes things a little bit complicated in that regard, but there might be ways to work around it, or otherwise adjust the positions of where registers are allocated and deallocated for more accurate optimisation, since there are situations where registers are written to needlessly.

Issue History

Date Modified Username Field Change
2019-11-30 00:07 J. Gareth Moreton New Issue
2019-11-30 00:07 J. Gareth Moreton File Added: OptPass1MOV-Improvement.patch
2019-11-30 00:07 J. Gareth Moreton File Added: x86_mov_call.patch
2019-11-30 00:07 J. Gareth Moreton Relationship added child of 0036376
2019-11-30 00:08 J. Gareth Moreton Tag Attached: patch
2019-11-30 00:08 J. Gareth Moreton Tag Attached: compiler
2019-11-30 00:08 J. Gareth Moreton Tag Attached: optimizations
2019-11-30 00:08 J. Gareth Moreton Tag Attached: x86_64
2019-11-30 00:08 J. Gareth Moreton Tag Attached: x86
2019-11-30 00:08 J. Gareth Moreton Tag Attached: i386
2019-11-30 00:11 J. Gareth Moreton Additional Information Updated View Revisions
2019-11-30 00:11 J. Gareth Moreton FPCTarget => -
2019-11-30 00:12 J. Gareth Moreton Additional Information Updated View Revisions
2019-11-30 00:12 J. Gareth Moreton File Deleted: OptPass1MOV-Improvement.patch
2019-11-30 00:13 J. Gareth Moreton File Added: OptPass1MOV-Improvement.patch
2019-12-01 22:23 Florian Note Added: 0119570
2019-12-01 22:25 J. Gareth Moreton Note Added: 0119571
2019-12-01 22:40 Florian Note Added: 0119572
2019-12-01 22:43 J. Gareth Moreton Note Added: 0119573
2019-12-01 23:19 Florian Note Added: 0119575
2019-12-01 23:33 J. Gareth Moreton Note Added: 0119576
2019-12-02 11:57 J. Gareth Moreton Note Added: 0119577
2019-12-04 10:58 J. Gareth Moreton Note Added: 0119608
2019-12-04 10:58 J. Gareth Moreton Note Edited: 0119608 View Revisions
2019-12-06 03:55 J. Gareth Moreton File Added: OptPass1MOV-Improvement-v2.patch
2019-12-06 03:55 J. Gareth Moreton Note Added: 0119650
2019-12-06 04:49 J. Gareth Moreton Note Added: 0119651
2019-12-06 04:50 J. Gareth Moreton File Deleted: OptPass1MOV-Improvement.patch
2019-12-06 16:04 J. Gareth Moreton Note Added: 0119663