View Issue Details

IDProjectCategoryView StatusLast Update
0034679FPCCompilerpublic2019-01-20 18:54
ReporterJ. Gareth MoretonAssigned ToFlorian 
PrioritylowSeveritytweakReproducibilityN/A
Status resolvedResolutionfixed 
Platformx86_64-win64OSMicrosoft WindowsOS Version10 Professional
Product Version3.3.1Product Buildx86_64-win64 
Target Version3.3.1Fixed in Version3.3.1 
Summary0034679: [Patch / Refactor] TmpUsedRegs object pooling and optimisation
DescriptionThis patch serves to make a small speed saving and reduce maintenance costs by using a paradigm known as 'object pooling' in the peephole optimizer on x86 systems. All references to TmpUsedRegs are changed to point to a common object that is created with the instance of the optimizer (which runs in a single thread). This serves to reduce time wasted on constantly creating and destroying a number of instances of TUsedRegs whenever the peephole optimizer wishes to look ahead with register usage.

Some new base functions are introduced, namely two versions of TransferUsedRegs that copy the used registers from one TAllUsedRegs collection to another, but without creating a new object (it calls the constructor as a regular method in order to update the internal state - no new object is instantiated - comments explain what's going on). One version copies the internal states of all register types, while the second copies only one, specified by a parameter - this version is also inlined (as is GetUsedRegs) because it collapses into a single line of code. For example, "OptPass1MOV" frequently calls:

TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);

If there's concern that this increases the maintenance factor and hence the risk of bugs, or an optimisation uses more than one type of register, then the first parameter can be removed - for example:

TransferUsedRegs(TmpUsedRegs);

...and the code will still compile successfully with no change in function, although it will run slightly slower.
Steps To ReproduceApply patch and confirm correct compilation.
Additional InformationAs a side-effect, a couple of memory leaks were fixed with this patch, because "OptPass1VMOVAP" and "OptPass1VOP" failed to call ReleaseUsedRegs after calling CopyUsedRegs (which creates instances of TUsedRegs). Thus maintenance costs are reduced because there is now no need to remember to call ReleaseUsedRegs every time TmpUsedRegs is used; the pooled instances are freed when the optimizer's destructor is called.

Time savings are marginal at best - no more than a few seconds with very large projects.

On win64, the size of the compiler increases by only 512 bytes, largely due to the inlined methods. It isn't larger because the version of TransferUsedRegs that omits the register type is never used. There is, however, potential for further savings in both speed and size with future peephole optimisations, because, for example, the following pair of instructions frequently appear:

movq %rbx,%rax
movq (%rax),%rax

...which could be collapsed into just:

movq (%rbx),%rax
Tags64-bit, compiler, optimization, patch, refactoring, x86, x86_64-win64
Fixed in Revision
FPCOldBugId0
FPCTarget
Attached Files
  • tmpusedregs-obj-pool.patch (18,992 bytes)
    Index: compiler/aoptobj.pas
    ===================================================================
    --- compiler/aoptobj.pas	(revision 40520)
    +++ compiler/aoptobj.pas	(working copy)
    @@ -270,6 +270,9 @@
             Procedure UpdateUsedRegs(p : Tai);
             class procedure UpdateUsedRegs(var Regs: TAllUsedRegs; p: Tai);
             Function CopyUsedRegs(var dest : TAllUsedRegs) : boolean;
    +        procedure RestoreUsedRegs(const Regs : TAllUsedRegs);
    +        procedure TransferUsedRegs(var dest: TAllUsedRegs);
    +        procedure TransferUsedRegs(typ: TRegisterType; var dest: TAllUsedRegs);
             class Procedure ReleaseUsedRegs(const regs : TAllUsedRegs);
             class Function RegInUsedRegs(reg : TRegister;regs : TAllUsedRegs) : boolean;
             class Procedure IncludeRegInUsedRegs(reg : TRegister;var regs : TAllUsedRegs);
    @@ -457,7 +460,7 @@
           End;
     
     
    -    Function TUsedRegs.GetUsedRegs: TRegSet;
    +    Function TUsedRegs.GetUsedRegs: TRegSet; inline;
           Begin
             GetUsedRegs := UsedRegs;
           End;
    @@ -945,6 +948,39 @@
           end;
     
     
    +      procedure TAOptObj.RestoreUsedRegs(const Regs: TAllUsedRegs);
    +      var
    +        i : TRegisterType;
    +      begin
    +        { Note that the constructor Create_Regset is being called as a regular
    +          method - it is not instantiating a new object.  This is because it is
    +          the only published means to modify the internal state en-masse. [Kit] }
    +        for i:=low(TRegisterType) to high(TRegisterType) do
    +          UsedRegs[i].Create_Regset(i,Regs[i].GetUsedRegs);
    +      end;
    +
    +
    +      procedure TAOptObj.TransferUsedRegs(var dest: TAllUsedRegs);
    +      var
    +        i : TRegisterType;
    +      begin
    +        { Note that the constructor Create_Regset is being called as a regular
    +          method - it is not instantiating a new object.  This is because it is
    +          the only published means to modify the internal state en-masse. [Kit] }
    +        for i:=low(TRegisterType) to high(TRegisterType) do
    +          dest[i].Create_Regset(i, UsedRegs[i].GetUsedRegs);
    +      end;
    +
    +
    +      procedure TAOptObj.TransferUsedRegs(typ: TRegisterType; var dest: TAllUsedRegs); inline;
    +      begin
    +        { Note that the constructor Create_Regset is being called as a regular
    +          method - it is not instantiating a new object.  This is because it is
    +          the only published means to modify the internal state en-masse. [Kit] }
    +        dest[typ].Create_Regset(typ, UsedRegs[typ].GetUsedRegs);
    +      end;
    +
    +
           class procedure TAOptObj.ReleaseUsedRegs(const regs: TAllUsedRegs);
             var
               i : TRegisterType;
    Index: compiler/x86/aoptx86.pas
    ===================================================================
    --- compiler/x86/aoptx86.pas	(revision 40520)
    +++ compiler/x86/aoptx86.pas	(working copy)
    @@ -30,15 +30,24 @@
         uses
           globtype,
           cpubase,
    -      aasmtai,aasmcpu,
    +      aasmtai,aasmcpu,aasmdata,
           cgbase,cgutils,
           aopt,aoptobj;
     
         type
    +
    +      { TX86AsmOptimizer }
    +
           TX86AsmOptimizer = class(TAsmOptimizer)
    +        constructor Create(_AsmL: TAsmList); override;
    +        destructor Destroy; override;
             function RegLoadedWithNewValue(reg : tregister; hp : tai) : boolean; override;
             function InstructionLoadsFromReg(const reg : TRegister; const hp : tai) : boolean; override;
             function RegReadByInstruction(reg : TRegister; hp : tai) : boolean;
    +      public
    +        { Pooled object that can be used by optimisation procedures to evaluate
    +          future register usage without upsetting the current state. }
    +        TmpUsedRegs: TAllUsedRegs;
           protected
             { checks whether loading a new value in reg1 overwrites the entirety of reg2 }
             function Reg1WriteOverwritesReg2Entirely(reg1, reg2: tregister): boolean;
    @@ -584,6 +593,20 @@
     {$endif DEBUG_AOPTCPU}
     
     
    +    constructor TX86AsmOptimizer.Create(_AsmL: TAsmList);
    +      begin
    +        inherited Create(_AsmL);
    +        CreateUsedRegs(TmpUsedRegs);
    +      end;
    +
    +
    +    destructor TX86AsmOptimizer.Destroy;
    +      begin
    +        ReleaseUsedRegs(TmpUsedRegs);
    +        inherited Destroy;
    +      end;
    +
    +
         function TX86AsmOptimizer.Reg1WriteOverwritesReg2Entirely(reg1, reg2: tregister): boolean;
           begin
             if not SuperRegistersEqual(reg1,reg2) then
    @@ -1032,7 +1055,6 @@
     
         function TX86AsmOptimizer.OptPass1MOVAP(var p : tai) : boolean;
           var
    -        TmpUsedRegs : TAllUsedRegs;
             hp1,hp2 : tai;
           begin
             result:=false;
    @@ -1059,7 +1081,7 @@
                          addsX/subsX/... reg3,reg
               }
               begin
    -            CopyUsedRegs(TmpUsedRegs);
    +            TransferUsedRegs(R_MMREGISTER, TmpUsedRegs);
                 UpdateUsedRegs(TmpUsedRegs, tai(p.next));
                 UpdateUsedRegs(TmpUsedRegs, tai(hp1.next));
                 If not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp2,TmpUsedRegs)) then
    @@ -1081,7 +1103,6 @@
                     p:=hp1;
                     result:=true;
                   end;
    -            ReleaseUsedRegs(TmpUsedRegs);
               end
             end;
     
    @@ -1088,7 +1109,6 @@
     
         function TX86AsmOptimizer.OptPass1VMOVAP(var p : tai) : boolean;
           var
    -        TmpUsedRegs : TAllUsedRegs;
             hp1,hp2 : tai;
           begin
             result:=false;
    @@ -1116,7 +1136,7 @@
                           dealloc reg2
                           =>
                           vmova* reg1,reg3 }
    -                    CopyUsedRegs(TmpUsedRegs);
    +                    TransferUsedRegs(R_MMREGISTER, TmpUsedRegs);
                         UpdateUsedRegs(TmpUsedRegs, tai(p.next));
                         if not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp1,TmpUsedRegs)) then
                           begin
    @@ -1145,7 +1165,7 @@
                       MatchInstruction(hp2,A_VMOVAPD,A_VMOVAPS,[S_NO]) and
                       MatchOperand(taicpu(p).oper[0]^,taicpu(hp2).oper[1]^) then
                       begin
    -                    CopyUsedRegs(TmpUsedRegs);
    +                    TransferUsedRegs(R_MMREGISTER, TmpUsedRegs);
                         UpdateUsedRegs(TmpUsedRegs, tai(p.next));
                         UpdateUsedRegs(TmpUsedRegs, tai(hp1.next));
                         if not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp2,TmpUsedRegs))
    @@ -1166,7 +1186,6 @@
     
         function TX86AsmOptimizer.OptPass1VOP(var p : tai) : boolean;
           var
    -        TmpUsedRegs : TAllUsedRegs;
             hp1 : tai;
           begin
             result:=false;
    @@ -1186,7 +1205,7 @@
               MatchOperand(taicpu(p).oper[2]^,taicpu(hp1).oper[0]^) and
               (taicpu(hp1).oper[1]^.typ=top_reg) then
               begin
    -            CopyUsedRegs(TmpUsedRegs);
    +            TransferUsedRegs(R_MMREGISTER, TmpUsedRegs);
                 UpdateUsedRegs(TmpUsedRegs, tai(p.next));
                 if not(RegUsedAfterInstruction(taicpu(hp1).oper[0]^.reg,hp1,TmpUsedRegs)
                  ) then
    @@ -1204,7 +1223,6 @@
         function TX86AsmOptimizer.OptPass1MOV(var p : tai) : boolean;
           var
             hp1, hp2: tai;
    -        TmpUsedRegs : TAllUsedRegs;
             GetNextInstruction_p: Boolean;
             PreMessage, RegName1, RegName2, InputVal, MaskNum: string;
             NewSize: topsize;
    @@ -1337,10 +1355,10 @@
                         taicpu(p).oper[1]^ := taicpu(hp1).oper[1]^;
     
                         { Safeguard if "and" is followed by a conditional command }
    -                    CopyUsedRegs(TmpUsedRegs);
    -                    UpdateUsedRegs(TmpUsedRegs,tai(hp1.next));
    +                    TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
    +                    UpdateUsedRegs(TmpUsedRegs,tai(p.next));
     
    -                    if (RegUsedAfterInstruction(NR_DEFAULTFLAGS, tai(hp1.next), TmpUsedRegs)) then
    +                    if (RegUsedAfterInstruction(NR_DEFAULTFLAGS, hp1, TmpUsedRegs)) then
                           begin
                             { At this point, the "and" command is effectively equivalent to
                               "test %reg,%reg". This will be handled separately by the
    @@ -1359,8 +1377,6 @@
                           end;
     
                         Result := True;
    -
    -                    ReleaseUsedRegs(TmpUsedRegs);
                         Exit;
     
                       end;
    @@ -1371,7 +1387,7 @@
               (taicpu(p).oper[1]^.typ = top_reg) and
               MatchOperand(taicpu(p).oper[1]^,taicpu(hp1).oper[0]^) then
               begin
    -            CopyUsedRegs(TmpUsedRegs);
    +            TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                 UpdateUsedRegs(TmpUsedRegs, tai(p.Next));
                 { we have
                     mov x, %treg
    @@ -1402,7 +1418,6 @@
                         DebugMsg(SPeepholeOptimization + 'MovMov2Mov 2 done',p);
                         asml.remove(hp1);
                         hp1.free;
    -                    ReleaseUsedRegs(TmpUsedRegs);
                         Result:=true;
                         Exit;
                       end;
    @@ -1425,7 +1440,6 @@
                             DebugMsg(SPeepholeOptimization + 'MovMov2Mov 5 done',p);
                             asml.remove(hp1);
                             hp1.free;
    -                        ReleaseUsedRegs(TmpUsedRegs);
                             Result:=true;
                             Exit;
                           end;
    @@ -1445,12 +1459,10 @@
                           DebugMsg(SPeepholeOptimization + 'MovMov2Mov 3 done',p);
                           asml.remove(hp1);
                           hp1.free;
    -                      ReleaseUsedRegs(TmpUsedRegs);
                           Result:=true;
                           Exit;
                         end;
                   end;
    -            ReleaseUsedRegs(TmpUsedRegs);
               end
             else
               { Change
    @@ -1486,7 +1498,7 @@
                        test/or/and %reg2, %reg2
                     }
                     begin
    -                  CopyUsedRegs(TmpUsedRegs);
    +                  TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                       { reg1 will be used after the first instruction,
                         so update the allocation info                  }
                       AllocRegBetween(taicpu(p).oper[0]^.reg,p,hp1,usedregs);
    @@ -1512,7 +1524,6 @@
                             asml.remove(p);
                             p.free;
                             p := hp1;
    -                        ReleaseUsedRegs(TmpUsedRegs);
                             Exit;
                           end
                         else
    @@ -1532,7 +1543,6 @@
                             taicpu(hp1).loadoper(1,taicpu(p).oper[0]^);
                             DebugMsg(SPeepholeOptimization + 'MovTestJxx2MovTestJxx done',p);
                           end;
    -                  ReleaseUsedRegs(TmpUsedRegs);
                     end
                 end
             else
    @@ -1606,7 +1616,7 @@
                           end
                         else
                           begin
    -                        CopyUsedRegs(TmpUsedRegs);
    +                        TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                             UpdateUsedRegs(TmpUsedRegs, tai(hp1.next));
                             if (taicpu(p).oper[1]^.typ = top_ref) and
                               { mov reg1, mem1
    @@ -1631,7 +1641,6 @@
                                 AllocRegBetween(taicpu(p).oper[0]^.reg,p,hp1,UsedRegs);
                                 DebugMsg(SPeepholeOptimization + 'MovMovCmp2MovCmp done',hp1);
                               end;
    -                        ReleaseUsedRegs(TmpUsedRegs);
                           end;
                       end
                     else if (taicpu(p).oper[1]^.typ=top_ref) and
    @@ -1643,7 +1652,7 @@
                       end
                     else
                       begin
    -                    CopyUsedRegs(TmpUsedRegs);
    +                    TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                         if GetNextInstruction(hp1, hp2) and
                           MatchOpType(taicpu(p),top_ref,top_reg) and
                           MatchOperand(taicpu(p).oper[1]^,taicpu(hp1).oper[0]^) and
    @@ -1715,7 +1724,6 @@
                             end
     {$endif i386}
                             ;
    -                    ReleaseUsedRegs(TmpUsedRegs);
                       end;
                   end
     (*          { movl [mem1],reg1
    @@ -1801,7 +1809,7 @@
                     to
                              add/sub/or/... reg3/$const, reg/ref      }
                   begin
    -                CopyUsedRegs(TmpUsedRegs);
    +                TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                     UpdateUsedRegs(TmpUsedRegs, tai(p.next));
                     UpdateUsedRegs(TmpUsedRegs, tai(hp1.next));
                     If not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp2,TmpUsedRegs)) then
    @@ -1855,7 +1863,6 @@
                         hp2.Free;
                         p := hp1;
                       end;
    -                ReleaseUsedRegs(TmpUsedRegs);
                   end
     {$ifndef x86_64}
                 else if MatchOpType(taicpu(hp2),top_reg,top_reg) and
    @@ -1875,7 +1882,7 @@
                              add/sub/or/... reg3/$const, reg3
                   }
                   begin
    -                CopyUsedRegs(TmpUsedRegs);
    +                TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                     UpdateUsedRegs(TmpUsedRegs, tai(p.next));
                     UpdateUsedRegs(TmpUsedRegs, tai(hp1.next));
                     If not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp2,TmpUsedRegs)) then
    @@ -1932,7 +1939,6 @@
                         hp2.Free;
     //                    p := hp1;
                       end;
    -                ReleaseUsedRegs(TmpUsedRegs);
                   end;
     {$endif x86_64}
               end
    @@ -1973,7 +1979,7 @@
     
                  add reg2,ref}
               begin
    -            CopyUsedRegs(TmpUsedRegs);
    +            TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                 { reg1 may not be used afterwards }
                 if not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp1, TmpUsedRegs)) then
                   begin
    @@ -1984,7 +1990,6 @@
                     p.free;
                     p:=hp1;
                   end;
    -            ReleaseUsedRegs(TmpUsedRegs);
               end;
           end;
     
    @@ -2044,7 +2049,6 @@
     
         function TX86AsmOptimizer.OptPass1OP(var p : tai) : boolean;
           var
    -        TmpUsedRegs : TAllUsedRegs;
             hp1 : tai;
           begin
             result:=false;
    @@ -2065,7 +2069,7 @@
               MatchOperand(taicpu(p).oper[0]^,taicpu(hp1).oper[1]^) and
               (taicpu(p).oper[0]^.typ=top_reg) then
               begin
    -            CopyUsedRegs(TmpUsedRegs);
    +            TransferUsedRegs(R_MMREGISTER, TmpUsedRegs);
                 UpdateUsedRegs(TmpUsedRegs, tai(p.next));
                 if not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp1,TmpUsedRegs)) then
                   begin
    @@ -2076,7 +2080,6 @@
                     hp1.Free;
                     result:=true;
                   end;
    -            ReleaseUsedRegs(TmpUsedRegs);
               end;
           end;
     
    @@ -2085,7 +2088,6 @@
           var
             hp1 : tai;
             l : ASizeInt;
    -        TmpUsedRegs : TAllUsedRegs;
           begin
             Result:=false;
             { removes seg register prefixes from LEA operations, as they
    @@ -2167,7 +2169,7 @@
               MatchOpType(Taicpu(hp1),top_reg,top_reg) and
               (taicpu(p).oper[1]^.reg<>NR_STACK_POINTER_REG) then
               begin
    -            CopyUsedRegs(TmpUsedRegs);
    +            TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                 UpdateUsedRegs(TmpUsedRegs, tai(p.next));
                 if not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp1,TmpUsedRegs)) then
                   begin
    @@ -2177,7 +2179,6 @@
                     hp1.Free;
                     result:=true;
                   end;
    -            ReleaseUsedRegs(TmpUsedRegs);
               end;
           end;
     
    @@ -2405,7 +2406,6 @@
     
         function TX86AsmOptimizer.OptPass1SETcc(var p: tai): boolean;
           var
    -        TmpUsedRegs : TAllUsedRegs;
             hp1,hp2,next: tai; SetC, JumpC: TAsmCond;
           begin
             Result:=false;
    @@ -2432,10 +2432,9 @@
               begin
                 next := tai(p.Next);
     
    -            CopyUsedRegs(TmpUsedRegs);
    +            TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                 UpdateUsedRegs(TmpUsedRegs, next);
                 UpdateUsedRegs(TmpUsedRegs, tai(hp1.next));
    -            UpdateUsedRegs(TmpUsedRegs, tai(hp2.next));
     
                 asml.Remove(hp1);
                 hp1.Free;
    @@ -2463,8 +2462,6 @@
                     p := hp2;
                   end;
     
    -            ReleaseUsedRegs(TmpUsedRegs);
    -
                 DebugMsg(SPeepholeOptimization + 'SETcc/TEST/Jcc -> Jcc',p);
               end;
           end;
    @@ -2472,7 +2469,6 @@
     
         function TX86AsmOptimizer.OptPass2MOV(var p : tai) : boolean;
           var
    -       TmpUsedRegs : TAllUsedRegs;
            hp1,hp2: tai;
     {$ifdef x86_64}
            hp3: tai;
    @@ -2497,9 +2493,8 @@
                 { Don't remove the MOV command without first checking that reg2 isn't used afterwards,
                   or unless supreg(reg3) = supreg(reg2)). [Kit] }
     
    -            CopyUsedRegs(TmpUsedRegs);
    +            TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                 UpdateUsedRegs(TmpUsedRegs, tai(p.next));
    -            UpdateUsedRegs(TmpUsedRegs, tai(hp1.next));
     
                 if (getsupreg(taicpu(p).oper[1]^.reg) = getsupreg(taicpu(hp1).oper[1]^.reg)) or
                   not RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp1, TmpUsedRegs)
    @@ -2511,7 +2506,6 @@
                     Result:=true;
                   end;
     
    -            ReleaseUsedRegs(TmpUsedRegs);
                 exit;
               end
             else if MatchOpType(taicpu(p),top_reg,top_reg) and
    @@ -2569,7 +2563,7 @@
               MatchOperand(taicpu(hp1).oper[taicpu(hp1).ops-1]^,taicpu(hp2).oper[0]^) and
               (taicpu(hp2).oper[1]^.typ = top_ref) then
               begin
    -            CopyUsedRegs(TmpUsedRegs);
    +            TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                 UpdateUsedRegs(TmpUsedRegs,tai(p.next));
                 UpdateUsedRegs(TmpUsedRegs,tai(hp1.next));
                 if (RefsEqual(taicpu(hp2).oper[1]^.ref^,taicpu(p).oper[0]^.ref^) and
    @@ -2604,7 +2598,6 @@
                     hp2.free;
                     p := hp1
                   end;
    -            ReleaseUsedRegs(TmpUsedRegs);
                 Exit;
     {$ifdef x86_64}
               end
    @@ -2706,7 +2699,6 @@
     
         function TX86AsmOptimizer.OptPass2Imul(var p : tai) : boolean;
           var
    -        TmpUsedRegs : TAllUsedRegs;
             hp1 : tai;
           begin
             Result:=false;
    @@ -2723,8 +2715,8 @@
                ((taicpu(hp1).oper[1]^.reg = taicpu(p).oper[1]^.reg) or
                 ((taicpu(hp1).opsize=S_L) and (taicpu(p).opsize=S_Q) and SuperRegistersEqual(taicpu(hp1).oper[1]^.reg,taicpu(p).oper[1]^.reg))) then
               begin
    -            CopyUsedRegs(TmpUsedRegs);
    +            TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                 if not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,p,TmpUsedRegs)) then
                   { change
                       mov reg1,reg2
                       imul y,reg2 to imul y,reg1,reg2 }
    @@ -2737,7 +2729,6 @@
                     hp1.free;
                     result:=true;
                   end;
    -            ReleaseUsedRegs(TmpUsedRegs);
               end;
           end;
     
    

Activities

J. Gareth Moreton

2018-12-11 10:00

developer  

tmpusedregs-obj-pool.patch (18,992 bytes)
Index: compiler/aoptobj.pas
===================================================================
--- compiler/aoptobj.pas	(revision 40520)
+++ compiler/aoptobj.pas	(working copy)
@@ -270,6 +270,9 @@
         Procedure UpdateUsedRegs(p : Tai);
         class procedure UpdateUsedRegs(var Regs: TAllUsedRegs; p: Tai);
         Function CopyUsedRegs(var dest : TAllUsedRegs) : boolean;
+        procedure RestoreUsedRegs(const Regs : TAllUsedRegs);
+        procedure TransferUsedRegs(var dest: TAllUsedRegs);
+        procedure TransferUsedRegs(typ: TRegisterType; var dest: TAllUsedRegs);
         class Procedure ReleaseUsedRegs(const regs : TAllUsedRegs);
         class Function RegInUsedRegs(reg : TRegister;regs : TAllUsedRegs) : boolean;
         class Procedure IncludeRegInUsedRegs(reg : TRegister;var regs : TAllUsedRegs);
@@ -457,7 +460,7 @@
       End;
 
 
-    Function TUsedRegs.GetUsedRegs: TRegSet;
+    Function TUsedRegs.GetUsedRegs: TRegSet; inline;
       Begin
         GetUsedRegs := UsedRegs;
       End;
@@ -945,6 +948,39 @@
       end;
 
 
+      procedure TAOptObj.RestoreUsedRegs(const Regs: TAllUsedRegs);
+      var
+        i : TRegisterType;
+      begin
+        { Note that the constructor Create_Regset is being called as a regular
+          method - it is not instantiating a new object.  This is because it is
+          the only published means to modify the internal state en-masse. [Kit] }
+        for i:=low(TRegisterType) to high(TRegisterType) do
+          UsedRegs[i].Create_Regset(i,Regs[i].GetUsedRegs);
+      end;
+
+
+      procedure TAOptObj.TransferUsedRegs(var dest: TAllUsedRegs);
+      var
+        i : TRegisterType;
+      begin
+        { Note that the constructor Create_Regset is being called as a regular
+          method - it is not instantiating a new object.  This is because it is
+          the only published means to modify the internal state en-masse. [Kit] }
+        for i:=low(TRegisterType) to high(TRegisterType) do
+          dest[i].Create_Regset(i, UsedRegs[i].GetUsedRegs);
+      end;
+
+
+      procedure TAOptObj.TransferUsedRegs(typ: TRegisterType; var dest: TAllUsedRegs); inline;
+      begin
+        { Note that the constructor Create_Regset is being called as a regular
+          method - it is not instantiating a new object.  This is because it is
+          the only published means to modify the internal state en-masse. [Kit] }
+        dest[typ].Create_Regset(typ, UsedRegs[typ].GetUsedRegs);
+      end;
+
+
       class procedure TAOptObj.ReleaseUsedRegs(const regs: TAllUsedRegs);
         var
           i : TRegisterType;
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 40520)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -30,15 +30,24 @@
     uses
       globtype,
       cpubase,
-      aasmtai,aasmcpu,
+      aasmtai,aasmcpu,aasmdata,
       cgbase,cgutils,
       aopt,aoptobj;
 
     type
+
+      { TX86AsmOptimizer }
+
       TX86AsmOptimizer = class(TAsmOptimizer)
+        constructor Create(_AsmL: TAsmList); override;
+        destructor Destroy; override;
         function RegLoadedWithNewValue(reg : tregister; hp : tai) : boolean; override;
         function InstructionLoadsFromReg(const reg : TRegister; const hp : tai) : boolean; override;
         function RegReadByInstruction(reg : TRegister; hp : tai) : boolean;
+      public
+        { Pooled object that can be used by optimisation procedures to evaluate
+          future register usage without upsetting the current state. }
+        TmpUsedRegs: TAllUsedRegs;
       protected
         { checks whether loading a new value in reg1 overwrites the entirety of reg2 }
         function Reg1WriteOverwritesReg2Entirely(reg1, reg2: tregister): boolean;
@@ -584,6 +593,20 @@
 {$endif DEBUG_AOPTCPU}
 
 
+    constructor TX86AsmOptimizer.Create(_AsmL: TAsmList);
+      begin
+        inherited Create(_AsmL);
+        CreateUsedRegs(TmpUsedRegs);
+      end;
+
+
+    destructor TX86AsmOptimizer.Destroy;
+      begin
+        ReleaseUsedRegs(TmpUsedRegs);
+        inherited Destroy;
+      end;
+
+
     function TX86AsmOptimizer.Reg1WriteOverwritesReg2Entirely(reg1, reg2: tregister): boolean;
       begin
         if not SuperRegistersEqual(reg1,reg2) then
@@ -1032,7 +1055,6 @@
 
     function TX86AsmOptimizer.OptPass1MOVAP(var p : tai) : boolean;
       var
-        TmpUsedRegs : TAllUsedRegs;
         hp1,hp2 : tai;
       begin
         result:=false;
@@ -1059,7 +1081,7 @@
                      addsX/subsX/... reg3,reg
           }
           begin
-            CopyUsedRegs(TmpUsedRegs);
+            TransferUsedRegs(R_MMREGISTER, TmpUsedRegs);
             UpdateUsedRegs(TmpUsedRegs, tai(p.next));
             UpdateUsedRegs(TmpUsedRegs, tai(hp1.next));
             If not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp2,TmpUsedRegs)) then
@@ -1081,7 +1103,6 @@
                 p:=hp1;
                 result:=true;
               end;
-            ReleaseUsedRegs(TmpUsedRegs);
           end
         end;
 
@@ -1088,7 +1109,6 @@
 
     function TX86AsmOptimizer.OptPass1VMOVAP(var p : tai) : boolean;
       var
-        TmpUsedRegs : TAllUsedRegs;
         hp1,hp2 : tai;
       begin
         result:=false;
@@ -1116,7 +1136,7 @@
                       dealloc reg2
                       =>
                       vmova* reg1,reg3 }
-                    CopyUsedRegs(TmpUsedRegs);
+                    TransferUsedRegs(R_MMREGISTER, TmpUsedRegs);
                     UpdateUsedRegs(TmpUsedRegs, tai(p.next));
                     if not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp1,TmpUsedRegs)) then
                       begin
@@ -1145,7 +1165,7 @@
                   MatchInstruction(hp2,A_VMOVAPD,A_VMOVAPS,[S_NO]) and
                   MatchOperand(taicpu(p).oper[0]^,taicpu(hp2).oper[1]^) then
                   begin
-                    CopyUsedRegs(TmpUsedRegs);
+                    TransferUsedRegs(R_MMREGISTER, TmpUsedRegs);
                     UpdateUsedRegs(TmpUsedRegs, tai(p.next));
                     UpdateUsedRegs(TmpUsedRegs, tai(hp1.next));
                     if not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp2,TmpUsedRegs))
@@ -1166,7 +1186,6 @@
 
     function TX86AsmOptimizer.OptPass1VOP(var p : tai) : boolean;
       var
-        TmpUsedRegs : TAllUsedRegs;
         hp1 : tai;
       begin
         result:=false;
@@ -1186,7 +1205,7 @@
           MatchOperand(taicpu(p).oper[2]^,taicpu(hp1).oper[0]^) and
           (taicpu(hp1).oper[1]^.typ=top_reg) then
           begin
-            CopyUsedRegs(TmpUsedRegs);
+            TransferUsedRegs(R_MMREGISTER, TmpUsedRegs);
             UpdateUsedRegs(TmpUsedRegs, tai(p.next));
             if not(RegUsedAfterInstruction(taicpu(hp1).oper[0]^.reg,hp1,TmpUsedRegs)
              ) then
@@ -1204,7 +1223,6 @@
     function TX86AsmOptimizer.OptPass1MOV(var p : tai) : boolean;
       var
         hp1, hp2: tai;
-        TmpUsedRegs : TAllUsedRegs;
         GetNextInstruction_p: Boolean;
         PreMessage, RegName1, RegName2, InputVal, MaskNum: string;
         NewSize: topsize;
@@ -1337,10 +1355,10 @@
                     taicpu(p).oper[1]^ := taicpu(hp1).oper[1]^;
 
                     { Safeguard if "and" is followed by a conditional command }
-                    CopyUsedRegs(TmpUsedRegs);
-                    UpdateUsedRegs(TmpUsedRegs,tai(hp1.next));
+                    TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
+                    UpdateUsedRegs(TmpUsedRegs,tai(p.next));
 
-                    if (RegUsedAfterInstruction(NR_DEFAULTFLAGS, tai(hp1.next), TmpUsedRegs)) then
+                    if (RegUsedAfterInstruction(NR_DEFAULTFLAGS, hp1, TmpUsedRegs)) then
                       begin
                         { At this point, the "and" command is effectively equivalent to
                           "test %reg,%reg". This will be handled separately by the
@@ -1359,8 +1377,6 @@
                       end;
 
                     Result := True;
-
-                    ReleaseUsedRegs(TmpUsedRegs);
                     Exit;
 
                   end;
@@ -1371,7 +1387,7 @@
           (taicpu(p).oper[1]^.typ = top_reg) and
           MatchOperand(taicpu(p).oper[1]^,taicpu(hp1).oper[0]^) then
           begin
-            CopyUsedRegs(TmpUsedRegs);
+            TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
             UpdateUsedRegs(TmpUsedRegs, tai(p.Next));
             { we have
                 mov x, %treg
@@ -1402,7 +1418,6 @@
                     DebugMsg(SPeepholeOptimization + 'MovMov2Mov 2 done',p);
                     asml.remove(hp1);
                     hp1.free;
-                    ReleaseUsedRegs(TmpUsedRegs);
                     Result:=true;
                     Exit;
                   end;
@@ -1425,7 +1440,6 @@
                         DebugMsg(SPeepholeOptimization + 'MovMov2Mov 5 done',p);
                         asml.remove(hp1);
                         hp1.free;
-                        ReleaseUsedRegs(TmpUsedRegs);
                         Result:=true;
                         Exit;
                       end;
@@ -1445,12 +1459,10 @@
                       DebugMsg(SPeepholeOptimization + 'MovMov2Mov 3 done',p);
                       asml.remove(hp1);
                       hp1.free;
-                      ReleaseUsedRegs(TmpUsedRegs);
                       Result:=true;
                       Exit;
                     end;
               end;
-            ReleaseUsedRegs(TmpUsedRegs);
           end
         else
           { Change
@@ -1486,7 +1498,7 @@
                    test/or/and %reg2, %reg2
                 }
                 begin
-                  CopyUsedRegs(TmpUsedRegs);
+                  TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                   { reg1 will be used after the first instruction,
                     so update the allocation info                  }
                   AllocRegBetween(taicpu(p).oper[0]^.reg,p,hp1,usedregs);
@@ -1512,7 +1524,6 @@
                         asml.remove(p);
                         p.free;
                         p := hp1;
-                        ReleaseUsedRegs(TmpUsedRegs);
                         Exit;
                       end
                     else
@@ -1532,7 +1543,6 @@
                         taicpu(hp1).loadoper(1,taicpu(p).oper[0]^);
                         DebugMsg(SPeepholeOptimization + 'MovTestJxx2MovTestJxx done',p);
                       end;
-                  ReleaseUsedRegs(TmpUsedRegs);
                 end
             end
         else
@@ -1606,7 +1616,7 @@
                       end
                     else
                       begin
-                        CopyUsedRegs(TmpUsedRegs);
+                        TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                         UpdateUsedRegs(TmpUsedRegs, tai(hp1.next));
                         if (taicpu(p).oper[1]^.typ = top_ref) and
                           { mov reg1, mem1
@@ -1631,7 +1641,6 @@
                             AllocRegBetween(taicpu(p).oper[0]^.reg,p,hp1,UsedRegs);
                             DebugMsg(SPeepholeOptimization + 'MovMovCmp2MovCmp done',hp1);
                           end;
-                        ReleaseUsedRegs(TmpUsedRegs);
                       end;
                   end
                 else if (taicpu(p).oper[1]^.typ=top_ref) and
@@ -1643,7 +1652,7 @@
                   end
                 else
                   begin
-                    CopyUsedRegs(TmpUsedRegs);
+                    TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                     if GetNextInstruction(hp1, hp2) and
                       MatchOpType(taicpu(p),top_ref,top_reg) and
                       MatchOperand(taicpu(p).oper[1]^,taicpu(hp1).oper[0]^) and
@@ -1715,7 +1724,6 @@
                         end
 {$endif i386}
                         ;
-                    ReleaseUsedRegs(TmpUsedRegs);
                   end;
               end
 (*          { movl [mem1],reg1
@@ -1801,7 +1809,7 @@
                 to
                          add/sub/or/... reg3/$const, reg/ref      }
               begin
-                CopyUsedRegs(TmpUsedRegs);
+                TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                 UpdateUsedRegs(TmpUsedRegs, tai(p.next));
                 UpdateUsedRegs(TmpUsedRegs, tai(hp1.next));
                 If not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp2,TmpUsedRegs)) then
@@ -1855,7 +1863,6 @@
                     hp2.Free;
                     p := hp1;
                   end;
-                ReleaseUsedRegs(TmpUsedRegs);
               end
 {$ifndef x86_64}
             else if MatchOpType(taicpu(hp2),top_reg,top_reg) and
@@ -1875,7 +1882,7 @@
                          add/sub/or/... reg3/$const, reg3
               }
               begin
-                CopyUsedRegs(TmpUsedRegs);
+                TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
                 UpdateUsedRegs(TmpUsedRegs, tai(p.next));
                 UpdateUsedRegs(TmpUsedRegs, tai(hp1.next));
                 If not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp2,TmpUsedRegs)) then
@@ -1932,7 +1939,6 @@
                     hp2.Free;
 //                    p := hp1;
                   end;
-                ReleaseUsedRegs(TmpUsedRegs);
               end;
 {$endif x86_64}
           end
@@ -1973,7 +1979,7 @@
 
              add reg2,ref}
           begin
-            CopyUsedRegs(TmpUsedRegs);
+            TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
             { reg1 may not be used afterwards }
             if not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp1, TmpUsedRegs)) then
               begin
@@ -1984,7 +1990,6 @@
                 p.free;
                 p:=hp1;
               end;
-            ReleaseUsedRegs(TmpUsedRegs);
           end;
       end;
 
@@ -2044,7 +2049,6 @@
 
     function TX86AsmOptimizer.OptPass1OP(var p : tai) : boolean;
       var
-        TmpUsedRegs : TAllUsedRegs;
         hp1 : tai;
       begin
         result:=false;
@@ -2065,7 +2069,7 @@
           MatchOperand(taicpu(p).oper[0]^,taicpu(hp1).oper[1]^) and
           (taicpu(p).oper[0]^.typ=top_reg) then
           begin
-            CopyUsedRegs(TmpUsedRegs);
+            TransferUsedRegs(R_MMREGISTER, TmpUsedRegs);
             UpdateUsedRegs(TmpUsedRegs, tai(p.next));
             if not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp1,TmpUsedRegs)) then
               begin
@@ -2076,7 +2080,6 @@
                 hp1.Free;
                 result:=true;
               end;
-            ReleaseUsedRegs(TmpUsedRegs);
           end;
       end;
 
@@ -2085,7 +2088,6 @@
       var
         hp1 : tai;
         l : ASizeInt;
-        TmpUsedRegs : TAllUsedRegs;
       begin
         Result:=false;
         { removes seg register prefixes from LEA operations, as they
@@ -2167,7 +2169,7 @@
           MatchOpType(Taicpu(hp1),top_reg,top_reg) and
           (taicpu(p).oper[1]^.reg<>NR_STACK_POINTER_REG) then
           begin
-            CopyUsedRegs(TmpUsedRegs);
+            TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
             UpdateUsedRegs(TmpUsedRegs, tai(p.next));
             if not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp1,TmpUsedRegs)) then
               begin
@@ -2177,7 +2179,6 @@
                 hp1.Free;
                 result:=true;
               end;
-            ReleaseUsedRegs(TmpUsedRegs);
           end;
       end;
 
@@ -2405,7 +2406,6 @@
 
     function TX86AsmOptimizer.OptPass1SETcc(var p: tai): boolean;
       var
-        TmpUsedRegs : TAllUsedRegs;
         hp1,hp2,next: tai; SetC, JumpC: TAsmCond;
       begin
         Result:=false;
@@ -2432,10 +2432,9 @@
           begin
             next := tai(p.Next);
 
-            CopyUsedRegs(TmpUsedRegs);
+            TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
             UpdateUsedRegs(TmpUsedRegs, next);
             UpdateUsedRegs(TmpUsedRegs, tai(hp1.next));
-            UpdateUsedRegs(TmpUsedRegs, tai(hp2.next));
 
             asml.Remove(hp1);
             hp1.Free;
@@ -2463,8 +2462,6 @@
                 p := hp2;
               end;
 
-            ReleaseUsedRegs(TmpUsedRegs);
-
             DebugMsg(SPeepholeOptimization + 'SETcc/TEST/Jcc -> Jcc',p);
           end;
       end;
@@ -2472,7 +2469,6 @@
 
     function TX86AsmOptimizer.OptPass2MOV(var p : tai) : boolean;
       var
-       TmpUsedRegs : TAllUsedRegs;
        hp1,hp2: tai;
 {$ifdef x86_64}
        hp3: tai;
@@ -2497,9 +2493,8 @@
             { Don't remove the MOV command without first checking that reg2 isn't used afterwards,
               or unless supreg(reg3) = supreg(reg2)). [Kit] }
 
-            CopyUsedRegs(TmpUsedRegs);
+            TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
             UpdateUsedRegs(TmpUsedRegs, tai(p.next));
-            UpdateUsedRegs(TmpUsedRegs, tai(hp1.next));
 
             if (getsupreg(taicpu(p).oper[1]^.reg) = getsupreg(taicpu(hp1).oper[1]^.reg)) or
               not RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp1, TmpUsedRegs)
@@ -2511,7 +2506,6 @@
                 Result:=true;
               end;
 
-            ReleaseUsedRegs(TmpUsedRegs);
             exit;
           end
         else if MatchOpType(taicpu(p),top_reg,top_reg) and
@@ -2569,7 +2563,7 @@
           MatchOperand(taicpu(hp1).oper[taicpu(hp1).ops-1]^,taicpu(hp2).oper[0]^) and
           (taicpu(hp2).oper[1]^.typ = top_ref) then
           begin
-            CopyUsedRegs(TmpUsedRegs);
+            TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
             UpdateUsedRegs(TmpUsedRegs,tai(p.next));
             UpdateUsedRegs(TmpUsedRegs,tai(hp1.next));
             if (RefsEqual(taicpu(hp2).oper[1]^.ref^,taicpu(p).oper[0]^.ref^) and
@@ -2604,7 +2598,6 @@
                 hp2.free;
                 p := hp1
               end;
-            ReleaseUsedRegs(TmpUsedRegs);
             Exit;
 {$ifdef x86_64}
           end
@@ -2706,7 +2699,6 @@
 
     function TX86AsmOptimizer.OptPass2Imul(var p : tai) : boolean;
       var
-        TmpUsedRegs : TAllUsedRegs;
         hp1 : tai;
       begin
         Result:=false;
@@ -2723,8 +2715,8 @@
            ((taicpu(hp1).oper[1]^.reg = taicpu(p).oper[1]^.reg) or
             ((taicpu(hp1).opsize=S_L) and (taicpu(p).opsize=S_Q) and SuperRegistersEqual(taicpu(hp1).oper[1]^.reg,taicpu(p).oper[1]^.reg))) then
           begin
-            CopyUsedRegs(TmpUsedRegs);
+            TransferUsedRegs(R_INTREGISTER, TmpUsedRegs);
             if not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,p,TmpUsedRegs)) then
               { change
                   mov reg1,reg2
                   imul y,reg2 to imul y,reg1,reg2 }
@@ -2737,7 +2729,6 @@
                 hp1.free;
                 result:=true;
               end;
-            ReleaseUsedRegs(TmpUsedRegs);
           end;
       end;
 

Florian

2019-01-20 18:54

administrator   ~0113529

Applied, with some modifications though: extended to all CPUs, no register type, as you suspected, I think this is too dangerous, TmpUsedRegs is declared in TAsmOptimizer.

Issue History

Date Modified Username Field Change
2018-12-11 10:00 J. Gareth Moreton New Issue
2018-12-11 10:00 J. Gareth Moreton Status new => assigned
2018-12-11 10:00 J. Gareth Moreton Assigned To => Florian
2018-12-11 10:00 J. Gareth Moreton File Added: tmpusedregs-obj-pool.patch
2018-12-11 10:01 J. Gareth Moreton Priority normal => low
2018-12-11 10:01 J. Gareth Moreton Severity minor => tweak
2018-12-11 10:04 J. Gareth Moreton Additional Information Updated View Revisions
2018-12-11 10:05 J. Gareth Moreton Tag Attached: 64-bit
2018-12-11 10:05 J. Gareth Moreton Tag Attached: compiler
2018-12-11 10:05 J. Gareth Moreton Tag Attached: optimization
2018-12-11 10:05 J. Gareth Moreton Tag Attached: patch
2018-12-11 10:05 J. Gareth Moreton Tag Attached: refactoring
2018-12-11 10:05 J. Gareth Moreton Tag Attached: x86
2018-12-11 10:05 J. Gareth Moreton Tag Attached: x86_64-win64
2018-12-11 10:07 J. Gareth Moreton Additional Information Updated View Revisions
2018-12-11 11:02 J. Gareth Moreton Summary [Refactor] TmpUsedRegs object pooling and optimisation => [Patch / Refactor] TmpUsedRegs object pooling and optimisation
2019-01-20 18:54 Florian Note Added: 0113529
2019-01-20 18:54 Florian Status assigned => resolved
2019-01-20 18:54 Florian Fixed in Version => 3.3.1
2019-01-20 18:54 Florian Resolution open => fixed