View Issue Details

IDProjectCategoryView StatusLast Update
0036437FPCCompilerpublic2020-01-04 13:08
ReporterJ. Gareth MoretonAssigned ToJ. Gareth Moreton 
PrioritylowSeveritytweakReproducibilityN/A
Status closedResolutionsuspended 
PlatformCross-platformOSMicrosoft WindowsOS Version10 Professional
Product Version3.3.1Product Buildr43679 
Target VersionFixed in Version 
Summary0036437: [Patch] Efficiency boosts in post-peephole optimisation stage
DescriptionThis patch makes some general-purpose changes to the post-peephole optimisation stage. since this stage is only meant as a final chance to convert instructions to more efficient forms. As a result, if the current list entry is not an instruction, it no longer calls the platform-specific PostPeepholeOptsCPU.

As an additional note, keeping track of the registers one instruction ahead proved to be less efficient both in terms of compiler speed and instruction conversion. On x86 platforms, certain "mov $0,%reg" instructions weren't being converted to "xor %reg,%reg" because a comparison instruction immediately followed (the FLAGS register gets allocated, implying that using xor would scramble it, even though it's not an issue here).
Steps To ReproduceApply patch and confirm correct compilation and testing.
Additional Informationi386-win32 and x86_64-win64 compile without problems, and "make fullcycle" is successful. Individual PostPeepholeOptsCPU routines were evaluated to determine if the register tracking changes would cause any adverse effects, which none were found.
Tagscompiler, optimizations, patch
Fixed in Revision
FPCOldBugId
FPCTarget-
Attached Files
  • PostPeepholeRegisters.patch (32,906 bytes)
    Index: compiler/aarch64/aoptcpu.pas
    ===================================================================
    --- compiler/aarch64/aoptcpu.pas	(revision 43679)
    +++ compiler/aarch64/aoptcpu.pas	(working copy)
    @@ -550,15 +550,12 @@
       function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean;
         begin
           result := false;
    -      if p.typ=ait_instruction then
    -        begin
    -          case taicpu(p).opcode of
    -            A_CMP:
    -              Result:=OptPostCMP(p);
    -            else
    -              ;
    -          end;
    -        end;
    +      case taicpu(p).opcode of
    +        A_CMP:
    +          Result:=OptPostCMP(p);
    +        else
    +          ;
    +      end;
         end;
     
     begin
    Index: compiler/aoptobj.pas
    ===================================================================
    --- compiler/aoptobj.pas	(revision 43679)
    +++ compiler/aoptobj.pas	(working copy)
    @@ -2497,14 +2497,15 @@
             ClearUsedRegs;
             while (p <> BlockEnd) Do
               begin
    -            UpdateUsedRegs(tai(p.next));
    -            if PostPeepHoleOptsCpu(p) then
    -              continue;
    -            if assigned(p) then
    +            if (p.typ = ait_instruction) and PostPeepHoleOptsCpu(p) then
                   begin
    -                UpdateUsedRegs(p);
    -                p:=tai(p.next);
    +                if (p.typ <> ait_instruction) then
    +                  UpdateUsedRegs(p);
    +                Continue;
                   end;
    +
    +            UpdateUsedRegs(tai(p.Next));
    +            GetNextInstruction(p, p);
               end;
           end;
     
    Index: compiler/arm/aoptcpu.pas
    ===================================================================
    --- compiler/arm/aoptcpu.pas	(revision 43679)
    +++ compiler/arm/aoptcpu.pas	(working copy)
    @@ -3064,156 +3064,153 @@
         begin
           result:=false;
     
    -      if p.typ = ait_instruction then
    +      if MatchInstruction(p, A_MOV, [C_None], [PF_None]) and
    +        (taicpu(p).oper[1]^.typ=top_const) and
    +        (taicpu(p).oper[1]^.val >= 0) and
    +        (taicpu(p).oper[1]^.val < 256) and
    +        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
             begin
    -          if MatchInstruction(p, A_MOV, [C_None], [PF_None]) and
    -            (taicpu(p).oper[1]^.typ=top_const) and
    -            (taicpu(p).oper[1]^.val >= 0) and
    -            (taicpu(p).oper[1]^.val < 256) and
    -            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    -            begin
    -              DebugMsg('Peephole Mov2Movs done', p);
    -              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    -              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    -              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    -              taicpu(p).oppostfix:=PF_S;
    -              result:=true;
    -            end
    -          else if MatchInstruction(p, A_MVN, [C_None], [PF_None]) and
    -            (taicpu(p).oper[1]^.typ=top_reg) and
    -            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    -            begin
    -              DebugMsg('Peephole Mvn2Mvns done', p);
    -              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    -              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    -              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    -              taicpu(p).oppostfix:=PF_S;
    -              result:=true;
    -            end
    -          else if MatchInstruction(p, A_RSB, [C_None], [PF_None]) and
    -            (taicpu(p).ops = 3) and
    -            (taicpu(p).oper[2]^.typ=top_const) and
    -            (taicpu(p).oper[2]^.val=0) and
    -            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    -            begin
    -              DebugMsg('Peephole Rsb2Rsbs done', p);
    -              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    -              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    -              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    -              taicpu(p).oppostfix:=PF_S;
    -              result:=true;
    -            end
    -          else if MatchInstruction(p, [A_ADD,A_SUB], [C_None], [PF_None]) and
    -            (taicpu(p).ops = 3) and
    -            MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
    -            (not MatchOperand(taicpu(p).oper[0]^, NR_STACK_POINTER_REG)) and
    -            (taicpu(p).oper[2]^.typ=top_const) and
    -            (taicpu(p).oper[2]^.val >= 0) and
    -            (taicpu(p).oper[2]^.val < 256) and
    -            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    -            begin
    -              DebugMsg('Peephole AddSub2*s done', p);
    -              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    -              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    -              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    -              taicpu(p).loadconst(1,taicpu(p).oper[2]^.val);
    -              taicpu(p).oppostfix:=PF_S;
    -              taicpu(p).ops := 2;
    -              result:=true;
    -            end
    -          else if MatchInstruction(p, [A_ADD,A_SUB], [C_None], [PF_None]) and
    -            (taicpu(p).ops = 2) and
    -            (taicpu(p).oper[1]^.typ=top_reg) and
    -            (not MatchOperand(taicpu(p).oper[0]^, NR_STACK_POINTER_REG)) and
    -            (not MatchOperand(taicpu(p).oper[1]^, NR_STACK_POINTER_REG)) and
    -            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    -            begin
    -              DebugMsg('Peephole AddSub2*s done', p);
    -              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    -              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    -              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    -              taicpu(p).oppostfix:=PF_S;
    -              result:=true;
    -            end
    -          else if MatchInstruction(p, [A_ADD], [C_None], [PF_None]) and
    -            (taicpu(p).ops = 3) and
    -            MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
    -            (taicpu(p).oper[2]^.typ=top_reg) then
    -            begin
    -              DebugMsg('Peephole AddRRR2AddRR done', p);
    -              taicpu(p).ops := 2;
    -              taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg);
    -              result:=true;
    -            end
    -          else if MatchInstruction(p, [A_AND,A_ORR,A_EOR,A_BIC,A_LSL,A_LSR,A_ASR,A_ROR], [C_None], [PF_None]) and
    -            (taicpu(p).ops = 3) and
    -            MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
    -            (taicpu(p).oper[2]^.typ=top_reg) and
    -            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    -            begin
    -              DebugMsg('Peephole opXXY2opsXY done', p);
    -              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    -              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    -              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    -              taicpu(p).ops := 2;
    -              taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg);
    -              taicpu(p).oppostfix:=PF_S;
    -              result:=true;
    -            end
    -          else if MatchInstruction(p, [A_AND,A_ORR,A_EOR,A_BIC,A_LSL,A_LSR,A_ASR,A_ROR], [C_None], [PF_S]) and
    -            (taicpu(p).ops = 3) and
    -            MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
    -            (taicpu(p).oper[2]^.typ in [top_reg,top_const]) then
    -            begin
    -              DebugMsg('Peephole opXXY2opXY done', p);
    -              taicpu(p).ops := 2;
    -              if taicpu(p).oper[2]^.typ=top_reg then
    -                taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg)
    -              else
    -                taicpu(p).loadconst(1,taicpu(p).oper[2]^.val);
    -              result:=true;
    -            end
    -          else if MatchInstruction(p, [A_AND,A_ORR,A_EOR], [C_None], [PF_None,PF_S]) and
    -            (taicpu(p).ops = 3) and
    -            MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[2]^) and
    -            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    -            begin
    -              DebugMsg('Peephole opXYX2opsXY done', p);
    -              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    -              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    -              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    -              taicpu(p).oppostfix:=PF_S;
    -              taicpu(p).ops := 2;
    -              result:=true;
    -            end
    -          else if MatchInstruction(p, [A_MOV], [C_None], [PF_None]) and
    -            (taicpu(p).ops=3) and
    -            (taicpu(p).oper[2]^.typ=top_shifterop) and
    -            (taicpu(p).oper[2]^.shifterop^.shiftmode in [SM_LSL,SM_LSR,SM_ASR,SM_ROR]) and
    -            //MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
    -            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    -            begin
    -              DebugMsg('Peephole Mov2Shift done', p);
    -              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    -              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    -              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    -              taicpu(p).oppostfix:=PF_S;
    +          DebugMsg('Peephole Mov2Movs done', p);
    +          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    +          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    +          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    +          taicpu(p).oppostfix:=PF_S;
    +          result:=true;
    +        end
    +      else if MatchInstruction(p, A_MVN, [C_None], [PF_None]) and
    +        (taicpu(p).oper[1]^.typ=top_reg) and
    +        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    +        begin
    +          DebugMsg('Peephole Mvn2Mvns done', p);
    +          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    +          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    +          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    +          taicpu(p).oppostfix:=PF_S;
    +          result:=true;
    +        end
    +      else if MatchInstruction(p, A_RSB, [C_None], [PF_None]) and
    +        (taicpu(p).ops = 3) and
    +        (taicpu(p).oper[2]^.typ=top_const) and
    +        (taicpu(p).oper[2]^.val=0) and
    +        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    +        begin
    +          DebugMsg('Peephole Rsb2Rsbs done', p);
    +          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    +          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    +          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    +          taicpu(p).oppostfix:=PF_S;
    +          result:=true;
    +        end
    +      else if MatchInstruction(p, [A_ADD,A_SUB], [C_None], [PF_None]) and
    +        (taicpu(p).ops = 3) and
    +        MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
    +        (not MatchOperand(taicpu(p).oper[0]^, NR_STACK_POINTER_REG)) and
    +        (taicpu(p).oper[2]^.typ=top_const) and
    +        (taicpu(p).oper[2]^.val >= 0) and
    +        (taicpu(p).oper[2]^.val < 256) and
    +        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    +        begin
    +          DebugMsg('Peephole AddSub2*s done', p);
    +          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    +          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    +          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    +          taicpu(p).loadconst(1,taicpu(p).oper[2]^.val);
    +          taicpu(p).oppostfix:=PF_S;
    +          taicpu(p).ops := 2;
    +          result:=true;
    +        end
    +      else if MatchInstruction(p, [A_ADD,A_SUB], [C_None], [PF_None]) and
    +        (taicpu(p).ops = 2) and
    +        (taicpu(p).oper[1]^.typ=top_reg) and
    +        (not MatchOperand(taicpu(p).oper[0]^, NR_STACK_POINTER_REG)) and
    +        (not MatchOperand(taicpu(p).oper[1]^, NR_STACK_POINTER_REG)) and
    +        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    +        begin
    +          DebugMsg('Peephole AddSub2*s done', p);
    +          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    +          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    +          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    +          taicpu(p).oppostfix:=PF_S;
    +          result:=true;
    +        end
    +      else if MatchInstruction(p, [A_ADD], [C_None], [PF_None]) and
    +        (taicpu(p).ops = 3) and
    +        MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
    +        (taicpu(p).oper[2]^.typ=top_reg) then
    +        begin
    +          DebugMsg('Peephole AddRRR2AddRR done', p);
    +          taicpu(p).ops := 2;
    +          taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg);
    +          result:=true;
    +        end
    +      else if MatchInstruction(p, [A_AND,A_ORR,A_EOR,A_BIC,A_LSL,A_LSR,A_ASR,A_ROR], [C_None], [PF_None]) and
    +        (taicpu(p).ops = 3) and
    +        MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
    +        (taicpu(p).oper[2]^.typ=top_reg) and
    +        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    +        begin
    +          DebugMsg('Peephole opXXY2opsXY done', p);
    +          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    +          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    +          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    +          taicpu(p).ops := 2;
    +          taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg);
    +          taicpu(p).oppostfix:=PF_S;
    +          result:=true;
    +        end
    +      else if MatchInstruction(p, [A_AND,A_ORR,A_EOR,A_BIC,A_LSL,A_LSR,A_ASR,A_ROR], [C_None], [PF_S]) and
    +        (taicpu(p).ops = 3) and
    +        MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
    +        (taicpu(p).oper[2]^.typ in [top_reg,top_const]) then
    +        begin
    +          DebugMsg('Peephole opXXY2opXY done', p);
    +          taicpu(p).ops := 2;
    +          if taicpu(p).oper[2]^.typ=top_reg then
    +            taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg)
    +          else
    +            taicpu(p).loadconst(1,taicpu(p).oper[2]^.val);
    +          result:=true;
    +        end
    +      else if MatchInstruction(p, [A_AND,A_ORR,A_EOR], [C_None], [PF_None,PF_S]) and
    +        (taicpu(p).ops = 3) and
    +        MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[2]^) and
    +        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    +        begin
    +          DebugMsg('Peephole opXYX2opsXY done', p);
    +          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    +          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    +          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    +          taicpu(p).oppostfix:=PF_S;
    +          taicpu(p).ops := 2;
    +          result:=true;
    +        end
    +      else if MatchInstruction(p, [A_MOV], [C_None], [PF_None]) and
    +        (taicpu(p).ops=3) and
    +        (taicpu(p).oper[2]^.typ=top_shifterop) and
    +        (taicpu(p).oper[2]^.shifterop^.shiftmode in [SM_LSL,SM_LSR,SM_ASR,SM_ROR]) and
    +        //MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
    +        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
    +        begin
    +          DebugMsg('Peephole Mov2Shift done', p);
    +          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
    +          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
    +          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
    +          taicpu(p).oppostfix:=PF_S;
     
    -              case taicpu(p).oper[2]^.shifterop^.shiftmode of
    -                SM_LSL: taicpu(p).opcode:=A_LSL;
    -                SM_LSR: taicpu(p).opcode:=A_LSR;
    -                SM_ASR: taicpu(p).opcode:=A_ASR;
    -                SM_ROR: taicpu(p).opcode:=A_ROR;
    -                else
    -                  internalerror(2019050912);
    -              end;
    +          case taicpu(p).oper[2]^.shifterop^.shiftmode of
    +            SM_LSL: taicpu(p).opcode:=A_LSL;
    +            SM_LSR: taicpu(p).opcode:=A_LSR;
    +            SM_ASR: taicpu(p).opcode:=A_ASR;
    +            SM_ROR: taicpu(p).opcode:=A_ROR;
    +            else
    +              internalerror(2019050912);
    +          end;
     
    -              if taicpu(p).oper[2]^.shifterop^.rs<>NR_NO then
    -                taicpu(p).loadreg(2, taicpu(p).oper[2]^.shifterop^.rs)
    -              else
    -                taicpu(p).loadconst(2, taicpu(p).oper[2]^.shifterop^.shiftimm);
    -              result:=true;
    -            end
    +          if taicpu(p).oper[2]^.shifterop^.rs<>NR_NO then
    +            taicpu(p).loadreg(2, taicpu(p).oper[2]^.shifterop^.rs)
    +          else
    +            taicpu(p).loadconst(2, taicpu(p).oper[2]^.shifterop^.shiftimm);
    +          result:=true;
             end;
         end;
     
    Index: compiler/i386/aoptcpu.pas
    ===================================================================
    --- compiler/i386/aoptcpu.pas	(revision 43679)
    +++ compiler/i386/aoptcpu.pas	(working copy)
    @@ -263,77 +263,70 @@
             hp1: tai;
           begin
             Result:=false;
    -        case p.Typ Of
    -          Ait_Instruction:
    -            begin
    -              if InsContainsSegRef(taicpu(p)) then
    -                Exit;
    -              case taicpu(p).opcode Of
    -                A_CALL:
    -                  Result:=PostPeepHoleOptCall(p);
    -                A_LEA:
    -                  Result:=PostPeepholeOptLea(p);
    -                A_CMP:
    -                  Result:=PostPeepholeOptCmp(p);
    -                A_MOV:
    -                  Result:=PostPeepholeOptMov(p);
    -                A_MOVZX:
    -                  { if register vars are on, it's possible there is code like }
    -                  {   "cmpl $3,%eax; movzbl 8(%ebp),%ebx; je .Lxxx"           }
    -                  { so we can't safely replace the movzx then with xor/mov,   }
    -                  { since that would change the flags (JM)                    }
    -                  if not(cs_opt_regvar in current_settings.optimizerswitches) then
    +        if InsContainsSegRef(taicpu(p)) then
    +          Exit;
    +        case taicpu(p).opcode Of
    +          A_CALL:
    +            Result:=PostPeepHoleOptCall(p);
    +          A_LEA:
    +            Result:=PostPeepholeOptLea(p);
    +          A_CMP:
    +            Result:=PostPeepholeOptCmp(p);
    +          A_MOV:
    +            Result:=PostPeepholeOptMov(p);
    +          A_MOVZX:
    +            { if register vars are on, it's possible there is code like }
    +            {   "cmpl $3,%eax; movzbl 8(%ebp),%ebx; je .Lxxx"           }
    +            { so we can't safely replace the movzx then with xor/mov,   }
    +            { since that would change the flags (JM)                    }
    +            if not(cs_opt_regvar in current_settings.optimizerswitches) then
    +              begin
    +                if (taicpu(p).oper[1]^.typ = top_reg) then
    +                  if (taicpu(p).oper[0]^.typ = top_reg)
    +                    then
    +                      case taicpu(p).opsize of
    +                        S_BL:
    +                          begin
    +                            if IsGP32Reg(taicpu(p).oper[1]^.reg) and
    +                               not(cs_opt_size in current_settings.optimizerswitches) and
    +                               (current_settings.optimizecputype = cpu_Pentium) then
    +                                {Change "movzbl %reg1, %reg2" to
    +                                 "xorl %reg2, %reg2; movb %reg1, %reg2" for Pentium and
    +                                 PentiumMMX}
    +                              begin
    +                                hp1 := taicpu.op_reg_reg(A_XOR, S_L,
    +                                            taicpu(p).oper[1]^.reg, taicpu(p).oper[1]^.reg);
    +                                InsertLLItem(p.previous, p, hp1);
    +                                taicpu(p).opcode := A_MOV;
    +                                taicpu(p).changeopsize(S_B);
    +                                setsubreg(taicpu(p).oper[1]^.reg,R_SUBL);
    +                            end;
    +                        end;
    +                      else
    +                        ;
    +                    end
    +                  else if (taicpu(p).oper[0]^.typ = top_ref) and
    +                      (taicpu(p).oper[0]^.ref^.base <> taicpu(p).oper[1]^.reg) and
    +                      (taicpu(p).oper[0]^.ref^.index <> taicpu(p).oper[1]^.reg) and
    +                      not(cs_opt_size in current_settings.optimizerswitches) and
    +                      IsGP32Reg(taicpu(p).oper[1]^.reg) and
    +                      (current_settings.optimizecputype = cpu_Pentium) and
    +                      (taicpu(p).opsize = S_BL) then
    +                    {changes "movzbl mem, %reg" to "xorl %reg, %reg; movb mem, %reg8" for
    +                      Pentium and PentiumMMX}
                         begin
    -                      if (taicpu(p).oper[1]^.typ = top_reg) then
    -                        if (taicpu(p).oper[0]^.typ = top_reg)
    -                          then
    -                            case taicpu(p).opsize of
    -                              S_BL:
    -                                begin
    -                                  if IsGP32Reg(taicpu(p).oper[1]^.reg) and
    -                                     not(cs_opt_size in current_settings.optimizerswitches) and
    -                                     (current_settings.optimizecputype = cpu_Pentium) then
    -                                      {Change "movzbl %reg1, %reg2" to
    -                                       "xorl %reg2, %reg2; movb %reg1, %reg2" for Pentium and
    -                                       PentiumMMX}
    -                                    begin
    -                                      hp1 := taicpu.op_reg_reg(A_XOR, S_L,
    -                                                  taicpu(p).oper[1]^.reg, taicpu(p).oper[1]^.reg);
    -                                      InsertLLItem(p.previous, p, hp1);
    -                                      taicpu(p).opcode := A_MOV;
    -                                      taicpu(p).changeopsize(S_B);
    -                                      setsubreg(taicpu(p).oper[1]^.reg,R_SUBL);
    -                                    end;
    -                                end;
    -                              else
    -                                ;
    -                            end
    -                          else if (taicpu(p).oper[0]^.typ = top_ref) and
    -                              (taicpu(p).oper[0]^.ref^.base <> taicpu(p).oper[1]^.reg) and
    -                              (taicpu(p).oper[0]^.ref^.index <> taicpu(p).oper[1]^.reg) and
    -                              not(cs_opt_size in current_settings.optimizerswitches) and
    -                              IsGP32Reg(taicpu(p).oper[1]^.reg) and
    -                              (current_settings.optimizecputype = cpu_Pentium) and
    -                              (taicpu(p).opsize = S_BL) then
    -                            {changes "movzbl mem, %reg" to "xorl %reg, %reg; movb mem, %reg8" for
    -                              Pentium and PentiumMMX}
    -                            begin
    -                              hp1 := taicpu.Op_reg_reg(A_XOR, S_L, taicpu(p).oper[1]^.reg,
    -                                          taicpu(p).oper[1]^.reg);
    -                              taicpu(p).opcode := A_MOV;
    -                              taicpu(p).changeopsize(S_B);
    -                              setsubreg(taicpu(p).oper[1]^.reg,R_SUBL);
    -                              InsertLLItem(p.previous, p, hp1);
    -                            end;
    -                   end;
    -                A_TEST, A_OR:
    -                  Result:=PostPeepholeOptTestOr(p);
    -                else
    -                  ;
    -              end;
    -            end;
    +                      hp1 := taicpu.Op_reg_reg(A_XOR, S_L, taicpu(p).oper[1]^.reg,
    +                                  taicpu(p).oper[1]^.reg);
    +                      taicpu(p).opcode := A_MOV;
    +                      taicpu(p).changeopsize(S_B);
    +                      setsubreg(taicpu(p).oper[1]^.reg,R_SUBL);
    +                      InsertLLItem(p.previous, p, hp1);
    +                    end;
    +               end;
    +          A_TEST, A_OR:
    +            Result:=PostPeepholeOptTestOr(p);
               else
    -            ;
    +            { Do nothing };
             end;
           end;
     
    Index: compiler/i8086/aoptcpu.pas
    ===================================================================
    --- compiler/i8086/aoptcpu.pas	(revision 43679)
    +++ compiler/i8086/aoptcpu.pas	(working copy)
    @@ -151,22 +151,15 @@
         function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean;
           begin
             result := false;
    -        case p.typ of
    -          ait_instruction:
    -            begin
    -              case taicpu(p).opcode of
    -                {A_MOV commented out, because it still breaks some i8086 code :( }
    -                {A_MOV:
    -                  Result:=PostPeepholeOptMov(p);}
    -                A_CMP:
    -                  Result:=PostPeepholeOptCmp(p);
    -                A_OR,
    -                A_TEST:
    -                  Result:=PostPeepholeOptTestOr(p);
    -                else
    -                  ;
    -              end;
    -            end;
    +        case taicpu(p).opcode of
    +          {A_MOV commented out, because it still breaks some i8086 code :( }
    +          {A_MOV:
    +            Result:=PostPeepholeOptMov(p);}
    +          A_CMP:
    +            Result:=PostPeepholeOptCmp(p);
    +          A_OR,
    +          A_TEST:
    +            Result:=PostPeepholeOptTestOr(p);
               else
                 ;
             end;
    Index: compiler/jvm/aoptcpu.pas
    ===================================================================
    --- compiler/jvm/aoptcpu.pas	(revision 43679)
    +++ compiler/jvm/aoptcpu.pas	(working copy)
    @@ -175,9 +175,7 @@
     
       function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean;
         begin
    -      result:=
    -        (p.typ=ait_instruction) and
    -        RemoveLoadLoadSwap(p);
    +      Result := RemoveLoadLoadSwap(p);
         end;
     
     begin
    Index: compiler/powerpc/aoptcpu.pas
    ===================================================================
    --- compiler/powerpc/aoptcpu.pas	(revision 43679)
    +++ compiler/powerpc/aoptcpu.pas	(working copy)
    @@ -441,70 +441,63 @@
           next1: tai;
         begin
           result := false;
    -      case p.typ of
    -        ait_instruction:
    +      case taicpu(p).opcode of
    +        A_RLWINM_:
               begin
    -            case taicpu(p).opcode of
    -              A_RLWINM_:
    +            // rlwinm_ is cracked on the G5, andi_/andis_ aren't
    +            if (taicpu(p).oper[2]^.val = 0) then
    +              if (taicpu(p).oper[3]^.val < 16) and
    +                 (taicpu(p).oper[4]^.val < 16) then
                     begin
    -                  // rlwinm_ is cracked on the G5, andi_/andis_ aren't
    -                  if (taicpu(p).oper[2]^.val = 0) then
    -                    if (taicpu(p).oper[3]^.val < 16) and
    -                       (taicpu(p).oper[4]^.val < 16) then
    -                      begin
    -                        taicpu(p).opcode := A_ANDIS_;
    -                        taicpu(p).oper[2]^.val := word(
    -                          ((1 shl (16-taicpu(p).oper[3]^.val)) - 1) xor
    -                          ((1 shl (15-taicpu(p).oper[4]^.val)) - 1));
    -                        taicpu(p).freeop(3);
    -                        taicpu(p).freeop(4);
    -                        taicpu(p).ops := 3;
    -                        taicpu(p).opercnt := 3;
    -                      end
    -                    else if (taicpu(p).oper[3]^.val >= 16) and
    -                       (taicpu(p).oper[4]^.val >= 16) then
    -                      begin
    -                        taicpu(p).opcode := A_ANDI_;
    -                        taicpu(p).oper[2]^.val := word(rlwinm2mask(taicpu(p).oper[3]^.val,taicpu(p).oper[4]^.val));
    -                        taicpu(p).freeop(3);
    -                        taicpu(p).freeop(4);
    -                        taicpu(p).ops := 3;
    -                        taicpu(p).opercnt := 3;
    -                      end;
    +                  taicpu(p).opcode := A_ANDIS_;
    +                  taicpu(p).oper[2]^.val := word(
    +                    ((1 shl (16-taicpu(p).oper[3]^.val)) - 1) xor
    +                    ((1 shl (15-taicpu(p).oper[4]^.val)) - 1));
    +                  taicpu(p).freeop(3);
    +                  taicpu(p).freeop(4);
    +                  taicpu(p).ops := 3;
    +                  taicpu(p).opercnt := 3;
    +                end
    +              else if (taicpu(p).oper[3]^.val >= 16) and
    +                 (taicpu(p).oper[4]^.val >= 16) then
    +                begin
    +                  taicpu(p).opcode := A_ANDI_;
    +                  taicpu(p).oper[2]^.val := word(rlwinm2mask(taicpu(p).oper[3]^.val,taicpu(p).oper[4]^.val));
    +                  taicpu(p).freeop(3);
    +                  taicpu(p).freeop(4);
    +                  taicpu(p).ops := 3;
    +                  taicpu(p).opercnt := 3;
                     end;
    -              else
    -                ;
    -            end;
    -
    -            // change "integer operation with destination reg" followed by a
    -            // comparison to zero of that reg, with a variant of that integer
    -            // operation which sets the flags (if it exists)
    -            if not(result) and
    -               (taicpu(p).ops >= 2) and
    -               (taicpu(p).oper[0]^.typ = top_reg) and
    -               (taicpu(p).oper[1]^.typ = top_reg) and
    -               getnextinstruction(p,next1) and
    -               (next1.typ = ait_instruction) and
    -               (taicpu(next1).opcode = A_CMPWI) and
    -               // make sure it the result goes to cr0
    -               (((taicpu(next1).ops = 2) and
    -                 (taicpu(next1).oper[1]^.val = 0) and
    -                 (taicpu(next1).oper[0]^.reg = taicpu(p).oper[0]^.reg)) or
    -                ((taicpu(next1).ops = 3) and
    -                 (taicpu(next1).oper[2]^.val = 0) and
    -                 (taicpu(next1).oper[0]^.typ = top_reg) and
    -                 (getsupreg(taicpu(next1).oper[0]^.reg) = RS_CR0) and
    -                 (taicpu(next1).oper[1]^.reg = taicpu(p).oper[0]^.reg))) and
    -               changetomodifyflags(taicpu(p)) then
    -              begin
    -                asml.remove(next1);
    -                next1.free;
    -                result := true;
    -              end;
               end;
             else
               ;
           end;
    +
    +      // change "integer operation with destination reg" followed by a
    +      // comparison to zero of that reg, with a variant of that integer
    +      // operation which sets the flags (if it exists)
    +      if not(result) and
    +         (taicpu(p).ops >= 2) and
    +         (taicpu(p).oper[0]^.typ = top_reg) and
    +         (taicpu(p).oper[1]^.typ = top_reg) and
    +         getnextinstruction(p,next1) and
    +         (next1.typ = ait_instruction) and
    +         (taicpu(next1).opcode = A_CMPWI) and
    +         // make sure it the result goes to cr0
    +         (((taicpu(next1).ops = 2) and
    +           (taicpu(next1).oper[1]^.val = 0) and
    +           (taicpu(next1).oper[0]^.reg = taicpu(p).oper[0]^.reg)) or
    +          ((taicpu(next1).ops = 3) and
    +           (taicpu(next1).oper[2]^.val = 0) and
    +           (taicpu(next1).oper[0]^.typ = top_reg) and
    +           (getsupreg(taicpu(next1).oper[0]^.reg) = RS_CR0) and
    +           (taicpu(next1).oper[1]^.reg = taicpu(p).oper[0]^.reg))) and
    +         changetomodifyflags(taicpu(p)) then
    +        begin
    +          asml.remove(next1);
    +          next1.free;
    +          result := true;
    +        end;
         end;
     
     begin
    Index: compiler/riscv32/aoptcpu.pas
    ===================================================================
    --- compiler/riscv32/aoptcpu.pas	(revision 43679)
    +++ compiler/riscv32/aoptcpu.pas	(working copy)
    @@ -63,17 +63,8 @@
     
     
       function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean;
    -    var
    -      next1: tai;
         begin
    -      result := false;
    -      case p.typ of
    -        ait_instruction:
    -          begin
    -          end;
    -        else
    -          ;
    -      end;
    +      Result := False;
         end;
     
     begin
    Index: compiler/x86_64/aoptcpu.pas
    ===================================================================
    --- compiler/x86_64/aoptcpu.pas	(revision 43679)
    +++ compiler/x86_64/aoptcpu.pas	(working copy)
    @@ -168,29 +168,22 @@
         function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean;
           begin
             result := false;
    -        case p.typ of
    -          ait_instruction:
    -            begin
    -              case taicpu(p).opcode of
    -                A_MOV:
    -                  Result:=PostPeepholeOptMov(p);
    -                A_MOVZX:
    -                  Result:=PostPeepholeOptMovzx(p);
    -                A_CMP:
    -                  Result:=PostPeepholeOptCmp(p);
    -                A_OR,
    -                A_TEST:
    -                  Result:=PostPeepholeOptTestOr(p);
    -                A_XOR:
    -                  Result:=PostPeepholeOptXor(p);
    -                A_CALL:
    -                  Result:=PostPeepholeOptCall(p);
    -                A_LEA:
    -                  Result:=PostPeepholeOptLea(p);
    -                else
    -                  ;
    -              end;
    -            end;
    +        case taicpu(p).opcode of
    +          A_MOV:
    +            Result:=PostPeepholeOptMov(p);
    +          A_MOVZX:
    +            Result:=PostPeepholeOptMovzx(p);
    +          A_CMP:
    +            Result:=PostPeepholeOptCmp(p);
    +          A_OR,
    +          A_TEST:
    +            Result:=PostPeepholeOptTestOr(p);
    +          A_XOR:
    +            Result:=PostPeepholeOptXor(p);
    +          A_CALL:
    +            Result:=PostPeepholeOptCall(p);
    +          A_LEA:
    +            Result:=PostPeepholeOptLea(p);
               else
                 ;
             end;
    

Activities

J. Gareth Moreton

2019-12-14 10:56

developer  

PostPeepholeRegisters.patch (32,906 bytes)
Index: compiler/aarch64/aoptcpu.pas
===================================================================
--- compiler/aarch64/aoptcpu.pas	(revision 43679)
+++ compiler/aarch64/aoptcpu.pas	(working copy)
@@ -550,15 +550,12 @@
   function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean;
     begin
       result := false;
-      if p.typ=ait_instruction then
-        begin
-          case taicpu(p).opcode of
-            A_CMP:
-              Result:=OptPostCMP(p);
-            else
-              ;
-          end;
-        end;
+      case taicpu(p).opcode of
+        A_CMP:
+          Result:=OptPostCMP(p);
+        else
+          ;
+      end;
     end;
 
 begin
Index: compiler/aoptobj.pas
===================================================================
--- compiler/aoptobj.pas	(revision 43679)
+++ compiler/aoptobj.pas	(working copy)
@@ -2497,14 +2497,15 @@
         ClearUsedRegs;
         while (p <> BlockEnd) Do
           begin
-            UpdateUsedRegs(tai(p.next));
-            if PostPeepHoleOptsCpu(p) then
-              continue;
-            if assigned(p) then
+            if (p.typ = ait_instruction) and PostPeepHoleOptsCpu(p) then
               begin
-                UpdateUsedRegs(p);
-                p:=tai(p.next);
+                if (p.typ <> ait_instruction) then
+                  UpdateUsedRegs(p);
+                Continue;
               end;
+
+            UpdateUsedRegs(tai(p.Next));
+            GetNextInstruction(p, p);
           end;
       end;
 
Index: compiler/arm/aoptcpu.pas
===================================================================
--- compiler/arm/aoptcpu.pas	(revision 43679)
+++ compiler/arm/aoptcpu.pas	(working copy)
@@ -3064,156 +3064,153 @@
     begin
       result:=false;
 
-      if p.typ = ait_instruction then
+      if MatchInstruction(p, A_MOV, [C_None], [PF_None]) and
+        (taicpu(p).oper[1]^.typ=top_const) and
+        (taicpu(p).oper[1]^.val >= 0) and
+        (taicpu(p).oper[1]^.val < 256) and
+        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
         begin
-          if MatchInstruction(p, A_MOV, [C_None], [PF_None]) and
-            (taicpu(p).oper[1]^.typ=top_const) and
-            (taicpu(p).oper[1]^.val >= 0) and
-            (taicpu(p).oper[1]^.val < 256) and
-            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
-            begin
-              DebugMsg('Peephole Mov2Movs done', p);
-              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
-              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
-              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
-              taicpu(p).oppostfix:=PF_S;
-              result:=true;
-            end
-          else if MatchInstruction(p, A_MVN, [C_None], [PF_None]) and
-            (taicpu(p).oper[1]^.typ=top_reg) and
-            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
-            begin
-              DebugMsg('Peephole Mvn2Mvns done', p);
-              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
-              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
-              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
-              taicpu(p).oppostfix:=PF_S;
-              result:=true;
-            end
-          else if MatchInstruction(p, A_RSB, [C_None], [PF_None]) and
-            (taicpu(p).ops = 3) and
-            (taicpu(p).oper[2]^.typ=top_const) and
-            (taicpu(p).oper[2]^.val=0) and
-            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
-            begin
-              DebugMsg('Peephole Rsb2Rsbs done', p);
-              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
-              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
-              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
-              taicpu(p).oppostfix:=PF_S;
-              result:=true;
-            end
-          else if MatchInstruction(p, [A_ADD,A_SUB], [C_None], [PF_None]) and
-            (taicpu(p).ops = 3) and
-            MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
-            (not MatchOperand(taicpu(p).oper[0]^, NR_STACK_POINTER_REG)) and
-            (taicpu(p).oper[2]^.typ=top_const) and
-            (taicpu(p).oper[2]^.val >= 0) and
-            (taicpu(p).oper[2]^.val < 256) and
-            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
-            begin
-              DebugMsg('Peephole AddSub2*s done', p);
-              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
-              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
-              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
-              taicpu(p).loadconst(1,taicpu(p).oper[2]^.val);
-              taicpu(p).oppostfix:=PF_S;
-              taicpu(p).ops := 2;
-              result:=true;
-            end
-          else if MatchInstruction(p, [A_ADD,A_SUB], [C_None], [PF_None]) and
-            (taicpu(p).ops = 2) and
-            (taicpu(p).oper[1]^.typ=top_reg) and
-            (not MatchOperand(taicpu(p).oper[0]^, NR_STACK_POINTER_REG)) and
-            (not MatchOperand(taicpu(p).oper[1]^, NR_STACK_POINTER_REG)) and
-            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
-            begin
-              DebugMsg('Peephole AddSub2*s done', p);
-              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
-              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
-              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
-              taicpu(p).oppostfix:=PF_S;
-              result:=true;
-            end
-          else if MatchInstruction(p, [A_ADD], [C_None], [PF_None]) and
-            (taicpu(p).ops = 3) and
-            MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
-            (taicpu(p).oper[2]^.typ=top_reg) then
-            begin
-              DebugMsg('Peephole AddRRR2AddRR done', p);
-              taicpu(p).ops := 2;
-              taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg);
-              result:=true;
-            end
-          else if MatchInstruction(p, [A_AND,A_ORR,A_EOR,A_BIC,A_LSL,A_LSR,A_ASR,A_ROR], [C_None], [PF_None]) and
-            (taicpu(p).ops = 3) and
-            MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
-            (taicpu(p).oper[2]^.typ=top_reg) and
-            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
-            begin
-              DebugMsg('Peephole opXXY2opsXY done', p);
-              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
-              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
-              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
-              taicpu(p).ops := 2;
-              taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg);
-              taicpu(p).oppostfix:=PF_S;
-              result:=true;
-            end
-          else if MatchInstruction(p, [A_AND,A_ORR,A_EOR,A_BIC,A_LSL,A_LSR,A_ASR,A_ROR], [C_None], [PF_S]) and
-            (taicpu(p).ops = 3) and
-            MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
-            (taicpu(p).oper[2]^.typ in [top_reg,top_const]) then
-            begin
-              DebugMsg('Peephole opXXY2opXY done', p);
-              taicpu(p).ops := 2;
-              if taicpu(p).oper[2]^.typ=top_reg then
-                taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg)
-              else
-                taicpu(p).loadconst(1,taicpu(p).oper[2]^.val);
-              result:=true;
-            end
-          else if MatchInstruction(p, [A_AND,A_ORR,A_EOR], [C_None], [PF_None,PF_S]) and
-            (taicpu(p).ops = 3) and
-            MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[2]^) and
-            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
-            begin
-              DebugMsg('Peephole opXYX2opsXY done', p);
-              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
-              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
-              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
-              taicpu(p).oppostfix:=PF_S;
-              taicpu(p).ops := 2;
-              result:=true;
-            end
-          else if MatchInstruction(p, [A_MOV], [C_None], [PF_None]) and
-            (taicpu(p).ops=3) and
-            (taicpu(p).oper[2]^.typ=top_shifterop) and
-            (taicpu(p).oper[2]^.shifterop^.shiftmode in [SM_LSL,SM_LSR,SM_ASR,SM_ROR]) and
-            //MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
-            (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
-            begin
-              DebugMsg('Peephole Mov2Shift done', p);
-              asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
-              asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
-              IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
-              taicpu(p).oppostfix:=PF_S;
+          DebugMsg('Peephole Mov2Movs done', p);
+          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
+          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
+          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
+          taicpu(p).oppostfix:=PF_S;
+          result:=true;
+        end
+      else if MatchInstruction(p, A_MVN, [C_None], [PF_None]) and
+        (taicpu(p).oper[1]^.typ=top_reg) and
+        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
+        begin
+          DebugMsg('Peephole Mvn2Mvns done', p);
+          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
+          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
+          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
+          taicpu(p).oppostfix:=PF_S;
+          result:=true;
+        end
+      else if MatchInstruction(p, A_RSB, [C_None], [PF_None]) and
+        (taicpu(p).ops = 3) and
+        (taicpu(p).oper[2]^.typ=top_const) and
+        (taicpu(p).oper[2]^.val=0) and
+        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
+        begin
+          DebugMsg('Peephole Rsb2Rsbs done', p);
+          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
+          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
+          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
+          taicpu(p).oppostfix:=PF_S;
+          result:=true;
+        end
+      else if MatchInstruction(p, [A_ADD,A_SUB], [C_None], [PF_None]) and
+        (taicpu(p).ops = 3) and
+        MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
+        (not MatchOperand(taicpu(p).oper[0]^, NR_STACK_POINTER_REG)) and
+        (taicpu(p).oper[2]^.typ=top_const) and
+        (taicpu(p).oper[2]^.val >= 0) and
+        (taicpu(p).oper[2]^.val < 256) and
+        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
+        begin
+          DebugMsg('Peephole AddSub2*s done', p);
+          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
+          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
+          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
+          taicpu(p).loadconst(1,taicpu(p).oper[2]^.val);
+          taicpu(p).oppostfix:=PF_S;
+          taicpu(p).ops := 2;
+          result:=true;
+        end
+      else if MatchInstruction(p, [A_ADD,A_SUB], [C_None], [PF_None]) and
+        (taicpu(p).ops = 2) and
+        (taicpu(p).oper[1]^.typ=top_reg) and
+        (not MatchOperand(taicpu(p).oper[0]^, NR_STACK_POINTER_REG)) and
+        (not MatchOperand(taicpu(p).oper[1]^, NR_STACK_POINTER_REG)) and
+        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
+        begin
+          DebugMsg('Peephole AddSub2*s done', p);
+          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
+          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
+          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
+          taicpu(p).oppostfix:=PF_S;
+          result:=true;
+        end
+      else if MatchInstruction(p, [A_ADD], [C_None], [PF_None]) and
+        (taicpu(p).ops = 3) and
+        MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
+        (taicpu(p).oper[2]^.typ=top_reg) then
+        begin
+          DebugMsg('Peephole AddRRR2AddRR done', p);
+          taicpu(p).ops := 2;
+          taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg);
+          result:=true;
+        end
+      else if MatchInstruction(p, [A_AND,A_ORR,A_EOR,A_BIC,A_LSL,A_LSR,A_ASR,A_ROR], [C_None], [PF_None]) and
+        (taicpu(p).ops = 3) and
+        MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
+        (taicpu(p).oper[2]^.typ=top_reg) and
+        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
+        begin
+          DebugMsg('Peephole opXXY2opsXY done', p);
+          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
+          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
+          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
+          taicpu(p).ops := 2;
+          taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg);
+          taicpu(p).oppostfix:=PF_S;
+          result:=true;
+        end
+      else if MatchInstruction(p, [A_AND,A_ORR,A_EOR,A_BIC,A_LSL,A_LSR,A_ASR,A_ROR], [C_None], [PF_S]) and
+        (taicpu(p).ops = 3) and
+        MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
+        (taicpu(p).oper[2]^.typ in [top_reg,top_const]) then
+        begin
+          DebugMsg('Peephole opXXY2opXY done', p);
+          taicpu(p).ops := 2;
+          if taicpu(p).oper[2]^.typ=top_reg then
+            taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg)
+          else
+            taicpu(p).loadconst(1,taicpu(p).oper[2]^.val);
+          result:=true;
+        end
+      else if MatchInstruction(p, [A_AND,A_ORR,A_EOR], [C_None], [PF_None,PF_S]) and
+        (taicpu(p).ops = 3) and
+        MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[2]^) and
+        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
+        begin
+          DebugMsg('Peephole opXYX2opsXY done', p);
+          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
+          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
+          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
+          taicpu(p).oppostfix:=PF_S;
+          taicpu(p).ops := 2;
+          result:=true;
+        end
+      else if MatchInstruction(p, [A_MOV], [C_None], [PF_None]) and
+        (taicpu(p).ops=3) and
+        (taicpu(p).oper[2]^.typ=top_shifterop) and
+        (taicpu(p).oper[2]^.shifterop^.shiftmode in [SM_LSL,SM_LSR,SM_ASR,SM_ROR]) and
+        //MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and
+        (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then
+        begin
+          DebugMsg('Peephole Mov2Shift done', p);
+          asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p);
+          asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p);
+          IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs);
+          taicpu(p).oppostfix:=PF_S;
 
-              case taicpu(p).oper[2]^.shifterop^.shiftmode of
-                SM_LSL: taicpu(p).opcode:=A_LSL;
-                SM_LSR: taicpu(p).opcode:=A_LSR;
-                SM_ASR: taicpu(p).opcode:=A_ASR;
-                SM_ROR: taicpu(p).opcode:=A_ROR;
-                else
-                  internalerror(2019050912);
-              end;
+          case taicpu(p).oper[2]^.shifterop^.shiftmode of
+            SM_LSL: taicpu(p).opcode:=A_LSL;
+            SM_LSR: taicpu(p).opcode:=A_LSR;
+            SM_ASR: taicpu(p).opcode:=A_ASR;
+            SM_ROR: taicpu(p).opcode:=A_ROR;
+            else
+              internalerror(2019050912);
+          end;
 
-              if taicpu(p).oper[2]^.shifterop^.rs<>NR_NO then
-                taicpu(p).loadreg(2, taicpu(p).oper[2]^.shifterop^.rs)
-              else
-                taicpu(p).loadconst(2, taicpu(p).oper[2]^.shifterop^.shiftimm);
-              result:=true;
-            end
+          if taicpu(p).oper[2]^.shifterop^.rs<>NR_NO then
+            taicpu(p).loadreg(2, taicpu(p).oper[2]^.shifterop^.rs)
+          else
+            taicpu(p).loadconst(2, taicpu(p).oper[2]^.shifterop^.shiftimm);
+          result:=true;
         end;
     end;
 
Index: compiler/i386/aoptcpu.pas
===================================================================
--- compiler/i386/aoptcpu.pas	(revision 43679)
+++ compiler/i386/aoptcpu.pas	(working copy)
@@ -263,77 +263,70 @@
         hp1: tai;
       begin
         Result:=false;
-        case p.Typ Of
-          Ait_Instruction:
-            begin
-              if InsContainsSegRef(taicpu(p)) then
-                Exit;
-              case taicpu(p).opcode Of
-                A_CALL:
-                  Result:=PostPeepHoleOptCall(p);
-                A_LEA:
-                  Result:=PostPeepholeOptLea(p);
-                A_CMP:
-                  Result:=PostPeepholeOptCmp(p);
-                A_MOV:
-                  Result:=PostPeepholeOptMov(p);
-                A_MOVZX:
-                  { if register vars are on, it's possible there is code like }
-                  {   "cmpl $3,%eax; movzbl 8(%ebp),%ebx; je .Lxxx"           }
-                  { so we can't safely replace the movzx then with xor/mov,   }
-                  { since that would change the flags (JM)                    }
-                  if not(cs_opt_regvar in current_settings.optimizerswitches) then
+        if InsContainsSegRef(taicpu(p)) then
+          Exit;
+        case taicpu(p).opcode Of
+          A_CALL:
+            Result:=PostPeepHoleOptCall(p);
+          A_LEA:
+            Result:=PostPeepholeOptLea(p);
+          A_CMP:
+            Result:=PostPeepholeOptCmp(p);
+          A_MOV:
+            Result:=PostPeepholeOptMov(p);
+          A_MOVZX:
+            { if register vars are on, it's possible there is code like }
+            {   "cmpl $3,%eax; movzbl 8(%ebp),%ebx; je .Lxxx"           }
+            { so we can't safely replace the movzx then with xor/mov,   }
+            { since that would change the flags (JM)                    }
+            if not(cs_opt_regvar in current_settings.optimizerswitches) then
+              begin
+                if (taicpu(p).oper[1]^.typ = top_reg) then
+                  if (taicpu(p).oper[0]^.typ = top_reg)
+                    then
+                      case taicpu(p).opsize of
+                        S_BL:
+                          begin
+                            if IsGP32Reg(taicpu(p).oper[1]^.reg) and
+                               not(cs_opt_size in current_settings.optimizerswitches) and
+                               (current_settings.optimizecputype = cpu_Pentium) then
+                                {Change "movzbl %reg1, %reg2" to
+                                 "xorl %reg2, %reg2; movb %reg1, %reg2" for Pentium and
+                                 PentiumMMX}
+                              begin
+                                hp1 := taicpu.op_reg_reg(A_XOR, S_L,
+                                            taicpu(p).oper[1]^.reg, taicpu(p).oper[1]^.reg);
+                                InsertLLItem(p.previous, p, hp1);
+                                taicpu(p).opcode := A_MOV;
+                                taicpu(p).changeopsize(S_B);
+                                setsubreg(taicpu(p).oper[1]^.reg,R_SUBL);
+                            end;
+                        end;
+                      else
+                        ;
+                    end
+                  else if (taicpu(p).oper[0]^.typ = top_ref) and
+                      (taicpu(p).oper[0]^.ref^.base <> taicpu(p).oper[1]^.reg) and
+                      (taicpu(p).oper[0]^.ref^.index <> taicpu(p).oper[1]^.reg) and
+                      not(cs_opt_size in current_settings.optimizerswitches) and
+                      IsGP32Reg(taicpu(p).oper[1]^.reg) and
+                      (current_settings.optimizecputype = cpu_Pentium) and
+                      (taicpu(p).opsize = S_BL) then
+                    {changes "movzbl mem, %reg" to "xorl %reg, %reg; movb mem, %reg8" for
+                      Pentium and PentiumMMX}
                     begin
-                      if (taicpu(p).oper[1]^.typ = top_reg) then
-                        if (taicpu(p).oper[0]^.typ = top_reg)
-                          then
-                            case taicpu(p).opsize of
-                              S_BL:
-                                begin
-                                  if IsGP32Reg(taicpu(p).oper[1]^.reg) and
-                                     not(cs_opt_size in current_settings.optimizerswitches) and
-                                     (current_settings.optimizecputype = cpu_Pentium) then
-                                      {Change "movzbl %reg1, %reg2" to
-                                       "xorl %reg2, %reg2; movb %reg1, %reg2" for Pentium and
-                                       PentiumMMX}
-                                    begin
-                                      hp1 := taicpu.op_reg_reg(A_XOR, S_L,
-                                                  taicpu(p).oper[1]^.reg, taicpu(p).oper[1]^.reg);
-                                      InsertLLItem(p.previous, p, hp1);
-                                      taicpu(p).opcode := A_MOV;
-                                      taicpu(p).changeopsize(S_B);
-                                      setsubreg(taicpu(p).oper[1]^.reg,R_SUBL);
-                                    end;
-                                end;
-                              else
-                                ;
-                            end
-                          else if (taicpu(p).oper[0]^.typ = top_ref) and
-                              (taicpu(p).oper[0]^.ref^.base <> taicpu(p).oper[1]^.reg) and
-                              (taicpu(p).oper[0]^.ref^.index <> taicpu(p).oper[1]^.reg) and
-                              not(cs_opt_size in current_settings.optimizerswitches) and
-                              IsGP32Reg(taicpu(p).oper[1]^.reg) and
-                              (current_settings.optimizecputype = cpu_Pentium) and
-                              (taicpu(p).opsize = S_BL) then
-                            {changes "movzbl mem, %reg" to "xorl %reg, %reg; movb mem, %reg8" for
-                              Pentium and PentiumMMX}
-                            begin
-                              hp1 := taicpu.Op_reg_reg(A_XOR, S_L, taicpu(p).oper[1]^.reg,
-                                          taicpu(p).oper[1]^.reg);
-                              taicpu(p).opcode := A_MOV;
-                              taicpu(p).changeopsize(S_B);
-                              setsubreg(taicpu(p).oper[1]^.reg,R_SUBL);
-                              InsertLLItem(p.previous, p, hp1);
-                            end;
-                   end;
-                A_TEST, A_OR:
-                  Result:=PostPeepholeOptTestOr(p);
-                else
-                  ;
-              end;
-            end;
+                      hp1 := taicpu.Op_reg_reg(A_XOR, S_L, taicpu(p).oper[1]^.reg,
+                                  taicpu(p).oper[1]^.reg);
+                      taicpu(p).opcode := A_MOV;
+                      taicpu(p).changeopsize(S_B);
+                      setsubreg(taicpu(p).oper[1]^.reg,R_SUBL);
+                      InsertLLItem(p.previous, p, hp1);
+                    end;
+               end;
+          A_TEST, A_OR:
+            Result:=PostPeepholeOptTestOr(p);
           else
-            ;
+            { Do nothing };
         end;
       end;
 
Index: compiler/i8086/aoptcpu.pas
===================================================================
--- compiler/i8086/aoptcpu.pas	(revision 43679)
+++ compiler/i8086/aoptcpu.pas	(working copy)
@@ -151,22 +151,15 @@
     function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean;
       begin
         result := false;
-        case p.typ of
-          ait_instruction:
-            begin
-              case taicpu(p).opcode of
-                {A_MOV commented out, because it still breaks some i8086 code :( }
-                {A_MOV:
-                  Result:=PostPeepholeOptMov(p);}
-                A_CMP:
-                  Result:=PostPeepholeOptCmp(p);
-                A_OR,
-                A_TEST:
-                  Result:=PostPeepholeOptTestOr(p);
-                else
-                  ;
-              end;
-            end;
+        case taicpu(p).opcode of
+          {A_MOV commented out, because it still breaks some i8086 code :( }
+          {A_MOV:
+            Result:=PostPeepholeOptMov(p);}
+          A_CMP:
+            Result:=PostPeepholeOptCmp(p);
+          A_OR,
+          A_TEST:
+            Result:=PostPeepholeOptTestOr(p);
           else
             ;
         end;
Index: compiler/jvm/aoptcpu.pas
===================================================================
--- compiler/jvm/aoptcpu.pas	(revision 43679)
+++ compiler/jvm/aoptcpu.pas	(working copy)
@@ -175,9 +175,7 @@
 
   function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean;
     begin
-      result:=
-        (p.typ=ait_instruction) and
-        RemoveLoadLoadSwap(p);
+      Result := RemoveLoadLoadSwap(p);
     end;
 
 begin
Index: compiler/powerpc/aoptcpu.pas
===================================================================
--- compiler/powerpc/aoptcpu.pas	(revision 43679)
+++ compiler/powerpc/aoptcpu.pas	(working copy)
@@ -441,70 +441,63 @@
       next1: tai;
     begin
       result := false;
-      case p.typ of
-        ait_instruction:
+      case taicpu(p).opcode of
+        A_RLWINM_:
           begin
-            case taicpu(p).opcode of
-              A_RLWINM_:
+            // rlwinm_ is cracked on the G5, andi_/andis_ aren't
+            if (taicpu(p).oper[2]^.val = 0) then
+              if (taicpu(p).oper[3]^.val < 16) and
+                 (taicpu(p).oper[4]^.val < 16) then
                 begin
-                  // rlwinm_ is cracked on the G5, andi_/andis_ aren't
-                  if (taicpu(p).oper[2]^.val = 0) then
-                    if (taicpu(p).oper[3]^.val < 16) and
-                       (taicpu(p).oper[4]^.val < 16) then
-                      begin
-                        taicpu(p).opcode := A_ANDIS_;
-                        taicpu(p).oper[2]^.val := word(
-                          ((1 shl (16-taicpu(p).oper[3]^.val)) - 1) xor
-                          ((1 shl (15-taicpu(p).oper[4]^.val)) - 1));
-                        taicpu(p).freeop(3);
-                        taicpu(p).freeop(4);
-                        taicpu(p).ops := 3;
-                        taicpu(p).opercnt := 3;
-                      end
-                    else if (taicpu(p).oper[3]^.val >= 16) and
-                       (taicpu(p).oper[4]^.val >= 16) then
-                      begin
-                        taicpu(p).opcode := A_ANDI_;
-                        taicpu(p).oper[2]^.val := word(rlwinm2mask(taicpu(p).oper[3]^.val,taicpu(p).oper[4]^.val));
-                        taicpu(p).freeop(3);
-                        taicpu(p).freeop(4);
-                        taicpu(p).ops := 3;
-                        taicpu(p).opercnt := 3;
-                      end;
+                  taicpu(p).opcode := A_ANDIS_;
+                  taicpu(p).oper[2]^.val := word(
+                    ((1 shl (16-taicpu(p).oper[3]^.val)) - 1) xor
+                    ((1 shl (15-taicpu(p).oper[4]^.val)) - 1));
+                  taicpu(p).freeop(3);
+                  taicpu(p).freeop(4);
+                  taicpu(p).ops := 3;
+                  taicpu(p).opercnt := 3;
+                end
+              else if (taicpu(p).oper[3]^.val >= 16) and
+                 (taicpu(p).oper[4]^.val >= 16) then
+                begin
+                  taicpu(p).opcode := A_ANDI_;
+                  taicpu(p).oper[2]^.val := word(rlwinm2mask(taicpu(p).oper[3]^.val,taicpu(p).oper[4]^.val));
+                  taicpu(p).freeop(3);
+                  taicpu(p).freeop(4);
+                  taicpu(p).ops := 3;
+                  taicpu(p).opercnt := 3;
                 end;
-              else
-                ;
-            end;
-
-            // change "integer operation with destination reg" followed by a
-            // comparison to zero of that reg, with a variant of that integer
-            // operation which sets the flags (if it exists)
-            if not(result) and
-               (taicpu(p).ops >= 2) and
-               (taicpu(p).oper[0]^.typ = top_reg) and
-               (taicpu(p).oper[1]^.typ = top_reg) and
-               getnextinstruction(p,next1) and
-               (next1.typ = ait_instruction) and
-               (taicpu(next1).opcode = A_CMPWI) and
-               // make sure it the result goes to cr0
-               (((taicpu(next1).ops = 2) and
-                 (taicpu(next1).oper[1]^.val = 0) and
-                 (taicpu(next1).oper[0]^.reg = taicpu(p).oper[0]^.reg)) or
-                ((taicpu(next1).ops = 3) and
-                 (taicpu(next1).oper[2]^.val = 0) and
-                 (taicpu(next1).oper[0]^.typ = top_reg) and
-                 (getsupreg(taicpu(next1).oper[0]^.reg) = RS_CR0) and
-                 (taicpu(next1).oper[1]^.reg = taicpu(p).oper[0]^.reg))) and
-               changetomodifyflags(taicpu(p)) then
-              begin
-                asml.remove(next1);
-                next1.free;
-                result := true;
-              end;
           end;
         else
           ;
       end;
+
+      // change "integer operation with destination reg" followed by a
+      // comparison to zero of that reg, with a variant of that integer
+      // operation which sets the flags (if it exists)
+      if not(result) and
+         (taicpu(p).ops >= 2) and
+         (taicpu(p).oper[0]^.typ = top_reg) and
+         (taicpu(p).oper[1]^.typ = top_reg) and
+         getnextinstruction(p,next1) and
+         (next1.typ = ait_instruction) and
+         (taicpu(next1).opcode = A_CMPWI) and
+         // make sure it the result goes to cr0
+         (((taicpu(next1).ops = 2) and
+           (taicpu(next1).oper[1]^.val = 0) and
+           (taicpu(next1).oper[0]^.reg = taicpu(p).oper[0]^.reg)) or
+          ((taicpu(next1).ops = 3) and
+           (taicpu(next1).oper[2]^.val = 0) and
+           (taicpu(next1).oper[0]^.typ = top_reg) and
+           (getsupreg(taicpu(next1).oper[0]^.reg) = RS_CR0) and
+           (taicpu(next1).oper[1]^.reg = taicpu(p).oper[0]^.reg))) and
+         changetomodifyflags(taicpu(p)) then
+        begin
+          asml.remove(next1);
+          next1.free;
+          result := true;
+        end;
     end;
 
 begin
Index: compiler/riscv32/aoptcpu.pas
===================================================================
--- compiler/riscv32/aoptcpu.pas	(revision 43679)
+++ compiler/riscv32/aoptcpu.pas	(working copy)
@@ -63,17 +63,8 @@
 
 
   function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean;
-    var
-      next1: tai;
     begin
-      result := false;
-      case p.typ of
-        ait_instruction:
-          begin
-          end;
-        else
-          ;
-      end;
+      Result := False;
     end;
 
 begin
Index: compiler/x86_64/aoptcpu.pas
===================================================================
--- compiler/x86_64/aoptcpu.pas	(revision 43679)
+++ compiler/x86_64/aoptcpu.pas	(working copy)
@@ -168,29 +168,22 @@
     function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean;
       begin
         result := false;
-        case p.typ of
-          ait_instruction:
-            begin
-              case taicpu(p).opcode of
-                A_MOV:
-                  Result:=PostPeepholeOptMov(p);
-                A_MOVZX:
-                  Result:=PostPeepholeOptMovzx(p);
-                A_CMP:
-                  Result:=PostPeepholeOptCmp(p);
-                A_OR,
-                A_TEST:
-                  Result:=PostPeepholeOptTestOr(p);
-                A_XOR:
-                  Result:=PostPeepholeOptXor(p);
-                A_CALL:
-                  Result:=PostPeepholeOptCall(p);
-                A_LEA:
-                  Result:=PostPeepholeOptLea(p);
-                else
-                  ;
-              end;
-            end;
+        case taicpu(p).opcode of
+          A_MOV:
+            Result:=PostPeepholeOptMov(p);
+          A_MOVZX:
+            Result:=PostPeepholeOptMovzx(p);
+          A_CMP:
+            Result:=PostPeepholeOptCmp(p);
+          A_OR,
+          A_TEST:
+            Result:=PostPeepholeOptTestOr(p);
+          A_XOR:
+            Result:=PostPeepholeOptXor(p);
+          A_CALL:
+            Result:=PostPeepholeOptCall(p);
+          A_LEA:
+            Result:=PostPeepholeOptLea(p);
           else
             ;
         end;

J. Gareth Moreton

2019-12-16 02:25

developer   ~0119872

Last edited: 2019-12-16 02:30

View 2 revisions

Completed a full regression run with a number of patches combined, namely this one, the two over at 0036382, and a patch I sent Florian privately that addresses an internal error for x86_64-darwin - no regressions noted on i386-win32 and x86_64-win64. Will attempt i386-linux tonight.

J. Gareth Moreton

2019-12-16 15:57

developer   ~0119885

One new failure in i386-linux: webtbs/tw2377.pp - now to determine if that is due to my additions or me unintentionally pressing something on the keyboard during that test (it tests the Keyboard unit).

J. Gareth Moreton

2019-12-16 18:50

developer   ~0119889

Can't seem to reproduce the failure. Seemed to be a glitch elsewhere. Granted, my changes were to the peephole optimiser so a failure on this test is non-sensical anyway. Still, might need another party to test.

Florian

2019-12-29 10:35

administrator   ~0120122

I do not like the patch for two (connected) reasons:
  - it makes PostPeepHoleOpts behave different from the other helpers, thus harder to understand
  - even if the first point is might not be valid: if the parameter is ensured to be a taicpu, it should have this type

J. Gareth Moreton

2019-12-29 15:15

developer   ~0120132

I guess the points are valid. I always thought the idea with the post-Peephole stage is to convert instructions into more efficient forms after all other optimisations are complete. I'll see what I can do in making it easier to understand.

J. Gareth Moreton

2019-12-29 15:25

developer   ~0120134

I don't think the parameter type can be easily changed from tai to taicpu though because if an instruction gets deleted, what appears next in the list may not be an instruction. The GetNextInstruction method cannot be used because this doesn't update UsedRegs, so something else is required. All I can do is come up with a potential showcase and hope it isn't too complex.

J. Gareth Moreton

2020-01-04 13:08

developer   ~0120208

Suspending this one for now because there is a possibility of improving this on all peephole stages - requires further investigation.

Issue History

Date Modified Username Field Change
2019-12-14 10:56 J. Gareth Moreton New Issue
2019-12-14 10:56 J. Gareth Moreton File Added: PostPeepholeRegisters.patch
2019-12-14 10:57 J. Gareth Moreton Tag Attached: patch
2019-12-14 10:57 J. Gareth Moreton Tag Attached: compiler
2019-12-14 10:57 J. Gareth Moreton Tag Attached: optimizations
2019-12-14 10:57 J. Gareth Moreton Priority normal => low
2019-12-14 10:57 J. Gareth Moreton Severity minor => tweak
2019-12-14 10:57 J. Gareth Moreton FPCTarget => -
2019-12-16 02:25 J. Gareth Moreton Note Added: 0119872
2019-12-16 02:30 J. Gareth Moreton Note Edited: 0119872 View Revisions
2019-12-16 15:57 J. Gareth Moreton Note Added: 0119885
2019-12-16 18:50 J. Gareth Moreton Note Added: 0119889
2019-12-29 10:35 Florian Note Added: 0120122
2019-12-29 15:15 J. Gareth Moreton Note Added: 0120132
2019-12-29 15:25 J. Gareth Moreton Note Added: 0120134
2020-01-04 13:08 J. Gareth Moreton Assigned To => J. Gareth Moreton
2020-01-04 13:08 J. Gareth Moreton Status new => closed
2020-01-04 13:08 J. Gareth Moreton Resolution open => suspended
2020-01-04 13:08 J. Gareth Moreton Note Added: 0120208