View Issue Details

IDProjectCategoryView StatusLast Update
0036687FPCCompilerpublic2020-02-22 19:38
ReporterJ. Gareth MoretonAssigned ToFlorian 
PrioritynormalSeverityminorReproducibilityN/A
Status resolvedResolutionfixed 
Platformi386 and x86_64OSMicrosoft WindowsOS Version10 Professional
Product Version3.3.1Product Buildr44140 
Target VersionFixed in Version3.3.1 
Summary0036687: [Patch] Processor-aware MOVZX optimisation cleanup
DescriptionThis patch resolves some issues where MOVZX and MOV/AND optimisations were 'fighting' each other. Now, if the compiler is optimising for the Pentium II or later (always the case with x86-64) or for size, it will favour MOVZX operations (under i8086, it will check to see if the CPU is at least an 80386, the processor that the instruction was introduced on); otherwise, it will favour the MOV/AND optimisations.

In cases where a register is zero-extended to itself, when optimising for size, it will favour using AND if the register in question is EAX, since this compiles into fewer bytes of machine code. This optimisation is not favoured for speed because it risks a partial write penalty.
Steps To ReproduceApply patch and confirm correct compilation and better use of MOVZX (or lack of, depending on the optimisation settings).
Additional InformationThis patch was tested as part of a mass-patch regression test alongside 0036669, 0036670, 0036675 and 0036680 and passed successfully, using the options "-O4 -CpCOREAVX2" when building the compiler for i386-win32 and x86_64-win64.

Additionally, optimisations "var16", "var17" and "var18" were removed because there aren't any 64-bit versions of MOVZX (the 32-bit versions do this implicitly), so these optimisations should logically never execute. If it turns out they were executed, Internal Error 2017050704 will now be raised, because the compiler should not be generating such 64-bit MOVZX instructions.
Tagscompiler, i386, optimization, patch, x86_64
Fixed in Revision44233
FPCOldBugId
FPCTarget-
Attached Files
  • movzx-choices.patch (17,667 bytes)
    Index: compiler/x86/aoptx86.pas
    ===================================================================
    --- compiler/x86/aoptx86.pas	(revision 44124)
    +++ compiler/x86/aoptx86.pas	(working copy)
    @@ -43,6 +43,8 @@
             function GetNextInstructionUsingReg(Current: tai; out Next: tai; reg: TRegister): Boolean;
             function RegModifiedByInstruction(Reg: TRegister; p1: tai): boolean; override;
           protected
    +        class function IsMOVZXAcceptable: Boolean; static; inline;
    +
             { checks whether loading a new value in reg1 overwrites the entirety of reg2 }
             function Reg1WriteOverwritesReg2Entirely(reg1, reg2: tregister): boolean;
             { checks whether reading the value in reg1 depends on the value of reg2. This
    @@ -842,6 +844,25 @@
           end;
     {$endif DEBUG_AOPTCPU}
     
    +    class function TX86AsmOptimizer.IsMOVZXAcceptable: Boolean; inline;
    +      begin
    +{$ifdef x86_64}
    +        { Always fine on x86-64 }
    +        Result := True;
    +{$else x86_64}
    +        Result :=
    +{$ifdef i8086}
    +          (current_settings.cputype >= cpu_386) and
    +{$endif i8086}
    +          (
    +            { Always accept if optimising for size }
    +            (cs_opt_size in current_settings.optimizerswitches) or
    +            { From the Pentium II onwards, MOVZX only takes 1 cycle. [Kit] }
    +            (current_settings.optimizecputype >= cpu_Pentium2)
    +          );
    +{$endif x86_64}
    +      end;
    +
         function TX86AsmOptimizer.Reg1WriteOverwritesReg2Entirely(reg1, reg2: tregister): boolean;
           begin
             if not SuperRegistersEqual(reg1,reg2) then
    @@ -1787,17 +1808,13 @@
                           change it to a MOVZX instruction when optimising for speed.
                         }
                         if not (cs_opt_size in current_settings.optimizerswitches) and
    -{$ifdef i8086}
    -                      { MOVZX was only introduced on the 386 }
    -                      (current_settings.cputype >= cpu_386) and
    -{$endif i8086}
    -                      (
    -                        (taicpu(hp1).opsize < taicpu(p).opsize)
    +                      IsMOVZXAcceptable and
    +                      (taicpu(hp1).opsize < taicpu(p).opsize)
     {$ifdef x86_64}
    -                        { operations already implicitly set the upper 64 bits to zero }
    -                        and not ((taicpu(hp1).opsize = S_L) and (taicpu(p).opsize = S_Q))
    +                      { operations already implicitly set the upper 64 bits to zero }
    +                      and not ((taicpu(hp1).opsize = S_L) and (taicpu(p).opsize = S_Q))
     {$endif x86_64}
    -                      ) then
    +                      then
                           begin
                             CurrentReg := taicpu(hp1).oper[1]^.reg;
     
    @@ -1907,7 +1924,8 @@
                         ;
                     end;
                   end
    -            else if (taicpu(p).oper[1]^.typ = top_reg) and (taicpu(hp1).oper[1]^.typ = top_reg) and
    +            else if IsMOVZXAcceptable and
    +              (taicpu(p).oper[1]^.typ = top_reg) and (taicpu(hp1).oper[1]^.typ = top_reg) and
                   (taicpu(p).oper[0]^.typ <> top_const) and { MOVZX only supports registers and memory, not immediates (use MOV for that!) }
                   (getsupreg(taicpu(p).oper[1]^.reg) = getsupreg(taicpu(hp1).oper[1]^.reg))
                   then
    @@ -5086,103 +5104,133 @@
                         ;
                     end;
                   end;
    -            { changes some movzx constructs to faster synonims (all examples
    +            { changes some movzx constructs to faster synonyms (all examples
                   are given with eax/ax, but are also valid for other registers)}
                 if (taicpu(p).oper[1]^.typ = top_reg) then
                   if (taicpu(p).oper[0]^.typ = top_reg) then
    -                case taicpu(p).opsize of
    -                  S_BW:
    -                    begin
    -                      if (getsupreg(taicpu(p).oper[0]^.reg)=getsupreg(taicpu(p).oper[1]^.reg)) and
    -                         not(cs_opt_size in current_settings.optimizerswitches) then
    -                        {Change "movzbw %al, %ax" to "andw $0x0ffh, %ax"}
    +                begin
    +                  case taicpu(p).opsize of
    +                    { Technically, movzbw %al,%ax cannot be encoded in 32/64-bit mode
    +                      (the machine code is equivalent to movzbl %al,%eax), but the
    +                      code generator still generates that assembler instruction and
    +                      it is silently converted.  This should probably be checked.
    +                      [Kit] }
    +                    S_BW:
    +                      begin
    +                        if (getsupreg(taicpu(p).oper[0]^.reg)=getsupreg(taicpu(p).oper[1]^.reg)) and
    +                          (
    +                            not IsMOVZXAcceptable
    +                            { and $0xff,%ax has a smaller encoding but risks a partial write penalty }
    +                            or (
    +                              (cs_opt_size in current_settings.optimizerswitches) and
    +                              (taicpu(p).oper[1]^.reg = NR_AX)
    +                            )
    +                          ) then
    +                          {Change "movzbw %al, %ax" to "andw $0x0ffh, %ax"}
    +                          begin
    +                            DebugMsg(SPeepholeOptimization + 'var7',p);
    +                            taicpu(p).opcode := A_AND;
    +                            taicpu(p).changeopsize(S_W);
    +                            taicpu(p).loadConst(0,$ff);
    +                            Result := True;
    +                          end
    +                        else if not IsMOVZXAcceptable and
    +                          GetNextInstruction(p, hp1) and
    +                          (tai(hp1).typ = ait_instruction) and
    +                          (taicpu(hp1).opcode = A_AND) and
    +                          (taicpu(hp1).oper[0]^.typ = top_const) and
    +                          (taicpu(hp1).oper[1]^.typ = top_reg) and
    +                          (taicpu(hp1).oper[1]^.reg = taicpu(p).oper[1]^.reg) then
    +                        { Change "movzbw %reg1, %reg2; andw $const, %reg2"
    +                          to "movw %reg1, reg2; andw $(const1 and $ff), %reg2"}
    +                          begin
    +                            DebugMsg(SPeepholeOptimization + 'var8',p);
    +                            taicpu(p).opcode := A_MOV;
    +                            taicpu(p).changeopsize(S_W);
    +                            setsubreg(taicpu(p).oper[0]^.reg,R_SUBW);
    +                            taicpu(hp1).loadConst(0,taicpu(hp1).oper[0]^.val and $ff);
    +                            Result := True;
    +                          end;
    +                      end;
    +{$ifndef i8086} { movzbl %al,%eax cannot be encoded in 16-bit mode (the machine code is equivalent to movzbw %al,%ax }
    +                    S_BL:
    +                      begin
    +                        if (getsupreg(taicpu(p).oper[0]^.reg)=getsupreg(taicpu(p).oper[1]^.reg)) and
    +                          (
    +                            not IsMOVZXAcceptable
    +                            { and $0xff,%eax has a smaller encoding but risks a partial write penalty }
    +                            or (
    +                              (cs_opt_size in current_settings.optimizerswitches) and
    +                              (taicpu(p).oper[1]^.reg = NR_EAX)
    +                            )
    +                          ) then
    +                          { Change "movzbl %al, %eax" to "andl $0x0ffh, %eax" }
    +                          begin
    +                            DebugMsg(SPeepholeOptimization + 'var9',p);
    +                            taicpu(p).opcode := A_AND;
    +                            taicpu(p).changeopsize(S_L);
    +                            taicpu(p).loadConst(0,$ff);
    +                            Result := True;
    +                          end
    +                        else if not IsMOVZXAcceptable and
    +                          GetNextInstruction(p, hp1) and
    +                          (tai(hp1).typ = ait_instruction) and
    +                          (taicpu(hp1).opcode = A_AND) and
    +                          (taicpu(hp1).oper[0]^.typ = top_const) and
    +                          (taicpu(hp1).oper[1]^.typ = top_reg) and
    +                          (taicpu(hp1).oper[1]^.reg = taicpu(p).oper[1]^.reg) then
    +                          { Change "movzbl %reg1, %reg2; andl $const, %reg2"
    +                            to "movl %reg1, reg2; andl $(const1 and $ff), %reg2"}
    +                          begin
    +                            DebugMsg(SPeepholeOptimization + 'var10',p);
    +                            taicpu(p).opcode := A_MOV;
    +                            taicpu(p).changeopsize(S_L);
    +                            { do not use R_SUBWHOLE
    +                              as movl %rdx,%eax
    +                              is invalid in assembler PM }
    +                            setsubreg(taicpu(p).oper[0]^.reg, R_SUBD);
    +                            taicpu(hp1).loadConst(0,taicpu(hp1).oper[0]^.val and $ff);
    +                            Result := True;
    +                          end;
    +                      end;
    +{$endif i8086}
    +                    S_WL:
    +                      if not IsMOVZXAcceptable then
                             begin
    -                          taicpu(p).opcode := A_AND;
    -                          taicpu(p).changeopsize(S_W);
    -                          taicpu(p).loadConst(0,$ff);
    -                          DebugMsg(SPeepholeOptimization + 'var7',p);
    -                        end
    -                      else if GetNextInstruction(p, hp1) and
    -                        (tai(hp1).typ = ait_instruction) and
    -                        (taicpu(hp1).opcode = A_AND) and
    -                        (taicpu(hp1).oper[0]^.typ = top_const) and
    -                        (taicpu(hp1).oper[1]^.typ = top_reg) and
    -                        (taicpu(hp1).oper[1]^.reg = taicpu(p).oper[1]^.reg) then
    -                      { Change "movzbw %reg1, %reg2; andw $const, %reg2"
    -                        to "movw %reg1, reg2; andw $(const1 and $ff), %reg2"}
    -                        begin
    -                          DebugMsg(SPeepholeOptimization + 'var8',p);
    -                          taicpu(p).opcode := A_MOV;
    -                          taicpu(p).changeopsize(S_W);
    -                          setsubreg(taicpu(p).oper[0]^.reg,R_SUBW);
    -                          taicpu(hp1).loadConst(0,taicpu(hp1).oper[0]^.val and $ff);
    +                          if (getsupreg(taicpu(p).oper[0]^.reg)=getsupreg(taicpu(p).oper[1]^.reg)) then
    +                            { Change "movzwl %ax, %eax" to "andl $0x0ffffh, %eax" }
    +                            begin
    +                              DebugMsg(SPeepholeOptimization + 'var11',p);
    +                              taicpu(p).opcode := A_AND;
    +                              taicpu(p).changeopsize(S_L);
    +                              taicpu(p).loadConst(0,$ffff);
    +                              Result := True;
    +                            end
    +                          else if GetNextInstruction(p, hp1) and
    +                            (tai(hp1).typ = ait_instruction) and
    +                            (taicpu(hp1).opcode = A_AND) and
    +                            (taicpu(hp1).oper[0]^.typ = top_const) and
    +                            (taicpu(hp1).oper[1]^.typ = top_reg) and
    +                            (taicpu(hp1).oper[1]^.reg = taicpu(p).oper[1]^.reg) then
    +                            { Change "movzwl %reg1, %reg2; andl $const, %reg2"
    +                              to "movl %reg1, reg2; andl $(const1 and $ffff), %reg2"}
    +                            begin
    +                              DebugMsg(SPeepholeOptimization + 'var12',p);
    +                              taicpu(p).opcode := A_MOV;
    +                              taicpu(p).changeopsize(S_L);
    +                              { do not use R_SUBWHOLE
    +                                as movl %rdx,%eax
    +                                is invalid in assembler PM }
    +                              setsubreg(taicpu(p).oper[0]^.reg, R_SUBD);
    +                              taicpu(hp1).loadConst(0,taicpu(hp1).oper[0]^.val and $ffff);
    +                              Result := True;
    +                            end;
                             end;
    -                    end;
    -                  S_BL:
    -                    begin
    -                      if (getsupreg(taicpu(p).oper[0]^.reg)=getsupreg(taicpu(p).oper[1]^.reg)) and
    -                         not(cs_opt_size in current_settings.optimizerswitches) then
    -                        { Change "movzbl %al, %eax" to "andl $0x0ffh, %eax" }
    -                        begin
    -                          taicpu(p).opcode := A_AND;
    -                          taicpu(p).changeopsize(S_L);
    -                          taicpu(p).loadConst(0,$ff)
    -                        end
    -                      else if GetNextInstruction(p, hp1) and
    -                        (tai(hp1).typ = ait_instruction) and
    -                        (taicpu(hp1).opcode = A_AND) and
    -                        (taicpu(hp1).oper[0]^.typ = top_const) and
    -                        (taicpu(hp1).oper[1]^.typ = top_reg) and
    -                        (taicpu(hp1).oper[1]^.reg = taicpu(p).oper[1]^.reg) then
    -                        { Change "movzbl %reg1, %reg2; andl $const, %reg2"
    -                          to "movl %reg1, reg2; andl $(const1 and $ff), %reg2"}
    -                        begin
    -                          DebugMsg(SPeepholeOptimization + 'var10',p);
    -                          taicpu(p).opcode := A_MOV;
    -                          taicpu(p).changeopsize(S_L);
    -                          { do not use R_SUBWHOLE
    -                            as movl %rdx,%eax
    -                            is invalid in assembler PM }
    -                          setsubreg(taicpu(p).oper[0]^.reg, R_SUBD);
    -                          taicpu(hp1).loadConst(0,taicpu(hp1).oper[0]^.val and $ff);
    -                        end
    -                    end;
    -{$ifndef i8086}
    -                  S_WL:
    -                    begin
    -                      if (getsupreg(taicpu(p).oper[0]^.reg)=getsupreg(taicpu(p).oper[1]^.reg)) and
    -                        not(cs_opt_size in current_settings.optimizerswitches) then
    -                        { Change "movzwl %ax, %eax" to "andl $0x0ffffh, %eax" }
    -                        begin
    -                          DebugMsg(SPeepholeOptimization + 'var11',p);
    -                          taicpu(p).opcode := A_AND;
    -                          taicpu(p).changeopsize(S_L);
    -                          taicpu(p).loadConst(0,$ffff);
    -                        end
    -                      else if GetNextInstruction(p, hp1) and
    -                        (tai(hp1).typ = ait_instruction) and
    -                        (taicpu(hp1).opcode = A_AND) and
    -                        (taicpu(hp1).oper[0]^.typ = top_const) and
    -                        (taicpu(hp1).oper[1]^.typ = top_reg) and
    -                        (taicpu(hp1).oper[1]^.reg = taicpu(p).oper[1]^.reg) then
    -                        { Change "movzwl %reg1, %reg2; andl $const, %reg2"
    -                          to "movl %reg1, reg2; andl $(const1 and $ffff), %reg2"}
    -                        begin
    -                          DebugMsg(SPeepholeOptimization + 'var12',p);
    -                          taicpu(p).opcode := A_MOV;
    -                          taicpu(p).changeopsize(S_L);
    -                          { do not use R_SUBWHOLE
    -                            as movl %rdx,%eax
    -                            is invalid in assembler PM }
    -                          setsubreg(taicpu(p).oper[0]^.reg, R_SUBD);
    -                          taicpu(hp1).loadConst(0,taicpu(hp1).oper[0]^.val and $ffff);
    -                        end;
    -                    end;
    -{$endif i8086}
    -                  else
    -                    ;
    +                    else
    +                      InternalError(2017050705);
    +                  end;
                     end
    -              else if (taicpu(p).oper[0]^.typ = top_ref) then
    +              else if not IsMOVZXAcceptable and (taicpu(p).oper[0]^.typ = top_ref) then
                       begin
                         if GetNextInstruction(p, hp1) and
                           (tai(hp1).typ = ait_instruction) and
    @@ -5210,31 +5258,10 @@
                                   taicpu(hp1).changeopsize(S_W);
                                   taicpu(hp1).loadConst(0,taicpu(hp1).oper[0]^.val and $ff);
                                 end;
    -{$ifdef x86_64}
    -                          S_BQ:
    -                            begin
    -                              DebugMsg(SPeepholeOptimization + 'var16',p);
    -                              taicpu(hp1).changeopsize(S_Q);
    -                              taicpu(hp1).loadConst(
    -                                0, taicpu(hp1).oper[0]^.val and $ff);
    -                            end;
    -                          S_WQ:
    -                            begin
    -                              DebugMsg(SPeepholeOptimization + 'var17',p);
    -                              taicpu(hp1).changeopsize(S_Q);
    -                              taicpu(hp1).loadConst(0, taicpu(hp1).oper[0]^.val and $ffff);
    -                            end;
    -                          S_LQ:
    -                            begin
    -                              DebugMsg(SPeepholeOptimization + 'var18',p);
    -                              taicpu(hp1).changeopsize(S_Q);
    -                              taicpu(hp1).loadConst(
    -                                0, taicpu(hp1).oper[0]^.val and $ffffffff);
    -                            end;
    -{$endif x86_64}
                               else
                                 Internalerror(2017050704)
                             end;
    +                        Result := True;
                           end;
                       end;
               end;
    
    movzx-choices.patch (17,667 bytes)
  • movx-new-tests.patch (707 bytes)
    Index: tests/test/cg/tcnvint3a.pp
    ===================================================================
    --- tests/test/cg/tcnvint3a.pp	(nonexistent)
    +++ tests/test/cg/tcnvint3a.pp	(working copy)
    @@ -0,0 +1,3 @@
    +{ %OPT=-O4 }
    +{ Tests the peephole optimiser where constant optimisation and branch prediction is concerned }
    +{$i tcnvint3.pp}
    Index: tests/test/cg/tcnvint3c.pp
    ===================================================================
    --- tests/test/cg/tcnvint3c.pp	(nonexistent)
    +++ tests/test/cg/tcnvint3c.pp	(working copy)
    @@ -0,0 +1,3 @@
    +{ %OPT=-O4 -Oonoconstprop }
    +{ Tests the peephole optimiser where constant optimisation and branch prediction is concerned }
    +{$i tcnvint3.pp}
    
    movx-new-tests.patch (707 bytes)
  • movxx-self-extra.patch (8,867 bytes)
    Index: compiler/x86/aoptx86.pas
    ===================================================================
    --- compiler/x86/aoptx86.pas	(revision 44195)
    +++ compiler/x86/aoptx86.pas	(working copy)
    @@ -1914,30 +1914,143 @@
             { Depending on the DeepMOVOpt above, it may turn out that hp1 completely
               overwrites the original destination register.  e.g.
     
    -          movl   %reg1d,%reg2d
    -          movslq %reg1d,%reg2q
    +          movl   ###,%reg2d
    +          movslq ###,%reg2q (### doesn't have to be the same as the first one)
     
    -          In this case, we can remove the MOV
    +          In this case, we can remove the MOV (Go to "Mov2Nop 5" below)
             }
             if (taicpu(p).oper[1]^.typ = top_reg) and
               MatchInstruction(hp1, [A_LEA, A_MOV, A_MOVSX, A_MOVZX{$ifdef x86_64}, A_MOVSXD{$endif x86_64}], []) and
    -          { The RegInOp check makes sure that movb r/m,%reg1b; movzbl %reg1b,%reg1l"
    -            and "movl r/m,%reg1; leal $1(%reg1,%reg2),%reg1" etc. are not incorrectly
    -            optimised }
               (taicpu(hp1).oper[1]^.typ = top_reg) and
    -          not RegInOp(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[0]^) and
               Reg1WriteOverwritesReg2Entirely(taicpu(hp1).oper[1]^.reg, taicpu(p).oper[1]^.reg) then
    -          begin
    -            DebugMsg(SPeepholeOptimization + 'Mov2Nop 5 done',p);
    -            { take care of the register (de)allocs following p }
    -            UpdateUsedRegs(tai(p.next));
    -            asml.remove(p);
    -            p.free;
    -            p:=hp1;
    -            Result := True;
    -            Exit;
    -          end;
    +            begin
    +              if RegInOp(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[0]^) then
    +                begin
    +                  if (taicpu(hp1).oper[0]^.typ = top_reg) then
    +                    case taicpu(p).oper[0]^.typ of
    +                      top_const:
    +                        { We have something like:
     
    +                          movb   $x,   %regb
    +                          movzbl %regb,%regd
    +
    +                          Change to:
    +
    +                          movl   $x,   %regd
    +                        }
    +                        begin
    +                          case taicpu(hp1).opsize of
    +                            S_BW:
    +                              begin
    +                                if (taicpu(hp1).opcode = A_MOVSX) and
    +                                  (taicpu(p).oper[0]^.val > $7F) then
    +                                  taicpu(p).oper[0]^.val := taicpu(p).oper[0]^.val - $100; { Convert to signed }
    +
    +                                setsubreg(taicpu(p).oper[1]^.reg, R_SUBW);
    +                                taicpu(p).opsize := S_W;
    +                              end;
    +                            S_BL:
    +                              begin
    +                                if (taicpu(hp1).opcode = A_MOVSX) and
    +                                  (taicpu(p).oper[0]^.val > $7F) then
    +                                  taicpu(p).oper[0]^.val := taicpu(p).oper[0]^.val - $100; { Convert to signed }
    +
    +                                setsubreg(taicpu(p).oper[1]^.reg, R_SUBD);
    +                                taicpu(p).opsize := S_L;
    +                              end;
    +                            S_WL:
    +                              begin
    +                                if (taicpu(hp1).opcode = A_MOVSX) and
    +                                  (taicpu(p).oper[0]^.val > $7FFF) then
    +                                  taicpu(p).oper[0]^.val := taicpu(p).oper[0]^.val - $10000; { Convert to signed }
    +
    +                                setsubreg(taicpu(p).oper[1]^.reg, R_SUBD);
    +                                taicpu(p).opsize := S_L;
    +                              end;
    +{$ifdef x86_64}
    +                            S_BQ:
    +                              begin
    +                                if (taicpu(hp1).opcode = A_MOVSX) and
    +                                  (taicpu(p).oper[0]^.val > $7F) then
    +                                  taicpu(p).oper[0]^.val := taicpu(p).oper[0]^.val - $100; { Convert to signed }
    +
    +                                setsubreg(taicpu(p).oper[1]^.reg, R_SUBQ);
    +                                taicpu(p).opsize := S_Q;
    +                              end;
    +                            S_WQ:
    +                              begin
    +                                if (taicpu(hp1).opcode = A_MOVSX) and
    +                                  (taicpu(p).oper[0]^.val > $7FFF) then
    +                                  taicpu(p).oper[0]^.val := taicpu(p).oper[0]^.val - $10000; { Convert to signed }
    +
    +                                setsubreg(taicpu(p).oper[1]^.reg, R_SUBQ);
    +                                taicpu(p).opsize := S_Q;
    +                              end;
    +                            S_LQ:
    +                              begin
    +                                if (taicpu(hp1).opcode = A_MOVSXD) and { Note it's MOVSXD, not MOVSX }
    +                                  (taicpu(p).oper[0]^.val > $7FFFFFFF) then
    +                                  taicpu(p).oper[0]^.val := taicpu(p).oper[0]^.val - $100000000; { Convert to signed }
    +
    +                                setsubreg(taicpu(p).oper[1]^.reg, R_SUBQ);
    +                                taicpu(p).opsize := S_Q;
    +                              end;
    +{$endif x86_64}
    +                            else
    +                              { If hp1 was a MOV instruction, it should have been
    +                                optimised already }
    +                              InternalError(2020021001);
    +                          end;
    +                          DebugMsg(SPeepholeOptimization + 'MovMovXX2MovXX 2 done',p);
    +                          asml.Remove(hp1);
    +                          hp1.Free;
    +                          Result := True;
    +                          Exit;
    +                        end;
    +                      top_ref:
    +                        { We have something like:
    +
    +                          movb   mem,  %regb
    +                          movzbl %regb,%regd
    +
    +                          Change to:
    +
    +                          movzbl mem,  %regd
    +                        }
    +                        if IsMOVZXAcceptable or (taicpu(hp1).opcode <> A_MOVZX) then
    +                          begin
    +                            DebugMsg(SPeepholeOptimization + 'MovMovXX2MovXX 1 done',p);
    +                            taicpu(hp1).loadref(0, taicpu(p).oper[0]^.ref^);
    +                            { take care of the register (de)allocs following p }
    +                            UpdateUsedRegs(tai(p.next));
    +                            asml.remove(p);
    +                            p.free;
    +                            p:=hp1;
    +                            Result := True;
    +                            Exit;
    +                          end;
    +                      else
    +                        if (taicpu(hp1).opcode <> A_MOV) and (taicpu(hp1).opcode <> A_LEA) then
    +                          { Just to make a saving, since there are no more optimisations with MOVZX and MOVSX/D }
    +                          Exit;
    +                  end;
    +                end
    +             { The RegInOp check makes sure that movl r/m,%reg1l; movzbl (%reg1l),%reg1l"
    +               and "movl r/m,%reg1; leal $1(%reg1,%reg2),%reg1" etc. are not incorrectly
    +               optimised }
    +              else
    +                begin
    +                  DebugMsg(SPeepholeOptimization + 'Mov2Nop 5 done',p);
    +                  { take care of the register (de)allocs following p }
    +                  UpdateUsedRegs(tai(p.next));
    +                  asml.remove(p);
    +                  p.free;
    +                  p:=hp1;
    +                  Result := True;
    +                  Exit;
    +                end;
    +            end;
    +
             if (taicpu(hp1).opcode = A_AND) and
               (taicpu(p).oper[1]^.typ = top_reg) and
               MatchOpType(taicpu(hp1),top_const,top_reg) then
    @@ -2339,27 +2452,8 @@
                     Result:=true;
                     exit;
                   end;
    -            {
    -              mov*  x,reg1
    -              mov*  y,reg1
     
    -              to
    -
    -              mov*  y,reg1
    -            }
    -            if (taicpu(p).oper[1]^.typ=top_reg) and
    -              MatchOperand(taicpu(p).oper[1]^,taicpu(hp1).oper[1]^) and
    -              not(RegInOp(taicpu(p).oper[1]^.reg,taicpu(hp1).oper[0]^)) then
    -              begin
    -                DebugMsg(SPeepholeOptimization + 'MovMov2Mov 4 done',p);
    -                { take care of the register (de)allocs following p }
    -                UpdateUsedRegs(tai(p.next));
    -                asml.remove(p);
    -                p.free;
    -                p:=hp1;
    -                Result:=true;
    -                exit;
    -              end;
    +              { mov x,reg1; mov y,reg1 -> mov y,reg1 is handled by the Mov2Nop 5 optimisation }
               end;
     
             { search further than the next instruction for a mov }
    
    movxx-self-extra.patch (8,867 bytes)

Activities

J. Gareth Moreton

2020-02-10 00:41

developer  

movzx-choices.patch (17,667 bytes)
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 44124)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -43,6 +43,8 @@
         function GetNextInstructionUsingReg(Current: tai; out Next: tai; reg: TRegister): Boolean;
         function RegModifiedByInstruction(Reg: TRegister; p1: tai): boolean; override;
       protected
+        class function IsMOVZXAcceptable: Boolean; static; inline;
+
         { checks whether loading a new value in reg1 overwrites the entirety of reg2 }
         function Reg1WriteOverwritesReg2Entirely(reg1, reg2: tregister): boolean;
         { checks whether reading the value in reg1 depends on the value of reg2. This
@@ -842,6 +844,25 @@
       end;
 {$endif DEBUG_AOPTCPU}
 
+    class function TX86AsmOptimizer.IsMOVZXAcceptable: Boolean; inline;
+      begin
+{$ifdef x86_64}
+        { Always fine on x86-64 }
+        Result := True;
+{$else x86_64}
+        Result :=
+{$ifdef i8086}
+          (current_settings.cputype >= cpu_386) and
+{$endif i8086}
+          (
+            { Always accept if optimising for size }
+            (cs_opt_size in current_settings.optimizerswitches) or
+            { From the Pentium II onwards, MOVZX only takes 1 cycle. [Kit] }
+            (current_settings.optimizecputype >= cpu_Pentium2)
+          );
+{$endif x86_64}
+      end;
+
     function TX86AsmOptimizer.Reg1WriteOverwritesReg2Entirely(reg1, reg2: tregister): boolean;
       begin
         if not SuperRegistersEqual(reg1,reg2) then
@@ -1787,17 +1808,13 @@
                       change it to a MOVZX instruction when optimising for speed.
                     }
                     if not (cs_opt_size in current_settings.optimizerswitches) and
-{$ifdef i8086}
-                      { MOVZX was only introduced on the 386 }
-                      (current_settings.cputype >= cpu_386) and
-{$endif i8086}
-                      (
-                        (taicpu(hp1).opsize < taicpu(p).opsize)
+                      IsMOVZXAcceptable and
+                      (taicpu(hp1).opsize < taicpu(p).opsize)
 {$ifdef x86_64}
-                        { operations already implicitly set the upper 64 bits to zero }
-                        and not ((taicpu(hp1).opsize = S_L) and (taicpu(p).opsize = S_Q))
+                      { operations already implicitly set the upper 64 bits to zero }
+                      and not ((taicpu(hp1).opsize = S_L) and (taicpu(p).opsize = S_Q))
 {$endif x86_64}
-                      ) then
+                      then
                       begin
                         CurrentReg := taicpu(hp1).oper[1]^.reg;
 
@@ -1907,7 +1924,8 @@
                     ;
                 end;
               end
-            else if (taicpu(p).oper[1]^.typ = top_reg) and (taicpu(hp1).oper[1]^.typ = top_reg) and
+            else if IsMOVZXAcceptable and
+              (taicpu(p).oper[1]^.typ = top_reg) and (taicpu(hp1).oper[1]^.typ = top_reg) and
               (taicpu(p).oper[0]^.typ <> top_const) and { MOVZX only supports registers and memory, not immediates (use MOV for that!) }
               (getsupreg(taicpu(p).oper[1]^.reg) = getsupreg(taicpu(hp1).oper[1]^.reg))
               then
@@ -5086,103 +5104,133 @@
                     ;
                 end;
               end;
-            { changes some movzx constructs to faster synonims (all examples
+            { changes some movzx constructs to faster synonyms (all examples
               are given with eax/ax, but are also valid for other registers)}
             if (taicpu(p).oper[1]^.typ = top_reg) then
               if (taicpu(p).oper[0]^.typ = top_reg) then
-                case taicpu(p).opsize of
-                  S_BW:
-                    begin
-                      if (getsupreg(taicpu(p).oper[0]^.reg)=getsupreg(taicpu(p).oper[1]^.reg)) and
-                         not(cs_opt_size in current_settings.optimizerswitches) then
-                        {Change "movzbw %al, %ax" to "andw $0x0ffh, %ax"}
+                begin
+                  case taicpu(p).opsize of
+                    { Technically, movzbw %al,%ax cannot be encoded in 32/64-bit mode
+                      (the machine code is equivalent to movzbl %al,%eax), but the
+                      code generator still generates that assembler instruction and
+                      it is silently converted.  This should probably be checked.
+                      [Kit] }
+                    S_BW:
+                      begin
+                        if (getsupreg(taicpu(p).oper[0]^.reg)=getsupreg(taicpu(p).oper[1]^.reg)) and
+                          (
+                            not IsMOVZXAcceptable
+                            { and $0xff,%ax has a smaller encoding but risks a partial write penalty }
+                            or (
+                              (cs_opt_size in current_settings.optimizerswitches) and
+                              (taicpu(p).oper[1]^.reg = NR_AX)
+                            )
+                          ) then
+                          {Change "movzbw %al, %ax" to "andw $0x0ffh, %ax"}
+                          begin
+                            DebugMsg(SPeepholeOptimization + 'var7',p);
+                            taicpu(p).opcode := A_AND;
+                            taicpu(p).changeopsize(S_W);
+                            taicpu(p).loadConst(0,$ff);
+                            Result := True;
+                          end
+                        else if not IsMOVZXAcceptable and
+                          GetNextInstruction(p, hp1) and
+                          (tai(hp1).typ = ait_instruction) and
+                          (taicpu(hp1).opcode = A_AND) and
+                          (taicpu(hp1).oper[0]^.typ = top_const) and
+                          (taicpu(hp1).oper[1]^.typ = top_reg) and
+                          (taicpu(hp1).oper[1]^.reg = taicpu(p).oper[1]^.reg) then
+                        { Change "movzbw %reg1, %reg2; andw $const, %reg2"
+                          to "movw %reg1, reg2; andw $(const1 and $ff), %reg2"}
+                          begin
+                            DebugMsg(SPeepholeOptimization + 'var8',p);
+                            taicpu(p).opcode := A_MOV;
+                            taicpu(p).changeopsize(S_W);
+                            setsubreg(taicpu(p).oper[0]^.reg,R_SUBW);
+                            taicpu(hp1).loadConst(0,taicpu(hp1).oper[0]^.val and $ff);
+                            Result := True;
+                          end;
+                      end;
+{$ifndef i8086} { movzbl %al,%eax cannot be encoded in 16-bit mode (the machine code is equivalent to movzbw %al,%ax }
+                    S_BL:
+                      begin
+                        if (getsupreg(taicpu(p).oper[0]^.reg)=getsupreg(taicpu(p).oper[1]^.reg)) and
+                          (
+                            not IsMOVZXAcceptable
+                            { and $0xff,%eax has a smaller encoding but risks a partial write penalty }
+                            or (
+                              (cs_opt_size in current_settings.optimizerswitches) and
+                              (taicpu(p).oper[1]^.reg = NR_EAX)
+                            )
+                          ) then
+                          { Change "movzbl %al, %eax" to "andl $0x0ffh, %eax" }
+                          begin
+                            DebugMsg(SPeepholeOptimization + 'var9',p);
+                            taicpu(p).opcode := A_AND;
+                            taicpu(p).changeopsize(S_L);
+                            taicpu(p).loadConst(0,$ff);
+                            Result := True;
+                          end
+                        else if not IsMOVZXAcceptable and
+                          GetNextInstruction(p, hp1) and
+                          (tai(hp1).typ = ait_instruction) and
+                          (taicpu(hp1).opcode = A_AND) and
+                          (taicpu(hp1).oper[0]^.typ = top_const) and
+                          (taicpu(hp1).oper[1]^.typ = top_reg) and
+                          (taicpu(hp1).oper[1]^.reg = taicpu(p).oper[1]^.reg) then
+                          { Change "movzbl %reg1, %reg2; andl $const, %reg2"
+                            to "movl %reg1, reg2; andl $(const1 and $ff), %reg2"}
+                          begin
+                            DebugMsg(SPeepholeOptimization + 'var10',p);
+                            taicpu(p).opcode := A_MOV;
+                            taicpu(p).changeopsize(S_L);
+                            { do not use R_SUBWHOLE
+                              as movl %rdx,%eax
+                              is invalid in assembler PM }
+                            setsubreg(taicpu(p).oper[0]^.reg, R_SUBD);
+                            taicpu(hp1).loadConst(0,taicpu(hp1).oper[0]^.val and $ff);
+                            Result := True;
+                          end;
+                      end;
+{$endif i8086}
+                    S_WL:
+                      if not IsMOVZXAcceptable then
                         begin
-                          taicpu(p).opcode := A_AND;
-                          taicpu(p).changeopsize(S_W);
-                          taicpu(p).loadConst(0,$ff);
-                          DebugMsg(SPeepholeOptimization + 'var7',p);
-                        end
-                      else if GetNextInstruction(p, hp1) and
-                        (tai(hp1).typ = ait_instruction) and
-                        (taicpu(hp1).opcode = A_AND) and
-                        (taicpu(hp1).oper[0]^.typ = top_const) and
-                        (taicpu(hp1).oper[1]^.typ = top_reg) and
-                        (taicpu(hp1).oper[1]^.reg = taicpu(p).oper[1]^.reg) then
-                      { Change "movzbw %reg1, %reg2; andw $const, %reg2"
-                        to "movw %reg1, reg2; andw $(const1 and $ff), %reg2"}
-                        begin
-                          DebugMsg(SPeepholeOptimization + 'var8',p);
-                          taicpu(p).opcode := A_MOV;
-                          taicpu(p).changeopsize(S_W);
-                          setsubreg(taicpu(p).oper[0]^.reg,R_SUBW);
-                          taicpu(hp1).loadConst(0,taicpu(hp1).oper[0]^.val and $ff);
+                          if (getsupreg(taicpu(p).oper[0]^.reg)=getsupreg(taicpu(p).oper[1]^.reg)) then
+                            { Change "movzwl %ax, %eax" to "andl $0x0ffffh, %eax" }
+                            begin
+                              DebugMsg(SPeepholeOptimization + 'var11',p);
+                              taicpu(p).opcode := A_AND;
+                              taicpu(p).changeopsize(S_L);
+                              taicpu(p).loadConst(0,$ffff);
+                              Result := True;
+                            end
+                          else if GetNextInstruction(p, hp1) and
+                            (tai(hp1).typ = ait_instruction) and
+                            (taicpu(hp1).opcode = A_AND) and
+                            (taicpu(hp1).oper[0]^.typ = top_const) and
+                            (taicpu(hp1).oper[1]^.typ = top_reg) and
+                            (taicpu(hp1).oper[1]^.reg = taicpu(p).oper[1]^.reg) then
+                            { Change "movzwl %reg1, %reg2; andl $const, %reg2"
+                              to "movl %reg1, reg2; andl $(const1 and $ffff), %reg2"}
+                            begin
+                              DebugMsg(SPeepholeOptimization + 'var12',p);
+                              taicpu(p).opcode := A_MOV;
+                              taicpu(p).changeopsize(S_L);
+                              { do not use R_SUBWHOLE
+                                as movl %rdx,%eax
+                                is invalid in assembler PM }
+                              setsubreg(taicpu(p).oper[0]^.reg, R_SUBD);
+                              taicpu(hp1).loadConst(0,taicpu(hp1).oper[0]^.val and $ffff);
+                              Result := True;
+                            end;
                         end;
-                    end;
-                  S_BL:
-                    begin
-                      if (getsupreg(taicpu(p).oper[0]^.reg)=getsupreg(taicpu(p).oper[1]^.reg)) and
-                         not(cs_opt_size in current_settings.optimizerswitches) then
-                        { Change "movzbl %al, %eax" to "andl $0x0ffh, %eax" }
-                        begin
-                          taicpu(p).opcode := A_AND;
-                          taicpu(p).changeopsize(S_L);
-                          taicpu(p).loadConst(0,$ff)
-                        end
-                      else if GetNextInstruction(p, hp1) and
-                        (tai(hp1).typ = ait_instruction) and
-                        (taicpu(hp1).opcode = A_AND) and
-                        (taicpu(hp1).oper[0]^.typ = top_const) and
-                        (taicpu(hp1).oper[1]^.typ = top_reg) and
-                        (taicpu(hp1).oper[1]^.reg = taicpu(p).oper[1]^.reg) then
-                        { Change "movzbl %reg1, %reg2; andl $const, %reg2"
-                          to "movl %reg1, reg2; andl $(const1 and $ff), %reg2"}
-                        begin
-                          DebugMsg(SPeepholeOptimization + 'var10',p);
-                          taicpu(p).opcode := A_MOV;
-                          taicpu(p).changeopsize(S_L);
-                          { do not use R_SUBWHOLE
-                            as movl %rdx,%eax
-                            is invalid in assembler PM }
-                          setsubreg(taicpu(p).oper[0]^.reg, R_SUBD);
-                          taicpu(hp1).loadConst(0,taicpu(hp1).oper[0]^.val and $ff);
-                        end
-                    end;
-{$ifndef i8086}
-                  S_WL:
-                    begin
-                      if (getsupreg(taicpu(p).oper[0]^.reg)=getsupreg(taicpu(p).oper[1]^.reg)) and
-                        not(cs_opt_size in current_settings.optimizerswitches) then
-                        { Change "movzwl %ax, %eax" to "andl $0x0ffffh, %eax" }
-                        begin
-                          DebugMsg(SPeepholeOptimization + 'var11',p);
-                          taicpu(p).opcode := A_AND;
-                          taicpu(p).changeopsize(S_L);
-                          taicpu(p).loadConst(0,$ffff);
-                        end
-                      else if GetNextInstruction(p, hp1) and
-                        (tai(hp1).typ = ait_instruction) and
-                        (taicpu(hp1).opcode = A_AND) and
-                        (taicpu(hp1).oper[0]^.typ = top_const) and
-                        (taicpu(hp1).oper[1]^.typ = top_reg) and
-                        (taicpu(hp1).oper[1]^.reg = taicpu(p).oper[1]^.reg) then
-                        { Change "movzwl %reg1, %reg2; andl $const, %reg2"
-                          to "movl %reg1, reg2; andl $(const1 and $ffff), %reg2"}
-                        begin
-                          DebugMsg(SPeepholeOptimization + 'var12',p);
-                          taicpu(p).opcode := A_MOV;
-                          taicpu(p).changeopsize(S_L);
-                          { do not use R_SUBWHOLE
-                            as movl %rdx,%eax
-                            is invalid in assembler PM }
-                          setsubreg(taicpu(p).oper[0]^.reg, R_SUBD);
-                          taicpu(hp1).loadConst(0,taicpu(hp1).oper[0]^.val and $ffff);
-                        end;
-                    end;
-{$endif i8086}
-                  else
-                    ;
+                    else
+                      InternalError(2017050705);
+                  end;
                 end
-              else if (taicpu(p).oper[0]^.typ = top_ref) then
+              else if not IsMOVZXAcceptable and (taicpu(p).oper[0]^.typ = top_ref) then
                   begin
                     if GetNextInstruction(p, hp1) and
                       (tai(hp1).typ = ait_instruction) and
@@ -5210,31 +5258,10 @@
                               taicpu(hp1).changeopsize(S_W);
                               taicpu(hp1).loadConst(0,taicpu(hp1).oper[0]^.val and $ff);
                             end;
-{$ifdef x86_64}
-                          S_BQ:
-                            begin
-                              DebugMsg(SPeepholeOptimization + 'var16',p);
-                              taicpu(hp1).changeopsize(S_Q);
-                              taicpu(hp1).loadConst(
-                                0, taicpu(hp1).oper[0]^.val and $ff);
-                            end;
-                          S_WQ:
-                            begin
-                              DebugMsg(SPeepholeOptimization + 'var17',p);
-                              taicpu(hp1).changeopsize(S_Q);
-                              taicpu(hp1).loadConst(0, taicpu(hp1).oper[0]^.val and $ffff);
-                            end;
-                          S_LQ:
-                            begin
-                              DebugMsg(SPeepholeOptimization + 'var18',p);
-                              taicpu(hp1).changeopsize(S_Q);
-                              taicpu(hp1).loadConst(
-                                0, taicpu(hp1).oper[0]^.val and $ffffffff);
-                            end;
-{$endif x86_64}
                           else
                             Internalerror(2017050704)
                         end;
+                        Result := True;
                       end;
                   end;
           end;
movzx-choices.patch (17,667 bytes)

J. Gareth Moreton

2020-02-10 02:24

developer   ~0120988

One thing to note is that, under i386, the compiler seems to optimise for the Pentium III by default, which favours MOVZX, so if you wish to test situations where the optimisation is not favoured, you need to explicitly set it (e.g. by specifying -Op80386).

J. Gareth Moreton

2020-02-10 18:35

developer   ~0120999

I've added an extra patch that restores a couple of coincidental optimisations with MOVSX and MOVZX instructions. "movxx-self-extra.patch" requires "movzx-choices.patch" to work though due to the use of the new 'IsMOVZXAcceptable' function.

J. Gareth Moreton

2020-02-11 02:23

developer   ~0121008

Full regression suites for i386-win32 and x86_64-win64 have passed, which included both of the above patches. I might add some extra code later on just to be doubly sure an internal error doesn't get triggered in some contrived situations (is part of movxx-self-extra.patch).

Florian

2020-02-12 20:42

administrator   ~0121056

Here I have several failures in the regression tests with the second part of the patch applied?

Namely:

test/cg/tcnvint3b
tbs/tb0212
webtbs/tw2332
webtbs/tw6129

Do you test with -O4?

J. Gareth Moreton

2020-02-12 23:10

developer   ~0121069

I'll do some investigating. Thanks for the feedback. I'm just hoping my script isn't still faulty.

At least the first part works. Can't get it right all the time!

J. Gareth Moreton

2020-02-13 03:26

developer   ~0121076

I haven't been able to reproduce the failures yet under windows for "-O4" or "-O4 -CpCOREAVX2". I'll give Linux a bash next. I have an idea as to what it could be though.

J. Gareth Moreton

2020-02-13 18:17

developer   ~0121086

I haven't had much luck in reproducing the regressions yet. What parameters are you using? So far, all of win32 and win64 seem to be fine, and I've run x86_64-linux after building the compiler for "-O4" and "-O4 -CpCOREAVX2" separately. I'm about to try i386-linux. Hopefully we can get to the bottom of this!

J. Gareth Moreton

2020-02-16 08:55

developer   ~0121117

Does this patch work any better?

Florian

2020-02-16 10:49

administrator   ~0121119

No neither, but I found the problem: mov*x does not support constants, so the initial check must be something like (note the check of taicpu(p).oper[0]^.typ, also the
comment was misleading: it mentioned only registers while the code was applied to all operands):

        { Depending on the DeepMOVOpt above, it may turn out that hp1 completely
          overwrites the original destination register. e.g.

          movl r/m,%reg2d
          movslq r/m,%reg2q

          In this case, we can remove the MOV (Go to "Mov2Nop 5" below)
        }
        if (taicpu(p).oper[1]^.typ = top_reg) and
          MatchInstruction(hp1, [A_LEA, A_MOV, A_MOVSX, A_MOVZX{$ifdef x86_64}, A_MOVSXD{$endif x86_64}], []) and
          (taicpu(p).oper[0]^.typ in [top_ref,top_reg]) and
          (taicpu(hp1).oper[1]^.typ = top_reg) and

J. Gareth Moreton

2020-02-16 15:30

developer   ~0121125

Last edited: 2020-02-16 15:46

View 3 revisions

Aah, good stuff. I'll make the changes in that case, and the comment too.

I do wonder if I have something messed up in my patch system, because my code does look out for constants correctly, specificially constructs like:

movb $1,%al
movzbl %al,%eax

Converting it to:

movl $1,%eax

J. Gareth Moreton

2020-02-16 19:58

developer   ~0121130

Last edited: 2020-02-16 20:14

View 2 revisions

I'm determined to get this right. The optimisation with constants should work... it better work! At least the 4 tests you listed appear to compile and run without any problems.

Code might need to be refactored later on.

J. Gareth Moreton

2020-02-16 21:29

developer   ~0121131

test/cg/tcnvint3b seems successful on i386-win32 and x86_64-win64 - I need to work out why I get different test results to you sometimes though:

dst : LOC_REGISTER src : LOC_REGISTER
Testing dst : s32bit src : s64bit...Passed.
Testing dst : s8bit src : s64bit...Passed.
Testing dst : s16bit src : s32bit...Passed.
dst : LOC_REGISTER src : LOC_REFERENCE
Testing dst : s32bit src : s64bit...Passed.
Testing dst : s16bit src : s32bit...Passed.
Testing dst : s16bit src : s32bit...Passed.
Testing dst : s8bit src : s16bit...Passed.
Testing dst : u16bit src : u32bit...Passed.
Testing dst : u8bit src : u16bit...Passed.
type conversion dst_size > src_size
dst : LOC_REGISTER src : LOC_REGISTER
Testing dst : u16bit src : s8bit, u8bit... Passed.
Testing dst : u32bit src : s8bit, u8bit, s16bit, u16bit... Passed.
Testing dst : s16bit src : s8bit, u8bit...Passed.
Testing dst : s32bit src : s8bit, u8bit. s16bit, u16bit...Passed.
Testing dst : s64bit src : s8bit, u8bit. s16bit, u16bit, s32bit, u32bit...Passed.
type conversion dst_size > src_size
dst : LOC_REGISTER src : LOC_REFERENCE
Testing dst : u16bit src : s8bit, u8bit... Passed.
Testing dst : u32bit src : s8bit, u8bit, s16bit, u16bit... Passed.
Testing dst : s16bit src : s8bit, u8bit...Passed.
Testing dst : s32bit src : s8bit, u8bit. s16bit, u16bit...Passed.
Testing dst : s64bit src : s8bit, u8bit. s16bit, u16bit, s32bit, u32bit...Passed.

(This is with the patch I uploaded at 21:02)

J. Gareth Moreton

2020-02-17 23:15

developer   ~0121155

Okay, I found a failure when building the above test with -O4 added:

type conversion src_size > dst_size
dst : LOC_REGISTER src : LOC_REGISTER
Testing dst : s32bit src : s64bit...Passed.
Testing dst : s8bit src : s64bit...Passed.
Testing dst : s16bit src : s32bit...Passed.
dst : LOC_REGISTER src : LOC_REFERENCE
Testing dst : s32bit src : s64bit...Passed.
Testing dst : s16bit src : s32bit...Passed.
Testing dst : s16bit src : s32bit...Passed.
Testing dst : s8bit src : s16bit...Passed.
Testing dst : u16bit src : u32bit...Passed.
Testing dst : u8bit src : u16bit...Passed.
type conversion dst_size > src_size
dst : LOC_REGISTER src : LOC_REGISTER
Testing dst : u16bit src : s8bit, u8bit... Passed.
Testing dst : u32bit src : s8bit, u8bit, s16bit, u16bit... Passed.
Testing dst : s16bit src : s8bit, u8bit...Passed.
Testing dst : s32bit src : s8bit, u8bit. s16bit, u16bit...Passed.
Testing dst : s64bit src : s8bit, u8bit. s16bit, u16bit, s32bit, u32bit...Passed.
type conversion dst_size > src_size
dst : LOC_REGISTER src : LOC_REFERENCE
Testing dst : u16bit src : s8bit, u8bit... Failure.

So, I finally reproduced it!

J. Gareth Moreton

2020-02-18 01:21

developer   ~0121157

Found the problem - it wasn't converting mov; movsx pairs properly when the mov instruction was writing a negative constant. I'm just testing my new patch now before I upload it.

I've also written two new tests... variants of tcnvint3.pp that specify "-O4" and "-O4 -Oonoconstprop" explicitly.

(On another note, I've seen some situations where it's possible to predict a branch with certainty, which is why I mentioned "branch prediction" in the comments of the new tests. For example

movw $65240,%ax
cmpw $65240,%ax
je @lbl

The branch is always taken and so the comparison can be removed and je changed to jmp. Just need to be careful with the flags)

J. Gareth Moreton

2020-02-18 01:26

developer  

movx-new-tests.patch (707 bytes)
Index: tests/test/cg/tcnvint3a.pp
===================================================================
--- tests/test/cg/tcnvint3a.pp	(nonexistent)
+++ tests/test/cg/tcnvint3a.pp	(working copy)
@@ -0,0 +1,3 @@
+{ %OPT=-O4 }
+{ Tests the peephole optimiser where constant optimisation and branch prediction is concerned }
+{$i tcnvint3.pp}
Index: tests/test/cg/tcnvint3c.pp
===================================================================
--- tests/test/cg/tcnvint3c.pp	(nonexistent)
+++ tests/test/cg/tcnvint3c.pp	(working copy)
@@ -0,0 +1,3 @@
+{ %OPT=-O4 -Oonoconstprop }
+{ Tests the peephole optimiser where constant optimisation and branch prediction is concerned }
+{$i tcnvint3.pp}
movx-new-tests.patch (707 bytes)

J. Gareth Moreton

2020-02-18 09:36

developer   ~0121158

Here we go! Got it working at last (I hope!)

Full Windows regression tests yielded no additional failures (I included my two new tests, but specified TEST_OPT="-O4" anyway).

movxx-self-extra.patch (8,867 bytes)
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 44195)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -1914,30 +1914,143 @@
         { Depending on the DeepMOVOpt above, it may turn out that hp1 completely
           overwrites the original destination register.  e.g.
 
-          movl   %reg1d,%reg2d
-          movslq %reg1d,%reg2q
+          movl   ###,%reg2d
+          movslq ###,%reg2q (### doesn't have to be the same as the first one)
 
-          In this case, we can remove the MOV
+          In this case, we can remove the MOV (Go to "Mov2Nop 5" below)
         }
         if (taicpu(p).oper[1]^.typ = top_reg) and
           MatchInstruction(hp1, [A_LEA, A_MOV, A_MOVSX, A_MOVZX{$ifdef x86_64}, A_MOVSXD{$endif x86_64}], []) and
-          { The RegInOp check makes sure that movb r/m,%reg1b; movzbl %reg1b,%reg1l"
-            and "movl r/m,%reg1; leal $1(%reg1,%reg2),%reg1" etc. are not incorrectly
-            optimised }
           (taicpu(hp1).oper[1]^.typ = top_reg) and
-          not RegInOp(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[0]^) and
           Reg1WriteOverwritesReg2Entirely(taicpu(hp1).oper[1]^.reg, taicpu(p).oper[1]^.reg) then
-          begin
-            DebugMsg(SPeepholeOptimization + 'Mov2Nop 5 done',p);
-            { take care of the register (de)allocs following p }
-            UpdateUsedRegs(tai(p.next));
-            asml.remove(p);
-            p.free;
-            p:=hp1;
-            Result := True;
-            Exit;
-          end;
+            begin
+              if RegInOp(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[0]^) then
+                begin
+                  if (taicpu(hp1).oper[0]^.typ = top_reg) then
+                    case taicpu(p).oper[0]^.typ of
+                      top_const:
+                        { We have something like:
 
+                          movb   $x,   %regb
+                          movzbl %regb,%regd
+
+                          Change to:
+
+                          movl   $x,   %regd
+                        }
+                        begin
+                          case taicpu(hp1).opsize of
+                            S_BW:
+                              begin
+                                if (taicpu(hp1).opcode = A_MOVSX) and
+                                  (taicpu(p).oper[0]^.val > $7F) then
+                                  taicpu(p).oper[0]^.val := taicpu(p).oper[0]^.val - $100; { Convert to signed }
+
+                                setsubreg(taicpu(p).oper[1]^.reg, R_SUBW);
+                                taicpu(p).opsize := S_W;
+                              end;
+                            S_BL:
+                              begin
+                                if (taicpu(hp1).opcode = A_MOVSX) and
+                                  (taicpu(p).oper[0]^.val > $7F) then
+                                  taicpu(p).oper[0]^.val := taicpu(p).oper[0]^.val - $100; { Convert to signed }
+
+                                setsubreg(taicpu(p).oper[1]^.reg, R_SUBD);
+                                taicpu(p).opsize := S_L;
+                              end;
+                            S_WL:
+                              begin
+                                if (taicpu(hp1).opcode = A_MOVSX) and
+                                  (taicpu(p).oper[0]^.val > $7FFF) then
+                                  taicpu(p).oper[0]^.val := taicpu(p).oper[0]^.val - $10000; { Convert to signed }
+
+                                setsubreg(taicpu(p).oper[1]^.reg, R_SUBD);
+                                taicpu(p).opsize := S_L;
+                              end;
+{$ifdef x86_64}
+                            S_BQ:
+                              begin
+                                if (taicpu(hp1).opcode = A_MOVSX) and
+                                  (taicpu(p).oper[0]^.val > $7F) then
+                                  taicpu(p).oper[0]^.val := taicpu(p).oper[0]^.val - $100; { Convert to signed }
+
+                                setsubreg(taicpu(p).oper[1]^.reg, R_SUBQ);
+                                taicpu(p).opsize := S_Q;
+                              end;
+                            S_WQ:
+                              begin
+                                if (taicpu(hp1).opcode = A_MOVSX) and
+                                  (taicpu(p).oper[0]^.val > $7FFF) then
+                                  taicpu(p).oper[0]^.val := taicpu(p).oper[0]^.val - $10000; { Convert to signed }
+
+                                setsubreg(taicpu(p).oper[1]^.reg, R_SUBQ);
+                                taicpu(p).opsize := S_Q;
+                              end;
+                            S_LQ:
+                              begin
+                                if (taicpu(hp1).opcode = A_MOVSXD) and { Note it's MOVSXD, not MOVSX }
+                                  (taicpu(p).oper[0]^.val > $7FFFFFFF) then
+                                  taicpu(p).oper[0]^.val := taicpu(p).oper[0]^.val - $100000000; { Convert to signed }
+
+                                setsubreg(taicpu(p).oper[1]^.reg, R_SUBQ);
+                                taicpu(p).opsize := S_Q;
+                              end;
+{$endif x86_64}
+                            else
+                              { If hp1 was a MOV instruction, it should have been
+                                optimised already }
+                              InternalError(2020021001);
+                          end;
+                          DebugMsg(SPeepholeOptimization + 'MovMovXX2MovXX 2 done',p);
+                          asml.Remove(hp1);
+                          hp1.Free;
+                          Result := True;
+                          Exit;
+                        end;
+                      top_ref:
+                        { We have something like:
+
+                          movb   mem,  %regb
+                          movzbl %regb,%regd
+
+                          Change to:
+
+                          movzbl mem,  %regd
+                        }
+                        if IsMOVZXAcceptable or (taicpu(hp1).opcode <> A_MOVZX) then
+                          begin
+                            DebugMsg(SPeepholeOptimization + 'MovMovXX2MovXX 1 done',p);
+                            taicpu(hp1).loadref(0, taicpu(p).oper[0]^.ref^);
+                            { take care of the register (de)allocs following p }
+                            UpdateUsedRegs(tai(p.next));
+                            asml.remove(p);
+                            p.free;
+                            p:=hp1;
+                            Result := True;
+                            Exit;
+                          end;
+                      else
+                        if (taicpu(hp1).opcode <> A_MOV) and (taicpu(hp1).opcode <> A_LEA) then
+                          { Just to make a saving, since there are no more optimisations with MOVZX and MOVSX/D }
+                          Exit;
+                  end;
+                end
+             { The RegInOp check makes sure that movl r/m,%reg1l; movzbl (%reg1l),%reg1l"
+               and "movl r/m,%reg1; leal $1(%reg1,%reg2),%reg1" etc. are not incorrectly
+               optimised }
+              else
+                begin
+                  DebugMsg(SPeepholeOptimization + 'Mov2Nop 5 done',p);
+                  { take care of the register (de)allocs following p }
+                  UpdateUsedRegs(tai(p.next));
+                  asml.remove(p);
+                  p.free;
+                  p:=hp1;
+                  Result := True;
+                  Exit;
+                end;
+            end;
+
         if (taicpu(hp1).opcode = A_AND) and
           (taicpu(p).oper[1]^.typ = top_reg) and
           MatchOpType(taicpu(hp1),top_const,top_reg) then
@@ -2339,27 +2452,8 @@
                 Result:=true;
                 exit;
               end;
-            {
-              mov*  x,reg1
-              mov*  y,reg1
 
-              to
-
-              mov*  y,reg1
-            }
-            if (taicpu(p).oper[1]^.typ=top_reg) and
-              MatchOperand(taicpu(p).oper[1]^,taicpu(hp1).oper[1]^) and
-              not(RegInOp(taicpu(p).oper[1]^.reg,taicpu(hp1).oper[0]^)) then
-              begin
-                DebugMsg(SPeepholeOptimization + 'MovMov2Mov 4 done',p);
-                { take care of the register (de)allocs following p }
-                UpdateUsedRegs(tai(p.next));
-                asml.remove(p);
-                p.free;
-                p:=hp1;
-                Result:=true;
-                exit;
-              end;
+              { mov x,reg1; mov y,reg1 -> mov y,reg1 is handled by the Mov2Nop 5 optimisation }
           end;
 
         { search further than the next instruction for a mov }
movxx-self-extra.patch (8,867 bytes)

Florian

2020-02-22 19:38

administrator   ~0121195

For linux, it still needed fixes, besides this, I applied it, thanks.

Issue History

Date Modified Username Field Change
2020-02-10 00:41 J. Gareth Moreton New Issue
2020-02-10 00:41 J. Gareth Moreton File Added: movzx-choices.patch
2020-02-10 00:48 J. Gareth Moreton Tag Attached: compiler
2020-02-10 00:48 J. Gareth Moreton Tag Attached: patch
2020-02-10 00:48 J. Gareth Moreton Tag Attached: i386
2020-02-10 00:48 J. Gareth Moreton Tag Attached: x86_64
2020-02-10 00:48 J. Gareth Moreton Tag Attached: optimization
2020-02-10 02:24 J. Gareth Moreton Note Added: 0120988
2020-02-10 18:35 J. Gareth Moreton File Added: movxx-self-extra.patch
2020-02-10 18:35 J. Gareth Moreton Note Added: 0120999
2020-02-11 02:23 J. Gareth Moreton Note Added: 0121008
2020-02-12 20:42 Florian Note Added: 0121056
2020-02-12 23:10 J. Gareth Moreton Note Added: 0121069
2020-02-13 03:26 J. Gareth Moreton Note Added: 0121076
2020-02-13 18:17 J. Gareth Moreton Note Added: 0121086
2020-02-16 08:55 J. Gareth Moreton File Deleted: movxx-self-extra.patch
2020-02-16 08:55 J. Gareth Moreton File Added: movxx-self-extra.patch
2020-02-16 08:55 J. Gareth Moreton Note Added: 0121117
2020-02-16 10:49 Florian Note Added: 0121119
2020-02-16 15:30 J. Gareth Moreton Note Added: 0121125
2020-02-16 15:46 J. Gareth Moreton Note Edited: 0121125 View Revisions
2020-02-16 15:46 J. Gareth Moreton Note Edited: 0121125 View Revisions
2020-02-16 19:56 J. Gareth Moreton File Deleted: movxx-self-extra.patch
2020-02-16 19:58 J. Gareth Moreton File Added: movxx-self-extra.patch
2020-02-16 19:58 J. Gareth Moreton Note Added: 0121130
2020-02-16 20:14 J. Gareth Moreton Note Edited: 0121130 View Revisions
2020-02-16 20:33 J. Gareth Moreton File Deleted: movxx-self-extra.patch
2020-02-16 21:02 J. Gareth Moreton File Added: movxx-self-extra.patch
2020-02-16 21:29 J. Gareth Moreton Note Added: 0121131
2020-02-17 23:15 J. Gareth Moreton Note Added: 0121155
2020-02-18 01:21 J. Gareth Moreton Note Added: 0121157
2020-02-18 01:26 J. Gareth Moreton File Added: movx-new-tests.patch
2020-02-18 01:26 J. Gareth Moreton File Deleted: movxx-self-extra.patch
2020-02-18 09:36 J. Gareth Moreton File Added: movxx-self-extra.patch
2020-02-18 09:36 J. Gareth Moreton Note Added: 0121158
2020-02-22 19:38 Florian Assigned To => Florian
2020-02-22 19:38 Florian Status new => resolved
2020-02-22 19:38 Florian Resolution open => fixed
2020-02-22 19:38 Florian Fixed in Version => 3.3.1
2020-02-22 19:38 Florian Fixed in Revision => 44233
2020-02-22 19:38 Florian FPCTarget => -
2020-02-22 19:38 Florian Note Added: 0121195