View Issue Details

IDProjectCategoryView StatusLast Update
0036551FPCCompilerpublic2020-01-12 10:21
ReporterJ. Gareth MoretonAssigned ToFlorian 
PrioritylowSeverityminorReproducibilityN/A
Status resolvedResolutionfixed 
Platformi386 and x86_64OSMicrosoft WindowsOS Version10 Professional
Product Version3.3.1Product Buildr43888 
Target VersionFixed in Version3.3.1 
Summary0036551: [Patch] EAX -> EDX:EAX sign extension shortcuts, and MOVSX shortcuts for AX register
DescriptionFind attached a couple of patches that search for instruction combinations that sign-extend EAX to EDX:EAX and attempt to convert them into more optimal forms via the CWDE/cwtl, CDQ/cltd, CDQE/cltq and CQO/cqto opcodes.

mov-cdq.patch, all take place in OptPass2MOV:

- movl %eax,%edx; sarl $31,%edx -> cltd
- movq %rax,%rdx; sarq $63,%rdx -> cqto (x86_64 only)

- movl %edx,%eax; sarl $31,%edx -> movl %edx,%eax; cltd (-Os only due to cltd's dependency on %eax)
- movq %rdx,%rax; sarq $63,%rdx -> movq %rdx,%rax; cqto (x86_64 only; -Os only due to cqto's dependency on %rax)

---

A more complex one - either of the following combinations (these sequences sign-extend a 32-bit integer to a 64-bit integer, so are common around Int64-types under i386):

movl r/m,%edx; movl %edx,%eax; sarl $31,%edx
movl r/m,%eax; movl %eax,%edx; sarl $31,%edx
movl r/m,%edx; movl r/m,%eax; sarl $31,%edx
movl r/m,%eax; movl r/m,%edx; sarl $31,%edx

...become...

movl r/m,%eax <-- always %eax, even if it was %edx before
cltd

----

A very complex one, not performed on x86_64 (the first 4 instructions are the implemention of Abs() for LongInt on x86 platforms that don't have CMOV available... x86_64 always has CMOV available):

movl %eax,%edx
sarl $31,%eax
xorl %eax,%edx
subl %eax,%edx
(instruction that uses %edx)
(%eax and %edx deallocated)

...becomes...

cltd
xorl %edx,%eax <-- note the registers have swapped
subl %edx,%eax
(instruction that uses %eax) <-- %eax rather than %edx

----

movsx-cdq.patch, all take place in a new PostPeepholeOptMOVSX routine:

movswl %ax,%eax -> cwtl
movslq %eax,%rax -> cltq (x86_64 only)
Steps To ReproduceApply patches and confirm correct compilation as well as size saving with no penalty to speed (except when -Os is specified).
Additional InformationA 64-bit version of the "movl r/m" one is present in mov-cdq.patch, but commented out, because these combinations arise when sign-extending a LongInt to Int64, among other reasons. Equivalent instructions that sign-extend from %rax to %rdx:%rax are not performed because there is no Int128 type, but the code is left in the patch in a commented-out state should such need for it arise in the future.

Extensive testing has been performed for -Os under i386-win32, i386-linux, x86_64-win64 and x86_64-linux.
Tagscompiler, i386, optimizations, patch, x86, x86_64
Fixed in Revision43917,43918
FPCOldBugId
FPCTarget-
Attached Files
  • mov-cdq.patch (14,136 bytes)
    Index: compiler/x86/aoptx86.pas
    ===================================================================
    --- compiler/x86/aoptx86.pas	(revision 43880)
    +++ compiler/x86/aoptx86.pas	(working copy)
    @@ -3363,9 +3363,10 @@
            end;
     
           var
    -       hp1,hp2: tai;
    -{$ifdef x86_64}
    -       hp3: tai;
    +       hp1,hp2, hp3: tai;
    +{$ifndef x86_64}
    +       hp4: tai;
    +       OperIdx: Integer;
     {$endif x86_64}
           begin
             Result:=false;
    @@ -3510,6 +3511,291 @@
                 Result:=true;
                 exit;
               end
    +        else if MatchOpType(taicpu(p),top_reg,top_reg) and
    +          MatchInstruction(hp1, A_SAR, []) then
    +          begin
    +            if MatchOperand(taicpu(hp1).oper[0]^, 31) then
    +              begin
    +                { the use of %edx also covers the opsize being S_L }
    +                if MatchOperand(taicpu(hp1).oper[1]^, NR_EDX) then
    +                  begin
    +                    { Note it has to be specifically "movl %eax,%edx", and those specific sub-registers }
    +                    if (taicpu(p).oper[0]^.reg = NR_EAX) and
    +                      (taicpu(p).oper[1]^.reg = NR_EDX) then
    +                      begin
    +                        { Change:
    +                            movl %eax,%edx
    +                            sarl $31,%edx
    +                          To:
    +                            cltd
    +                        }
    +                        DebugMsg(SPeepholeOptimization + 'MovSar2Cltd', p);
    +                        Asml.Remove(hp1);
    +                        hp1.Free;
    +                        taicpu(p).opcode := A_CDQ;
    +                        taicpu(p).opsize := S_NO;
    +                        taicpu(p).clearop(1);
    +                        taicpu(p).clearop(0);
    +                        taicpu(p).ops:=0;
    +                        Result := True;
    +                      end
    +                    else if (cs_opt_size in current_settings.optimizerswitches) and
    +                      (taicpu(p).oper[0]^.reg = NR_EDX) and
    +                      (taicpu(p).oper[1]^.reg = NR_EAX) then
    +                      begin
    +                        { Change:
    +                            movl %edx,%eax
    +                            sarl $31,%edx
    +                          To:
    +                            movl %edx,%eax
    +                            cltd
    +
    +                          Note that this creates a dependency between the two instructions,
    +                            so only perform if optimising for size.
    +                        }
    +                        DebugMsg(SPeepholeOptimization + 'MovSar2MovCltd', p);
    +                        taicpu(hp1).opcode := A_CDQ;
    +                        taicpu(hp1).opsize := S_NO;
    +                        taicpu(hp1).clearop(1);
    +                        taicpu(hp1).clearop(0);
    +                        taicpu(hp1).ops:=0;
    +                      end;
    +{$ifndef x86_64}
    +                  end
    +                { Don't bother if CMOV is supported, because a more optimal
    +                  sequence would have been generated for the Abs() intrinsic }
    +                else if not(CPUX86_HAS_CMOV in cpu_capabilities[current_settings.cputype]) and
    +                  { the use of %eax also covers the opsize being S_L }
    +                  MatchOperand(taicpu(hp1).oper[1]^, NR_EAX) and
    +                  (taicpu(p).oper[0]^.reg = NR_EAX) and
    +                  (taicpu(p).oper[1]^.reg = NR_EDX) and
    +                  GetNextInstruction(hp1, hp2) and
    +                  MatchInstruction(hp2, A_XOR, [S_L]) and
    +                  MatchOperand(taicpu(hp2).oper[0]^, NR_EAX) and
    +                  MatchOperand(taicpu(hp2).oper[1]^, NR_EDX) and
    +
    +                  GetNextInstruction(hp2, hp3) and
    +                  MatchInstruction(hp3, A_SUB, [S_L]) and
    +                  MatchOperand(taicpu(hp3).oper[0]^, NR_EAX) and
    +                  MatchOperand(taicpu(hp3).oper[1]^, NR_EDX) then
    +                  begin
    +                    { Change:
    +                        movl %eax,%edx
    +                        sarl $31,%eax
    +                        xorl %eax,%edx
    +                        subl %eax,%edx
    +                        (Instruction that uses %edx)
    +                        (%eax deallocated)
    +                        (%edx deallocated)
    +                      To:
    +                        cltd
    +                        xorl %edx,%eax  <-- Note the registers have swapped
    +                        subl %edx,%eax
    +                        (Instruction that uses %eax) <-- %eax rather than %edx
    +                    }
    +
    +                    TransferUsedRegs(TmpUsedRegs);
    +                    UpdateUsedRegs(TmpUsedRegs, tai(p.Next));
    +                    UpdateUsedRegs(TmpUsedRegs, tai(hp1.Next));
    +                    UpdateUsedRegs(TmpUsedRegs, tai(hp2.Next));
    +
    +                    if not RegUsedAfterInstruction(NR_EAX, hp3, TmpUsedRegs) then
    +                      begin
    +                        if GetNextInstruction(hp3, hp4) and
    +                          not RegModifiedByInstruction(NR_EDX, hp4) and
    +                          not RegUsedAfterInstruction(NR_EDX, hp4, TmpUsedRegs) then
    +                          begin
    +                            DebugMsg(SPeepholeOptimization + 'abs() intrinsic optimisation', p);
    +
    +                            taicpu(p).opcode := A_CDQ;
    +                            taicpu(p).clearop(1);
    +                            taicpu(p).clearop(0);
    +                            taicpu(p).ops:=0;
    +
    +                            AsmL.Remove(hp1);
    +                            hp1.Free;
    +
    +                            taicpu(hp2).loadreg(0, NR_EDX);
    +                            taicpu(hp2).loadreg(1, NR_EAX);
    +
    +                            taicpu(hp3).loadreg(0, NR_EDX);
    +                            taicpu(hp3).loadreg(1, NR_EAX);
    +
    +                            AllocRegBetween(NR_EAX, hp3, hp4, TmpUsedRegs);
    +                            { Convert references in the following instruction (hp4) from %edx to %eax }
    +                            for OperIdx := 0 to taicpu(hp4).ops - 1 do
    +                              with taicpu(hp4).oper[OperIdx]^ do
    +                                case typ of
    +                                  top_reg:
    +                                    if reg = NR_EDX then
    +                                      reg := NR_EAX;
    +                                  top_ref:
    +                                    begin
    +                                      if ref^.base = NR_EDX then
    +                                        ref^.base := NR_EAX;
    +                                      if ref^.index = NR_EDX then
    +                                        ref^.index := NR_EAX;
    +                                    end;
    +                                  else
    +                                    { Do nothing };
    +                                end;
    +                          end;
    +                      end;
    +{$else x86_64}
    +                  end;
    +              end
    +            else if MatchOperand(taicpu(hp1).oper[0]^, 63) and
    +              { the use of %rdx also covers the opsize being S_Q }
    +              MatchOperand(taicpu(hp1).oper[1]^, NR_RDX) then
    +              begin
    +                { Note it has to be specifically "movq %rax,%rdx", and those specific sub-registers }
    +                if (taicpu(p).oper[0]^.reg = NR_RAX) and
    +                  (taicpu(p).oper[1]^.reg = NR_RDX) then
    +                  begin
    +                    { Change:
    +                        movq %rax,%rdx
    +                        sarq $63,%rdx
    +                      To:
    +                        cqto
    +                    }
    +                    DebugMsg(SPeepholeOptimization + 'MovSar2Cqto', p);
    +                    Asml.Remove(hp1);
    +                    hp1.Free;
    +                    taicpu(p).opcode := A_CQO;
    +                    taicpu(p).opsize := S_NO;
    +                    taicpu(p).clearop(1);
    +                    taicpu(p).clearop(0);
    +                    taicpu(p).ops:=0;
    +                    Result := True;
    +                  end
    +                else if (cs_opt_size in current_settings.optimizerswitches) and
    +                  (taicpu(p).oper[0]^.reg = NR_RDX) and
    +                  (taicpu(p).oper[1]^.reg = NR_RAX) then
    +                  begin
    +                    { Change:
    +                        movq %rdx,%rax
    +                        sarq $63,%rdx
    +                      To:
    +                        movq %rdx,%rax
    +                        cqto
    +
    +                      Note that this creates a dependency between the two instructions,
    +                        so only perform if optimising for size.
    +                    }
    +                    DebugMsg(SPeepholeOptimization + 'MovSar2MovCqto', p);
    +                    taicpu(hp1).opcode := A_CQO;
    +                    taicpu(hp1).opsize := S_NO;
    +                    taicpu(hp1).clearop(1);
    +                    taicpu(hp1).clearop(0);
    +                    taicpu(hp1).ops:=0;
    +{$endif x86_64}
    +                  end;
    +              end;
    +          end
    +        else if MatchInstruction(hp1, A_MOV, []) and
    +          (taicpu(hp1).oper[1]^.typ = top_reg) then
    +          { Though "GetNextInstruction" could be factored out, along with
    +            the instructions that depend on hp2, it is an expensive call that
    +            should be delayed for as long as possible, hence we do cheaper
    +            checks first that are likely to be False. [Kit] }
    +          begin
    +
    +            if MatchOperand(taicpu(p).oper[1]^, NR_EDX) and
    +              (
    +                (
    +                  (taicpu(hp1).oper[1]^.reg = NR_EAX) and
    +                  (
    +                    MatchOperand(taicpu(hp1).oper[0]^, taicpu(p).oper[0]^) or
    +                    MatchOperand(taicpu(hp1).oper[0]^, NR_EDX)
    +                  )
    +                ) or
    +                (
    +                  (taicpu(hp1).oper[1]^.reg = NR_EDX) and
    +                  (
    +                    MatchOperand(taicpu(hp1).oper[0]^, taicpu(p).oper[0]^) or
    +                    MatchOperand(taicpu(hp1).oper[0]^, NR_EAX)
    +                  )
    +                )
    +              ) and
    +              GetNextInstruction(hp1, hp2) and
    +              MatchInstruction(hp2, A_SAR, []) and
    +              MatchOperand(taicpu(hp2).oper[0]^, 31) then
    +              begin
    +                if MatchOperand(taicpu(hp2).oper[1]^, NR_EDX) then
    +                  begin
    +                    { Change:
    +                        movl r/m,%edx         movl r/m,%eax         movl r/m,%edx         movl r/m,%eax
    +                        movl %edx,%eax   or   movl %eax,%edx   or   movl r/m,%eax    or   movl r/m,%edx
    +                        sarl $31,%edx         sarl $31,%edx         sarl $31,%edx         sarl $31,%edx
    +                      To:
    +                        movl r/m,%eax    <- Note the change in register
    +                        cltd
    +                    }
    +                    DebugMsg(SPeepholeOptimization + 'MovMovSar2MovCltd', p);
    +
    +                    AllocRegBetween(NR_EAX, p, hp1, UsedRegs);
    +                    taicpu(p).loadreg(1, NR_EAX);
    +
    +                    taicpu(hp1).opcode := A_CDQ;
    +                    taicpu(hp1).clearop(1);
    +                    taicpu(hp1).clearop(0);
    +                    taicpu(hp1).ops:=0;
    +
    +                    AsmL.Remove(hp2);
    +                    hp2.Free;
    +(*
    +{$ifdef x86_64}
    +                  end
    +                else if MatchOperand(taicpu(hp2).oper[1]^, NR_RDX) and
    +                  { This code sequence does not get generated - however it might become useful
    +                    if and when 128-bit signed integer types make an appearance, so the code
    +                    is kept here for when it is eventually needed. [Kit] }
    +                  (
    +                    (
    +                      (taicpu(hp1).oper[1]^.reg = NR_RAX) and
    +                      (
    +                        MatchOperand(taicpu(hp1).oper[0]^, taicpu(p).oper[0]^) or
    +                        MatchOperand(taicpu(hp1).oper[0]^, NR_RDX)
    +                      )
    +                    ) or
    +                    (
    +                      (taicpu(hp1).oper[1]^.reg = NR_RDX) and
    +                      (
    +                        MatchOperand(taicpu(hp1).oper[0]^, taicpu(p).oper[0]^) or
    +                        MatchOperand(taicpu(hp1).oper[0]^, NR_RAX)
    +                      )
    +                    )
    +                  ) and
    +                  GetNextInstruction(hp1, hp2) and
    +                  MatchInstruction(hp2, A_SAR, [S_Q]) and
    +                  MatchOperand(taicpu(hp2).oper[0]^, 63) and
    +                  MatchOperand(taicpu(hp2).oper[1]^, NR_RDX) then
    +                  begin
    +                    { Change:
    +                        movq r/m,%rdx         movq r/m,%rax         movq r/m,%rdx         movq r/m,%rax
    +                        movq %rdx,%rax   or   movq %rax,%rdx   or   movq r/m,%rax    or   movq r/m,%rdx
    +                        sarq $63,%rdx         sarq $63,%rdx         sarq $63,%rdx         sarq $63,%rdx
    +                      To:
    +                        movq r/m,%rax    <- Note the change in register
    +                        cqto
    +                    }
    +                    DebugMsg(SPeepholeOptimization + 'MovMovSar2MovCqto', p);
    +
    +                    AllocRegBetween(NR_RAX, p, hp1, UsedRegs);
    +                    taicpu(p).loadreg(1, NR_RAX);
    +
    +                    taicpu(hp1).opcode := A_CQO;
    +                    taicpu(hp1).clearop(1);
    +                    taicpu(hp1).clearop(0);
    +                    taicpu(hp1).ops:=0;
    +
    +                    AsmL.Remove(hp2);
    +                    hp2.Free;
    +{$endif x86_64}
    +*)
    +                  end;
    +              end;
    +          end
             else if (taicpu(p).oper[0]^.typ = top_ref) and
               (hp1.typ = ait_instruction) and
               { while the GetNextInstruction(hp1,hp2) call could be factored out,
    
    mov-cdq.patch (14,136 bytes)
  • movsx-cdq.patch (3,262 bytes)
    Index: compiler/i386/aoptcpu.pas
    ===================================================================
    --- compiler/i386/aoptcpu.pas	(revision 43888)
    +++ compiler/i386/aoptcpu.pas	(working copy)
    @@ -324,6 +324,8 @@
                        end;
                     A_TEST, A_OR:
                       Result:=PostPeepholeOptTestOr(p);
    +                A_MOVSX:
    +                  Result:=PostPeepholeOptMOVSX(p);
                     else
                       ;
                   end;
    Index: compiler/x86/aoptx86.pas
    ===================================================================
    --- compiler/x86/aoptx86.pas	(revision 43880)
    +++ compiler/x86/aoptx86.pas	(working copy)
    @@ -88,6 +88,7 @@
             function PostPeepholeOptMovzx(var p : tai) : Boolean;
             function PostPeepholeOptXor(var p : tai) : Boolean;
     {$endif}
    +        function PostPeepholeOptMOVSX(var p : tai) : boolean;
             function PostPeepholeOptCmp(var p : tai) : Boolean;
             function PostPeepholeOptTestOr(var p : tai) : Boolean;
             function PostPeepholeOptCall(var p : tai) : Boolean;
    @@ -4747,6 +4748,50 @@
           end;
     
     
    +    function TX86AsmOptimizer.PostPeepholeOptMOVSX(var p : tai) : boolean;
    +      begin
    +        Result := False;
    +        if not MatchOpType(taicpu(p), top_reg, top_reg) then
    +          Exit;
    +
    +        { Convert:
    +            movswl %ax,%eax  -> cwtl
    +            movslq %eax,%rax -> cdqe
    +
    +            NOTE: Don't convert movswl %al,%ax to cbw, because cbw and cwde
    +              refer to the same opcode and depends only on the assembler's
    +              current operand-size attribute. [Kit]
    +        }
    +        with taicpu(p) do
    +          case opsize of
    +            S_WL:
    +              if (oper[0]^.reg = NR_AX) and (oper[1]^.reg = NR_EAX) then
    +                begin
    +                  DebugMsg(SPeepholeOptimization + 'Converted movswl %ax,%eax to cwtl', p);
    +                  opcode := A_CWDE;
    +                  clearop(0);
    +                  clearop(1);
    +                  ops := 0;
    +                  Result := True;
    +                end;
    +{$ifdef x86_64}
    +            S_LQ:
    +              if (oper[0]^.reg = NR_EAX) and (oper[1]^.reg = NR_RAX) then
    +                begin
    +                  DebugMsg(SPeepholeOptimization + 'Converted movslq %eax,%rax to cltq', p);
    +                  opcode := A_CDQE;
    +                  clearop(0);
    +                  clearop(1);
    +                  ops := 0;
    +                  Result := True;
    +                end;
    +{$endif x86_64}
    +            else
    +              ;
    +          end;
    +      end;
    +
    +
         function TX86AsmOptimizer.PostPeepholeOptCmp(var p : tai) : Boolean;
           begin
             Result:=false;
    Index: compiler/x86_64/aoptcpu.pas
    ===================================================================
    --- compiler/x86_64/aoptcpu.pas	(revision 43888)
    +++ compiler/x86_64/aoptcpu.pas	(working copy)
    @@ -174,6 +174,8 @@
                   case taicpu(p).opcode of
                     A_MOV:
                       Result:=PostPeepholeOptMov(p);
    +                A_MOVSX:
    +                  Result:=PostPeepholeOptMOVSX(p);
                     A_MOVZX:
                       Result:=PostPeepholeOptMovzx(p);
                     A_CMP:
    
    movsx-cdq.patch (3,262 bytes)

Activities

J. Gareth Moreton

2020-01-11 00:07

developer  

mov-cdq.patch (14,136 bytes)
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 43880)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -3363,9 +3363,10 @@
        end;
 
       var
-       hp1,hp2: tai;
-{$ifdef x86_64}
-       hp3: tai;
+       hp1,hp2, hp3: tai;
+{$ifndef x86_64}
+       hp4: tai;
+       OperIdx: Integer;
 {$endif x86_64}
       begin
         Result:=false;
@@ -3510,6 +3511,291 @@
             Result:=true;
             exit;
           end
+        else if MatchOpType(taicpu(p),top_reg,top_reg) and
+          MatchInstruction(hp1, A_SAR, []) then
+          begin
+            if MatchOperand(taicpu(hp1).oper[0]^, 31) then
+              begin
+                { the use of %edx also covers the opsize being S_L }
+                if MatchOperand(taicpu(hp1).oper[1]^, NR_EDX) then
+                  begin
+                    { Note it has to be specifically "movl %eax,%edx", and those specific sub-registers }
+                    if (taicpu(p).oper[0]^.reg = NR_EAX) and
+                      (taicpu(p).oper[1]^.reg = NR_EDX) then
+                      begin
+                        { Change:
+                            movl %eax,%edx
+                            sarl $31,%edx
+                          To:
+                            cltd
+                        }
+                        DebugMsg(SPeepholeOptimization + 'MovSar2Cltd', p);
+                        Asml.Remove(hp1);
+                        hp1.Free;
+                        taicpu(p).opcode := A_CDQ;
+                        taicpu(p).opsize := S_NO;
+                        taicpu(p).clearop(1);
+                        taicpu(p).clearop(0);
+                        taicpu(p).ops:=0;
+                        Result := True;
+                      end
+                    else if (cs_opt_size in current_settings.optimizerswitches) and
+                      (taicpu(p).oper[0]^.reg = NR_EDX) and
+                      (taicpu(p).oper[1]^.reg = NR_EAX) then
+                      begin
+                        { Change:
+                            movl %edx,%eax
+                            sarl $31,%edx
+                          To:
+                            movl %edx,%eax
+                            cltd
+
+                          Note that this creates a dependency between the two instructions,
+                            so only perform if optimising for size.
+                        }
+                        DebugMsg(SPeepholeOptimization + 'MovSar2MovCltd', p);
+                        taicpu(hp1).opcode := A_CDQ;
+                        taicpu(hp1).opsize := S_NO;
+                        taicpu(hp1).clearop(1);
+                        taicpu(hp1).clearop(0);
+                        taicpu(hp1).ops:=0;
+                      end;
+{$ifndef x86_64}
+                  end
+                { Don't bother if CMOV is supported, because a more optimal
+                  sequence would have been generated for the Abs() intrinsic }
+                else if not(CPUX86_HAS_CMOV in cpu_capabilities[current_settings.cputype]) and
+                  { the use of %eax also covers the opsize being S_L }
+                  MatchOperand(taicpu(hp1).oper[1]^, NR_EAX) and
+                  (taicpu(p).oper[0]^.reg = NR_EAX) and
+                  (taicpu(p).oper[1]^.reg = NR_EDX) and
+                  GetNextInstruction(hp1, hp2) and
+                  MatchInstruction(hp2, A_XOR, [S_L]) and
+                  MatchOperand(taicpu(hp2).oper[0]^, NR_EAX) and
+                  MatchOperand(taicpu(hp2).oper[1]^, NR_EDX) and
+
+                  GetNextInstruction(hp2, hp3) and
+                  MatchInstruction(hp3, A_SUB, [S_L]) and
+                  MatchOperand(taicpu(hp3).oper[0]^, NR_EAX) and
+                  MatchOperand(taicpu(hp3).oper[1]^, NR_EDX) then
+                  begin
+                    { Change:
+                        movl %eax,%edx
+                        sarl $31,%eax
+                        xorl %eax,%edx
+                        subl %eax,%edx
+                        (Instruction that uses %edx)
+                        (%eax deallocated)
+                        (%edx deallocated)
+                      To:
+                        cltd
+                        xorl %edx,%eax  <-- Note the registers have swapped
+                        subl %edx,%eax
+                        (Instruction that uses %eax) <-- %eax rather than %edx
+                    }
+
+                    TransferUsedRegs(TmpUsedRegs);
+                    UpdateUsedRegs(TmpUsedRegs, tai(p.Next));
+                    UpdateUsedRegs(TmpUsedRegs, tai(hp1.Next));
+                    UpdateUsedRegs(TmpUsedRegs, tai(hp2.Next));
+
+                    if not RegUsedAfterInstruction(NR_EAX, hp3, TmpUsedRegs) then
+                      begin
+                        if GetNextInstruction(hp3, hp4) and
+                          not RegModifiedByInstruction(NR_EDX, hp4) and
+                          not RegUsedAfterInstruction(NR_EDX, hp4, TmpUsedRegs) then
+                          begin
+                            DebugMsg(SPeepholeOptimization + 'abs() intrinsic optimisation', p);
+
+                            taicpu(p).opcode := A_CDQ;
+                            taicpu(p).clearop(1);
+                            taicpu(p).clearop(0);
+                            taicpu(p).ops:=0;
+
+                            AsmL.Remove(hp1);
+                            hp1.Free;
+
+                            taicpu(hp2).loadreg(0, NR_EDX);
+                            taicpu(hp2).loadreg(1, NR_EAX);
+
+                            taicpu(hp3).loadreg(0, NR_EDX);
+                            taicpu(hp3).loadreg(1, NR_EAX);
+
+                            AllocRegBetween(NR_EAX, hp3, hp4, TmpUsedRegs);
+                            { Convert references in the following instruction (hp4) from %edx to %eax }
+                            for OperIdx := 0 to taicpu(hp4).ops - 1 do
+                              with taicpu(hp4).oper[OperIdx]^ do
+                                case typ of
+                                  top_reg:
+                                    if reg = NR_EDX then
+                                      reg := NR_EAX;
+                                  top_ref:
+                                    begin
+                                      if ref^.base = NR_EDX then
+                                        ref^.base := NR_EAX;
+                                      if ref^.index = NR_EDX then
+                                        ref^.index := NR_EAX;
+                                    end;
+                                  else
+                                    { Do nothing };
+                                end;
+                          end;
+                      end;
+{$else x86_64}
+                  end;
+              end
+            else if MatchOperand(taicpu(hp1).oper[0]^, 63) and
+              { the use of %rdx also covers the opsize being S_Q }
+              MatchOperand(taicpu(hp1).oper[1]^, NR_RDX) then
+              begin
+                { Note it has to be specifically "movq %rax,%rdx", and those specific sub-registers }
+                if (taicpu(p).oper[0]^.reg = NR_RAX) and
+                  (taicpu(p).oper[1]^.reg = NR_RDX) then
+                  begin
+                    { Change:
+                        movq %rax,%rdx
+                        sarq $63,%rdx
+                      To:
+                        cqto
+                    }
+                    DebugMsg(SPeepholeOptimization + 'MovSar2Cqto', p);
+                    Asml.Remove(hp1);
+                    hp1.Free;
+                    taicpu(p).opcode := A_CQO;
+                    taicpu(p).opsize := S_NO;
+                    taicpu(p).clearop(1);
+                    taicpu(p).clearop(0);
+                    taicpu(p).ops:=0;
+                    Result := True;
+                  end
+                else if (cs_opt_size in current_settings.optimizerswitches) and
+                  (taicpu(p).oper[0]^.reg = NR_RDX) and
+                  (taicpu(p).oper[1]^.reg = NR_RAX) then
+                  begin
+                    { Change:
+                        movq %rdx,%rax
+                        sarq $63,%rdx
+                      To:
+                        movq %rdx,%rax
+                        cqto
+
+                      Note that this creates a dependency between the two instructions,
+                        so only perform if optimising for size.
+                    }
+                    DebugMsg(SPeepholeOptimization + 'MovSar2MovCqto', p);
+                    taicpu(hp1).opcode := A_CQO;
+                    taicpu(hp1).opsize := S_NO;
+                    taicpu(hp1).clearop(1);
+                    taicpu(hp1).clearop(0);
+                    taicpu(hp1).ops:=0;
+{$endif x86_64}
+                  end;
+              end;
+          end
+        else if MatchInstruction(hp1, A_MOV, []) and
+          (taicpu(hp1).oper[1]^.typ = top_reg) then
+          { Though "GetNextInstruction" could be factored out, along with
+            the instructions that depend on hp2, it is an expensive call that
+            should be delayed for as long as possible, hence we do cheaper
+            checks first that are likely to be False. [Kit] }
+          begin
+
+            if MatchOperand(taicpu(p).oper[1]^, NR_EDX) and
+              (
+                (
+                  (taicpu(hp1).oper[1]^.reg = NR_EAX) and
+                  (
+                    MatchOperand(taicpu(hp1).oper[0]^, taicpu(p).oper[0]^) or
+                    MatchOperand(taicpu(hp1).oper[0]^, NR_EDX)
+                  )
+                ) or
+                (
+                  (taicpu(hp1).oper[1]^.reg = NR_EDX) and
+                  (
+                    MatchOperand(taicpu(hp1).oper[0]^, taicpu(p).oper[0]^) or
+                    MatchOperand(taicpu(hp1).oper[0]^, NR_EAX)
+                  )
+                )
+              ) and
+              GetNextInstruction(hp1, hp2) and
+              MatchInstruction(hp2, A_SAR, []) and
+              MatchOperand(taicpu(hp2).oper[0]^, 31) then
+              begin
+                if MatchOperand(taicpu(hp2).oper[1]^, NR_EDX) then
+                  begin
+                    { Change:
+                        movl r/m,%edx         movl r/m,%eax         movl r/m,%edx         movl r/m,%eax
+                        movl %edx,%eax   or   movl %eax,%edx   or   movl r/m,%eax    or   movl r/m,%edx
+                        sarl $31,%edx         sarl $31,%edx         sarl $31,%edx         sarl $31,%edx
+                      To:
+                        movl r/m,%eax    <- Note the change in register
+                        cltd
+                    }
+                    DebugMsg(SPeepholeOptimization + 'MovMovSar2MovCltd', p);
+
+                    AllocRegBetween(NR_EAX, p, hp1, UsedRegs);
+                    taicpu(p).loadreg(1, NR_EAX);
+
+                    taicpu(hp1).opcode := A_CDQ;
+                    taicpu(hp1).clearop(1);
+                    taicpu(hp1).clearop(0);
+                    taicpu(hp1).ops:=0;
+
+                    AsmL.Remove(hp2);
+                    hp2.Free;
+(*
+{$ifdef x86_64}
+                  end
+                else if MatchOperand(taicpu(hp2).oper[1]^, NR_RDX) and
+                  { This code sequence does not get generated - however it might become useful
+                    if and when 128-bit signed integer types make an appearance, so the code
+                    is kept here for when it is eventually needed. [Kit] }
+                  (
+                    (
+                      (taicpu(hp1).oper[1]^.reg = NR_RAX) and
+                      (
+                        MatchOperand(taicpu(hp1).oper[0]^, taicpu(p).oper[0]^) or
+                        MatchOperand(taicpu(hp1).oper[0]^, NR_RDX)
+                      )
+                    ) or
+                    (
+                      (taicpu(hp1).oper[1]^.reg = NR_RDX) and
+                      (
+                        MatchOperand(taicpu(hp1).oper[0]^, taicpu(p).oper[0]^) or
+                        MatchOperand(taicpu(hp1).oper[0]^, NR_RAX)
+                      )
+                    )
+                  ) and
+                  GetNextInstruction(hp1, hp2) and
+                  MatchInstruction(hp2, A_SAR, [S_Q]) and
+                  MatchOperand(taicpu(hp2).oper[0]^, 63) and
+                  MatchOperand(taicpu(hp2).oper[1]^, NR_RDX) then
+                  begin
+                    { Change:
+                        movq r/m,%rdx         movq r/m,%rax         movq r/m,%rdx         movq r/m,%rax
+                        movq %rdx,%rax   or   movq %rax,%rdx   or   movq r/m,%rax    or   movq r/m,%rdx
+                        sarq $63,%rdx         sarq $63,%rdx         sarq $63,%rdx         sarq $63,%rdx
+                      To:
+                        movq r/m,%rax    <- Note the change in register
+                        cqto
+                    }
+                    DebugMsg(SPeepholeOptimization + 'MovMovSar2MovCqto', p);
+
+                    AllocRegBetween(NR_RAX, p, hp1, UsedRegs);
+                    taicpu(p).loadreg(1, NR_RAX);
+
+                    taicpu(hp1).opcode := A_CQO;
+                    taicpu(hp1).clearop(1);
+                    taicpu(hp1).clearop(0);
+                    taicpu(hp1).ops:=0;
+
+                    AsmL.Remove(hp2);
+                    hp2.Free;
+{$endif x86_64}
+*)
+                  end;
+              end;
+          end
         else if (taicpu(p).oper[0]^.typ = top_ref) and
           (hp1.typ = ait_instruction) and
           { while the GetNextInstruction(hp1,hp2) call could be factored out,
mov-cdq.patch (14,136 bytes)

Florian

2020-01-11 09:56

administrator   ~0120322

One first comment: please do not add { do nothing } comments. Those are obvious comments and have no added value. The correct comment would be: { else clause added to avoid that the compiler complains about uncovered cases }, but I do not think this is needed either.

J. Gareth Moreton

2020-01-11 13:00

developer   ~0120324

Oh okay. I just felt they were a bit more informative compared to a lone semicolon, where a less-informed programmer may be tempted to remove the else block, while the presence of the comment shows that its presence is deliberate.

J. Gareth Moreton

2020-01-11 15:12

developer   ~0120328

Updated movsx-cdq.patch to remove "{ Do nothing }" comments.

J. Gareth Moreton

2020-01-11 15:58

developer  

movsx-cdq.patch (3,262 bytes)
Index: compiler/i386/aoptcpu.pas
===================================================================
--- compiler/i386/aoptcpu.pas	(revision 43888)
+++ compiler/i386/aoptcpu.pas	(working copy)
@@ -324,6 +324,8 @@
                    end;
                 A_TEST, A_OR:
                   Result:=PostPeepholeOptTestOr(p);
+                A_MOVSX:
+                  Result:=PostPeepholeOptMOVSX(p);
                 else
                   ;
               end;
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 43880)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -88,6 +88,7 @@
         function PostPeepholeOptMovzx(var p : tai) : Boolean;
         function PostPeepholeOptXor(var p : tai) : Boolean;
 {$endif}
+        function PostPeepholeOptMOVSX(var p : tai) : boolean;
         function PostPeepholeOptCmp(var p : tai) : Boolean;
         function PostPeepholeOptTestOr(var p : tai) : Boolean;
         function PostPeepholeOptCall(var p : tai) : Boolean;
@@ -4747,6 +4748,50 @@
       end;
 
 
+    function TX86AsmOptimizer.PostPeepholeOptMOVSX(var p : tai) : boolean;
+      begin
+        Result := False;
+        if not MatchOpType(taicpu(p), top_reg, top_reg) then
+          Exit;
+
+        { Convert:
+            movswl %ax,%eax  -> cwtl
+            movslq %eax,%rax -> cdqe
+
+            NOTE: Don't convert movswl %al,%ax to cbw, because cbw and cwde
+              refer to the same opcode and depends only on the assembler's
+              current operand-size attribute. [Kit]
+        }
+        with taicpu(p) do
+          case opsize of
+            S_WL:
+              if (oper[0]^.reg = NR_AX) and (oper[1]^.reg = NR_EAX) then
+                begin
+                  DebugMsg(SPeepholeOptimization + 'Converted movswl %ax,%eax to cwtl', p);
+                  opcode := A_CWDE;
+                  clearop(0);
+                  clearop(1);
+                  ops := 0;
+                  Result := True;
+                end;
+{$ifdef x86_64}
+            S_LQ:
+              if (oper[0]^.reg = NR_EAX) and (oper[1]^.reg = NR_RAX) then
+                begin
+                  DebugMsg(SPeepholeOptimization + 'Converted movslq %eax,%rax to cltq', p);
+                  opcode := A_CDQE;
+                  clearop(0);
+                  clearop(1);
+                  ops := 0;
+                  Result := True;
+                end;
+{$endif x86_64}
+            else
+              ;
+          end;
+      end;
+
+
     function TX86AsmOptimizer.PostPeepholeOptCmp(var p : tai) : Boolean;
       begin
         Result:=false;
Index: compiler/x86_64/aoptcpu.pas
===================================================================
--- compiler/x86_64/aoptcpu.pas	(revision 43888)
+++ compiler/x86_64/aoptcpu.pas	(working copy)
@@ -174,6 +174,8 @@
               case taicpu(p).opcode of
                 A_MOV:
                   Result:=PostPeepholeOptMov(p);
+                A_MOVSX:
+                  Result:=PostPeepholeOptMOVSX(p);
                 A_MOVZX:
                   Result:=PostPeepholeOptMovzx(p);
                 A_CMP:
movsx-cdq.patch (3,262 bytes)

Florian

2020-01-12 10:21

administrator   ~0120352

Thanks, applied. Removed another "do nothing".

Issue History

Date Modified Username Field Change
2020-01-11 00:07 J. Gareth Moreton New Issue
2020-01-11 00:07 J. Gareth Moreton File Added: mov-cdq.patch
2020-01-11 00:07 J. Gareth Moreton File Added: movsx-cdq.patch
2020-01-11 00:08 J. Gareth Moreton Priority normal => low
2020-01-11 00:08 J. Gareth Moreton FPCTarget => -
2020-01-11 00:08 J. Gareth Moreton Tag Attached: patch
2020-01-11 00:08 J. Gareth Moreton Tag Attached: compiler
2020-01-11 00:08 J. Gareth Moreton Tag Attached: optimizations
2020-01-11 00:08 J. Gareth Moreton Tag Attached: x86_64
2020-01-11 00:08 J. Gareth Moreton Tag Attached: x86
2020-01-11 00:08 J. Gareth Moreton Tag Attached: i386
2020-01-11 00:11 J. Gareth Moreton Description Updated View Revisions
2020-01-11 00:14 J. Gareth Moreton Description Updated View Revisions
2020-01-11 00:22 J. Gareth Moreton Description Updated View Revisions
2020-01-11 09:56 Florian Note Added: 0120322
2020-01-11 13:00 J. Gareth Moreton Note Added: 0120324
2020-01-11 15:12 J. Gareth Moreton File Deleted: movsx-cdq.patch
2020-01-11 15:12 J. Gareth Moreton File Added: movsx-cdq.patch
2020-01-11 15:12 J. Gareth Moreton Note Added: 0120328
2020-01-11 15:58 J. Gareth Moreton File Deleted: movsx-cdq.patch
2020-01-11 15:58 J. Gareth Moreton File Added: movsx-cdq.patch
2020-01-12 10:21 Florian Assigned To => Florian
2020-01-12 10:21 Florian Status new => resolved
2020-01-12 10:21 Florian Resolution open => fixed
2020-01-12 10:21 Florian Fixed in Version => 3.3.1
2020-01-12 10:21 Florian Fixed in Revision => 43917,43918
2020-01-12 10:21 Florian Note Added: 0120352