View Issue Details

IDProjectCategoryView StatusLast Update
0038882FPCCompilerpublic2021-05-15 17:42
ReporterJ. Gareth Moreton Assigned ToFlorian  
PrioritylowSeverityminorReproducibilityN/A
Status resolvedResolutionfixed 
Platformi386 and x86_64OSMicrosoft Windows 
Product Version3.3.1 
Fixed in Version3.3.1 
Summary0038882: [Patch] x86 MOVZX/CMP optimisation
DescriptionThis patch makes an addition to the in-depth OptPass2Movx routine so that it can accommodate CMP instructions. In other words, if there are calls to MOVZX that then manipulate the register and then use CMP to compare it to a constant, the Peephole Optimizer will attempt to convert the MOVZX instruction to a MOV instruction and decrease the size of the sub-register if the constant sizes are small enough. Generally it won't make the code any faster, but it will decrease its size in some places without penalty.
Steps To ReproduceApply patch and confirm correct code compilation and slight code size reduction
Additional InformationAn example from sfpux80:

Before:

    .seh_proc SFPUX80_$$_FLOATX80_IS_SIGNALING_NAN$FLOATX80$$BYTE
    ...
    movzwl 8(%rsp),%eax
    andl $32767,%eax
    cmpl $32767,%eax

After:

    .seh_proc SFPUX80_$$_FLOATX80_IS_SIGNALING_NAN$FLOATX80$$BYTE
    ...
# Peephole Optimization: Movzx2Mov 1a
    movw 8(%rsp),%ax
    andw $32767,%ax
    cmpw $32767,%ax

On this sequence alone, by reducing the register size from %eax and %ax, 5 bytes are saved: one fewer byte from using MOV instead of MOVZX, and 2 bytes each for ANDW and CMPW because the constants are now 16-bit instead of 32-bit.
Tagscompiler, i386, optimizations, patch, x86, x86_64
Fixed in Revision49366
FPCOldBugId
FPCTarget-
Attached Files

Activities

J. Gareth Moreton

2021-05-13 01:50

developer  

MovzCmp.patch (7,525 bytes)   
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 49349)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -5156,22 +5156,19 @@
         InstrMax, Index: Integer;
         UpperLimit, TrySmallerLimit: TCgInt;
 
+        PreMessage: string;
+
         { Data flow analysis }
         TestValMin, TestValMax: TCgInt;
-        SmallerOverflow: Boolean;
+        SmallerOverflow, FirstCheck: Boolean;
 
       begin
         Result := False;
         p_removed := False;
 
-        { This is anything but quick! }
-        if not(cs_opt_level2 in current_settings.optimizerswitches) then
-          Exit;
-
         SetLength(InstrList, 0);
         InstrMax := -1;
         ThisReg := taicpu(p).oper[1]^.reg;
-        hp1 := p;
 
         case taicpu(p).opsize of
           S_BW, S_BL:
@@ -5193,6 +5190,33 @@
             InternalError(2020112301);
         end;
 
+        { With MinSize and MaxSize set, we can check some other optimisations
+          first, before attempting the expensive data flow analysis }
+        FirstCheck := GetNextInstructionUsingReg(p, hp1, ThisReg) and
+          (hp1.typ = ait_instruction) and
+          (
+            { Under -O1 and -O2, GetNextInstructionUsingReg may return an
+              instruction that doesn't actually contain ThisReg }
+            (cs_opt_level3 in current_settings.optimizerswitches) or
+            RegInInstruction(ThisReg, hp1)
+          );
+
+        { Data-flow analysis won't get anywhere since there was no instruction match }
+        if not FirstCheck then
+          Exit;
+
+        { Check for:
+            movz ##,%reg
+            cmp  $y,%reg
+            (%reg deallocated)
+
+          Change register size so movz becomes mov
+        }
+
+        { This is anything but quick! }
+        if not(cs_opt_level2 in current_settings.optimizerswitches) then
+          Exit;
+
         TestValMin := 0;
         TestValMax := UpperLimit;
         TrySmallerLimit := UpperLimit;
@@ -5200,15 +5224,20 @@
         SmallerOverflow := False;
         RegChanged := False;
 
-        while GetNextInstructionUsingReg(hp1, hp1, ThisReg) and
-          (hp1.typ = ait_instruction) and
+        while FirstCheck { No need to waste checking for the next instruction again } or
           (
-            { Under -O1 and -O2, GetNextInstructionUsingReg may return an
-              instruction that doesn't actually contain ThisReg }
-            (cs_opt_level3 in current_settings.optimizerswitches) or
-            RegInInstruction(ThisReg, hp1)
+            GetNextInstructionUsingReg(hp1, hp1, ThisReg) and
+            (hp1.typ = ait_instruction) and
+            (
+              { Under -O1 and -O2, GetNextInstructionUsingReg may return an
+                instruction that doesn't actually contain ThisReg }
+              (cs_opt_level3 in current_settings.optimizerswitches) or
+              RegInInstruction(ThisReg, hp1)
+            )
           ) do
           begin
+            FirstCheck := False;
+
             case taicpu(hp1).opcode of
               A_INC,A_DEC:
                 begin
@@ -5228,6 +5257,91 @@
                     end;
                 end;
 
+              A_CMP:
+                begin
+                  { Smallest signed value for MinSize }
+                  TrySmallerLimit := not (UpperLimit shr 1);
+
+                  if (taicpu(hp1).oper[1]^.typ <> top_reg) or
+                    { Has to be an exact match on the register }
+                    (taicpu(hp1).oper[1]^.reg <> ThisReg) or
+                    (taicpu(hp1).oper[0]^.typ <> top_const) or
+                    { Make sure the comparison value is not smaller than the
+                      smallest allowed signed value for the minimum size (e.g.
+                      -128 for 8-bit) }
+                    (taicpu(hp1).oper[0]^.val < TrySmallerLimit) then
+                    Break;
+
+                  TestValMin := TestValMin - taicpu(hp1).oper[0]^.val;
+                  TestValMax := TestValMax - taicpu(hp1).oper[0]^.val;
+
+                  if (TestValMin < TrySmallerLimit) or (TestValMax < TrySmallerLimit) or
+                    (TestValMin > UpperLimit) or (TestValMax > UpperLimit) then
+                    { Overflow }
+                    Break;
+
+                  { Check to see if the active register is used afterwards }
+                  TransferUsedRegs(TmpUsedRegs);
+                  IncludeRegInUsedRegs(ThisReg, TmpUsedRegs);
+                  if not RegUsedAfterInstruction(ThisReg, hp1, TmpUsedRegs) then
+                    begin
+                      case MinSize of
+                        S_B:
+                          TargetSubReg := R_SUBL;
+                        S_W:
+                          TargetSubReg := R_SUBW;
+                        else
+                          InternalError(2021051002);
+                      end;
+
+                      { Update the register to its new size }
+                      ThisReg := newreg(R_INTREGISTER, getsupreg(ThisReg), TargetSubReg);
+
+                      taicpu(hp1).oper[1]^.reg := ThisReg;
+                      taicpu(hp1).opsize := MinSize;
+
+                      { Convert the input MOVZX to a MOV }
+                      if (taicpu(p).oper[0]^.typ = top_reg) and
+                        SuperRegistersEqual(taicpu(p).oper[0]^.reg, ThisReg) then
+                        begin
+                          { Or remove it completely! }
+                          DebugMsg(SPeepholeOptimization + 'Movzx2Nop 1a', p);
+                          RemoveCurrentP(p);
+                          p_removed := True;
+                        end
+                      else
+                        begin
+                          DebugMsg(SPeepholeOptimization + 'Movzx2Mov 1a', p);
+                          taicpu(p).opcode := A_MOV;
+                          taicpu(p).oper[1]^.reg := ThisReg;
+                          taicpu(p).opsize := MinSize;
+                        end;
+
+                      if (InstrMax >= 0) then
+                        begin
+                          for Index := 0 to InstrMax do
+                            begin
+
+                              { If p_removed is true, then the original MOV/Z was removed
+                                and removing the AND instruction may not be safe if it
+                                appears first }
+                              if (InstrList[Index].oper[InstrList[Index].ops - 1]^.typ <> top_reg) then
+                                InternalError(2020112311);
+
+                              if InstrList[Index].oper[0]^.typ = top_reg then
+                                InstrList[Index].oper[0]^.reg := ThisReg;
+
+                              InstrList[Index].oper[InstrList[Index].ops - 1]^.reg := ThisReg;
+                              InstrList[Index].opsize := MinSize;
+                            end;
+
+                        end;
+
+                      Result := True;
+                      Exit;
+                    end;
+                end;
+
               { OR and XOR are not included because they can too easily fool
                 the data flow analysis (they can cause non-linear behaviour) }
               A_ADD,A_SUB,A_AND,A_SHL,A_SHR:
MovzCmp.patch (7,525 bytes)   

Florian

2021-05-14 20:50

administrator   ~0130880

Thanks, applied.

Do-wan Kim

2021-05-15 03:40

reporter   ~0130882

It makes errors on building win32.

sysstr.inc(633,24) Error: Asm: [mov ???,mem8] invalid combination of opcode and operands
sysstr.inc(633,24) Error: Asm: [cmp ???,imm8s] invalid combination of opcode and operands
sysstr.inc(645,20) Error: Asm: [mov ???,mem8] invalid combination of opcode and operands
sysstr.inc(645,20) Error: Asm: [cmp ???,imm8s] invalid combination of opcode and operands

J. Gareth Moreton

2021-05-15 04:54

developer   ~0130883

Oh nuts, thank you. I'll re-test.

J. Gareth Moreton

2021-05-15 04:55

developer   ~0130884

Reopened to fix i386-win32 bugs.

J. Gareth Moreton

2021-05-15 06:49

developer   ~0130887

Fixed the problem. Basically I didn't check to see if the registers could be represented in 8-bit (EAX, ECX, EDX and EBX only). Running tests now.

I did some extra clean-up at the same time, since there was residual code from a previous idea that I later decided against.

J. Gareth Moreton

2021-05-15 11:49

developer   ~0130895

Here we go - tested it on i386-win32 successfully.
movzx-cmp-fix.patch (5,250 bytes)   
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 49369)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -5160,12 +5160,16 @@
 
         { Data flow analysis }
         TestValMin, TestValMax: TCgInt;
-        SmallerOverflow, FirstCheck: Boolean;
+        SmallerOverflow: Boolean;
 
       begin
         Result := False;
         p_removed := False;
 
+        { This is anything but quick! }
+        if not(cs_opt_level2 in current_settings.optimizerswitches) then
+          Exit;
+
         SetLength(InstrList, 0);
         InstrMax := -1;
         ThisReg := taicpu(p).oper[1]^.reg;
@@ -5173,6 +5177,12 @@
         case taicpu(p).opsize of
           S_BW, S_BL:
             begin
+{$if defined(i386) or defined(i8086)}
+              { If the target size is 8-bit, make sure we can actually encode it }
+              if not (GetSupReg(ThisReg) in [RS_EAX,RS_EBX,RS_ECX,RS_EDX]) then
+                Exit;
+{$endif i386 or i8086}
+
               UpperLimit := $FF;
               MinSize := S_B;
               if taicpu(p).opsize = S_BW then
@@ -5190,33 +5200,6 @@
             InternalError(2020112301);
         end;
 
-        { With MinSize and MaxSize set, we can check some other optimisations
-          first, before attempting the expensive data flow analysis }
-        FirstCheck := GetNextInstructionUsingReg(p, hp1, ThisReg) and
-          (hp1.typ = ait_instruction) and
-          (
-            { Under -O1 and -O2, GetNextInstructionUsingReg may return an
-              instruction that doesn't actually contain ThisReg }
-            (cs_opt_level3 in current_settings.optimizerswitches) or
-            RegInInstruction(ThisReg, hp1)
-          );
-
-        { Data-flow analysis won't get anywhere since there was no instruction match }
-        if not FirstCheck then
-          Exit;
-
-        { Check for:
-            movz ##,%reg
-            cmp  $y,%reg
-            (%reg deallocated)
-
-          Change register size so movz becomes mov
-        }
-
-        { This is anything but quick! }
-        if not(cs_opt_level2 in current_settings.optimizerswitches) then
-          Exit;
-
         TestValMin := 0;
         TestValMax := UpperLimit;
         TrySmallerLimit := UpperLimit;
@@ -5224,19 +5207,17 @@
         SmallerOverflow := False;
         RegChanged := False;
 
-        while FirstCheck { No need to waste checking for the next instruction again } or
+        hp1 := p;
+
+        while GetNextInstructionUsingReg(hp1, hp1, ThisReg) and
+          (hp1.typ = ait_instruction) and
           (
-            GetNextInstructionUsingReg(hp1, hp1, ThisReg) and
-            (hp1.typ = ait_instruction) and
-            (
-              { Under -O1 and -O2, GetNextInstructionUsingReg may return an
-                instruction that doesn't actually contain ThisReg }
-              (cs_opt_level3 in current_settings.optimizerswitches) or
-              RegInInstruction(ThisReg, hp1)
-            )
+            { Under -O1 and -O2, GetNextInstructionUsingReg may return an
+              instruction that doesn't actually contain ThisReg }
+            (cs_opt_level3 in current_settings.optimizerswitches) or
+            RegInInstruction(ThisReg, hp1)
           ) do
           begin
-            FirstCheck := False;
 
             case taicpu(hp1).opcode of
               A_INC,A_DEC:
@@ -5259,9 +5240,6 @@
 
               A_CMP:
                 begin
-                  { Smallest signed value for MinSize }
-                  TrySmallerLimit := not (UpperLimit shr 1);
-
                   if (taicpu(hp1).oper[1]^.typ <> top_reg) or
                     { Has to be an exact match on the register }
                     (taicpu(hp1).oper[1]^.reg <> ThisReg) or
@@ -5269,7 +5247,11 @@
                     { Make sure the comparison value is not smaller than the
                       smallest allowed signed value for the minimum size (e.g.
                       -128 for 8-bit) }
-                    (taicpu(hp1).oper[0]^.val < TrySmallerLimit) then
+                    not (
+                      ((taicpu(hp1).oper[0]^.val and UpperLimit) = taicpu(hp1).oper[0]^.val) or
+                      { Is it in the negative range? }
+                      (((not taicpu(hp1).oper[0]^.val) and (UpperLimit shr 1)) = (not taicpu(hp1).oper[0]^.val))
+                    ) then
                     Break;
 
                   TestValMin := TestValMin - taicpu(hp1).oper[0]^.val;
@@ -5295,7 +5277,7 @@
                       end;
 
                       { Update the register to its new size }
-                      ThisReg := newreg(R_INTREGISTER, getsupreg(ThisReg), TargetSubReg);
+                      setsubreg(ThisReg, TargetSubReg);
 
                       taicpu(hp1).oper[1]^.reg := ThisReg;
                       taicpu(hp1).opsize := MinSize;
@@ -5604,7 +5586,7 @@
                   end;
 
                   { Update the register to its new size }
-                  ThisReg := newreg(R_INTREGISTER, getsupreg(ThisReg), TargetSubReg);
+                  setsubreg(ThisReg, TargetSubReg);
 
                   if TargetSize = MinSize then
                     begin
movzx-cmp-fix.patch (5,250 bytes)   

Florian

2021-05-15 16:03

administrator   ~0130899

Fix, applied. Sorry forgot to mention you in the commit message :(

J. Gareth Moreton

2021-05-15 17:42

developer   ~0130901

I'm sure I'll live without that one mention! Thanks though.

Issue History

Date Modified Username Field Change
2021-05-13 01:50 J. Gareth Moreton New Issue
2021-05-13 01:50 J. Gareth Moreton File Added: MovzCmp.patch
2021-05-13 01:51 J. Gareth Moreton Priority normal => low
2021-05-13 01:51 J. Gareth Moreton Additional Information Updated View Revisions
2021-05-13 01:51 J. Gareth Moreton FPCTarget => -
2021-05-13 02:05 J. Gareth Moreton Tag Attached: patch
2021-05-13 02:05 J. Gareth Moreton Tag Attached: compiler
2021-05-13 02:05 J. Gareth Moreton Tag Attached: optimizations
2021-05-13 02:05 J. Gareth Moreton Tag Attached: i386
2021-05-13 02:05 J. Gareth Moreton Tag Attached: x86
2021-05-13 02:05 J. Gareth Moreton Tag Attached: x86_64
2021-05-14 20:50 Florian Assigned To => Florian
2021-05-14 20:50 Florian Status new => resolved
2021-05-14 20:50 Florian Resolution open => fixed
2021-05-14 20:50 Florian Fixed in Version => 3.3.1
2021-05-14 20:50 Florian Fixed in Revision => 49366
2021-05-14 20:50 Florian Note Added: 0130880
2021-05-15 03:40 Do-wan Kim Note Added: 0130882
2021-05-15 04:54 J. Gareth Moreton Note Added: 0130883
2021-05-15 04:55 J. Gareth Moreton Assigned To Florian => J. Gareth Moreton
2021-05-15 04:55 J. Gareth Moreton Status resolved => feedback
2021-05-15 04:55 J. Gareth Moreton Resolution fixed => open
2021-05-15 04:55 J. Gareth Moreton Note Added: 0130884
2021-05-15 06:49 J. Gareth Moreton Note Added: 0130887
2021-05-15 11:49 J. Gareth Moreton Note Added: 0130895
2021-05-15 11:49 J. Gareth Moreton File Added: movzx-cmp-fix.patch
2021-05-15 11:50 J. Gareth Moreton Assigned To J. Gareth Moreton =>
2021-05-15 11:50 J. Gareth Moreton Status feedback => new
2021-05-15 16:03 Florian Assigned To => Florian
2021-05-15 16:03 Florian Status new => resolved
2021-05-15 16:03 Florian Resolution open => fixed
2021-05-15 16:03 Florian Note Added: 0130899
2021-05-15 17:42 J. Gareth Moreton Note Added: 0130901