View Issue Details

IDProjectCategoryView StatusLast Update
0037390FPCCompilerpublic2020-08-09 20:56
ReporterJ. Gareth Moreton Assigned ToFlorian  
PrioritynormalSeverityminorReproducibilityN/A
Status resolvedResolutionfixed 
Platformi386 and x86_64OSMicrosoft Windows 
Product Version3.3.1 
Fixed in Version3.3.1 
Summary0037390: [Patch] Long-range MOV + MOVS/Z optimisation
DescriptionThere are some situations where a register is written to, and then it is zero- or sign-extended a few instructions later, without the previous value having been read or modified. Using GetNextInstructionUsingReg, this patch modifies these pairs so the MOV instruction is changed to the following MOVS/Z instruction (or kept as MOV if a constant is being written), and the MOVS/Z instruction removed. For example:

movw %cx,%dx
movq U_$IDECOMMANDS_$$_IDECOMMANDLIST(%rip),%rcx (%rcx is reused)
movzwl %dx,%edx

Becomes...

movzwl %cx,%edx
movq U_$IDECOMMANDS_$$_IDECOMMANDLIST(%rip),%rcx
Steps To ReproduceApply patch and confirm correct compilation.
Additional InformationWhen coupled with 0037389 (and can also help its potential false dependency problems), this single optimisation can sometimes permit a cascade of additional operations. For example, observed in the TDependencyGraphOptDialog.UpdateInfo method in the Lazarus source:

    movw %cx,%dx
.Ll1277:
    movq U_$IDECOMMANDS_$$_IDECOMMANDLIST(%rip),%rcx
    movzwl %dx,%edx
    movq U_$IDECOMMANDS_$$_IDECOMMANDLIST(%rip),%rax
    movq (%rax),%rax

Becomes:

# Peephole Optimization: MovMovs/z2Mov/s/z done
    movzwl %cx,%edx
# Peephole Optimization: MovMov2MovMov 2
.Ll1277:
    movq U_$IDECOMMANDS_$$_IDECOMMANDLIST(%rip),%rcx
# Peephole Optimization: Removed movs/z instruction and extended earlier write (MovMovs/z2Mov/s/z)
# Peephole Optimization: Mov2Nop 3 done
# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall (MovXXX2MovXXX)
    movq (%rcx),%rax

(5 instructions become 3)
Tagscompiler, i386, optimizations, patch, x86, x86_64
Fixed in Revision46346
FPCOldBugId
FPCTarget-
Attached Files

Activities

J. Gareth Moreton

2020-07-19 17:39

developer  

movsz-optimisation.patch (14,650 bytes)   
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 45802)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -2550,129 +2550,172 @@
           { we work with hp2 here, so hp1 can be still used later on when
             checking for GetNextInstruction_p }
           { GetNextInstructionUsingReg only searches one instruction ahead unless -O3 is specified }
-          GetNextInstructionUsingReg(hp1,hp2,taicpu(p).oper[1]^.reg) and
-          MatchInstruction(hp2,A_MOV,[]) and
-          MatchOperand(taicpu(p).oper[1]^,taicpu(hp2).oper[0]^) and
-          ((taicpu(p).oper[0]^.typ=top_const) or
-           ((taicpu(p).oper[0]^.typ=top_reg) and
-            not(RegUsedBetween(taicpu(p).oper[0]^.reg, p, hp2))
-           )
-          ) then
+          GetNextInstructionUsingReg(hp1,hp2,taicpu(p).oper[1]^.reg) then
           begin
-            { we have
-                mov x, %treg
-                mov %treg, y
-            }
 
-            TransferUsedRegs(TmpUsedRegs);
-            TmpUsedRegs[R_INTREGISTER].Update(tai(p.Next));
+            case taicpu(hp2).opcode of
+              A_MOV:
+                if MatchOperand(taicpu(hp2).oper[0]^,taicpu(p).oper[1]^.reg) and
+                  ((taicpu(p).oper[0]^.typ=top_const) or
+                   ((taicpu(p).oper[0]^.typ=top_reg) and
+                    not(RegUsedBetween(taicpu(p).oper[0]^.reg, p, hp2))
+                   )
+                  ) then
+                  begin
+                    { we have
+                        mov x, %treg
+                        mov %treg, y
+                    }
 
-            { We don't need to call UpdateUsedRegs for every instruction between
-              p and hp2 because the register we're concerned about will not
-              become deallocated (otherwise GetNextInstructionUsingReg would
-              have stopped at an earlier instruction). [Kit] }
+                    TransferUsedRegs(TmpUsedRegs);
+                    TmpUsedRegs[R_INTREGISTER].Update(tai(p.Next));
 
-            TempRegUsed :=
-              RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp2, TmpUsedRegs) or
-              RegReadByInstruction(taicpu(p).oper[1]^.reg, hp1);
+                    { We don't need to call UpdateUsedRegs for every instruction between
+                      p and hp2 because the register we're concerned about will not
+                      become deallocated (otherwise GetNextInstructionUsingReg would
+                      have stopped at an earlier instruction). [Kit] }
 
-            case taicpu(p).oper[0]^.typ Of
-              top_reg:
-                begin
-                  { change
-                      mov %reg, %treg
-                      mov %treg, y
+                    TempRegUsed :=
+                      RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp2, TmpUsedRegs) or
+                      RegReadByInstruction(taicpu(p).oper[1]^.reg, hp1);
 
-                      to
+                    case taicpu(p).oper[0]^.typ Of
+                      top_reg:
+                        begin
+                          { change
+                              mov %reg, %treg
+                              mov %treg, y
 
-                      mov %reg, y
-                  }
-                  CurrentReg := taicpu(p).oper[0]^.reg; { Saves on a handful of pointer dereferences }
-                  RegName1 := debug_regname(taicpu(hp2).oper[0]^.reg);
-                  if taicpu(hp2).oper[1]^.reg = CurrentReg then
-                    begin
-                      { %reg = y - remove hp2 completely (doing it here instead of relying on
-                        the "mov %reg,%reg" optimisation might cut down on a pass iteration) }
+                              to
 
-                      if TempRegUsed then
-                        begin
-                          DebugMsg(SPeepholeOptimization + debug_regname(CurrentReg) + ' = ' + RegName1 + '; removed unnecessary instruction (MovMov2MovNop 6b}',hp2);
-                          AllocRegBetween(CurrentReg, p, hp2, UsedRegs);
-                          asml.remove(hp2);
-                          hp2.Free;
-                        end
-                      else
-                        begin
-                          asml.remove(hp2);
-                          hp2.Free;
+                              mov %reg, y
+                          }
+                          CurrentReg := taicpu(p).oper[0]^.reg; { Saves on a handful of pointer dereferences }
+                          RegName1 := debug_regname(taicpu(hp2).oper[0]^.reg);
+                          if taicpu(hp2).oper[1]^.reg = CurrentReg then
+                            begin
+                              { %reg = y - remove hp2 completely (doing it here instead of relying on
+                                the "mov %reg,%reg" optimisation might cut down on a pass iteration) }
 
-                          { We can remove the original MOV too }
-                          DebugMsg(SPeepholeOptimization + 'MovMov2NopNop 6b done',p);
-                          RemoveCurrentP(p, hp1);
-                          Result:=true;
-                          Exit;
-                        end;
-                    end
-                  else
-                    begin
-                      AllocRegBetween(CurrentReg, p, hp2, UsedRegs);
-                      taicpu(hp2).loadReg(0, CurrentReg);
-                      if TempRegUsed then
-                        begin
-                          { Don't remove the first instruction if the temporary register is in use }
-                          DebugMsg(SPeepholeOptimization + RegName1 + ' = ' + debug_regname(CurrentReg) + '; changed to minimise pipeline stall (MovMov2Mov 6a}',hp2);
+                              if TempRegUsed then
+                                begin
+                                  DebugMsg(SPeepholeOptimization + debug_regname(CurrentReg) + ' = ' + RegName1 + '; removed unnecessary instruction (MovMov2MovNop 6b}',hp2);
+                                  AllocRegBetween(CurrentReg, p, hp2, UsedRegs);
+                                  asml.remove(hp2);
+                                  hp2.Free;
+                                end
+                              else
+                                begin
+                                  asml.remove(hp2);
+                                  hp2.Free;
 
-                          { No need to set Result to True. If there's another instruction later on
-                            that can be optimised, it will be detected when the main Pass 1 loop
-                            reaches what is now hp2 and passes it through OptPass1MOV. [Kit] };
-                        end
-                      else
-                        begin
-                          DebugMsg(SPeepholeOptimization + 'MovMov2Mov 6 done',p);
-                          RemoveCurrentP(p, hp1);
-                          Result:=true;
-                          Exit;
+                                  { We can remove the original MOV too }
+                                  DebugMsg(SPeepholeOptimization + 'MovMov2NopNop 6b done',p);
+                                  RemoveCurrentP(p, hp1);
+                                  Result:=true;
+                                  Exit;
+                                end;
+                            end
+                          else
+                            begin
+                              AllocRegBetween(CurrentReg, p, hp2, UsedRegs);
+                              taicpu(hp2).loadReg(0, CurrentReg);
+                              if TempRegUsed then
+                                begin
+                                  { Don't remove the first instruction if the temporary register is in use }
+                                  DebugMsg(SPeepholeOptimization + RegName1 + ' = ' + debug_regname(CurrentReg) + '; changed to minimise pipeline stall (MovMov2Mov 6a}',hp2);
+
+                                  { No need to set Result to True. If there's another instruction later on
+                                    that can be optimised, it will be detected when the main Pass 1 loop
+                                    reaches what is now hp2 and passes it through OptPass1MOV. [Kit] };
+                                end
+                              else
+                                begin
+                                  DebugMsg(SPeepholeOptimization + 'MovMov2Mov 6 done',p);
+                                  RemoveCurrentP(p, hp1);
+                                  Result:=true;
+                                  Exit;
+                                end;
+                            end;
                         end;
-                    end;
-                end;
-              top_const:
-                if not (cs_opt_size in current_settings.optimizerswitches) or (taicpu(hp2).opsize = S_B) then
-                  begin
-                    { change
-                        mov const, %treg
-                        mov %treg, y
+                      top_const:
+                        if not (cs_opt_size in current_settings.optimizerswitches) or (taicpu(hp2).opsize = S_B) then
+                          begin
+                            { change
+                                mov const, %treg
+                                mov %treg, y
 
-                        to
+                                to
 
-                        mov const, y
-                    }
-                    if (taicpu(hp2).oper[1]^.typ=top_reg) or
-                      ((taicpu(p).oper[0]^.val>=low(longint)) and (taicpu(p).oper[0]^.val<=high(longint))) then
-                      begin
-                        RegName1 := debug_regname(taicpu(hp2).oper[0]^.reg);
-                        taicpu(hp2).loadOper(0,taicpu(p).oper[0]^);
+                                mov const, y
+                            }
+                            if (taicpu(hp2).oper[1]^.typ=top_reg) or
+                              ((taicpu(p).oper[0]^.val>=low(longint)) and (taicpu(p).oper[0]^.val<=high(longint))) then
+                              begin
+                                RegName1 := debug_regname(taicpu(hp2).oper[0]^.reg);
+                                taicpu(hp2).loadOper(0,taicpu(p).oper[0]^);
 
-                        if TempRegUsed then
-                          begin
-                            { Don't remove the first instruction if the temporary register is in use }
-                            DebugMsg(SPeepholeOptimization + RegName1 + ' = ' + debug_tostr(taicpu(p).oper[0]^.val) + '; changed to minimise pipeline stall (MovMov2Mov 7a)',hp2);
+                                if TempRegUsed then
+                                  begin
+                                    { Don't remove the first instruction if the temporary register is in use }
+                                    DebugMsg(SPeepholeOptimization + RegName1 + ' = ' + debug_tostr(taicpu(p).oper[0]^.val) + '; changed to minimise pipeline stall (MovMov2Mov 7a)',hp2);
 
-                            { No need to set Result to True. If there's another instruction later on
-                              that can be optimised, it will be detected when the main Pass 1 loop
-                              reaches what is now hp2 and passes it through OptPass1MOV. [Kit] };
-                          end
+                                    { No need to set Result to True. If there's another instruction later on
+                                      that can be optimised, it will be detected when the main Pass 1 loop
+                                      reaches what is now hp2 and passes it through OptPass1MOV. [Kit] };
+                                  end
+                                else
+                                  begin
+                                    DebugMsg(SPeepholeOptimization + 'MovMov2Mov 7 done',p);
+                                    RemoveCurrentP(p, hp1);
+                                    Result:=true;
+                                    Exit;
+                                  end;
+                              end;
+                          end;
                         else
-                          begin
-                            DebugMsg(SPeepholeOptimization + 'MovMov2Mov 7 done',p);
-                            RemoveCurrentP(p, hp1);
-                            Result:=true;
-                            Exit;
-                          end;
+                          Internalerror(2019103001);
                       end;
                   end;
-                else
-                  Internalerror(2019103001);
-              end;
+              A_MOVZX, A_MOVSX{$ifdef x86_64}, A_MOVSXD{$endif x86_64}:
+                if MatchOpType(taicpu(hp2), top_reg, top_reg) and
+                  MatchOperand(taicpu(hp2).oper[0]^, taicpu(p).oper[1]^.reg) and
+                  SuperRegistersEqual(taicpu(hp2).oper[1]^.reg, taicpu(p).oper[1]^.reg) then
+                  begin
+                    {
+                      Change from:
+                        mov    ###, %reg
+                        ...
+                        movs/z %reg,%reg  (Same register, just different sizes)
+
+                      To:
+                        movs/z ###, %reg  (Longer version)
+                        ...
+                        (remove)
+                    }
+                    DebugMsg(SPeepholeOptimization + 'MovMovs/z2Mov/s/z done', p);
+                    taicpu(p).oper[1]^.reg := taicpu(hp2).oper[1]^.reg;
+
+                    { Keep the first instruction as mov if ### is a constant }
+                    if taicpu(p).oper[0]^.typ = top_const then
+                      taicpu(p).opsize := reg2opsize(taicpu(hp2).oper[1]^.reg)
+                    else
+                      begin
+                        taicpu(p).opcode := taicpu(hp2).opcode;
+                        taicpu(p).opsize := taicpu(hp2).opsize;
+                      end;
+
+                    DebugMsg(SPeepholeOptimization + 'Removed movs/z instruction and extended earlier write (MovMovs/z2Mov/s/z)', hp2);
+                    AllocRegBetween(taicpu(hp2).oper[1]^.reg, p, hp2, UsedRegs);
+                    AsmL.Remove(hp2);
+                    hp2.Free;
+
+                    Result := True;
+                    Exit;
+                  end;
+              else
+                ;
+            end;
           end;
 
         if (aoc_MovAnd2Mov_3 in OptsToCheck) and
movsz-optimisation.patch (14,650 bytes)   

J. Gareth Moreton

2020-07-19 17:40

developer   ~0124171

i386-win32 and x86_64-win64 regression tests have passed for the combined patches of this issue and 0037389

Issue History

Date Modified Username Field Change
2020-07-19 17:39 J. Gareth Moreton New Issue
2020-07-19 17:39 J. Gareth Moreton File Added: movsz-optimisation.patch
2020-07-19 17:40 J. Gareth Moreton Note Added: 0124171
2020-07-19 17:40 J. Gareth Moreton Tag Attached: patch
2020-07-19 17:40 J. Gareth Moreton Tag Attached: compiler
2020-07-19 17:40 J. Gareth Moreton Tag Attached: optimizations
2020-07-19 17:40 J. Gareth Moreton Tag Attached: x86_64
2020-07-19 17:40 J. Gareth Moreton Tag Attached: x86
2020-07-19 17:40 J. Gareth Moreton Tag Attached: i386
2020-08-09 20:56 Florian Assigned To => Florian
2020-08-09 20:56 Florian Status new => resolved
2020-08-09 20:56 Florian Resolution open => fixed
2020-08-09 20:56 Florian Fixed in Version => 3.3.1
2020-08-09 20:56 Florian Fixed in Revision => 46346
2020-08-09 20:56 Florian FPCTarget => -