View Issue Details

IDProjectCategoryView StatusLast Update
0038908FPCCompilerpublic2021-05-20 20:46
ReporterJ. Gareth Moreton Assigned ToFlorian  
PrioritylowSeverityminorReproducibilityN/A
Status resolvedResolutionfixed 
Platformi386 and x86_64OSMicrosoft Windows 
Product Version3.3.1 
Fixed in Version3.3.1 
Summary0038908: [Patch] TEST chain shortcutting
DescriptionOccasionally, you get assembly language where "test %reg,%reg" is called, followed by a conditional branch, only at the destination label, the same register is tested again with a branch that is often the exact same condition. This patch provides an optimisation that shortcuts such checks so the control flow jumps to the last label in the chain.
Steps To ReproduceApply patch and confirm correct compilation.
Additional InformationIt can couple quite well with 0038907 and other optimisations if the intermediate label becomes dead as a result - for example, in SysUtils before:

    movq (%rdi),%rax
    testq %rax,%rax
    je .Lj6927
    movq -8(%rax),%rax
.Lj6927:
    testq %rax,%rax
    je .Lj6929

After:

    movq (%rdi),%rax
    testq %rax,%rax <-- (0038907 doesn't get applied here because %rax is used in the reference below)
# Peephole Optimization: TEST/Jcc/Lbl/TEST/Jcc -> TEST/Jcc, redirecting first jump
    je .Lj6929
# Peephole Optimization: MOV/CMP -> CMP (memory check)
    cmpq $0,-8(%rax)
    je .Lj6929
Tagscompiler, i386, optimization, patch, x86, x86_64
Fixed in Revision49385
FPCOldBugId
FPCTarget-
Attached Files

Activities

J. Gareth Moreton

2021-05-19 07:39

developer   ~0130948

Just to reassure everyone, I have tested this on i386 and x86_64 with most configurations.

J. Gareth Moreton

2021-05-19 07:43

developer   ~0130950

test-shortcut.patch (5,677 bytes)   
Index: compiler/i386/aoptcpu.pas
===================================================================
--- compiler/i386/aoptcpu.pas	(revision 49374)
+++ compiler/i386/aoptcpu.pas	(working copy)
@@ -166,6 +166,8 @@
                 A_MOVSX,
                 A_MOVZX :
                   Result:=OptPass1Movx(p);
+                A_TEST:
+                  Result:=OptPass1Test(p);
                 A_PUSH:
                   begin
                     if (taicpu(p).opsize = S_W) and
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 49374)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -122,6 +122,7 @@
         function PrePeepholeOptSxx(var p : tai) : boolean;
         function PrePeepholeOptIMUL(var p : tai) : boolean;
 
+        function OptPass1Test(var p: tai): boolean;
         function OptPass1Add(var p: tai): boolean;
         function OptPass1AND(var p : tai) : boolean;
         function OptPass1_V_MOVAP(var p : tai) : boolean;
@@ -3218,6 +3219,74 @@
       end;
 
 
+    function TX86AsmOptimizer.OptPass1Test(var p: tai) : boolean;
+      var
+        hp1, p_label, p_dist, hp1_dist: tai;
+        JumpLabel, JumpLabel_dist: TAsmLabel;
+      begin
+        Result := False;
+        { Search for:
+            test  %reg,%reg
+            j(c1) @lbl1
+            ...
+          @lbl:
+            test %reg,%reg (same register)
+            j(c2) @lbl2
+
+          If c2 is a subset of c1, change to:
+            test  %reg,%reg
+            j(c1) @lbl2
+            (@lbl1 may become a dead label as a result)
+        }
+
+        if MatchOpType(taicpu(p), top_reg, top_reg) and
+          (taicpu(p).oper[0]^.reg = taicpu(p).oper[1]^.reg) and
+          GetNextInstruction(p, hp1) and
+          MatchInstruction(hp1, A_JCC, []) and
+          (taicpu(hp1).oper[0]^.typ = top_ref) then
+          begin
+            JumpLabel := TAsmLabel(taicpu(hp1).oper[0]^.ref^.symbol);
+            p_label := nil;
+            if Assigned(JumpLabel) then
+              p_label := getlabelwithsym(JumpLabel);
+
+            if Assigned(p_label) and
+              GetNextInstruction(p_label, p_dist) and
+              MatchInstruction(p_dist, A_TEST, []) and
+              { It's fine if the second test uses smaller sub-registers }
+              (taicpu(p_dist).opsize <= taicpu(p).opsize) and
+              MatchOpType(taicpu(p_dist), top_reg, top_reg) and
+              SuperRegistersEqual(taicpu(p_dist).oper[0]^.reg, taicpu(p).oper[0]^.reg) and
+              SuperRegistersEqual(taicpu(p_dist).oper[1]^.reg, taicpu(p).oper[1]^.reg) and
+              GetNextInstruction(p_dist, hp1_dist) and
+              MatchInstruction(hp1_dist, A_JCC, []) then
+              begin
+                JumpLabel_dist := TAsmLabel(taicpu(hp1_dist).oper[0]^.ref^.symbol);
+
+                if JumpLabel = JumpLabel_dist then
+                  { This is an infinite loop }
+                  Exit;
+
+                { Best optimisation when the second condition is a subset (or equal) to the first }
+                if condition_in(taicpu(hp1_dist).condition, taicpu(hp1).condition) then
+                  begin
+                    if Assigned(JumpLabel_dist) then
+                      JumpLabel_dist.IncRefs;
+
+                    if Assigned(JumpLabel) then
+                      JumpLabel.DecRefs;
+
+                    DebugMsg(SPeepholeOptimization + 'TEST/Jcc/@Lbl/TEST/Jcc -> TEST/Jcc, redirecting first jump', hp1);
+                    taicpu(hp1).loadref(0, taicpu(hp1_dist).oper[0]^.ref^);
+                    Result := True;
+                    Exit;
+                  end;
+              end;
+          end;
+
+      end;
+
+
     function TX86AsmOptimizer.OptPass1Add(var p : tai) : boolean;
       var
         hp1 : tai;
@@ -4297,6 +4466,7 @@
        var
          v: TCGInt;
          hp1, hp2: tai;
+         FirstMatch: Boolean;
        begin
          Result:=false;
 
@@ -4312,6 +4382,7 @@
                MatchInstruction(hp1,A_Jcc,A_SETcc,[]) then
                begin
                  hp2 := p;
+                 FirstMatch := True;
                  { When dealing with "cmp $0,%reg", only ZF and SF contain
                    anything meaningful once it's converted to "test %reg,%reg";
                    additionally, some jumps will always (or never) branch, so
@@ -4319,9 +4390,13 @@
                    comparison, optimising the conditions if possible.
                    Similarly with SETcc... those that are always set to 0 or 1
                    are changed to MOV instructions }
-                 while GetNextInstruction(hp2, hp1) and
-                   MatchInstruction(hp1,A_Jcc,A_SETcc,[]) do
+                 while FirstMatch or { Saves calling GetNextInstruction unnecessarily }
+                   (
+                     GetNextInstruction(hp2, hp1) and
+                     MatchInstruction(hp1,A_Jcc,A_SETcc,[])
+                   ) do
                    begin
+                     FirstMatch := False;
                      case taicpu(hp1).condition of
                        C_B, C_C, C_NAE, C_O:
                          { For B/NAE:
Index: compiler/x86_64/aoptcpu.pas
===================================================================
--- compiler/x86_64/aoptcpu.pas	(revision 49374)
+++ compiler/x86_64/aoptcpu.pas	(working copy)
@@ -145,6 +145,8 @@
                 A_XORPD,
                 A_PXOR:
                   Result:=OptPass1PXor(p);
+                A_TEST:
+                  Result:=OptPass1Test(p);
                 else
                   ;
               end;
test-shortcut.patch (5,677 bytes)   

Florian

2021-05-20 20:46

administrator   ~0130978

Thanks, applied.

Issue History

Date Modified Username Field Change
2021-05-19 07:37 J. Gareth Moreton New Issue
2021-05-19 07:37 J. Gareth Moreton File Added: test-shortcut.patch
2021-05-19 07:38 J. Gareth Moreton Priority normal => low
2021-05-19 07:38 J. Gareth Moreton Additional Information Updated View Revisions
2021-05-19 07:38 J. Gareth Moreton FPCTarget => -
2021-05-19 07:39 J. Gareth Moreton Additional Information Updated View Revisions
2021-05-19 07:39 J. Gareth Moreton Note Added: 0130948
2021-05-19 07:39 J. Gareth Moreton Tag Attached: patch
2021-05-19 07:39 J. Gareth Moreton Tag Attached: compiler
2021-05-19 07:39 J. Gareth Moreton Tag Attached: optimization
2021-05-19 07:39 J. Gareth Moreton Tag Attached: i386
2021-05-19 07:39 J. Gareth Moreton Tag Attached: x86
2021-05-19 07:39 J. Gareth Moreton Tag Attached: x86_64
2021-05-19 07:41 J. Gareth Moreton Additional Information Updated View Revisions
2021-05-19 07:41 J. Gareth Moreton File Deleted: test-shortcut.patch
2021-05-19 07:43 J. Gareth Moreton Note Added: 0130950
2021-05-19 07:43 J. Gareth Moreton File Added: test-shortcut.patch
2021-05-20 20:46 Florian Assigned To => Florian
2021-05-20 20:46 Florian Status new => resolved
2021-05-20 20:46 Florian Resolution open => fixed
2021-05-20 20:46 Florian Fixed in Version => 3.3.1
2021-05-20 20:46 Florian Fixed in Revision => 49385
2021-05-20 20:46 Florian Note Added: 0130978