View Issue Details

IDProjectCategoryView StatusLast Update
0034859FPCCompilerpublic2021-04-15 03:53
ReporterJ. Gareth Moreton Assigned To 
PrioritynormalSeveritytweakReproducibilityN/A
Status newResolutionopen 
Platformx86_64-win64OSMicrosoft Windows 
Product Version3.3.1 
Target Version3.3.1 
Summary0034859: [Patch] Speed boosts to linear list case blocks
DescriptionThis patch makes a number of minor improvements to the compiler where linear lists are concerned with case blocks. Benchmarks on "tests/bench/bcase.pp" suggest a 5% speed increase (although this improvement is saturated by the jump tree and jump table tests, which are not affected by these changes).
Steps To ReproduceCompile "tests/bench/bcase.pp" with the trunk compiler, then apply the patch and compile the benchmark test again. Observe reduction in total time and also confirm satisfactory code quality.

(Note that the first six "Domain of 1024" tests utilise linear comparison lists due to the presence of negative values in the enumeration)
Additional InformationThe improvements to the linear list:

- The lower bound check and first subtraction are merged into a single subtraction operation, since they always contained the same value.

- Except for the last 4, single-value branches now have a "jl" or "jb" conditional jump to immediately go to the else block after each value (at the last 4, it's faster to just go through the last few comparisons and save the CPU from having to deal with extra branch predictions). For case blocks with a lot of single-value branches, this saves the program from stepping through each one in turn until it reaches a range or the end of the block.

- For linear lists and linear comparison lists, the compiler attempts to insert the last branch block first (it won't if the label has been shortcut), since this allows for the elimination of a jump and alignment hint (enough to cause "tests/bench/bcase.pp" to be 512 bytes smaller). To accommodate for this, the "shortcut" field has been removed from TCaseBlock and replaced with a "Flags" field that can hold more information (see "compiler/nset.pas").

- Some refactoring and optimisations in the genlinearlist routine for x86 so programs compile faster. Main foci are in minimising the number of comparisons done against variables of type TConstExprInt, and also attempting to save the peephole optimiser some work (e.g. directly inserting "test %reg, %reg" when it would otherwise insert "cmp 0, %reg"), since this reduces the number of function calls and increases the chance of reducing the number of optimiser passes on -O3.

- The "last branch first" policy on linear comparison lists is cross-platform, but currently written in a way that requires a platform-specific peephole optimiser to finish the job.
Tags64-bit, compiler, node, optimization, patch, x86, x86_64-win64
Fixed in Revision
FPCOldBugId0
FPCTarget
Attached Files

Relationships

child of 0034762 resolvedFlorian [Patch] Speed improvement in case blocks 

Activities

J. Gareth Moreton

2019-01-13 06:06

developer  

bcase-sample-run.txt (7,870 bytes)   
TRUNK
-----

Case node compilation and timing test
-------------------------------------
            Byte domain, entirely covered; equal polling - Pass - average iteration duration: 5.850 ns
           Byte domain, entirely covered; first weighted - Pass - average iteration duration: 5.937 ns
            Byte domain, entirely covered; last weighted - Pass - average iteration duration: 6.374 ns
     Byte domain, almost entirely covered; equal polling - Pass - average iteration duration: 5.442 ns
    Byte domain, almost entirely covered; first weighted - Pass - average iteration duration: 5.966 ns
     Byte domain, almost entirely covered; last weighted - Pass - average iteration duration: 5.995 ns
     Single entry with default value; 1/256 match chance - Pass - average iteration duration: 2.910 ns
       Single entry with default value; 75% match chance - Pass - average iteration duration: 3.289 ns
       Single entry with default value; 25% match chance - Pass - average iteration duration: 3.492 ns
        Single entry with else block; 1/256 match chance - Pass - average iteration duration: 3.405 ns
          Single entry with else block; 75% match chance - Pass - average iteration duration: 3.376 ns
          Single entry with else block; 25% match chance - Pass - average iteration duration: 3.434 ns
                     Single entry of "0:" and else block - Pass - average iteration duration: 3.376 ns
                Single entry of "-1:" with default value - Pass - average iteration duration: 3.347 ns
                    Single entry of "-4:" and else block - Pass - average iteration duration: 2.939 ns
                   Single entry of "0..5" and else block - Pass - average iteration duration: 2.998 ns
                  Single entry of "0..50" and else block - Pass - average iteration duration: 3.347 ns
                   Single entry of "1..5" and else block - Pass - average iteration duration: 3.289 ns
                  Single entry of "1..50" and else block - Pass - average iteration duration: 2.939 ns
                  Single entry of "-1..5" and else block - Pass - average iteration duration: 3.638 ns
                 Single entry of "-1..50" and else block - Pass - average iteration duration: 3.580 ns
      Two labels, one with extreme spread, equal polling - Pass - average iteration duration: 2.910 ns
    Two labels, one with extreme spread, 50% else chance - Pass - average iteration duration: 2.823 ns
   Two labels, sparse values, equal polling across range - Pass - average iteration duration: 2.939 ns
             Two labels, sparse values, always triggered - Pass - average iteration duration: 3.638 ns
         Domain of 1024, 12 sparse labels, equal polling - Pass - average iteration duration: 6.956 ns
  Domain of 1024, 12 sparse labels, 75% particular match - Pass - average iteration duration: 4.336 ns
    Domain of 1024, 12 sparse labels, 75% midpoint match - Pass - average iteration duration: 6.985 ns
         Domain of 1024, 39 sparse labels, equal polling - Pass - average iteration duration: 9.983 ns
  Domain of 1024, 39 sparse labels, 75% particular match - Pass - average iteration duration: 5.035 ns
    Domain of 1024, 39 sparse labels, 75% midpoint match - Pass - average iteration duration: 7.625 ns
         Domain of 1024, 71 sparse labels, equal polling - Pass - average iteration duration: 6.199 ns
  Domain of 1024, 71 sparse labels, 75% particular match - Pass - average iteration duration: 5.792 ns
    Domain of 1024, 71 sparse labels, 75% midpoint match - Pass - average iteration duration: 4.511 ns
                            Linear list depends on input - Pass - average iteration duration: 3.463 ns
                            C-style cascade using 'goto' - Pass - average iteration duration: 3.463 ns

ok
- Sum of average durations: 161.584 ns
- Overall average duration: 4.488 ns

NEW OPTIMISATIONS
------------------

Case node compilation and timing test
-------------------------------------
            Byte domain, entirely covered; equal polling - Pass - average iteration duration: 5.908 ns
           Byte domain, entirely covered; first weighted - Pass - average iteration duration: 5.704 ns
            Byte domain, entirely covered; last weighted - Pass - average iteration duration: 5.733 ns
     Byte domain, almost entirely covered; equal polling - Pass - average iteration duration: 5.821 ns
    Byte domain, almost entirely covered; first weighted - Pass - average iteration duration: 5.995 ns
     Byte domain, almost entirely covered; last weighted - Pass - average iteration duration: 5.588 ns
     Single entry with default value; 1/256 match chance - Pass - average iteration duration: 2.969 ns
       Single entry with default value; 75% match chance - Pass - average iteration duration: 2.939 ns
       Single entry with default value; 25% match chance - Pass - average iteration duration: 2.910 ns
        Single entry with else block; 1/256 match chance - Pass - average iteration duration: 2.881 ns
          Single entry with else block; 75% match chance - Pass - average iteration duration: 3.085 ns
          Single entry with else block; 25% match chance - Pass - average iteration duration: 3.260 ns
                     Single entry of "0:" and else block - Pass - average iteration duration: 2.969 ns
                Single entry of "-1:" with default value - Pass - average iteration duration: 2.939 ns
                    Single entry of "-4:" and else block - Pass - average iteration duration: 2.969 ns
                   Single entry of "0..5" and else block - Pass - average iteration duration: 3.201 ns
                  Single entry of "0..50" and else block - Pass - average iteration duration: 2.794 ns
                   Single entry of "1..5" and else block - Pass - average iteration duration: 2.969 ns
                  Single entry of "1..50" and else block - Pass - average iteration duration: 3.027 ns
                  Single entry of "-1..5" and else block - Pass - average iteration duration: 3.085 ns
                 Single entry of "-1..50" and else block - Pass - average iteration duration: 3.085 ns
      Two labels, one with extreme spread, equal polling - Pass - average iteration duration: 2.678 ns
    Two labels, one with extreme spread, 50% else chance - Pass - average iteration duration: 3.347 ns
   Two labels, sparse values, equal polling across range - Pass - average iteration duration: 2.939 ns
             Two labels, sparse values, always triggered - Pass - average iteration duration: 3.085 ns
         Domain of 1024, 12 sparse labels, equal polling - Pass - average iteration duration: 6.228 ns
  Domain of 1024, 12 sparse labels, 75% particular match - Pass - average iteration duration: 4.133 ns
    Domain of 1024, 12 sparse labels, 75% midpoint match - Pass - average iteration duration: 6.781 ns
         Domain of 1024, 39 sparse labels, equal polling - Pass - average iteration duration: 10.303 ns
  Domain of 1024, 39 sparse labels, 75% particular match - Pass - average iteration duration: 4.802 ns
    Domain of 1024, 39 sparse labels, 75% midpoint match - Pass - average iteration duration: 7.451 ns
         Domain of 1024, 71 sparse labels, equal polling - Pass - average iteration duration: 6.083 ns
  Domain of 1024, 71 sparse labels, 75% particular match - Pass - average iteration duration: 5.268 ns
    Domain of 1024, 71 sparse labels, 75% midpoint match - Pass - average iteration duration: 4.715 ns
                            Linear list depends on input - Pass - average iteration duration: 2.590 ns
                            C-style cascade using 'goto' - Pass - average iteration duration: 3.056 ns

ok
- Sum of average durations: 153.290 ns
- Overall average duration: 4.258 ns
bcase-sample-run.txt (7,870 bytes)   

J. Gareth Moreton

2019-03-08 15:50

developer  

more-case-improvements.patch (22,575 bytes)   
Index: compiler/ncgset.pas
===================================================================
--- compiler/ncgset.pas	(revision 41633)
+++ compiler/ncgset.pas	(working copy)
@@ -62,6 +62,7 @@
           procedure pass_generate_code;override;
 
         protected
+          min_result_def: TConstExprInt;
           with_sign : boolean;
           opsize : tdef;
           jmp_gt,jmp_lt,jmp_le : topcmp;
@@ -90,6 +91,8 @@
 
          procedure genjmptreeentry(p : pcaselabel;parentvalue : TConstExprInt); virtual;
          procedure genjmptree(root : pcaselabel); virtual;
+
+         procedure ProcessBlockLabel(Block: PCaseBlock); dynamic;
        end;
 
 
@@ -658,7 +661,7 @@
            if assigned(t^.less) then
              genitem(t^.less);
            { do we need to test the first value? }
-           if first and (t^._low>get_min_value(left.resultdef)) then
+           if first and (t^._low>min_result_def) then
              hlcg.a_cmp_const_reg_label(current_asmdata.CurrAsmList,opsize,jmp_lt,tcgint(t^._low.svalue),hregister,elselabel);
            if t^._low=t^._high then
              begin
@@ -679,8 +682,8 @@
                 { ELSE-label                                }
                 if first then
                   begin
-                     { have we to ajust the first value ? }
-                     if (t^._low>get_min_value(left.resultdef)) or (get_min_value(left.resultdef)<>0) then
+                     { do we have to adjust the first value ? }
+                     if (t^._low>min_result_def) or (min_result_def<>0) then
                        gensub(tcgint(t^._low.svalue));
                   end
                 else
@@ -752,6 +755,7 @@
       var
          last : TConstExprInt;
          lastwasrange: boolean;
+         lastBlockID: LongInt;
 
       procedure genitem(t : pcaselabel);
 
@@ -1032,15 +1036,33 @@
                 last:=t^._high;
                 lastwasrange := true;
              end;
+
+           LastBlockID := t^.blockid;
+
            if assigned(t^.greater) then
              genitem(t^.greater);
         end;
 
       begin
+         lastBlockID := -1;
          last:=0;
          lastwasrange:=false;
          genitem(hp);
          hlcg.a_jmp_always(current_asmdata.CurrAsmList,elselabel);
+
+         { The last block referenced by the jumps.  By inserting its
+           code first, we can possibly eliminate one of the jumps
+           completely via the peephole optimiser. [Kit] }
+         if (lastBlockID <> -1) and
+           (cs_opt_level1 in current_settings.optimizerswitches) and
+           { Don't do it if it's not going to help, since this may
+             cause other jumps to take on a longer form due to the
+             increased distance }
+           not (cbShortcut in PCaseBlock(Blocks[LastBlockID])^.Flags) then
+           begin
+             Include(PCaseBlock(Blocks[LastBlockID])^.Flags, cbNoAlign);
+             ProcessBlockLabel(PCaseBlock(Blocks[LastBlockID]));
+           end;
       end;
 
 
@@ -1145,6 +1167,33 @@
         genjmptreeentry(root,root^._high+10);
       end;
 
+    procedure tcgcasenode.ProcessBlockLabel(Block: PCaseBlock);
+      begin
+        with Block^ do
+        begin
+          { If cbShortcut is set, then the block label has been shortcut to
+            point elsewhere, so there's no need to implement it, while
+            cbProcessed indicates that the code for this block has already
+            been generated. }
+          if (Flags * [cbShortcut, cbProcessed]) = [] then
+            begin
+              if not (cbNoAlign in Flags) then
+                current_asmdata.CurrAsmList.concat(cai_align.create(current_settings.alignment.jumpalign));
+
+              cg.a_label(current_asmdata.CurrAsmList,blocklabel);
+
+              secondpass(statement);
+              { don't come back to case line }
+              current_filepos:=current_asmdata.CurrAsmList.getlasttaifilepos^;
+{$ifdef OLDREGVARS}
+              load_all_regvars(current_asmdata.CurrAsmList);
+{$endif OLDREGVARS}
+              hlcg.a_jmp_always(current_asmdata.CurrAsmList,endlabel);
+            end;
+          Include(Flags, cbProcessed);
+        end;
+      end;
+
     procedure tcgcasenode.pass_generate_code;
 
       { Combines "case_count_labels" and "case_true_count" }
@@ -1182,7 +1231,12 @@
 
          for i:=0 to blocks.count-1 do
            with pcaseblock(blocks[i])^ do
-             shortcut := GetBranchLabel(statement, blocklabel);
+             begin
+               if GetBranchLabel(statement, blocklabel) then
+                 Flags := [cbShortcut]
+               else
+                 Flags := [];
+             end;
 
          with_sign:=is_signed(left.resultdef);
          if with_sign then
@@ -1202,8 +1256,14 @@
          if (left.expectloc=LOC_JUMP)<>
             (left.location.loc=LOC_JUMP) then
            internalerror(2006050501);
+
          { determines the size of the operand }
          opsize:=left.resultdef;
+
+         { Value is frequently sought, so store it to reduce unnecessary
+           subroutine calls and object access }
+         min_result_def := get_min_value(opsize);
+
          { copy the case expression to a register }
          hlcg.location_force_reg(current_asmdata.CurrAsmList,left.location,left.resultdef,opsize,false);
 {$if not defined(cpu64bitalu)}
@@ -1319,23 +1379,8 @@
            end;
 
          { generate the instruction blocks }
-         for i:=0 to blocks.count-1 do with pcaseblock(blocks[i])^ do
-           begin
-             { If the labels are not equal, then the block label has been shortcut to point elsewhere,
-               so there's no need to implement it }
-             if not shortcut then
-               begin
-                 current_asmdata.CurrAsmList.concat(cai_align.create(current_settings.alignment.jumpalign));
-                 cg.a_label(current_asmdata.CurrAsmList,blocklabel);
-                 secondpass(statement);
-                 { don't come back to case line }
-                 current_filepos:=current_asmdata.CurrAsmList.getlasttaifilepos^;
-{$ifdef OLDREGVARS}
-                 load_all_regvars(current_asmdata.CurrAsmList);
-{$endif OLDREGVARS}
-                 hlcg.a_jmp_always(current_asmdata.CurrAsmList,endlabel);
-               end;
-           end;
+         for i:=0 to blocks.count-1 do
+           ProcessBlockLabel(pcaseblock(blocks[i]));
 
          { ...and the else block }
          if not ShortcutElse then
@@ -1357,8 +1402,11 @@
          hlcg.a_label(current_asmdata.CurrAsmList,endlabel);
 
          { Reset labels }
-         for i:=0 to blocks.count-1 do
-           pcaseblock(blocks[i])^.blocklabel:=nil;
+         for i:=0 to blocks.count-1 do with pcaseblock(blocks[i])^ do
+           begin
+             blocklabel := nil;
+             Flags := [];
+           end;
          flowcontrol := oldflowcontrol + (flowcontrol - [fc_inflowcontrol]);
       end;
 
Index: compiler/nset.pas
===================================================================
--- compiler/nset.pas	(revision 41633)
+++ compiler/nset.pas	(working copy)
@@ -57,20 +57,34 @@
             );
        end;
 
+       TCaseBlockFlag = (cbShortcut, cbProcessed, cbNoAlign);
+       TCaseBlockFlags = set of TCaseBlockFlag;
+
        pcaseblock = ^tcaseblock;
        tcaseblock = record
           { label (only used in pass_generate_code) }
           blocklabel : tasmlabel;
 
-          { shortcut - set to true if blocklabel isn't actually unique to the
-            case block due to one of the following conditions:
-            - if the node contains a jump, then the label is set to that jump's destination,
-            - if the node is empty, the label is set to the end label. }
-          shortcut: Boolean;
-
           statementlabel : tlabelnode;
           { instructions }
           statement  : tnode;
+
+          { Flags:
+             cbShortcut:
+               Set if blocklabel isn't actually unique to the case block due
+               to one of the following conditions:
+               - if the node contains a jump, then the label is set to that jump's destination,
+               - if the node is empty, the label is set to the end label.
+
+             cbProcessed:
+               If set, this label has already been passed through the
+               "ProcessBlockLabel" method, thus preventing duplicate code from
+               being generated if it is passed into "ProcessBlockLabel" again.
+
+             cbNoAlign:
+               If set, an alignment hint is not inserted before the code block.
+          }
+          Flags: TCaseBlockFlags;
        end;
 
        tsetelementnode = class(tbinarynode)
Index: compiler/x86/nx86set.pas
===================================================================
--- compiler/x86/nx86set.pas	(revision 41633)
+++ compiler/x86/nx86set.pas	(working copy)
@@ -220,55 +220,131 @@
         first : boolean;
         lastrange : boolean;
         last : TConstExprInt;
-        cond_lt,cond_le : tresflags;
+        cond_lt,cond_le,cond_le_Opposite: TResFlags;
         opcgsize: tcgsize;
+        Countdown: TCgInt;
 
-        procedure genitem(t : pcaselabel);
+        function genitem(t : pcaselabel): Boolean;
           var
-             range, gap: aint;
+            range, gap, offset: aint;
+            Lbl: TAsmLabel;
+            ID: LongInt;
+            LowCond: Boolean;
           begin
-             if assigned(t^.less) then
-               genitem(t^.less);
-             { need we to test the first value }
-             if first and (t^._low>get_min_value(left.resultdef)) then
+            Result := False;
+            if assigned(t^.less) then
+              genitem(t^.less);
+
+            Dec(Countdown);
+
+            ID := t^.blockid;
+            Lbl := blocklabel(t^.blockid);
+            gap := aint(t^._low.svalue - last.svalue);
+            range := aint(t^._high.svalue - t^._low.svalue);
+
+             if range = 0 then
                begin
-                 cg.a_cmp_const_reg_label(current_asmdata.CurrAsmList,opcgsize,jmp_lt,aint(t^._low.svalue),hregister,elselabel);
-               end;
-             if t^._low=t^._high then
-               begin
-                  if t^._low-last=0 then
-                    cg.a_cmp_const_reg_label(current_asmdata.CurrAsmList, opcgsize, OC_EQ,0,hregister,blocklabel(t^.blockid))
-                  else
-                    begin
-                      cg.a_op_const_reg(current_asmdata.CurrAsmList, OP_SUB, opcgsize, aint(t^._low.svalue-last.svalue), hregister);
-                      cg.a_jmp_flags(current_asmdata.CurrAsmList,F_E,blocklabel(t^.blockid));
-                    end;
-                  last:=t^._low;
-                  lastrange:=false;
+
+                 if (Countdown = 0) and not (cbShortcut in PCaseBlock(Blocks[ID])^.Flags) then
+                   { Never true on optimisation level 0 }
+                   begin
+                     { If we insert the code for the last branch first, we
+                       can eliminate one of the jumps and alignment hints }
+                     cg.a_cmp_const_reg_label(current_asmdata.CurrAsmList, opcgsize, OC_NE, aint(t^._low.svalue-last.svalue), hregister, elselabel);
+                     Include(PCaseBlock(Blocks[ID])^.Flags, cbNoAlign);
+                     ProcessBlockLabel(PCaseBlock(Blocks[ID]));
+
+                     { Jump to else effectively handled }
+                     Result := True;
+                   end
+                 else
+                   begin
+                     if gap = 0 then { This is only possible if t^.low and last are both zero }
+                       begin
+                         { Note: a_op_const_reg will automatically remove "sub 0,%reg",
+                           so we have to use "cmp 0,%reg" here - however this is
+                           equivalent to "test %reg,%reg" even with the sign flag
+                           being correctly set in the result, so use that to save the
+                           peephole optimiser from having to do it. }
+                         emit_reg_reg(A_TEST, TCGSize2OpSize[opcgsize], hregister, hregister);
+                         if (t^._low > min_result_def) then
+                           cg.a_jmp_flags(current_asmdata.CurrAsmList, cond_lt, elselabel);
+
+                         cg.a_jmp_flags(current_asmdata.CurrAsmList, F_E, Lbl);
+                       end
+                     else
+                        begin
+                          { we need to use A_SUB, if jmp_le uses the carry flags
+                            because A_DEC does not set the correct flags, therefor
+                            using a_op_const_reg(OP_SUB) is not possible }
+                          if (jmp_le in [OC_AE,OC_BE]) and (gap = 1) then
+                            emit_const_reg(A_SUB,TCGSize2OpSize[opcgsize], gap, hregister)
+                          else
+                            cg.a_op_const_reg(current_asmdata.CurrAsmList, OP_SUB, opcgsize, gap, hregister);
+
+                          { If we're near the end of the case block, the CPU keeping
+                            track of the additional branches becomes cost-prohibitive
+                            compared to just doing some additional comparisons. [Kit] }
+                          if ((gap > 1) and (Countdown >= 4)) or
+                            (first and (t^._low > min_result_def))
+                          then
+                            cg.a_jmp_flags(current_asmdata.CurrAsmList, cond_lt, elselabel);
+
+                          cg.a_jmp_flags(current_asmdata.CurrAsmList,F_E, Lbl);
+                        end;
+                      last:=t^._low;
+                      lastrange:=false;
+                   end;
                end
              else
                begin
-                  range := aint(t^._high.svalue - t^._low.svalue);
+                 { need we to test the first value }
+
                   { it begins with the smallest label, if the value }
                   { is even smaller then jump immediately to the    }
                   { ELSE-label                                }
                   if first then
                     begin
-                       { have we to ajust the first value ? }
-                       if (t^._low>get_min_value(left.resultdef)) or (get_min_value(left.resultdef)<>0) then
-                         cg.a_op_const_reg(current_asmdata.CurrAsmList, OP_SUB, opcgsize, aint(t^._low.svalue), hregister);
+                      offset := range;
+
+                      { have we to ajust the first value ? }
+                      LowCond := (t^._low > min_result_def); { This condition can get expensive to check multiple times }
+                      if LowCond or (min_result_def <> 0) then
+                        begin
+                          if LowCond then
+                            begin
+                              { gap is equal to t^_.low, but it's faster to access }
+                              if gap = 0 then
+                                { "sub 0,%reg" gets stripped by a_op_const_reg, so use "test %reg,%reg" }
+                                emit_reg_reg(A_TEST, TCGSize2OpSize[opcgsize], hregister, hregister)
+                              else
+                                cg.a_op_const_reg(current_asmdata.CurrAsmList, OP_SUB, opcgsize, aint(t^._low.svalue), hregister);
+
+                              cg.a_jmp_flags(current_asmdata.CurrAsmList,cond_lt,elselabel);
+                            end
+                          else
+                            Inc(offset, gap);
+                        end;
+
+                      { we need to use A_SUB, if jmp_le uses the carry flags
+                      because A_DEC does not set the correct flags, therefor
+                      using a_op_const_reg(OP_SUB) is not possible }
+                      if (jmp_le in [OC_AE,OC_BE]) and (offset = 1) then
+                        emit_const_reg(A_SUB,TCGSize2OpSize[opcgsize], offset, hregister)
+                      else
+                        cg.a_op_const_reg(current_asmdata.CurrAsmList, OP_SUB, opcgsize, offset, hregister);
+
                     end
                   else
                     begin
-                      gap := aint(t^._low.svalue - last.svalue);
                       { if there is no unused label between the last and the }
                       { present label then the lower limit can be checked    }
                       { immediately. else check the range in between:       }
 
-                      { we need to use A_SUB, if cond_lt uses the carry flags
+                      { we need to use A_SUB, if jmp_lt uses the carry flags
                         because A_DEC does not set the correct flags, therefor
                         using a_op_const_reg(OP_SUB) is not possible }
-                      if (gap = 1) and (cond_lt in [F_C,F_NC,F_A,F_AE,F_B,F_BE]) then
+                      if (gap = 1) and (jmp_lt in [OC_A,OC_AE,OC_B,OC_BE]) then
                         emit_const_reg(A_SUB, TCGSize2OpSize[opcgsize], gap, hregister)
                       else
                         cg.a_op_const_reg(current_asmdata.CurrAsmList, OP_SUB, opcgsize, gap, hregister);
@@ -275,38 +351,65 @@
                       { no jump necessary here if the new range starts at
                         at the value following the previous one           }
                       if (gap <> 1) or
-                         (not lastrange) then
+                         (not lastrange and (Countdown < 3)) then
+                        { The "Countdown < 3" condition covers the fact that non-range
+                          branches stop having conditional jumps to else by this point.
+                          It has to equal 4 - 1 = 3 because 4 belongs to the previous
+                          branch, and doing "Countdown < 4" will cause an off-by-one
+                          error (in this case, it isn't fatal - it just inserts an
+                          unnecessary jump). [Kit] }
                         cg.a_jmp_flags(current_asmdata.CurrAsmList,cond_lt,elselabel);
+
+                        { we need to use A_SUB, if jmp_le uses the carry flags
+                        because A_DEC does not set the correct flags, therefor
+                        using a_op_const_reg(OP_SUB) is not possible }
+                        if (jmp_le in [OC_AE,OC_BE]) and (range = 1) then
+                          emit_const_reg(A_SUB,TCGSize2OpSize[opcgsize], range, hregister)
+                        else
+                          cg.a_op_const_reg(current_asmdata.CurrAsmList, OP_SUB, opcgsize, range, hregister);
+
                     end;
-                  { we need to use A_SUB, if cond_le uses the carry flags
-                    because A_DEC does not set the correct flags, therefor
-                    using a_op_const_reg(OP_SUB) is not possible }
-                  if (cond_le in [F_C,F_NC,F_A,F_AE,F_B,F_BE]) and (range = 1) then
-                    emit_const_reg(A_SUB,TCGSize2OpSize[opcgsize], range, hregister)
+
+                  if (Countdown = 0) and not (cbShortcut in PCaseBlock(Blocks[ID])^.Flags) then
+                    { Never true on optimisation level 0 }
+                    begin
+                      { If we insert the code for the last branch first, we
+                        can eliminate one of the jumps and alignment hints }
+
+                      cg.a_jmp_flags(current_asmdata.CurrAsmList, cond_le_Opposite, elselabel);
+                      Include(PCaseBlock(Blocks[ID])^.Flags, cbNoAlign);
+                      ProcessBlockLabel(PCaseBlock(Blocks[ID]));
+
+                      { The final jump to else has been effectively handled }
+                      Result := True;
+                    end
                   else
-                    cg.a_op_const_reg(current_asmdata.CurrAsmList, OP_SUB, opcgsize, range, hregister);
+                    cg.a_jmp_flags(current_asmdata.CurrAsmList,cond_le, Lbl);
 
-                  cg.a_jmp_flags(current_asmdata.CurrAsmList,cond_le,blocklabel(t^.blockid));
                   last:=t^._high;
                   lastrange:=true;
                end;
              first:=false;
+
              if assigned(t^.greater) then
-               genitem(t^.greater);
+               Result := genitem(t^.greater) or Result;
           end;
 
         begin
            opcgsize:=def_cgsize(opsize);
-           if with_sign then
+           if jmp_lt = OC_LT then
              begin
-                cond_lt:=F_L;
-                cond_le:=F_LE;
+               cond_lt := F_L;
+               cond_le := F_LE;
+               cond_le_Opposite := F_G;
              end
            else
-              begin
-                cond_lt:=F_B;
-                cond_le:=F_BE;
+             begin
+               cond_lt := F_B;
+               cond_le := F_BE;
+               cond_le_Opposite := F_A;
              end;
+
            { do we need to generate cmps? }
 {$ifdef i8086}
            if (with_sign and (min_label<0)) or (opcgsize in [OS_32, OS_S32]) then
@@ -318,10 +421,15 @@
              begin
                 if (labelcnt>1) or not(cs_opt_level1 in current_settings.optimizerswitches) then
                   begin
+                    Countdown := LabelCnt; { Will be zero on level 0 optimisations and then immediately go negative }
                     last:=0;
                     lastrange:=false;
                     first:=true;
-                    genitem(hp);
+
+                    { If Result is true, then the genitem stack has handled the
+                      jump to else }
+                    if not genitem(hp) then
+                      cg.a_jmp_always(current_asmdata.CurrAsmList, elselabel);
                   end
                 else
                   begin
@@ -333,8 +441,9 @@
                         cg.a_op_const_reg(current_asmdata.CurrAsmList, OP_SUB, opcgsize, tcgint(hp^._low.svalue), hregister);
                         cg.a_cmp_const_reg_label(current_asmdata.CurrAsmList, opcgsize, OC_BE, tcgint(hp^._high.svalue - hp^._low.svalue), hregister,blocklabel(hp^.blockid));
                       end;
+
+                    cg.a_jmp_always(current_asmdata.CurrAsmList,elselabel);
                   end;
-                cg.a_jmp_always(current_asmdata.CurrAsmList,elselabel);
              end;
         end;
 
more-case-improvements.patch (22,575 bytes)   

J. Gareth Moreton

2019-03-08 15:52

developer   ~0114713

Fixed some code that caused errors in cross-compilation - everything should build fine now.

Regression tests have also been run and have passed.

ravi dion

2021-04-14 16:58

reporter   ~0130376

Why no add?

J. Gareth Moreton

2021-04-15 03:53

developer   ~0130386

I'm not sure. I might have been overzealous in compiler efficiencyt or badly explained something ("the last 4") or overlooked something.

Issue History

Date Modified Username Field Change
2019-01-13 06:06 J. Gareth Moreton New Issue
2019-01-13 06:06 J. Gareth Moreton File Added: more-case-improvements.patch
2019-01-13 06:06 J. Gareth Moreton File Added: bcase-sample-run.txt
2019-01-13 06:07 J. Gareth Moreton Severity minor => tweak
2019-01-13 06:07 J. Gareth Moreton Steps to Reproduce Updated View Revisions
2019-01-13 06:44 J. Gareth Moreton Tag Attached: 64-bit
2019-01-13 06:44 J. Gareth Moreton Tag Attached: compiler
2019-01-13 06:44 J. Gareth Moreton Tag Attached: node
2019-01-13 06:44 J. Gareth Moreton Tag Attached: optimization
2019-01-13 06:44 J. Gareth Moreton Tag Attached: patch
2019-01-13 06:44 J. Gareth Moreton Tag Attached: x86
2019-01-13 06:44 J. Gareth Moreton Tag Attached: x86_64-win64
2019-01-13 07:33 J. Gareth Moreton Steps to Reproduce Updated View Revisions
2019-01-14 22:39 J. Gareth Moreton Relationship added child of 0034762
2019-03-08 15:50 J. Gareth Moreton File Deleted: more-case-improvements.patch
2019-03-08 15:50 J. Gareth Moreton File Added: more-case-improvements.patch
2019-03-08 15:52 J. Gareth Moreton Note Added: 0114713
2021-04-14 16:58 ravi dion Note Added: 0130376
2021-04-15 03:53 J. Gareth Moreton Note Added: 0130386