View Issue Details

IDProjectCategoryView StatusLast Update
0034762FPCCompilerpublic2019-02-24 12:09
ReporterJ. Gareth Moreton Assigned ToFlorian  
PrioritylowSeverityminorReproducibilityN/A
Status resolvedResolutionfixed 
Platformx86_64-win64OSMicrosoft Windows 
Product Version3.3.1 
Target Version3.3.1Fixed in Version3.3.1 
Summary0034762: [Patch] Speed improvement in case blocks
DescriptionThis patch improves the compiler where "case" statements are concerne, using jump tables more often and creating more efficient machine code in some situations. The list of changes can be found under "Additional Information".
Steps To ReproduceApply patch, compile supplied test program to confirm correctness and to compare timings against the program when compiled against the trunk.
Additional Information- If a case block only contains one branch (not including the else block), the initial range check is removed, since this becomes wasted effort.
- If the else block is empty, the else label is set to the end label - though this doesn't decrease the code size, it takes a bit of strain off the peephole optimizer.
- On -O2 and above, some node analysis is now done on the branch labels. Most of the time this just redirects it to the end label for empty blocks, but if the block contains a goto statement, it will redirect it to its destination instead, thus increasing performance by not having multiple jumps (this won't get picked up by the peephole optimiser if the label addresses are in a jump table).
- Some checks now use what I call the 'true count' rather than the 'label count'. The true count includes each individual value in a range - for example, 0..2 counts as 3. This increases the chance that a jump table will be utilised in situations where it is more efficient than a linear list.
- For jump tables, if the case block almost covers the entire range (32 entries or fewer from full coverage), the initial range check is removed and the gaps included in the jump table (pointing to the else label).
Tags64-bit, compiler, node, patch, x86, x86_64-win64
Fixed in Revision40676
FPCOldBugId0
FPCTarget
Attached Files

Relationships

parent of 0034779 closedMarco van de Voort FPC trunk cannot build Lazarus any more 
parent of 0034783 closed [Test] New tests for bcase.pp 
parent of 0034859 new [Patch] Speed boosts to linear list case blocks 
related to 0034776 closed char by char copying of string broken 
related to 0034818 closed out of memory when compiling with -Os1 
Not all the children of this issue are yet resolved or closed.

Activities

J. Gareth Moreton

2018-12-26 01:21

developer  

case-improvement.patch (25,250 bytes)   
Index: compiler/ncgset.pas
===================================================================
--- compiler/ncgset.pas	(revision 40548)
+++ compiler/ncgset.pas	(working copy)
@@ -73,7 +73,14 @@
           jumptable_no_range : boolean;
           { has the implementation jumptable support }
           min_label : tconstexprint;
+          { Number of labels }
+          labelcnt: TCgInt;
+          { Number of individual values checked, counting each value in a range
+            individually (e.g. 0..2 counts as 3). }
+          TrueCount: TCgInt;
 
+          function GetBranchLabel(Block: TNode; out _Label: TAsmLabel): Boolean;
+
           function  blocklabel(id:longint):tasmlabel;
           procedure optimizevalues(var max_linear_list:aint;var max_dist:aword);virtual;
           function  has_jumptable : boolean;virtual;
@@ -90,9 +97,9 @@
 
     uses
       verbose,
-      symconst,symdef,defutil,
+      symconst,symdef,symsym,defutil,
       pass_2,tgobj,
-      ncon,
+      nbas,ncon,ncgflw,
       ncgutil,hlcgobj;
 
 
@@ -524,6 +531,79 @@
                             TCGCASENODE
 *****************************************************************************}
 
+
+    { Analyse the nodes following the else label - if empty, change to end label }
+    function tcgcasenode.GetBranchLabel(Block: TNode; out _Label: TAsmLabel): Boolean;
+      var
+        LabelSym: TLabelSym;
+      begin
+        Result := True;
+
+        if not Assigned(Block) then
+          begin
+            { Block doesn't exist / is empty }
+            _Label := endlabel;
+            Exit;
+          end;
+
+        { These optimisations aren't particularly debugger friendly }
+        if not (cs_opt_level2 in current_settings.optimizerswitches) then
+          begin
+            Result := False;
+            current_asmdata.getjumplabel(_Label);
+            Exit;
+          end;
+
+        while Assigned(Block) do
+          begin
+            case Block.nodetype of
+              nothingn:
+                begin
+                  _Label := endlabel;
+                  Exit;
+                end;
+              goton:
+                begin
+                  LabelSym := TCGGotoNode(Block).labelsym;
+                  if not Assigned(LabelSym) then
+                    InternalError(2018121131);
+
+                  _Label := TCGLabelNode(TCGGotoNode(Block).labelnode).getasmlabel;
+                  if Assigned(_Label) then
+                    { Keep tabs on the fact that an actual 'goto' was used }
+                    Include(flowcontrol,fc_gotolabel)
+                  else
+                    Break;
+                  Exit;
+                end;
+              blockn:
+                begin
+                  Block := TBlockNode(Block).Left;
+                  Continue;
+                end;
+              statementn:
+                begin
+                  { If the right node is assigned, then it's a compound block
+                    that can't be simplified, so fall through, set Result to
+                    False and make a new label }
+
+                  if Assigned(TStatementNode(Block).right) then
+                    Break;
+
+                  Block := TStatementNode(Block).Left;
+                  Continue;
+                end;
+            end;
+
+            Break;
+          end;
+
+        { Create unique label }
+        Result := False;
+        current_asmdata.getjumplabel(_Label);
+      end;
+
+
     function tcgcasenode.blocklabel(id:longint):tasmlabel;
       begin
         if not assigned(blocks[id]) then
@@ -642,7 +722,7 @@
                   opsize:=newdef;
                 end;
               last:=0;
-              first:=true;
+              first:=(labelcnt > 1); { Can greatly simplify the range checks if there's only one label }
               scratch_reg:=hlcg.getintregister(current_asmdata.CurrAsmList,opsize);
               genitem(hp);
               hlcg.a_jmp_always(current_asmdata.CurrAsmList,elselabel);
@@ -1043,15 +1123,28 @@
       end;
 
     procedure tcgcasenode.pass_generate_code;
+
+      { Combines "case_count_labels" and "case_true_count" }
+      procedure CountBoth(p : pcaselabel);
+        begin
+          Inc(labelcnt);
+          Inc(TrueCount, (p^._high.svalue - p^._low.svalue) + 1);
+          if assigned(p^.less) then
+            CountBoth(p^.less);
+          if assigned(p^.greater) then
+            CountBoth(p^.greater);
+        end;
+
       var
          oldflowcontrol: tflowcontrol;
          i : longint;
-         dist,distv,
+         dist : aword;
+         distv,
          lv,hv,
          max_label: tconstexprint;
-         labelcnt : tcgint;
          max_linear_list : aint;
          max_dist : aword;
+         ShortcutElse: Boolean;
       begin
          location_reset(location,LOC_VOID,OS_NO);
 
@@ -1058,10 +1151,15 @@
          oldflowcontrol := flowcontrol;
          include(flowcontrol,fc_inflowcontrol);
          { Allocate labels }
+
          current_asmdata.getjumplabel(endlabel);
-         current_asmdata.getjumplabel(elselabel);
+
+         { Do some optimisation to deal with empty else blocks }
+         ShortcutElse := GetBranchLabel(elseblock, elselabel);
+
          for i:=0 to blocks.count-1 do
-           current_asmdata.getjumplabel(pcaseblock(blocks[i])^.blocklabel);
+           with pcaseblock(blocks[i])^ do
+             shortcut := GetBranchLabel(statement, blocklabel);
 
          with_sign:=is_signed(left.resultdef);
          if with_sign then
@@ -1118,8 +1216,13 @@
                    { moreover can the size only be appro- }
                    { ximated as it is not known if rel8,  }
                    { rel16 or rel32 jumps are used   }
-                   max_label:=case_get_max(labels);
-                   labelcnt:=case_count_labels(labels);
+                   labelcnt := 0;
+                   TrueCount := 0;
+
+                   CountBoth(labels);
+
+                   max_label := case_get_max(labels);
+
                    { can we omit the range check of the jump table ? }
                    getrange(left.resultdef,lv,hv);
                    jumptable_no_range:=(lv=min_label) and (hv=max_label);
@@ -1128,7 +1231,7 @@
                    if distv>=0 then
                      dist:=distv.uvalue
                    else
-                     dist:=-distv.svalue;
+                     dist:=aword(-distv.svalue);
 
                    { optimize for size ? }
                    if cs_opt_size in current_settings.optimizerswitches  then
@@ -1137,8 +1240,8 @@
                           (min_label>=int64(low(aint))) and
                           (max_label<=high(aint)) and
                           not((labelcnt<=2) or
-                              ((max_label-min_label)<0) or
-                              ((max_label-min_label)>3*labelcnt)) then
+                              (distv.svalue<0) or
+                              (dist>3*TrueCount)) then
                          begin
                            { if the labels less or more a continuum then }
                            genjumptable(labels,min_label.svalue,max_label.svalue);
@@ -1151,7 +1254,7 @@
                      end
                    else
                      begin
-                        max_dist:=4*labelcnt;
+                        max_dist := 4 * TrueCount;
                         if jumptable_no_range then
                           max_linear_list:=4
                         else
@@ -1187,26 +1290,37 @@
            end;
 
          { generate the instruction blocks }
-         for i:=0 to blocks.count-1 do
+         for i:=0 to blocks.count-1 do with pcaseblock(blocks[i])^ do
            begin
-              current_asmdata.CurrAsmList.concat(cai_align.create(current_settings.alignment.jumpalign));
-              cg.a_label(current_asmdata.CurrAsmList,pcaseblock(blocks[i])^.blocklabel);
-              secondpass(pcaseblock(blocks[i])^.statement);
-              { don't come back to case line }
-              current_filepos:=current_asmdata.CurrAsmList.getlasttaifilepos^;
+             { If the labels are not equal, then the block label has been shortcut to point elsewhere,
+               so there's no need to implement it }
+             if not shortcut then
+               begin
+                 current_asmdata.CurrAsmList.concat(cai_align.create(current_settings.alignment.jumpalign));
+                 cg.a_label(current_asmdata.CurrAsmList,blocklabel);
+                 secondpass(statement);
+                 { don't come back to case line }
+                 current_filepos:=current_asmdata.CurrAsmList.getlasttaifilepos^;
 {$ifdef OLDREGVARS}
-              load_all_regvars(current_asmdata.CurrAsmList);
+                 load_all_regvars(current_asmdata.CurrAsmList);
 {$endif OLDREGVARS}
-              hlcg.a_jmp_always(current_asmdata.CurrAsmList,endlabel);
+                 hlcg.a_jmp_always(current_asmdata.CurrAsmList,endlabel);
+               end;
            end;
-         current_asmdata.CurrAsmList.concat(cai_align.create(current_settings.alignment.jumpalign));
+
          { ...and the else block }
-         hlcg.a_label(current_asmdata.CurrAsmList,elselabel);
-         if assigned(elseblock) then
+         if not ShortcutElse then
            begin
-              secondpass(elseblock);
+             current_asmdata.CurrAsmList.concat(cai_align.create(current_settings.alignment.jumpalign));
+             hlcg.a_label(current_asmdata.CurrAsmList,elselabel);
+           end;
+
+         if Assigned(elseblock) then
+           begin
+
+             secondpass(elseblock);
 {$ifdef OLDREGVARS}
-              load_all_regvars(current_asmdata.CurrAsmList);
+             load_all_regvars(current_asmdata.CurrAsmList);
 {$endif OLDREGVARS}
            end;
 
Index: compiler/nset.pas
===================================================================
--- compiler/nset.pas	(revision 40548)
+++ compiler/nset.pas	(working copy)
@@ -62,6 +62,12 @@
           { label (only used in pass_generate_code) }
           blocklabel : tasmlabel;
 
+          { shortcut - set to true if blocklabel isn't actually unique to the
+            case block due to one of the following conditions:
+            - if the node contains a jump, then the label is set to that jump's destination,
+            - if the node is empty, the label is set to the end label. }
+          shortcut: Boolean;
+
           statementlabel : tlabelnode;
           { instructions }
           statement  : tnode;
@@ -121,6 +127,9 @@
 
     { counts the labels }
     function case_count_labels(root : pcaselabel) : longint;
+    { Returns the true count in a case block, which includes each individual
+      value in a range (e.g. "0..2" counts as 3) }
+    function case_true_count(root : pcaselabel) : longint;
     { searches the highest label }
     function case_get_max(root : pcaselabel) : tconstexprint;
     { searches the lowest label }
@@ -439,6 +448,29 @@
       end;
 
 
+    { Returns the true count in a case block, which includes each individual
+      value in a range (e.g. "0..2" counts as 3) }
+    function case_true_count(root : pcaselabel) : longint;
+      var
+         _l : longint;
+
+      procedure count(p : pcaselabel);
+        begin
+           inc(_l, (p^._high.svalue - p^._low.svalue) + 1);
+           if assigned(p^.less) then
+             count(p^.less);
+           if assigned(p^.greater) then
+             count(p^.greater);
+        end;
+
+      begin
+        _l:=0;
+        count(root);
+        case_true_count:=_l;
+      end;
+
+
+
     function case_get_max(root : pcaselabel) : tconstexprint;
       var
          hp : pcaselabel;
Index: compiler/x86/nx86set.pas
===================================================================
--- compiler/x86/nx86set.pas	(revision 40548)
+++ compiler/x86/nx86set.pas	(working copy)
@@ -47,7 +47,7 @@
     uses
       systems,
       verbose,globals,
-      symconst,symdef,defutil,
+      symconst,symdef,defutil,cutils,
       aasmbase,aasmtai,aasmdata,aasmcpu,
       cgbase,pass_2,tgobj,
       ncon,
@@ -76,7 +76,13 @@
         opcgsize: tcgsize;
         jumpreg: tregister;
         labeltyp: taiconst_type;
+        AlmostExhaustive: Boolean;
+        lv, hv: TConstExprInt;
+        ExhaustiveLimit, Range, x, oldmin : aint;
 
+      const
+        ExhaustiveLimitBase = 32;
+
         procedure genitem(list:TAsmList;t : pcaselabel);
           var
             i : aint;
@@ -83,6 +89,7 @@
           begin
             if assigned(t^.less) then
               genitem(list,t^.less);
+
             { fill possible hole }
             i:=last.svalue+1;
             while i<=t^._low.svalue-1 do
@@ -106,16 +113,44 @@
         { This generates near pointers on i8086 }
         labeltyp:=aitconst_ptr;
         opcgsize:=def_cgsize(opsize);
+
+        AlmostExhaustive := False;
+
         if not(jumptable_no_range) then
           begin
-             { a <= x <= b <-> unsigned(x-a) <= (b-a) }
-             cg.a_op_const_reg(current_asmdata.CurrAsmList,OP_SUB,opcgsize,aint(min_),hregister);
-             { case expr greater than max_ => goto elselabel }
-             cg.a_cmp_const_reg_label(current_asmdata.CurrAsmList,opcgsize,OC_A,aint(max_)-aint(min_),hregister,elselabel);
-             min_:=0;
-             { do not sign extend when we load the index register, as we applied an offset above }
-             opcgsize:=tcgsize2unsigned[opcgsize];
+
+            getrange(left.resultdef,lv,hv);
+            Range := aint(max_)-aint(min_);
+
+            if (cs_opt_size in current_settings.optimizerswitches) then
+              { Limit size of jump tables for small enumerations so they have
+                to be at least two-thirds full before being considered for the
+                "almost exhaustive" treatment }
+              ExhaustiveLimit := min(ExhaustiveLimitBase, TrueCount shl 1)
+            else
+              ExhaustiveLimit := ExhaustiveLimitBase;
+
+            { If true, then this indicates that almost every possible value of x is covered by
+              a label.  As such, it's more cost-efficient to remove the initial range check and
+              instead insert the remaining values into the jump table, pointing at elselabel. [Kit] }
+            if ((hv - lv) - Range <= ExhaustiveLimit) then
+              begin
+                oldmin := min_;
+                min_ := lv.svalue;
+                AlmostExhaustive := True;
+              end
+            else
+              begin
+                { a <= x <= b <-> unsigned(x-a) <= (b-a) }
+                cg.a_op_const_reg(current_asmdata.CurrAsmList,OP_SUB,opcgsize,aint(min_),hregister);
+                { case expr greater than max_ => goto elselabel }
+                cg.a_cmp_const_reg_label(current_asmdata.CurrAsmList,opcgsize,OC_A,aint(max_)-aint(min_),hregister,elselabel);
+                min_:=0;
+                { do not sign extend when we load the index register, as we applied an offset above }
+                opcgsize:=tcgsize2unsigned[opcgsize];
+              end;
           end;
+
         current_asmdata.getglobaldatalabel(table);
         { make it a 32bit register }
         indexreg:=cg.makeregsize(current_asmdata.CurrAsmList,hregister,OS_INT);
@@ -148,7 +183,32 @@
           jtlist:=current_procinfo.aktlocaldata;
         new_section(jtlist,sec_rodata,current_procinfo.procdef.mangledname,sizeof(aint));
         jtlist.concat(Tai_label.Create(table));
-        genitem(jtlist,hp);
+
+        if AlmostExhaustive then
+          begin
+            { Fill the table with the values below _min }
+            x := lv.svalue;
+            while x < oldmin do
+              begin
+                jtlist.concat(Tai_const.Create_type_sym(labeltyp, elselabel));
+                Inc(x);
+              end;
+
+            genitem(jtlist,hp);
+
+            { Fill the table with the values above _max }
+            { Subtracting one from hv and not adding 1 to max averts the risk of an overflow }
+            x := max_;
+            hv := hv - 1;
+            while x <= hv.svalue do
+              begin
+                jtlist.concat(Tai_const.Create_type_sym(labeltyp, elselabel));
+                Inc(x);
+              end;
+
+          end
+        else
+          genitem(jtlist,hp)
       end;
 
 
@@ -251,7 +311,7 @@
              begin
                 last:=0;
                 lastrange:=false;
-                first:=true;
+                first:=(labelcnt > 1); { Can greatly simplify the range checks if there's only one label }
                 genitem(hp);
                 cg.a_jmp_always(current_asmdata.CurrAsmList,elselabel);
              end;
Index: compiler/x86_64/nx64set.pas
===================================================================
--- compiler/x86_64/nx64set.pas	(revision 40548)
+++ compiler/x86_64/nx64set.pas	(working copy)
@@ -26,6 +26,7 @@
 interface
 
     uses
+      constexp,
       globtype,
       nset,nx86set;
 
@@ -39,15 +40,21 @@
 implementation
 
     uses
-      systems,
-      verbose,globals,constexp,
-      defutil,
-      aasmbase,aasmtai,aasmdata,
+      systems,cpuinfo,
+      verbose,globals,
+      defutil,cutils,
+      aasmbase,aasmtai,aasmdata,aasmcpu,
       cgbase,
       cpubase,procinfo,
-      cga,cgutils,cgobj;
+      cga,cgutils,cgobj,cgx86;
 
 
+    function CreateCMOVInstr(cond : TAsmCond; size: TOpSize; reg1, reg2: TRegister): taicpu;
+      begin
+        Result := taicpu.op_reg_reg(A_CMOVcc, size, reg1, reg2);
+        Result.condition := cond;
+      end;
+
 {*****************************************************************************
                             TX8664CASENODE
 *****************************************************************************}
@@ -55,6 +62,7 @@
     procedure tx8664casenode.optimizevalues(var max_linear_list:aint;var max_dist:aword);
       begin
         inc(max_linear_list,9);
+        max_dist := min(max_dist,2048); { Don't allow jump tables to get too large }
       end;
 
 
@@ -66,32 +74,39 @@
         tablelabel: TAsmLabel;
         basereg,indexreg,jumpreg: TRegister;
         href: TReference;
+        jtlist: TAsmList;
         opcgsize: tcgsize;
         sectype: TAsmSectiontype;
         jtitemconsttype: taiconst_type;
+        AlmostExhaustive: Boolean;
+        lv, hv: TConstExprInt;
+        ExhaustiveLimit, Range, x, oldmin : aint;
 
-      procedure genitem(list:TAsmList;t : pcaselabel);
+      const
+        ExhaustiveLimitBase = 32;
+
+      procedure genitem(t : pcaselabel);
         var
           i : aint;
         begin
           if assigned(t^.less) then
-            genitem(list,t^.less);
+            genitem(t^.less);
           { fill possible hole }
           i:=last.svalue+1;
           while i<=t^._low.svalue-1 do
             begin
-              list.concat(Tai_const.Create_rel_sym(jtitemconsttype,tablelabel,elselabel));
+              jtlist.concat(Tai_const.Create_rel_sym(jtitemconsttype,tablelabel,elselabel));
               inc(i);
             end;
           i:=t^._low.svalue;
           while i<=t^._high.svalue do
             begin
-              list.concat(Tai_const.Create_rel_sym(jtitemconsttype,tablelabel,blocklabel(t^.blockid)));
+              jtlist.concat(Tai_const.Create_rel_sym(jtitemconsttype,tablelabel,blocklabel(t^.blockid)));
               inc(i);
             end;
           last:=t^._high;
           if assigned(t^.greater) then
-            genitem(list,t^.greater);
+            genitem(t^.greater);
         end;
 
       begin
@@ -101,38 +116,67 @@
           { see https://gmplib.org/list-archives/gmp-bugs/2012-December/002836.html }
           jtitemconsttype:=aitconst_darwin_dwarf_delta32;
 
+        jtlist := current_asmdata.CurrAsmList;
         last:=min_;
         opcgsize:=def_cgsize(opsize);
+
+        AlmostExhaustive := False;
+        oldmin := min_;
+
         if not(jumptable_no_range) then
           begin
-             { a <= x <= b <-> unsigned(x-a) <= (b-a) }
-             cg.a_op_const_reg(current_asmdata.CurrAsmList,OP_SUB,opcgsize,aint(min_),hregister);
-             { case expr greater than max_ => goto elselabel }
-             cg.a_cmp_const_reg_label(current_asmdata.CurrAsmList,opcgsize,OC_A,aint(max_)-aint(min_),hregister,elselabel);
-             min_:=0;
-             { do not sign extend when we load the index register, as we applied an offset above }
-             opcgsize:=tcgsize2unsigned[opcgsize];
+
+            getrange(left.resultdef,lv,hv);
+            Range := aint(max_)-aint(min_);
+
+            if (cs_opt_size in current_settings.optimizerswitches) then
+              { Limit size of jump tables for small enumerations so they have
+                to be at least two-thirds full before being considered for the
+                "almost exhaustive" treatment }
+              ExhaustiveLimit := min(ExhaustiveLimitBase, TrueCount shl 1)
+            else
+              ExhaustiveLimit := ExhaustiveLimitBase;
+
+            { If true, then this indicates that almost every possible value of x is covered by
+              a label.  As such, it's more cost-efficient to remove the initial range check and
+              instead insert the remaining values into the jump table, pointing at elselabel. [Kit] }
+            if ((hv - lv) - Range <= ExhaustiveLimit) then
+              begin
+                oldmin := min_;
+                min_ := lv.svalue;
+                AlmostExhaustive := True;
+              end
+            else
+              begin
+                { a <= x <= b <-> unsigned(x-a) <= (b-a) }
+                cg.a_op_const_reg(jtlist,OP_SUB,opcgsize,aint(min_),hregister);
+                { case expr greater than max_ => goto elselabel }
+                cg.a_cmp_const_reg_label(jtlist,opcgsize,OC_A,Range,hregister,elselabel);
+                min_:=0;
+                { do not sign extend when we load the index register, as we applied an offset above }
+                opcgsize:=tcgsize2unsigned[opcgsize];
+              end;
           end;
 
         { local label in order to avoid using GOT }
         current_asmdata.getlabel(tablelabel,alt_data);
-        indexreg:=cg.makeregsize(current_asmdata.CurrAsmList,hregister,OS_ADDR);
-        cg.a_load_reg_reg(current_asmdata.CurrAsmList,opcgsize,OS_ADDR,hregister,indexreg);
+        indexreg:=cg.makeregsize(jtlist,hregister,OS_ADDR);
+        cg.a_load_reg_reg(jtlist,opcgsize,OS_ADDR,hregister,indexreg);
         { load table address }
         reference_reset_symbol(href,tablelabel,0,4,[]);
-        basereg:=cg.getaddressregister(current_asmdata.CurrAsmList);
-        cg.a_loadaddr_ref_reg(current_asmdata.CurrAsmList,href,basereg);
+        basereg:=cg.getaddressregister(jtlist);
+        cg.a_loadaddr_ref_reg(jtlist,href,basereg);
         { load table slot, 32-bit sign extended }
         reference_reset_base(href,basereg,-aint(min_)*4,ctempposinvalid,4,[]);
         href.index:=indexreg;
         href.scalefactor:=4;
-        jumpreg:=cg.getaddressregister(current_asmdata.CurrAsmList);
-        cg.a_load_ref_reg(current_asmdata.CurrAsmList,OS_S32,OS_ADDR,href,jumpreg);
+        jumpreg:=cg.getaddressregister(jtlist);
+        cg.a_load_ref_reg(jtlist,OS_S32,OS_ADDR,href,jumpreg);
         { add table address }
         reference_reset_base(href,basereg,0,ctempposinvalid,sizeof(pint),[]);
         href.index:=jumpreg;
         href.scalefactor:=1;
-        cg.a_loadaddr_ref_reg(current_asmdata.CurrAsmList,href,jumpreg);
+        cg.a_loadaddr_ref_reg(jtlist,href,jumpreg);
         { and finally jump }
         emit_reg(A_JMP,S_NO,jumpreg);
         { generate jump table }
@@ -151,9 +195,36 @@
             is inserted right after the routine, it will become part of the
             same subsection that contains the routine's code }
           sectype:=sec_code;
-        new_section(current_procinfo.aktlocaldata,sectype,current_procinfo.procdef.mangledname,4);
-        current_procinfo.aktlocaldata.concat(Tai_label.Create(tablelabel));
-        genitem(current_procinfo.aktlocaldata,hp);
+
+        jtlist := current_procinfo.aktlocaldata;
+        new_section(jtlist,sectype,current_procinfo.procdef.mangledname,4);
+        jtlist.concat(Tai_label.Create(tablelabel));
+
+        if AlmostExhaustive then
+          begin
+            { Fill the table with the values below _min }
+            x := lv.svalue;
+            while x < oldmin do
+              begin
+                jtlist.concat(Tai_const.Create_rel_sym(jtitemconsttype,tablelabel,elselabel));
+                Inc(x);
+              end;
+
+            genitem(hp);
+
+            { Fill the table with the values above _max }
+            { Subtracting one from hv and not adding 1 to max_ averts the risk of an overflow }
+            x := max_;
+            hv := hv - 1;
+            while x <= hv.svalue do
+              begin
+                jtlist.concat(Tai_const.Create_rel_sym(jtitemconsttype,tablelabel,elselabel));
+                Inc(x);
+              end;
+
+          end
+        else
+          genitem(hp);
       end;
 
 begin
case-improvement.patch (25,250 bytes)   

J. Gareth Moreton

2018-12-26 01:22

developer  

CaseBranchTest.pp (78,315 bytes)

J. Gareth Moreton

2018-12-26 01:23

developer   ~0112882

Last edited: 2018-12-26 01:44

View 3 revisions

Sample test running of CaseBranchTest on x86_64-win64 (the improvements generally only affect the top 6 tests):

Trunk:

Case node compilation and timing test
-------------------------------------
            Byte domain, entirely covered; equal polling - Pass - average iteration duration: 12.966 ns
           Byte domain, entirely covered; first weighted - Pass - average iteration duration: 10.012 ns
            Byte domain, entirely covered; last weighted - Pass - average iteration duration: 19.281 ns
     Byte domain, almost entirely covered; equal polling - Pass - average iteration duration: 13.264 ns
    Byte domain, almost entirely covered; first weighted - Pass - average iteration duration: 10.252 ns
     Byte domain, almost entirely covered; last weighted - Pass - average iteration duration: 15.374 ns
     Single entry with default value; 1/256 match chance - Pass - average iteration duration: 4.235 ns
       Single entry with default value; 75% match chance - Pass - average iteration duration: 4.889 ns
       Single entry with default value; 25% match chance - Pass - average iteration duration: 4.824 ns
        Single entry with else block; 1/256 match chance - Pass - average iteration duration: 4.206 ns
          Single entry with else block; 75% match chance - Pass - average iteration duration: 4.497 ns
          Single entry with else block; 25% match chance - Pass - average iteration duration: 4.533 ns
      Two labels, one with extreme spread, equal polling - Pass - average iteration duration: 4.286 ns
    Two labels, one with extreme spread, 50% else chance - Pass - average iteration duration: 4.176 ns
   Two labels, sparse values, equal polling across range - Pass - average iteration duration: 3.543 ns
             Two labels, sparse values, always triggered - Pass - average iteration duration: 3.900 ns
         Domain of 1024, 12 sparse labels, equal polling - Pass - average iteration duration: 10.550 ns
  Domain of 1024, 12 sparse labels, 75% particular match - Pass - average iteration duration: 6.345 ns
    Domain of 1024, 12 sparse labels, 75% midpoint match - Pass - average iteration duration: 11.445 ns
         Domain of 1024, 39 sparse labels, equal polling - Pass - average iteration duration: 17.819 ns
  Domain of 1024, 39 sparse labels, 75% particular match - Pass - average iteration duration: 8.433 ns
    Domain of 1024, 39 sparse labels, 75% midpoint match - Pass - average iteration duration: 12.667 ns
         Domain of 1024, 71 sparse labels, equal polling - Pass - average iteration duration: 9.713 ns
  Domain of 1024, 71 sparse labels, 75% particular match - Pass - average iteration duration: 7.538 ns
    Domain of 1024, 71 sparse labels, 75% midpoint match - Pass - average iteration duration: 5.784 ns


Patch:

            Byte domain, entirely covered; equal polling - Pass - average iteration duration: 7.836 ns
           Byte domain, entirely covered; first weighted - Pass - average iteration duration: 8.760 ns
            Byte domain, entirely covered; last weighted - Pass - average iteration duration: 8.673 ns
     Byte domain, almost entirely covered; equal polling - Pass - average iteration duration: 7.480 ns
    Byte domain, almost entirely covered; first weighted - Pass - average iteration duration: 9.059 ns
     Byte domain, almost entirely covered; last weighted - Pass - average iteration duration: 9.117 ns
     Single entry with default value; 1/256 match chance - Pass - average iteration duration: 4.206 ns
       Single entry with default value; 75% match chance - Pass - average iteration duration: 4.882 ns
       Single entry with default value; 25% match chance - Pass - average iteration duration: 4.831 ns
        Single entry with else block; 1/256 match chance - Pass - average iteration duration: 3.900 ns
          Single entry with else block; 75% match chance - Pass - average iteration duration: 4.533 ns
          Single entry with else block; 25% match chance - Pass - average iteration duration: 4.526 ns
      Two labels, one with extreme spread, equal polling - Pass - average iteration duration: 4.227 ns
    Two labels, one with extreme spread, 50% else chance - Pass - average iteration duration: 4.235 ns
   Two labels, sparse values, equal polling across range - Pass - average iteration duration: 3.303 ns
             Two labels, sparse values, always triggered - Pass - average iteration duration: 4.176 ns
         Domain of 1024, 12 sparse labels, equal polling - Pass - average iteration duration: 10.790 ns
  Domain of 1024, 12 sparse labels, 75% particular match - Pass - average iteration duration: 6.316 ns
    Domain of 1024, 12 sparse labels, 75% midpoint match - Pass - average iteration duration: 11.772 ns
         Domain of 1024, 39 sparse labels, equal polling - Pass - average iteration duration: 18.095 ns
  Domain of 1024, 39 sparse labels, 75% particular match - Pass - average iteration duration: 8.404 ns
    Domain of 1024, 39 sparse labels, 75% midpoint match - Pass - average iteration duration: 12.900 ns
         Domain of 1024, 71 sparse labels, equal polling - Pass - average iteration duration: 8.731 ns
  Domain of 1024, 71 sparse labels, 75% particular match - Pass - average iteration duration: 7.181 ns
    Domain of 1024, 71 sparse labels, 75% midpoint match - Pass - average iteration duration: 6.017 ns

J. Gareth Moreton

2018-12-26 01:24

developer   ~0112883

To give an example as to where these improvements take effect, the "taicpu.calcsize" method in "/x86/aasmcpu.pas" has a large case block that analyses the encoding of assembler commands - before, this only became a linear list because while there are loads of entries, the "labelcnt" value is still under 64 because a number of values are combined into a range. This is now changed to an efficient jump table, and furthermore, it is almost exhaustive because the input type is a Char, and only a handful of values redirect to the else block (which raises an internal error). As a result, the jump table is changed to contain all 256 possible values (1KB in size), with addresses to the else block where appropriate.

J. Gareth Moreton

2018-12-26 01:43

developer   ~0112884

Many of the changes are cross-platform, but the "almost exhaustive" checks are currently only available for i386 and x86_64, since jump tables are not cross-platform by themselves.

The CreateCMOVInstr() function that appears in the patch is an accidental leftover from my experimental binary search code and can be safely removed.

Florian

2018-12-27 15:09

administrator   ~0112911

Here:

Old:

Domain of 1024, 39 sparse labels, equal polling - Pass - average iteration duration: 3.725 ns

New:

Domain of 1024, 39 sparse labels, equal polling - Pass - average iteration duration: 7.218 ns

(i7-4770)

It does not generate a jump table anymore while it should in my opinion.

J. Gareth Moreton

2018-12-27 18:01

developer   ~0112916

I'll have a check on that one. Thanks Florian.

Florian

2018-12-27 18:33

administrator   ~0112917

Thanks, I comitted it (slighly modified). I think the "regression" mentioned shouldn't use a jump table, the case is simply too large.

Issue History

Date Modified Username Field Change
2018-12-26 01:21 J. Gareth Moreton New Issue
2018-12-26 01:21 J. Gareth Moreton File Added: case-improvement.patch
2018-12-26 01:22 J. Gareth Moreton File Added: CaseBranchTest.pp
2018-12-26 01:23 J. Gareth Moreton Note Added: 0112882
2018-12-26 01:23 J. Gareth Moreton Note Edited: 0112882 View Revisions
2018-12-26 01:24 J. Gareth Moreton Note Added: 0112883
2018-12-26 01:37 J. Gareth Moreton Tag Attached: 64-bit
2018-12-26 01:37 J. Gareth Moreton Tag Attached: compiler
2018-12-26 01:37 J. Gareth Moreton Tag Attached: node
2018-12-26 01:37 J. Gareth Moreton Tag Attached: patch
2018-12-26 01:37 J. Gareth Moreton Tag Attached: x86
2018-12-26 01:37 J. Gareth Moreton Tag Attached: x86_64-win64
2018-12-26 01:43 J. Gareth Moreton Note Added: 0112884
2018-12-26 01:44 J. Gareth Moreton Note Edited: 0112882 View Revisions
2018-12-26 01:47 J. Gareth Moreton Priority normal => low
2018-12-27 15:09 Florian Note Added: 0112911
2018-12-27 18:01 J. Gareth Moreton Note Added: 0112916
2018-12-27 18:33 Florian Fixed in Revision => 40676
2018-12-27 18:33 Florian Note Added: 0112917
2018-12-27 18:33 Florian Status new => resolved
2018-12-27 18:33 Florian Fixed in Version => 3.3.1
2018-12-27 18:33 Florian Resolution open => fixed
2018-12-27 18:33 Florian Assigned To => Florian
2018-12-30 03:03 J. Gareth Moreton Relationship added parent of 0034779
2018-12-30 23:02 J. Gareth Moreton Relationship added parent of 0034783
2018-12-31 09:22 Jonas Maebe Relationship added related to 0034776
2019-01-14 20:45 Florian Relationship added related to 0034818
2019-01-14 21:39 J. Gareth Moreton Relationship added parent of 0034859