View Issue Details

IDProjectCategoryView StatusLast Update
0035479FPCCompilerpublic2019-11-04 11:49
ReporterJ. Gareth MoretonAssigned To 
PrioritylowSeveritytrivialReproducibilityN/A
Status newResolutionopen 
Platformi386 and x86_64OSMIcrosoft WindowsOS Version10 Professional
Product Version3.3.1Product Build 
Target VersionFixed in Version 
Summary0035479: [Refactor] Case node efficiency boost
DescriptionThis patch refactors some code related to jump table generation by removing some unnecessary variable initialisation and storing the lower and upper bounds of the domain (e.g. Low(Int64) and High(Int64)) so it does not have to call "getrange" multiple times. It also uses local variables for these values as much as possible in order to minimise memory fetches and cache misses (as well as increasing the chance of these values being stored in registers).
Steps To ReproduceCompile a test program, observing the time taken, then apply patch and repeat the operation, observing the time taken and confirming that the compiled binaries are byte-for-byte identical.
Additional InformationTimings for compiling Lazarus with -B -s options. It was difficult to get consistent results on my laptop, but I've done my best with 7 iterations apiece.

Case node refactor:

[85.289] 1298541 lines compiled, 85.3 sec
[85.574] 1298541 lines compiled, 85.6 sec
[82.023] 1298541 lines compiled, 82.0 sec
[81.059] 1298541 lines compiled, 81.1 sec
[82.406] 1298541 lines compiled, 82.4 sec
[81.391] 1298541 lines compiled, 81.4 sec
[80.035] 1298541 lines compiled, 80.0 sec

Mean: 82.540
Median: 82.023

Control (Trunk)

[89.441] 1298541 lines compiled, 89.4 sec
[84.145] 1298541 lines compiled, 84.1 sec
[84.832] 1298541 lines compiled, 84.8 sec
[92.023] 1298541 lines compiled, 92.0 sec
[88.996] 1298541 lines compiled, 89.0 sec
[86.074] 1298541 lines compiled, 86.1 sec
[85.480] 1298541 lines compiled, 85.5 sec

Mean: 87.284
Median: 86.074
Tagscompiler, patch, refactor, x86
Fixed in Revision
FPCOldBugId
FPCTarget-
Attached Files
  • case-refactoring.patch (8,577 bytes)
    Index: compiler/ncgset.pas
    ===================================================================
    --- compiler/ncgset.pas	(revision 42117)
    +++ compiler/ncgset.pas	(working copy)
    @@ -573,10 +573,8 @@
                       Exit;
                     end;
                   blockn:
    -                begin
    -                  Block := TBlockNode(Block).Left;
    -                  Continue;
    -                end;
    +                Block := TBlockNode(Block).Left;
    +
                   statementn:
                     begin
                       { If the right node is assigned, then it's a compound block
    @@ -587,13 +585,11 @@
                         Break;
     
                       Block := TStatementNode(Block).Left;
    -                  Continue;
                     end;
                   else
    -                ;
    +                { Anything it doesn't recognise, assume it can't be simplified }
    +                Break;
                 end;
    -
    -            Break;
               end;
     
             { Create unique label }
    @@ -1148,7 +1144,6 @@
              i : longint;
              dist : aword;
              distv,
    -         lv,hv,
              max_label: tconstexprint;
              max_linear_list : int64;
              max_dist : qword;
    @@ -1228,8 +1223,7 @@
                        max_label := case_get_max(labels);
     
                        { can we omit the range check of the jump table ? }
    -                   getrange(left.resultdef,lv,hv);
    -                   jumptable_no_range:=(lv=min_label) and (hv=max_label);
    +                   jumptable_no_range := (LowerDomain = min_label) and (UpperDomain = max_label);
     
                        distv:=max_label-min_label;
                        if distv>=0 then
    Index: compiler/nset.pas
    ===================================================================
    --- compiler/nset.pas	(revision 42117)
    +++ compiler/nset.pas	(working copy)
    @@ -136,6 +136,9 @@
               property labelcoverage: qword read getlabelcoverage;
              protected
               flabels    : pcaselabel;
    +
    +          { The smallest and largest values of the base type }
    +          LowerDomain, UpperDomain: TConstExprInt;
              public
               property labels: pcaselabel read flabels;
            end;
    @@ -686,9 +689,15 @@
               typecheckpass(elseblock);
     
             if not codegenerror and
    -           is_ordinal(left.resultdef) then
    -          checkordinalcoverage;
    +          (left.resultdef.typ in [orddef, enumdef, arraydef, undefineddef]) then
    +          begin
    +            { Get the upper and lower domain }
    +            getrange(left.resultdef, LowerDomain, UpperDomain);
     
    +            if is_ordinal(left.resultdef) then
    +              checkordinalcoverage;
    +          end;
    +
             resultdef:=voidtype;
           end;
     
    @@ -980,6 +989,8 @@
              n.fcountsuptodate:=fcountsuptodate;
              n.flabelcnt:=flabelcnt;
              n.flabelcoverage:=flabelcoverage;
    +         n.LowerDomain := LowerDomain;
    +         n.UpperDomain := UpperDomain;
              dogetcopy:=n;
           end;
     
    @@ -1163,11 +1174,10 @@
             end;
     
           var
    -        lv, hv, typcount: tconstexprint;
    +        typcount: QWord;
           begin
             { Check label type coverage for enumerations and small types }
    -        getrange(left.resultdef,lv,hv);
    -        typcount:=hv-lv;
    +        typcount := (UpperDomain - LowerDomain).uvalue;
             if not assigned(elseblock) then
               begin
                 { unless cs_check_all_case_coverage is set, only check for enums, booleans and
    Index: compiler/x86/nx86set.pas
    ===================================================================
    --- compiler/x86/nx86set.pas	(revision 42117)
    +++ compiler/x86/nx86set.pas	(working copy)
    @@ -78,7 +78,7 @@
             labeltyp: taiconst_type;
             AlmostExhaustive: Boolean;
             lv, hv: TConstExprInt;
    -        ExhaustiveLimit, Range, x, oldmin : int64;
    +        ExhaustiveLimit, Range, x, MaxRange, oldmin : int64;
     
           const
             ExhaustiveLimitBase = 32;
    @@ -109,9 +109,6 @@
               end;
     
           begin
    -        lv:=0;
    -        hv:=0;
    -        oldmin:=0;
             last:=min_;
             { This generates near pointers on i8086 }
             labeltyp:=aitconst_ptr;
    @@ -121,8 +118,10 @@
     
             if not(jumptable_no_range) then
               begin
    +            { Pass these fields into local variables to make reading them more efficient }
    +            lv := LowerDomain;
    +            hv := UpperDomain;
     
    -            getrange(left.resultdef,lv,hv);
                 Range := aint(max_)-aint(min_);
     
                 if (cs_opt_size in current_settings.optimizerswitches) then
    @@ -147,7 +146,7 @@
                     { a <= x <= b <-> unsigned(x-a) <= (b-a) }
                     cg.a_op_const_reg(current_asmdata.CurrAsmList,OP_SUB,opcgsize,aint(min_),hregister);
                     { case expr greater than max_ => goto elselabel }
    -                cg.a_cmp_const_reg_label(current_asmdata.CurrAsmList,opcgsize,OC_A,aint(max_)-aint(min_),hregister,elselabel);
    +                cg.a_cmp_const_reg_label(current_asmdata.CurrAsmList,opcgsize,OC_A,Range,hregister,elselabel);
                     min_:=0;
                     { do not sign extend when we load the index register, as we applied an offset above }
                     opcgsize:=tcgsize2unsigned[opcgsize];
    @@ -201,11 +200,22 @@
                 genitem(jtlist,hp);
     
                 { Fill the table with the values above _max }
    -            { Subtracting one from hv and not adding 1 to max averts the risk of an overflow }
    -            x := max_;
    -            hv := hv - 1;
    -            while x <= hv.svalue do
    +            if not hv.signed and (hv.uvalue > High(Int64)) then
                   begin
    +                { Handle the situation where hv.svalue will overflow by
    +                  offsetting max_ and hv to the high end of Int64 }
    +                x := max_ - Int64(hv.uvalue - QWord($8000000000000000));
    +                MaxRange := $7FFFFFFFFFFFFFFF;
    +              end
    +            else
    +              begin
    +                { Subtracting one from hv and not adding 1 to max averts the risk of an overflow }
    +                x := max_;
    +                MaxRange := hv.svalue - 1;
    +              end;
    +
    +            while x <= MaxRange do
    +              begin
                     jtlist.concat(Tai_const.Create_type_sym(labeltyp, elselabel));
                     Inc(x);
                   end;
    Index: compiler/x86_64/nx64set.pas
    ===================================================================
    --- compiler/x86_64/nx64set.pas	(revision 42117)
    +++ compiler/x86_64/nx64set.pas	(working copy)
    @@ -73,7 +73,7 @@
             jtitemconsttype: taiconst_type;
             AlmostExhaustive: Boolean;
             lv, hv: TConstExprInt;
    -        ExhaustiveLimit, Range, x, oldmin : aint;
    +        ExhaustiveLimit, Range, x, MaxRange, oldmin : aint;
     
           const
             ExhaustiveLimitBase = 32;
    @@ -103,8 +103,6 @@
             end;
     
           begin
    -        lv:=0;
    -        hv:=0;
             if not(target_info.system in systems_darwin) then
               jtitemconsttype:=aitconst_32bit
             else
    @@ -116,12 +114,13 @@
             opcgsize:=def_cgsize(opsize);
     
             AlmostExhaustive := False;
    -        oldmin := min_;
     
             if not(jumptable_no_range) then
               begin
    +            { Pass these fields into local variables to make reading them more efficient }
    +            lv := LowerDomain;
    +            hv := UpperDomain;
     
    -            getrange(left.resultdef,lv,hv);
                 Range := aint(max_)-aint(min_);
     
                 if (cs_opt_size in current_settings.optimizerswitches) then
    @@ -208,11 +207,22 @@
                 genitem(hp);
     
                 { Fill the table with the values above _max }
    -            { Subtracting one from hv and not adding 1 to max_ averts the risk of an overflow }
    -            x := max_;
    -            hv := hv - 1;
    -            while x <= hv.svalue do
    +            if not hv.signed and (hv.uvalue > High(Int64)) then
                   begin
    +                { Handle the situation where hv.svalue will overflow by
    +                  offsetting max_ and hv to the high end of Int64 }
    +                x := max_ - Int64(hv.uvalue - QWord($8000000000000000));
    +                MaxRange := $7FFFFFFFFFFFFFFF;
    +              end
    +            else
    +              begin
    +                { Subtracting one from hv and not adding 1 to max averts the risk of an overflow }
    +                x := max_;
    +                MaxRange := hv.svalue - 1;
    +              end;
    +
    +            while x <= MaxRange do
    +              begin
                     jtlist.concat(Tai_const.Create_rel_sym(jtitemconsttype,tablelabel,elselabel));
                     Inc(x);
                   end;
    
    case-refactoring.patch (8,577 bytes)

Activities

J. Gareth Moreton

2019-05-03 21:16

developer   ~0115984

Added mean and median compilation times to additional notes.

J. Gareth Moreton

2019-05-25 05:51

developer   ~0116403

Updated patch to work with latest trunk version. Also slightly tidied up the case branch in GetBranchLabel().

Note that I wasn't able to optimise the getrange() call that appears in the new "checkordinalcoverage" method because LowerDomain and UpperDomain are not always set.

J. Gareth Moreton

2019-05-25 19:18

developer   ~0116414

Last edited: 2019-05-25 19:33

View 2 revisions

Managed to optimise the code so getrange() is only called once during the type check, after which the domain values are obtained and reused later on. Got some new metrics (although I forgot the -s option, and it's hard to get anything consistent since my computer adds so many other factors to the compilation time):

Refactor

[80.891] 1299818 lines compiled, 80.9 sec, 9496208 bytes code, 799028 bytes data
[78.500] 1299818 lines compiled, 78.5 sec, 9496208 bytes code, 799028 bytes data
[78.305] 1299818 lines compiled, 78.3 sec, 9496208 bytes code, 799028 bytes data
[86.680] 1299818 lines compiled, 86.7 sec, 9496208 bytes code, 799028 bytes data
[88.719] 1299818 lines compiled, 88.7 sec, 9496208 bytes code, 799028 bytes data

Mean: 82.619
Median: 80.891


Control (Trunk)

[86.578] 1299818 lines compiled, 86.6 sec, 9496208 bytes code, 799028 bytes data
[88.238] 1299818 lines compiled, 88.2 sec, 9496208 bytes code, 799028 bytes data
[88.773] 1299818 lines compiled, 88.8 sec, 9496208 bytes code, 799028 bytes data
[88.008] 1299818 lines compiled, 88.0 sec, 9496208 bytes code, 799028 bytes data
[88.547] 1299818 lines compiled, 88.5 sec, 9496208 bytes code, 799028 bytes data

Mean: 88.029
Median: 88.238

----

Confirmed that lazarus.exe is no different, with or without the patch. It just compiles faster!



case-refactoring.patch (8,577 bytes)
Index: compiler/ncgset.pas
===================================================================
--- compiler/ncgset.pas	(revision 42117)
+++ compiler/ncgset.pas	(working copy)
@@ -573,10 +573,8 @@
                   Exit;
                 end;
               blockn:
-                begin
-                  Block := TBlockNode(Block).Left;
-                  Continue;
-                end;
+                Block := TBlockNode(Block).Left;
+
               statementn:
                 begin
                   { If the right node is assigned, then it's a compound block
@@ -587,13 +585,11 @@
                     Break;
 
                   Block := TStatementNode(Block).Left;
-                  Continue;
                 end;
               else
-                ;
+                { Anything it doesn't recognise, assume it can't be simplified }
+                Break;
             end;
-
-            Break;
           end;
 
         { Create unique label }
@@ -1148,7 +1144,6 @@
          i : longint;
          dist : aword;
          distv,
-         lv,hv,
          max_label: tconstexprint;
          max_linear_list : int64;
          max_dist : qword;
@@ -1228,8 +1223,7 @@
                    max_label := case_get_max(labels);
 
                    { can we omit the range check of the jump table ? }
-                   getrange(left.resultdef,lv,hv);
-                   jumptable_no_range:=(lv=min_label) and (hv=max_label);
+                   jumptable_no_range := (LowerDomain = min_label) and (UpperDomain = max_label);
 
                    distv:=max_label-min_label;
                    if distv>=0 then
Index: compiler/nset.pas
===================================================================
--- compiler/nset.pas	(revision 42117)
+++ compiler/nset.pas	(working copy)
@@ -136,6 +136,9 @@
           property labelcoverage: qword read getlabelcoverage;
          protected
           flabels    : pcaselabel;
+
+          { The smallest and largest values of the base type }
+          LowerDomain, UpperDomain: TConstExprInt;
          public
           property labels: pcaselabel read flabels;
        end;
@@ -686,9 +689,15 @@
           typecheckpass(elseblock);
 
         if not codegenerror and
-           is_ordinal(left.resultdef) then
-          checkordinalcoverage;
+          (left.resultdef.typ in [orddef, enumdef, arraydef, undefineddef]) then
+          begin
+            { Get the upper and lower domain }
+            getrange(left.resultdef, LowerDomain, UpperDomain);
 
+            if is_ordinal(left.resultdef) then
+              checkordinalcoverage;
+          end;
+
         resultdef:=voidtype;
       end;
 
@@ -980,6 +989,8 @@
          n.fcountsuptodate:=fcountsuptodate;
          n.flabelcnt:=flabelcnt;
          n.flabelcoverage:=flabelcoverage;
+         n.LowerDomain := LowerDomain;
+         n.UpperDomain := UpperDomain;
          dogetcopy:=n;
       end;
 
@@ -1163,11 +1174,10 @@
         end;
 
       var
-        lv, hv, typcount: tconstexprint;
+        typcount: QWord;
       begin
         { Check label type coverage for enumerations and small types }
-        getrange(left.resultdef,lv,hv);
-        typcount:=hv-lv;
+        typcount := (UpperDomain - LowerDomain).uvalue;
         if not assigned(elseblock) then
           begin
             { unless cs_check_all_case_coverage is set, only check for enums, booleans and
Index: compiler/x86/nx86set.pas
===================================================================
--- compiler/x86/nx86set.pas	(revision 42117)
+++ compiler/x86/nx86set.pas	(working copy)
@@ -78,7 +78,7 @@
         labeltyp: taiconst_type;
         AlmostExhaustive: Boolean;
         lv, hv: TConstExprInt;
-        ExhaustiveLimit, Range, x, oldmin : int64;
+        ExhaustiveLimit, Range, x, MaxRange, oldmin : int64;
 
       const
         ExhaustiveLimitBase = 32;
@@ -109,9 +109,6 @@
           end;
 
       begin
-        lv:=0;
-        hv:=0;
-        oldmin:=0;
         last:=min_;
         { This generates near pointers on i8086 }
         labeltyp:=aitconst_ptr;
@@ -121,8 +118,10 @@
 
         if not(jumptable_no_range) then
           begin
+            { Pass these fields into local variables to make reading them more efficient }
+            lv := LowerDomain;
+            hv := UpperDomain;
 
-            getrange(left.resultdef,lv,hv);
             Range := aint(max_)-aint(min_);
 
             if (cs_opt_size in current_settings.optimizerswitches) then
@@ -147,7 +146,7 @@
                 { a <= x <= b <-> unsigned(x-a) <= (b-a) }
                 cg.a_op_const_reg(current_asmdata.CurrAsmList,OP_SUB,opcgsize,aint(min_),hregister);
                 { case expr greater than max_ => goto elselabel }
-                cg.a_cmp_const_reg_label(current_asmdata.CurrAsmList,opcgsize,OC_A,aint(max_)-aint(min_),hregister,elselabel);
+                cg.a_cmp_const_reg_label(current_asmdata.CurrAsmList,opcgsize,OC_A,Range,hregister,elselabel);
                 min_:=0;
                 { do not sign extend when we load the index register, as we applied an offset above }
                 opcgsize:=tcgsize2unsigned[opcgsize];
@@ -201,11 +200,22 @@
             genitem(jtlist,hp);
 
             { Fill the table with the values above _max }
-            { Subtracting one from hv and not adding 1 to max averts the risk of an overflow }
-            x := max_;
-            hv := hv - 1;
-            while x <= hv.svalue do
+            if not hv.signed and (hv.uvalue > High(Int64)) then
               begin
+                { Handle the situation where hv.svalue will overflow by
+                  offsetting max_ and hv to the high end of Int64 }
+                x := max_ - Int64(hv.uvalue - QWord($8000000000000000));
+                MaxRange := $7FFFFFFFFFFFFFFF;
+              end
+            else
+              begin
+                { Subtracting one from hv and not adding 1 to max averts the risk of an overflow }
+                x := max_;
+                MaxRange := hv.svalue - 1;
+              end;
+
+            while x <= MaxRange do
+              begin
                 jtlist.concat(Tai_const.Create_type_sym(labeltyp, elselabel));
                 Inc(x);
               end;
Index: compiler/x86_64/nx64set.pas
===================================================================
--- compiler/x86_64/nx64set.pas	(revision 42117)
+++ compiler/x86_64/nx64set.pas	(working copy)
@@ -73,7 +73,7 @@
         jtitemconsttype: taiconst_type;
         AlmostExhaustive: Boolean;
         lv, hv: TConstExprInt;
-        ExhaustiveLimit, Range, x, oldmin : aint;
+        ExhaustiveLimit, Range, x, MaxRange, oldmin : aint;
 
       const
         ExhaustiveLimitBase = 32;
@@ -103,8 +103,6 @@
         end;
 
       begin
-        lv:=0;
-        hv:=0;
         if not(target_info.system in systems_darwin) then
           jtitemconsttype:=aitconst_32bit
         else
@@ -116,12 +114,13 @@
         opcgsize:=def_cgsize(opsize);
 
         AlmostExhaustive := False;
-        oldmin := min_;
 
         if not(jumptable_no_range) then
           begin
+            { Pass these fields into local variables to make reading them more efficient }
+            lv := LowerDomain;
+            hv := UpperDomain;
 
-            getrange(left.resultdef,lv,hv);
             Range := aint(max_)-aint(min_);
 
             if (cs_opt_size in current_settings.optimizerswitches) then
@@ -208,11 +207,22 @@
             genitem(hp);
 
             { Fill the table with the values above _max }
-            { Subtracting one from hv and not adding 1 to max_ averts the risk of an overflow }
-            x := max_;
-            hv := hv - 1;
-            while x <= hv.svalue do
+            if not hv.signed and (hv.uvalue > High(Int64)) then
               begin
+                { Handle the situation where hv.svalue will overflow by
+                  offsetting max_ and hv to the high end of Int64 }
+                x := max_ - Int64(hv.uvalue - QWord($8000000000000000));
+                MaxRange := $7FFFFFFFFFFFFFFF;
+              end
+            else
+              begin
+                { Subtracting one from hv and not adding 1 to max averts the risk of an overflow }
+                x := max_;
+                MaxRange := hv.svalue - 1;
+              end;
+
+            while x <= MaxRange do
+              begin
                 jtlist.concat(Tai_const.Create_rel_sym(jtitemconsttype,tablelabel,elselabel));
                 Inc(x);
               end;
case-refactoring.patch (8,577 bytes)

J. Gareth Moreton

2019-05-25 19:28

developer   ~0116415

Not that it should matter, but the refactoring also shrunk ppcx64.exe by 512 bytes.

Florian

2019-11-03 23:15

administrator   ~0119031

I tried to reproduce the values after adapting the patch to trunk. However, for me it makes no difference?

J. Gareth Moreton

2019-11-04 11:49

developer   ~0119044

I'll take a look. At least it isn't worse!

Issue History

Date Modified Username Field Change
2019-05-03 19:18 J. Gareth Moreton New Issue
2019-05-03 19:18 J. Gareth Moreton File Added: case-refactoring.patch
2019-05-03 19:18 J. Gareth Moreton Tag Attached: patch
2019-05-03 19:18 J. Gareth Moreton Tag Attached: compiler
2019-05-03 19:18 J. Gareth Moreton Tag Attached: refactor
2019-05-03 19:19 J. Gareth Moreton Tag Attached: x86
2019-05-03 19:19 J. Gareth Moreton Priority normal => low
2019-05-03 19:19 J. Gareth Moreton Severity minor => trivial
2019-05-03 19:19 J. Gareth Moreton FPCTarget => -
2019-05-03 21:16 J. Gareth Moreton Additional Information Updated View Revisions
2019-05-03 21:16 J. Gareth Moreton Note Added: 0115984
2019-05-25 05:49 J. Gareth Moreton File Deleted: case-refactoring.patch
2019-05-25 05:51 J. Gareth Moreton File Added: case-refactoring.patch
2019-05-25 05:51 J. Gareth Moreton Note Added: 0116403
2019-05-25 19:15 J. Gareth Moreton File Deleted: case-refactoring.patch
2019-05-25 19:18 J. Gareth Moreton File Added: case-refactoring.patch
2019-05-25 19:18 J. Gareth Moreton Note Added: 0116414
2019-05-25 19:28 J. Gareth Moreton Note Added: 0116415
2019-05-25 19:33 J. Gareth Moreton Note Edited: 0116414 View Revisions
2019-11-03 23:15 Florian Note Added: 0119031
2019-11-04 11:49 J. Gareth Moreton Note Added: 0119044