View Issue Details

IDProjectCategoryView StatusLast Update
0034272FPCCompilerpublic2019-12-26 22:21
ReporterJulian Puhl Assigned ToFlorian  
PrioritynormalSeverityminorReproducibilityN/A
Status resolvedResolutionfixed 
Product Version3.3.1 
Summary0034272: [Patch]Better node count heuristic for inlining code
DescriptionThe current heuristic for inlining tries to calculate the complexity based on the nth-root using the current inline level. It does not take any actual code length into account. This is bad as the number of allowed nodes shrinks drastically the further the inline level gets. If you have small functions which call other small functions you will very soon get not any inlines at all despite having only a few lines of code.

This patch tries to get a better estimate of the actual node count by summing all the nodes up until a limit is reached. The limit can be set with {$MAXINLINELENGTH x} where x is the maximum number of nodes. The default is currently 600 but ideally this is chosen depending on the target cpu.

If the patch is going to be accepted the new compiler directive should probably be changed from a global to a local directive with am additional command line parameter. I was not sure if it is worth the effort so for now I chose the simplest implementation. I can modify it if wanted.

Also since this is my first compiler patch there might be things which are not done the way the team usually handles them.

The patch was created against revision 39750.
TagsNo tags attached.
Fixed in Revision39763
FPCOldBugId
FPCTarget-
Attached Files

Activities

Julian Puhl

2018-09-13 13:59

reporter  

inlineheuristic.patch (7,408 bytes)   
Index: compiler/globals.pas
===================================================================
--- compiler/globals.pas	(revision 39750)
+++ compiler/globals.pas	(working copy)
@@ -54,8 +54,8 @@
          [m_delphi,m_class,m_objpas,m_result,m_string_pchar,
           m_pointer_2_procedure,m_autoderef,m_tp_procvar,m_initfinal,m_default_ansistring,
           m_out,m_default_para,m_duplicate_names,m_hintdirective,
-          m_property,m_default_inline,m_except,m_advanced_records,
-          m_array_operators];
+          m_property,m_default_inline,m_except,m_advanced_records,
+          m_array_operators];
        delphiunicodemodeswitches = delphimodeswitches + [m_systemcodepage,m_default_unicodestring];
        fpcmodeswitches =
          [m_fpc,m_string_pchar,m_nested_comment,m_repeat_forward,
@@ -324,6 +324,7 @@
        peflags : longint;
        minstacksize,
        maxstacksize,
+       maxinlinelength,
        imagebase     : puint;
        UseDeffileForExports    : boolean;
        UseDeffileForExportsSetExplicitly : boolean;
@@ -1522,6 +1523,7 @@
         do_make:=true;
         compile_level:=0;
         codegenerror:=false;
+        maxinlinelength:=600;
 
         { Output }
         OutputFileName:='';
Index: compiler/ncal.pas
===================================================================
--- compiler/ncal.pas	(revision 39750)
+++ compiler/ncal.pas	(working copy)
@@ -305,6 +305,8 @@
     const
       { track current inlining depth }
       inlinelevel : longint = 0;
+	  { track current inlining node count }
+	  inline_node_count : longint = 0;
 
 implementation
 
@@ -4153,54 +4155,61 @@
       var
         st   : tsymtable;
         para : tcallparanode;
+        current_node_count : dword;
       begin
         { Can we inline the procedure? }
         if (po_inline in procdefinition.procoptions) and
            (procdefinition.typ=procdef) and
-           tprocdef(procdefinition).has_inlininginfo and
-           {  Prevent too deep inlining recursion and code bloat by inlining
+           tprocdef(procdefinition).has_inlininginfo
+           {  Prevent too deep inlining recursion and code bloat by inlining.
 
-              The actual formuala is
-                                inlinelevel+1  /-------
-                  node count <  -------------\/  10000
+              To get an estimate about the inlined code length all nodes which are traversed are summed up until a limit is reached. After that no
+              further inlining is done. Note that the sum is not necessarily the actual code length in the end but rather overestimates it slightly.
 
-              This allows exponential grow of the code only to a certain limit.
-
               Remarks
                - The current approach calculates the inlining level top down, so outer call nodes (nodes closer to the leaf) might not be inlined
                  if the max. complexity is reached. This is done because it makes the implementation easier and because
                  there might be situations were it is more beneficial to inline inner nodes and do the calls to the outer nodes
                  if the outer nodes are in a seldomly used code path
-               - The code avoids to use functions from the math unit
            }
-           (node_count(tprocdef(procdefinition).inlininginfo^.code)<round(exp((1.0/(inlinelevel+1))*ln(10000)))) then
-          begin
-            include(callnodeflags,cnf_do_inline);
-            { Check if we can inline the procedure when it references proc/var that
-              are not in the globally available }
-            st:=procdefinition.owner;
-            while (st.symtabletype in [ObjectSymtable,recordsymtable]) do
-              st:=st.defowner.owner;
-            if (pi_uses_static_symtable in tprocdef(procdefinition).inlininginfo^.flags) and
-               (st.symtabletype=globalsymtable) and
-               (not st.iscurrentunit) then
-              begin
-                Comment(V_lineinfo+V_Debug,'Not inlining "'+tprocdef(procdefinition).procsym.realname+'", references static symtable');
-                exclude(callnodeflags,cnf_do_inline);
-              end;
-            para:=tcallparanode(parameters);
-            while assigned(para) do
-              begin
-                if not para.can_be_inlined then
-                  begin
-                    Comment(V_lineinfo+V_Debug,'Not inlining "'+tprocdef(procdefinition).procsym.realname+
-                      '", invocation parameter contains an unsafe/unsupported construct');
-                    exclude(callnodeflags,cnf_do_inline);
-                    break;
-                  end;
-                para:=tcallparanode(para.nextpara);
-              end;
-          end;
+            then
+          begin	
+	      current_node_count := node_count(tprocdef(procdefinition).inlininginfo^.code);
+	      if inlinelevel = 0 then
+		      inline_node_count := 0;
+	      inc(inline_node_count,current_node_count);
+	      if inline_node_count < maxinlinelength then
+	       begin
+		      include(callnodeflags,cnf_do_inline);
+		      { Check if we can inline the procedure when it references proc/var that
+			are not in the globally available }
+		      st:=procdefinition.owner;
+		      while (st.symtabletype in [ObjectSymtable,recordsymtable]) do
+			st:=st.defowner.owner;
+		      if (pi_uses_static_symtable in tprocdef(procdefinition).inlininginfo^.flags) and
+			 (st.symtabletype=globalsymtable) and
+			 (not st.iscurrentunit) then
+			begin
+			      Comment(V_lineinfo+V_Debug,'Not inlining "'+tprocdef(procdefinition).procsym.realname+'", references static symtable');
+			      exclude(callnodeflags,cnf_do_inline);
+			end;
+		      para:=tcallparanode(parameters);
+		      while assigned(para) do
+			begin
+			      if not para.can_be_inlined then
+				begin
+				      Comment(V_lineinfo+V_Debug,'Not inlining "'+tprocdef(procdefinition).procsym.realname+
+					'", invocation parameter contains an unsafe/unsupported construct');
+				      exclude(callnodeflags,cnf_do_inline);
+				      break;
+				end;
+			      para:=tcallparanode(para.nextpara);
+			end;
+	       end
+	      else
+	       Comment(V_lineinfo+V_Debug,'Not inlining "'+tprocdef(procdefinition).procsym.realname+'", node count too large:'
+               +tostr(inline_node_count)+', inlinelevel: '+tostr(inlinelevel));
+          end
       end;
 
 
Index: compiler/scandir.pas
===================================================================
--- compiler/scandir.pas	(revision 39750)
+++ compiler/scandir.pas	(working copy)
@@ -836,6 +836,12 @@
         MaxStackSizeSetExplicity:=true;
       end;
 
+    procedure dir_maxinlinelength;
+      begin
+        current_scanner.skipspace;
+        maxinlinelength:=current_scanner.readval;
+      end;
+
     procedure dir_memory;
       var
         l : longint;
@@ -1951,6 +1957,7 @@
         AddDirective('MACRO',directive_all, @dir_macro);
         AddDirective('MAXFPUREGISTERS',directive_all, @dir_maxfpuregisters);
         AddDirective('MAXSTACKSIZE',directive_all, @dir_maxstacksize);
+        AddDirective('MAXINLINELENGTH',directive_all, @dir_maxinlinelength);
         AddDirective('MEMORY',directive_all, @dir_memory);
         AddDirective('MESSAGE',directive_all, @dir_message);
         AddDirective('MINENUMSIZE',directive_all, @dir_packenum);
inlineheuristic.patch (7,408 bytes)   

Florian

2018-09-16 17:10

administrator   ~0110798

I am not convinced of the patch because the behaviour is not "symmetric", i.e. later calls to inlined subroutines in one inlining block are less likely inlined. If the current heuristics is a problem, I propose to adjust the calculation formula or make even its behaviour configurable e.g. by dividing the "root power" (is this the right expression?) by a configurable variable.

Small other remarks about the patch: please do not use tabs and follow the indention style of the surrounding code.

Julian Puhl

2018-09-16 18:09

reporter   ~0110804

I was relying on Lazarus for the indentions, but will make sure they are correct next time.

The suggested heuristic is not the best solution, but rather tries to improve the old one. I was assuming it is better to have more inlined code than none. Since you want a limit on inlined code size you have to make a decision somehow. I would say ideally you inline first all the smaller routines and then work you way upwards to the larger ones until a code size limit is reached, but that requires more than one pass.

With the current formula you have no idea about the overall picture since you are only taking the node count of the current call into account. In my opinion it is impossible to draw any conclusions about the overall inlined code length this way.

Florian

2018-09-16 19:20

administrator   ~0110808

Well, this is probably a matter of taste. I think, if subroutines are marked as inline, the compiler should try to inline them and not worry too much about code size but only avoid huge compilation time or compilation failures for somehow realistic code.

Julian Puhl

2018-09-16 20:22

reporter   ~0110813

Last edited: 2018-09-16 21:12

View 3 revisions

Actually I am thinking more in terms of architecture (x86 in this case but probably applies to others as well). Since calls are expensive, you want to eliminate as many as possible of them. But at the same time too large code size also may hurt performance. So you have to find a good compromise. There are also other factors as well, e.g. how often a code part gets executed, but we don't have that information available.

The motivation for the patch was also a real world problem, because we were wondering why even small getters and setters wouldn't get inlined at some places in our programs while in other parts it worked.

Florian

2018-09-16 21:16

administrator   ~0110816

Well, we have no information how often each code path is used, so the only information we have is: the user wants the compiler to inline if possible by any means. The only other constraint we have is: prevent that the compiler gets into trouble with real world code.

About your real world problem: yes, I understand it and I think the solution is just to adapt the current formula. I did this in 39763.

Julian Puhl

2018-09-17 00:51

reporter   ~0110819

I tested with a very large project and most of the time even more is inlined now. So it is looking good. Thanks for that :)

There is one separate issue I noticed: If a function, which is not declared in the interface of a unit, is used within an inline function this inline function is not inlined at all. The debug message is the one with "[...]references static symtable". Which does not make much sense to me. If you declare it in the interface everything works as expected. But you don't always want a function to be visible outside of a unit so this is not a nice workaround.

I can create a separate bug report for that with more details if you want.

Sven Barth

2018-09-21 17:04

manager   ~0110930

This is by design as functions that are declared in the implementation section are not available outside the unit (even when using an "external 'mangledname'"; the linker will complain here). "inline" doesn't change anything there.

Erik Rößiger

2018-09-21 21:44

reporter   ~0110944

Last edited: 2018-09-21 21:54

View 4 revisions

May I ask what thoughts stand behind this decision? At compile time, there should be no difference between functions in Implementation and those made public in Interface, as far as I can imagine. And thus, it should be possible to inline them.
The only place I cannot see inlining work is when you have virtual methods.

Sven Barth

2018-09-22 22:22

manager   ~0110960

The compiler and linker can optimize such local code differently/better when it knows that no other compilation unit can access it.

Issue History

Date Modified Username Field Change
2018-09-13 13:59 Julian Puhl New Issue
2018-09-13 13:59 Julian Puhl File Added: inlineheuristic.patch
2018-09-16 17:10 Florian Note Added: 0110798
2018-09-16 18:09 Julian Puhl Note Added: 0110804
2018-09-16 19:20 Florian Note Added: 0110808
2018-09-16 20:22 Julian Puhl Note Added: 0110813
2018-09-16 21:00 Julian Puhl Note Edited: 0110813 View Revisions
2018-09-16 21:12 Julian Puhl Note Edited: 0110813 View Revisions
2018-09-16 21:16 Florian Note Added: 0110816
2018-09-17 00:51 Julian Puhl Note Added: 0110819
2018-09-21 17:04 Sven Barth Note Added: 0110930
2018-09-21 21:44 Erik Rößiger Note Added: 0110944
2018-09-21 21:45 Erik Rößiger Note Edited: 0110944 View Revisions
2018-09-21 21:53 Erik Rößiger Note Edited: 0110944 View Revisions
2018-09-21 21:54 Erik Rößiger Note Edited: 0110944 View Revisions
2018-09-22 22:22 Sven Barth Note Added: 0110960
2019-12-26 22:21 Florian Assigned To => Florian
2019-12-26 22:21 Florian Status new => resolved
2019-12-26 22:21 Florian Resolution open => fixed
2019-12-26 22:21 Florian Fixed in Revision => 39763
2019-12-26 22:21 Florian FPCTarget => -