View Issue Details

IDProjectCategoryView StatusLast Update
0038527FPCCompilerpublic2021-02-23 01:22
Reporterrunewalsh Assigned ToJ. Gareth Moreton  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Platformi386-win32, x86_64-win64 
Product Version3.3.1 
Summary0038527: Dreadful, mind-chilling CSE failure
DescriptionCode below breaks under -O2 or higher: function F returns just 4*n. -OoNOCSE makes it work.
Steps To Reproducefunction F(n: SizeUint): SizeUint;
begin
    result := 4 * n + 4 * n;
end;

begin
    writeln('Reference F(5): ', 4 * 5 + 4 * 5);
    writeln(' Actual F(5): ', F(5));
end.
TagsNo tags attached.
Fixed in Revision48792
FPCOldBugId
FPCTarget-
Attached Files

Activities

Pierre Muller

2021-02-22 00:04

developer   ~0129080

Bug report confirmed!
  This is indeed a really enormous error :-(

  Hopefully Gareth will be able to fix this rapidly...

Pierre

J. Gareth Moreton

2021-02-22 00:14

developer   ~0129081

I'll get on it.

J. Gareth Moreton

2021-02-22 01:52

developer   ~0129082

Last edited: 2021-02-22 02:14

View 5 revisions

I confess when I first saw the title, I thought it was one of Si Nicholson's spam bug reports about the Pascal language in general and how they want it to be more C-based.

That aside, it doesn't look like CSE itself is at fault, but some that manages temporary references. When compiling with DEBUG_NODE_XML enabled, the following branches in the "firstpass" tree are generated:

<statementn pos="4,1">
   <blockn resultdef="$void" pos="4,1" flags="nf_pass1_done" complexity="0">
      <statementn pos="4,1">
         <nothingn resultdef="$void" pos="4,1" flags="nf_pass1_done" complexity="0" />
      </statementn>
      <statementn pos="5,17">
         <tempcreaten resultdef="$void" pos="5,17" flags="nf_pass1_done" complexity="0" id="$7C46BA70">
            <typedef>QWord</typedef>
            <tempflags>ti_may_be_in_reg,ti_const,ti_cleanup_only</tempflags>
            <temptype>tt_persistent</temptype>
            <size>8</size>
            <tempinit>
               <assignn resultdef="$void" pos="4,1" flags="nf_pass1_done" complexity="3">
                  <temprefn resultdef="QWord" pos="4,1" flags="nf_pass1_done,nf_write" complexity="1" id="$7C46BA70">
                     <typedef>QWord</typedef>
                     <tempflags>ti_may_be_in_reg,ti_const,ti_cleanup_only</tempflags>
                     <temptype>tt_persistent</temptype>
                  </temprefn>
                  <shln resultdef="QWord" pos="5,17" flags="nf_pass1_done" complexity="1">
                     <loadn resultdef="QWord" pos="5,19" flags="nf_pass1_done" complexity="0">
                        <symbol>N</symbol>
                     </loadn>
                     <ordconstn resultdef="QWord" pos="5,15" flags="nf_pass1_done" complexity="0" rangecheck="FALSE">
                        <value>2</value>
                     </ordconstn>
                  </shln>
               </assignn>
            </tempinit>
         </tempcreaten>
      </statementn>
   </blockn>
</statementn>
<statementn pos="4,1">
   <assignn resultdef="$void" pos="5,5" flags="nf_pass1_done" complexity="1">
      <typeconvn resultdef="QWord" pos="5,5" flags="nf_pass1_done,nf_absolute" complexity="0" convtype="tc_equal">
         <loadn resultdef="QWord" pos="5,12" flags="nf_pass1_done,nf_write" complexity="0">
            <symbol>result</symbol>
         </loadn>
      </typeconvn>
      <addn resultdef="QWord" pos="5,21" flags="nf_pass1_done" complexity="1">
         <temprefn resultdef="QWord" pos="5,17" flags="nf_pass1_done" complexity="1" id="$7C46BA70">
            <typedef>QWord</typedef>
            <tempflags>ti_may_be_in_reg,ti_const,ti_cleanup_only</tempflags>
            <temptype>tt_persistent</temptype>
         </temprefn>
         <temprefn resultdef="QWord" pos="5,25" flags="nf_pass1_done" complexity="1" id="$7C46BA70">
            <typedef>QWord</typedef>
            <tempflags>ti_may_be_in_reg,ti_const,ti_cleanup_only</tempflags>
            <temptype>tt_persistent</temptype>
         </temprefn>
      </addn>
   </assignn>
</statementn>

From those nodes alone, everything should work as expected: "N shl 2" (n * 4) is stored in a temporary store (they're unnamed, so the only way they can be referenced is via their pointer address... $7C46BA70 in this case). The answer is then added with itself and the result stored in Result.

HOWEVER...

I've just discovered that if the LeaLea2Lea peephole optimisation is disabled (alternatively, see what happens if you try to build the above code sample with "-OoNOPEEPHOLE"), the code compiles correctly. Looks like another bug that only rears its ugly head in a rare number of cases.

It looks like LeaLea2Lea doesn't optimise the following pair of assembly instructions properly:

leaq (,%rcx,4),%rax
leaq (%rax,%rax),%rax

(without LeaLea2Lea, the second instruction is converted into "addq %rax,%rax" by the Lea2AddBase optimisation)

I don't recall writing this optimisation, but I'll see what I can figure out.

J. Gareth Moreton

2021-02-22 12:29

developer   ~0129095

Okay, I've made a patch that reworks the LeaLea2Lea optimisation and hpoefully fixes the bug (admittedly by adding a brand new optimisation!). There's room for improvement, I'm sure.

Once I finish running the tests and confirm everything is good, I'll upload it.

J. Gareth Moreton

2021-02-22 22:49

developer   ~0129110

Tests on Windows pass without incident when run with -O4.
lealea2lea-rework.patch (11,518 bytes)   
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 48789)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -3360,67 +3360,164 @@
         if (taicpu(p).oper[1]^.reg <> NR_STACK_POINTER_REG) and
           GetNextInstructionUsingReg(p,hp1,taicpu(p).oper[1]^.reg) then
           begin
-            { changes
-                lea offset1(regX), reg1
-                lea offset2(reg1), reg1
-                to
-                lea offset1+offset2(regX), reg1 }
-
+            { Check common LEA/LEA conditions }
             if MatchInstruction(hp1,A_LEA,[taicpu(p).opsize]) and
-              MatchOperand(taicpu(p).oper[1]^,taicpu(hp1).oper[1]^) and
-              (taicpu(p).oper[0]^.ref^.relsymbol=nil) and
-              (taicpu(p).oper[0]^.ref^.segment=NR_NO) and
-              (taicpu(p).oper[0]^.ref^.symbol=nil) and
-              (((taicpu(hp1).oper[0]^.ref^.base=taicpu(p).oper[1]^.reg) and
-                (taicpu(p).oper[0]^.ref^.scalefactor <= 1) and
-                (taicpu(p).oper[0]^.ref^.index=NR_NO) and
-                (taicpu(p).oper[0]^.ref^.index=taicpu(hp1).oper[0]^.ref^.index) and
-                (taicpu(p).oper[0]^.ref^.scalefactor=taicpu(hp1).oper[0]^.ref^.scalefactor)
-               ) or
-               ((taicpu(hp1).oper[0]^.ref^.index=taicpu(p).oper[1]^.reg) and
-                (taicpu(p).oper[0]^.ref^.index=NR_NO)
-               ) or
-               ((taicpu(hp1).oper[0]^.ref^.base=taicpu(p).oper[1]^.reg) and
-                (taicpu(hp1).oper[0]^.ref^.scalefactor <= 1) and
-                ((taicpu(p).oper[0]^.ref^.base=NR_NO) or
-                 ((taicpu(p).oper[0]^.ref^.base=taicpu(p).oper[0]^.ref^.base) and
-                  (taicpu(p).oper[0]^.ref^.index=NR_NO)
-                 )
-                ) and
-                not(RegUsedBetween(taicpu(p).oper[0]^.ref^.index,p,hp1)))
-              ) and
-              not(RegUsedBetween(taicpu(p).oper[0]^.ref^.base,p,hp1)) and
-              (taicpu(p).oper[0]^.ref^.relsymbol=taicpu(hp1).oper[0]^.ref^.relsymbol) and
-              (taicpu(p).oper[0]^.ref^.segment=taicpu(hp1).oper[0]^.ref^.segment) and
-              (taicpu(p).oper[0]^.ref^.symbol=taicpu(hp1).oper[0]^.ref^.symbol) then
+              (taicpu(p).oper[1]^.reg = taicpu(hp1).oper[1]^.reg) and
+              (taicpu(p).oper[0]^.ref^.relsymbol = nil) and
+              (taicpu(p).oper[0]^.ref^.segment = NR_NO) and
+              (taicpu(p).oper[0]^.ref^.symbol = nil) and
+              (taicpu(hp1).oper[0]^.ref^.relsymbol = nil) and
+              (taicpu(hp1).oper[0]^.ref^.segment = NR_NO) and
+              (taicpu(hp1).oper[0]^.ref^.symbol = nil) and
+              (
+                (taicpu(p).oper[0]^.ref^.base = NR_NO) or { Don't call RegUsedBetween unnecessarily }
+                not(RegUsedBetween(taicpu(p).oper[0]^.ref^.base,p,hp1))
+              ) then
               begin
-                DebugMsg(SPeepholeOptimization + 'LeaLea2Lea done',p);
-                if taicpu(hp1).oper[0]^.ref^.index=taicpu(p).oper[1]^.reg then
+                { changes
+                    lea (regX,scale), reg1
+                    lea offset(reg1,reg1), reg1
+                    to
+                    lea offset(regX,scale*2), reg1
+
+                  and
+                    lea (regX,scale1), reg1
+                    lea offset(reg1,scale2), reg1
+                    to
+                    lea offset(regX,scale1*scale2), reg1
+
+                  ... so long as the final scale does not exceed 8
+
+                  (Similarly, allow the first instruction to be "lea (regX,regX),reg1")
+                  }
+                if (taicpu(p).oper[0]^.ref^.offset = 0) and
+                  (taicpu(hp1).oper[0]^.ref^.index = taicpu(p).oper[1]^.reg) and
+                  (
+                    (
+                      (taicpu(p).oper[0]^.ref^.base = NR_NO)
+                    ) or (
+                      (taicpu(p).oper[0]^.ref^.scalefactor <= 1) and
+                      (
+                        (taicpu(p).oper[0]^.ref^.base = taicpu(p).oper[0]^.ref^.index) and
+                        not(RegUsedBetween(taicpu(p).oper[0]^.ref^.index, p, hp1))
+                      )
+                    )
+                  ) and (
+                    (
+                      { lea (reg1,scale2), reg1 variant }
+                      (taicpu(hp1).oper[0]^.ref^.base = NR_NO) and
+                      (
+                        (
+                          (taicpu(p).oper[0]^.ref^.base = NR_NO) and
+                          (taicpu(hp1).oper[0]^.ref^.scalefactor * taicpu(p).oper[0]^.ref^.scalefactor <= 8)
+                        ) or (
+                          { lea (regX,regX), reg1 variant }
+                          (taicpu(p).oper[0]^.ref^.base <> NR_NO) and
+                          (taicpu(hp1).oper[0]^.ref^.scalefactor <= 4)
+                        )
+                      )
+                    ) or (
+                      { lea (reg1,reg1), reg1 variant }
+                      (taicpu(hp1).oper[0]^.ref^.base = taicpu(p).oper[1]^.reg) and
+                      (taicpu(hp1).oper[0]^.ref^.scalefactor <= 1)
+                    )
+                  ) then
                   begin
-                    taicpu(hp1).oper[0]^.ref^.index:=taicpu(p).oper[0]^.ref^.base;
-                    inc(taicpu(hp1).oper[0]^.ref^.offset,taicpu(p).oper[0]^.ref^.offset*max(taicpu(hp1).oper[0]^.ref^.scalefactor,1));
-                    { if the register is used as index and base, we have to increase for base as well
-                      and adapt base }
-                    if taicpu(hp1).oper[0]^.ref^.base=taicpu(p).oper[1]^.reg then
+                    DebugMsg(SPeepholeOptimization + 'LeaLea2Lea 2 done',p);
+
+                    { Make everything homogeneous to make calculations easier }
+                    if (taicpu(p).oper[0]^.ref^.base <> NR_NO) then
                       begin
-                        taicpu(hp1).oper[0]^.ref^.base:=taicpu(p).oper[0]^.ref^.base;
-                        inc(taicpu(hp1).oper[0]^.ref^.offset,taicpu(p).oper[0]^.ref^.offset);
+                        if taicpu(p).oper[0]^.ref^.index <> NR_NO then
+                          { Convert lea (regX,regX),reg1 to lea (regX,2),reg1 }
+                          taicpu(p).oper[0]^.ref^.scalefactor := 2
+                        else
+                          taicpu(p).oper[0]^.ref^.index := taicpu(p).oper[0]^.ref^.base;
+
+                        taicpu(p).oper[0]^.ref^.base := NR_NO;
                       end;
+
+                    if (taicpu(hp1).oper[0]^.ref^.base = NR_NO) then
+                      begin
+                        { Just to prevent miscalculations }
+                        if (taicpu(hp1).oper[0]^.ref^.scalefactor = 0) then
+                          taicpu(hp1).oper[0]^.ref^.scalefactor := taicpu(p).oper[0]^.ref^.scalefactor
+                        else
+                          taicpu(hp1).oper[0]^.ref^.scalefactor := taicpu(hp1).oper[0]^.ref^.scalefactor * taicpu(p).oper[0]^.ref^.scalefactor;
+                      end
+                    else
+                      begin
+                        taicpu(hp1).oper[0]^.ref^.base := NR_NO;
+                        taicpu(hp1).oper[0]^.ref^.scalefactor := taicpu(p).oper[0]^.ref^.scalefactor * 2;
+                      end;
+
+                    taicpu(hp1).oper[0]^.ref^.index := taicpu(p).oper[0]^.ref^.index;
+                    RemoveCurrentP(p);
+                    result:=true;
+                    exit;
                   end
-                else
+
+                { changes
+                    lea offset1(regX), reg1
+                    lea offset2(reg1), reg1
+                    to
+                    lea offset1+offset2(regX), reg1 }
+                else if
+                  (
+                    (taicpu(hp1).oper[0]^.ref^.index = taicpu(p).oper[1]^.reg) and
+                    (taicpu(p).oper[0]^.ref^.index = NR_NO)
+                  ) or (
+                    (taicpu(hp1).oper[0]^.ref^.base = taicpu(p).oper[1]^.reg) and
+                    (taicpu(hp1).oper[0]^.ref^.scalefactor <= 1) and
+                    (
+                      (
+                        (taicpu(p).oper[0]^.ref^.index = NR_NO) or
+                        (taicpu(p).oper[0]^.ref^.base = NR_NO)
+                      ) or (
+                        (taicpu(p).oper[0]^.ref^.scalefactor <= 1) and
+                        (
+                          (taicpu(p).oper[0]^.ref^.index = NR_NO) or
+                          (
+                            (taicpu(p).oper[0]^.ref^.index = taicpu(p).oper[0]^.ref^.base) and
+                            (
+                              (taicpu(hp1).oper[0]^.ref^.index = NR_NO) or
+                              (taicpu(hp1).oper[0]^.ref^.base = NR_NO)
+                            )
+                          )
+                        )
+                      )
+                    )
+                  ) then
                   begin
-                    inc(taicpu(hp1).oper[0]^.ref^.offset,taicpu(p).oper[0]^.ref^.offset);
-                    taicpu(hp1).oper[0]^.ref^.base:=taicpu(p).oper[0]^.ref^.base;
+                    DebugMsg(SPeepholeOptimization + 'LeaLea2Lea 1 done',p);
+
+                    if taicpu(hp1).oper[0]^.ref^.index=taicpu(p).oper[1]^.reg then
+                      begin
+                        taicpu(hp1).oper[0]^.ref^.index:=taicpu(p).oper[0]^.ref^.base;
+                        inc(taicpu(hp1).oper[0]^.ref^.offset,taicpu(p).oper[0]^.ref^.offset*max(taicpu(hp1).oper[0]^.ref^.scalefactor,1));
+                        { if the register is used as index and base, we have to increase for base as well
+                          and adapt base }
+                        if taicpu(hp1).oper[0]^.ref^.base=taicpu(p).oper[1]^.reg then
+                          begin
+                            taicpu(hp1).oper[0]^.ref^.base:=taicpu(p).oper[0]^.ref^.base;
+                            inc(taicpu(hp1).oper[0]^.ref^.offset,taicpu(p).oper[0]^.ref^.offset);
+                          end;
+                      end
+                    else
+                      begin
+                        inc(taicpu(hp1).oper[0]^.ref^.offset,taicpu(p).oper[0]^.ref^.offset);
+                        taicpu(hp1).oper[0]^.ref^.base:=taicpu(p).oper[0]^.ref^.base;
+                      end;
+                    if taicpu(p).oper[0]^.ref^.index<>NR_NO then
+                      begin
+                        taicpu(hp1).oper[0]^.ref^.base:=taicpu(hp1).oper[0]^.ref^.index;
+                        taicpu(hp1).oper[0]^.ref^.index:=taicpu(p).oper[0]^.ref^.index;
+                        taicpu(hp1).oper[0]^.ref^.scalefactor:=taicpu(p).oper[0]^.ref^.scalefactor;
+                      end;
+                    RemoveCurrentP(p);
+                    result:=true;
+                    exit;
                   end;
-                if taicpu(p).oper[0]^.ref^.index<>NR_NO then
-                  begin
-                    taicpu(hp1).oper[0]^.ref^.base:=taicpu(hp1).oper[0]^.ref^.index;
-                    taicpu(hp1).oper[0]^.ref^.index:=taicpu(p).oper[0]^.ref^.index;
-                    taicpu(hp1).oper[0]^.ref^.scalefactor:=taicpu(p).oper[0]^.ref^.scalefactor;
-                  end;
-                RemoveCurrentP(p);
-                result:=true;
-                exit;
               end;
 
             { Change:
lealea2lea-rework.patch (11,518 bytes)   

Pierre Muller

2021-02-23 00:18

developer   ~0129111

  I applied the patch provided by Gareth, after testing on x86_64-linux.

  Could you please test if the ug is fixed,
and close the bug report if it is the case.

  Thanks to Gareth,

Pierre

runewalsh

2021-02-23 01:22

reporter   ~0129112

Yay, it works, thank you, I was really scared.

Issue History

Date Modified Username Field Change
2021-02-21 23:50 runewalsh New Issue
2021-02-22 00:04 Pierre Muller Note Added: 0129080
2021-02-22 00:14 J. Gareth Moreton Assigned To => J. Gareth Moreton
2021-02-22 00:14 J. Gareth Moreton Status new => assigned
2021-02-22 00:14 J. Gareth Moreton Note Added: 0129081
2021-02-22 01:52 J. Gareth Moreton Note Added: 0129082
2021-02-22 01:53 J. Gareth Moreton Note Edited: 0129082 View Revisions
2021-02-22 01:53 J. Gareth Moreton Note Edited: 0129082 View Revisions
2021-02-22 01:54 J. Gareth Moreton Note Edited: 0129082 View Revisions
2021-02-22 02:14 J. Gareth Moreton Note Edited: 0129082 View Revisions
2021-02-22 12:29 J. Gareth Moreton Note Added: 0129095
2021-02-22 22:49 J. Gareth Moreton Note Added: 0129110
2021-02-22 22:49 J. Gareth Moreton File Added: lealea2lea-rework.patch
2021-02-22 22:49 J. Gareth Moreton Assigned To J. Gareth Moreton => Pierre Muller
2021-02-22 22:49 J. Gareth Moreton Status assigned => feedback
2021-02-22 22:49 J. Gareth Moreton FPCTarget => -
2021-02-23 00:18 Pierre Muller Assigned To Pierre Muller => J. Gareth Moreton
2021-02-23 00:18 Pierre Muller Status feedback => resolved
2021-02-23 00:18 Pierre Muller Resolution open => fixed
2021-02-23 00:18 Pierre Muller Fixed in Revision => 48792
2021-02-23 00:18 Pierre Muller Note Added: 0129111
2021-02-23 01:22 runewalsh Note Added: 0129112