View Issue Details

IDProjectCategoryView StatusLast Update
0038703FPCCompilerpublic2021-04-06 17:54
ReporterC Western Assigned ToFlorian  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Platformi386OSlinux 
Product Version3.3.1 
Fixed in Version3.3.1 
Summary0038703: Bad code generation
DescriptionThe attached gives inconsitent results with the current SVN:
linux i386 only:
$ fpc -Pi386 -Cr -MObjfpc project1.lpr
Free Pascal Compiler version 3.3.1 [2021/04/04] for i386
Copyright (c) 1993-2021 by Florian Klaempfl and others
Target OS: Linux for i386
Compiling project1.lpr
Linking project1
50 lines compiled, 0.1 sec, 155632 bytes code, 41796 bytes data
$ ./project1
 5.4000000000000000E+001


$ fpc -Pi386 -O4 -Cr -MObjfpc project1.lpr
Free Pascal Compiler version 3.3.1 [2021/04/04] for i386
Copyright (c) 1993-2021 by Florian Klaempfl and others
Target OS: Linux for i386
Compiling project1.lpr
Linking project1
50 lines compiled, 0.1 sec, 155568 bytes code, 41796 bytes data
$ ./project1
 0.0000000000000000E+000

TagsNo tags attached.
Fixed in Revision49128
FPCOldBugId
FPCTarget-
Attached Files

Activities

C Western

2021-04-04 15:17

reporter  

project1.lpr (864 bytes)   
program project1;

const
  MaxLoopDepth = 4;
type
  TES = record
    LoopDepth: Integer;
    Sums: array [1..MaxLoopDepth] of Double;
  end;
  PES = ^TES;
  TE = class
    ThreadStates: array of PES;
  end;

  TSF = class
  public
    function NI(Evaluator: TE; var a:array of Double): Double; virtual;
  end;

var
  E: TE;
  ES: TES;
  D: Double;
  SF: TSF;

threadvar
  ThreadIndex: Integer;

function TSF.NI(Evaluator: TE; var a: array of Double): Double;
begin
  with Evaluator.ThreadStates[ThreadIndex-1]^ do begin
    Sums[LoopDepth] := Sums[LoopDepth] + a[0];
    Result := Sums[LoopDepth];
  end;
end;

begin
  ThreadIndex := 2;
  SF := TSF.Create;
  E := TE.Create;
  SetLength(E.ThreadStates,2);
  E.ThreadStates[1] := @ES;
  ES.LoopDepth := 1;
  ES.Sums[1] := 0;
  D := 27;
  SF.NI(E, D);
  SF.NI(E, D);
  WriteLn(ES.Sums[1]); { should write 54 }
end.

project1.lpr (864 bytes)   

runewalsh

2021-04-05 10:48

reporter   ~0130103

Last edited: 2021-04-05 10:52

View 2 revisions

I'm afraid that with -O4 it is by design.
-O4 enables -OoUNCERTAIN, and the documentation about -OoUNCERTAIN notes that:

“If uncertain optimizations are enabled, the CSE algorithm assumes that
— If something is written to a local/global register or a procedure/function parameter, this value doesn’t overwrite the value to which a pointer points.
— If something is written to memory pointed to by a pointer variable, this value doesn’t overwrite the value of a local/global variable or a procedure/function parameter.”

If so, either use -O3 or allocate all of your structures dynamically. Presently, your ‘ES’ variable is static and so its static members, and thus they fall under the CSE's notion of global-variables-not-to-be-pointed-by.

I think this UNCERTAIN thing is a distant counterpart of what GCC's strict aliasing is. This is good for performance, as the compiler can more often be sure that the variable was not overwritten between two points in code, so will re-read them less. But not for excessive pointer lovers.

Do-wan Kim

2021-04-05 10:59

reporter   ~0130104

It may peephole optimization bug.
It works with {$Optimization noPEEPHOLE}.

C Western

2021-04-05 13:03

reporter   ~0130106

My understanding of OoUNCERTAIN is that it ignores the possibility of updating a given variable by two different routes, but I don't think that applies here as the problem function TSF.NI only uses one route to access any given variable. If it accessed ES directly that would be asking for trouble.

Do-wan Kim

2021-04-05 14:21

reporter   ~0130108

Last edited: 2021-04-05 14:22

View 3 revisions

It caused MovMovMov2MovMov 1 peephole optimization don't fully check register usage after optimization point.
Sums[LoopDepth] + a[0] store to invalid memory.
38703_aoptx86.pas.patch (787 bytes)   
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 49122)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -2475,6 +2475,7 @@
                       MatchOpType(taicpu(hp2),top_ref,top_reg) and
                       RefsEqual(taicpu(hp2).oper[0]^.ref^, taicpu(hp1).oper[1]^.ref^)  then
                       if not RegInRef(taicpu(hp2).oper[1]^.reg,taicpu(hp2).oper[0]^.ref^) and
+                         GetLastInstruction(hp1,hp3) and not RegUsedBetween(taicpu(hp1).oper[0]^.reg,hp2,hp3) and // 38703
                          not(RegUsedAfterInstruction(taicpu(p).oper[1]^.reg,hp1,tmpUsedRegs)) then
                         {   mov mem1, %reg1
                             mov %reg1, mem2
38703_aoptx86.pas.patch (787 bytes)   

J. Gareth Moreton

2021-04-05 19:32

developer   ~0130111

If logic serves me correctly, a better one might be to add "not RegInRef(taicpu(p).oper[1]^.reg,taicpu(hp2).oper[0]^.ref^)" instead, since the only way mem2 could become invalid in this optimisation is if it contains %reg1, in which case I might have a possible alternative. Stand by.

J. Gareth Moreton

2021-04-05 21:20

developer   ~0130115

Can't reproduce the issue under i386-win32, but that's to be expected sometimes. I'm setting up a test on my virtual machine. In the meantime, how does this patch perform?

C Western

2021-04-05 22:12

reporter   ~0130119

Thanks for looking at this. Sadly. assuming I have applied the patch correctly the result does not look immediately promising.

J. Gareth Moreton

2021-04-05 23:58

developer   ~0130122

Okay - I'll try to get i386-linux working for myself and examine the assembly language. Saying that, if you can provide an assembly dump of your project1 (-al option will probably be best), I'll be most grateful.

-O4 does have uncertain optimisations, but it shouldn't cause outright bad code.

Do-wan Kim

2021-04-06 00:42

reporter   ~0130124

I found buggy code generation under i386-win32. Range checking code makes this bug to appear.
38703.png (68,915 bytes)   
38703.png (68,915 bytes)   

J. Gareth Moreton

2021-04-06 01:58

developer   ~0130126

That's weird and interesting. Note it might be a fault with MovMov2Mov 3 as well because %ebx gets replaced with %edx.

J. Gareth Moreton

2021-04-06 04:11

developer   ~0130128

Last edited: 2021-04-06 04:11

View 2 revisions

I think I found your bug. It was in "MovMov2Mov 3" - it didn't mark the target register as in use when it was placed in the first MOV instruction, and this then caused MovMovMov2MovMov 1 to be incorrectly applied.

P.S. I managed to reproduce it on i386-win32... I just forgot the -Cr option. Oops!
i38703-kit.patch (602 bytes)   
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 49124)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -2380,6 +2380,7 @@
 
                                  mov mem, %reg"
                             }
+                            AllocRegBetween(taicpu(hp1).oper[1]^.reg,p,hp1,usedregs);
                             taicpu(p).loadreg(1, taicpu(hp1).oper[1]^.reg);
                             DebugMsg(SPeepholeOptimization + 'MovMov2Mov 3 done',p);
                             RemoveInstruction(hp1);
i38703-kit.patch (602 bytes)   

Do-wan Kim

2021-04-06 06:30

reporter   ~0130133

Your patch works good under i386 ubuntu and windows :)

C Western

2021-04-06 09:56

reporter   ~0130134

I can confirm the patch works for me, on the test and my full program - thank you.

Florian

2021-04-06 17:54

administrator   ~0130138

Thanks, applied.

Issue History

Date Modified Username Field Change
2021-04-04 15:17 C Western New Issue
2021-04-04 15:17 C Western File Added: project1.lpr
2021-04-05 10:48 runewalsh Note Added: 0130103
2021-04-05 10:52 runewalsh Note Edited: 0130103 View Revisions
2021-04-05 10:59 Do-wan Kim Note Added: 0130104
2021-04-05 13:03 C Western Note Added: 0130106
2021-04-05 14:21 Do-wan Kim Note Added: 0130108
2021-04-05 14:21 Do-wan Kim File Added: 38703_aoptx86.pas.patch
2021-04-05 14:22 Do-wan Kim Note Edited: 0130108 View Revisions
2021-04-05 14:22 Do-wan Kim Note Edited: 0130108 View Revisions
2021-04-05 19:32 J. Gareth Moreton Note Added: 0130111
2021-04-05 21:20 J. Gareth Moreton Note Added: 0130115
2021-04-05 21:20 J. Gareth Moreton File Added: i38703-kit.patch
2021-04-05 22:12 C Western Note Added: 0130119
2021-04-05 23:58 J. Gareth Moreton Note Added: 0130122
2021-04-06 00:42 Do-wan Kim Note Added: 0130124
2021-04-06 00:42 Do-wan Kim File Added: 38703.png
2021-04-06 01:58 J. Gareth Moreton Note Added: 0130126
2021-04-06 04:09 J. Gareth Moreton File Deleted: i38703-kit.patch
2021-04-06 04:11 J. Gareth Moreton Note Added: 0130128
2021-04-06 04:11 J. Gareth Moreton File Added: i38703-kit.patch
2021-04-06 04:11 J. Gareth Moreton Note Edited: 0130128 View Revisions
2021-04-06 06:30 Do-wan Kim Note Added: 0130133
2021-04-06 09:56 C Western Note Added: 0130134
2021-04-06 17:54 Florian Assigned To => Florian
2021-04-06 17:54 Florian Status new => resolved
2021-04-06 17:54 Florian Resolution open => fixed
2021-04-06 17:54 Florian Fixed in Version => 3.3.1
2021-04-06 17:54 Florian Fixed in Revision => 49128
2021-04-06 17:54 Florian FPCTarget => -
2021-04-06 17:54 Florian Note Added: 0130138