View Issue Details

IDProjectCategoryView StatusLast Update
0038129FPCCompilerpublic2020-12-12 00:09
Reporterrunewalsh Assigned ToFlorian  
Status feedbackResolutionopen 
Summary0038129: Wrong code generated on x86-64
DescriptionCode below should write "#######... 70%", but depending on compiler settings it writes garbage or even crashes.
I have this happening on x86-64-win64 with optimization levels greater than -O1.

Bug vanishes after replacing
pChar(result)[i] :=
with conventional
result[1 + i] :=

or after removing the second condition part, i. e.
(i = int32(divs) - 1) and (progress >= 1)

or after replacing any of 32-bit variables with 64-bit equivalent

so I can't even come close to guessing what's going on here.
Steps To Reproducefunction Bar(const progress: single; divs: uint32): string;
    BarSym: array[boolean] of char = ('.', '#');
    i: int32;
    SetLength(result, divs);
    for i := 0 to int32(divs) - 1 do
        pChar(result)[i] := BarSym[(progress >= (0.75 + i) / divs) or (i = int32(divs) - 1) and (progress >= 1)];

    s: string;

    writeln(Bar(0.7, 10) + ' 70%');
Tagsaarch64, compiler, x86_64
Fixed in Revision
Attached Files


related to 0037527 new X86_64 code optimization 
related to 0038053 assignedJ. Gareth Moreton AArch64 -O2 bug not seen with -O1 
related to 0038055 resolved Compiling Lazarus for AArch64 with -O3 creates non-functional code 



2020-11-25 02:06

reporter   ~0127170

*var s: string is unnecessary, also {$longstrings+} implied, of course.

Thaddy de Koning

2020-11-25 09:16

reporter   ~0127172

Why do you use / instead of div? Your result is not a float, but an integer type.

J. Gareth Moreton

2020-11-25 09:43

developer   ~0127173

Last edited: 2020-11-25 09:46

View 3 revisions

Ignoring complaints about the quality of the code, the fact that it produces a bad binary is a pretty major bug. I'll take a look.

(Also, Thaddy... the result is Boolean, not an integer type. The floating-point division is part of the right side of a comparison)


2020-11-25 17:38

reporter   ~0127179

Last edited: 2020-11-26 12:01

View 2 revisions

What's wrong about the quality of the code T_________T Also, this is just an intermediate version which I could not reduce further (neither I tried hard enough), for example, the second condition part is superfluous and can be removed outright, but hey, remember that the bug vanishes then.

J. Gareth Moreton

2020-11-25 20:43

developer   ~0127183

Last edited: 2020-11-25 20:46

View 2 revisions

Confirmed that -O2 produces different (and incorrect) output when compared to -O1 and -O-.

Specifying -O2 -OoNOPEEPHOLE still produces bad code, indicating that the bug is not in the peephole optimiser and is higher in the chain.

(P.S. There's nothing wrong with your code - it was more of a jab at Thaddy)

Thaddy de Koning

2020-11-25 20:50

reporter   ~0127185

well, a boolean is not a float. The division code is wrong.

J. Gareth Moreton

2020-11-25 21:01

developer   ~0127186

"(progress >= (0.75 + i) / divs)" is a Boolean. It's a comparison of two floats.


2020-11-25 21:13

reporter   ~0127188

I draw a progress bar that displays floating-point ‘progress’ ∈ [0.0; 1.0] using ‘divs’ characters. For ‘divs’ = 5, i-th character (i ∈ [0; 4]) is drawn as ‘#’ if ‘progress’ is at least, respectively, 0.15, 0.35, 0.55, 0.75 or 0.95, and as ‘.’ otherwise. These breakpoints are calculated as (i + 0.75) / divs.

J. Gareth Moreton

2020-11-25 21:24

developer   ~0127189

Sorry for all the negative discussion on your code. While personally I try to use integers rather than floats (and using the compiler optimisation that converts an integer divison by a constant into a multiplication), there's nothing wrong with your code. I have reproduced the compiler bug though.

As a general rule of thumb, the quality of one's code or the techniques used shouldn't be up for discussion. If it causes the compiler to glitch out like it does here, it is a compiler bug through and through.


2020-11-25 23:06

reporter   ~0127191

Well, you're right. (Or at least I could convert “/ divs” at the right to “* divs” at the left.)

Sorry for off-topic, but I myself became interested.
I have microbenchmarked various operations over various types (on x64):
“Independent” version does R := A op B, where R, A and B are global variables.
“Dependent” version does R += A op B and is somewhat more realistic because we are going to use the result right away.
For the measurement, ‘Benchmark’ exponentially searches ‘iterations’ value that will take enough time (1/6 sec).

For me, both “independent” and “dependent” floating point divisions are slightly under 2 times slower than integer multiplications, so they aren't that scary, provided you have hardware support for floating point.

J. Gareth Moreton

2020-11-26 00:44

developer   ~0127195

Though I don't fully understand the specifics yet, floating-point division is generally much faster than integer division, which is why the latter is convertred into a multiplication where possible. And indeed, multiplication is faster than division overall in most cases.

In the meantime, I've found the cause of the strange output, but not the reason why:

# [10] for i := 0 to int32(divs) - 1 do
    leal -1(%esi),%eax
    testl %eax,%eax
    jnge .Lj6
    movl $-1,%ecx
    .p2align 4,,10
    .p2align 3
# Peephole Optimization: LeaMov2Lea done
# Peephole Optimization: Lea2Add done
    addl $1,%ecx
# [11] pChar(result)[i] := BarSym[(progress >= (0.75 + i) / divs) or (i = int32(divs) - 1) and (progress >= 1)];
    cvtsi2ssl %ecx,%xmm0
    addss _$BREAKP$_Ld1(%rip),%xmm0
# Peephole Optimization: MovAnd2Mov 1 done
    movl %esi,%edx
    cvtsi2ssq %rdx,%xmm1
    divss %xmm1,%xmm0
    comiss %xmm6,%xmm0
    jp .Lj12
    jbe .Lj10
# Peephole Optimization: Mov2Nop 5 done
# Peephole Optimization: %edx = %esi; changed to minimise pipeline stall (MovXXX2MovXXX)
    movslq %esi,%rdx
    subq $1,%rdx
    movslq %ecx,%r8
    cmpq %r8,%rdx
    jne .Lj13
    comiss _$BREAKP$_Ld2(%rip),%xmm6
    jp .Lj13
    jnae .Lj13
# Peephole Optimization: movq $1,%r9 -> movl $1,%r9d (immediate can be represented with just 32 bits)
    movl $1,%r9d
    jmp .Lj18
    xorq %r9,%r9
    movq (%rbx),%rdx
    testq %rdx,%rdx
    jne .Lj19
    leaq FPC_EMPTYCHAR(%rip),%rdx
    movb (%r10,%r9,1),%r9b
    movb %r9b,(%rdx,%r8,1) <<<------ r8 is not initailised
    cmpl %eax,%ecx
    jnge .Lj7
# [12] end;

The counter variable i is stored in %ecx, which is then sign-extended into %r8 just after .Lj12, but this snippet of code is not entered if the left-hand side of the "or" operation (the floating-point comparison) is True. If %r8 happens to be zero, the program won't crash but the string contains garbage between index 0 and the first ".". If %r8 is something else entirely, then you'll most likely get an access violation.

Under -O1, the counter is stored on the stack, but gets sign-extended into a register which is then used in place of %r8 in the write instruction.

(Under -O1)
addl $1,-28(%rbp)
... (some code that checks if Result is a null string)
movslq -28(%rbp),%rdx
cvtsi2ssl -28(%rbp),%xmm0

(Under -O2)
addl $1,%ecx
cvtsi2ssl %ecx,%xmm0

J. Gareth Moreton

2020-11-26 05:24

developer   ~0127196

Testing other platforms... problem doesn't appear to occur under i386-win32.


2020-11-26 07:08

reporter   ~0127197

It seems this related to 0037527.

J. Gareth Moreton

2020-11-28 15:39

developer   ~0127238

Last edited: 2020-11-28 15:40

View 2 revisions

BREAKTHROUGH! This issue is not isolated to x86_64-win64. I have just reproduced it on aarch64-linux.

Under -O1:

#######... 70%

Under -O2:

Runtime error 216 at $000000000040026C

J. Gareth Moreton

2020-11-28 16:29

developer   ~0127240

Last edited: 2020-11-28 16:31

View 3 revisions

And to prove it's the same problem...

// [11] pChar(result)[i] := BarSym[(progress >= (0.75 + i) / divs) or (i = int32(divs) - 1) and (progress >= 1)];
    fmov s0, 7.5000000000000000E-001
    scvtf s1,w2
    fadd s0,s0,s1
    ucvtf s1,w20
    fdiv s0,s0,s1
    fcmpe s8,s0 .Lj10 <-- if first condition is True and control flow jumps to .Lj10, x1 is left uninitialised.
    mov w1,w20
    sxtw x1,w1
    sub x3,x1,1
    sxtw x1,w2
    cmp x3,x1 .Lj12
    fmov s0, 1.0000000000000000E+000
    fcmpe s8,s0 .Lj10
    b .Lj12
    movz x5,1
    b .Lj16
    movz x5,0
    ldr x4,[x19]
    cbnz x4,.Lj17
    adrp x4,:got:FPC_EMPTYCHAR
    ldr x4,[x4, :got_lo12:FPC_EMPTYCHAR]
    ldrb w3,[x3, x5]
    strb w3,[x4, x1] <-- x1 not initlalised. Runtime error 216 / SIGSEGV is triggered.

J. Gareth Moreton

2020-11-29 00:47

developer   ~0127250

Okay, so I modified the code temporarily to print out the node tree after the first pass and just before the second pass, and currently everything looks correct with the temporary stores - it's created and destroyed in the correct places, so an incorrect node tree can be ruled out So it looks like the second pass is incorrect somewhere.

I think I might upgrade my DEBUG_NODE_XML feature to contain the second tree alongside the initial code tree that's currently dumped, because it's a useful debugging feature.

J. Gareth Moreton

2020-11-29 05:14

developer   ~0127252

So the issue is to do with tcgtempcreatenode and tcgtemprefnode. In some situations, the temporary storage is only initialised at the first reference. In the above code example, the temporary storage contains an implicitly-typecast 'i' and its first reference is only conditionally executed because it is the right-side argument of an 'or' operator. Node processing use a kind of reverse Polish notation for assignments... the expressions are calculated first, hence why these are the first references to 'i', and then the index in the target variable.

I'm still looking at a solution, sometimes a tcgtemprefnode can appear before its tcgtempcreatenode and moving the initialisation code to tcgtempcreatenode pass_generate_code can sometimes raise an internal error.

J. Gareth Moreton

2020-11-29 05:15

developer   ~0127253

(Assigned to myself because this problem seems to manifest in multiple places and is a compiler logic error)

J. Gareth Moreton

2020-11-29 18:23

developer   ~0127259

This is actually difficult to fix fundamentally. As I discovered the hard way, I can't just "uplift" the temp initialisations to outside the "or" block, because the left-hand side may be testing if the right-hand side can even be executed safely. e.g. if Assigned(p) and (p^ = etc.). Uplifting the temp reference here will cause a null pointer exception.

I'll keep at it though.

J. Gareth Moreton

2020-11-30 02:28

developer   ~0127266

It seems the source of the bug is in the Common Subexpression Elimination (CSE) optimisation. I comment out "do_optcse(code);" in "generate_code" (or specify -OoNOCSE) and the machine code is no longer bad. I'm going to try a few things.

J. Gareth Moreton

2020-11-30 08:06

developer   ~0127267

It looks like this glitch was present in 3.2.0 as well - might need double-checking by another. I've got a patch working - just finishing the test runs, then I'll submit.

J. Gareth Moreton

2020-11-30 13:29

developer   ~0127268

So I found a fix, even if it's a little band-aidy. The compiler now forces a new reference when dealing with an assignment to an array-like variable. It may make CSE less efficient in places, and there is probably room for improvement in the future, but hopefully this resolves the issue. Give it a test run and see if it lives up to the standard.

I'm still trying to get the aarch64-linux tests complete - having some configuration difficulties.
cse-init-fix.patch (2,196 bytes)   
Index: compiler/optcse.pas
--- compiler/optcse.pas	(revision 47639)
+++ compiler/optcse.pas	(working copy)
@@ -64,6 +64,15 @@
     function searchsubdomain(var n:tnode; arg: pointer) : foreachnoderesult;
+        { When writing to an array, the index can be uninitialised if a
+          conditional value that depends on the index is assigned to it }
+        if (n.nodetype = vecn) and (nf_callunique in n.flags) then
+          begin
+            pboolean(arg)^:=false;
+            result:=fen_norecurse_true;
+            Exit;
+          end;
         if (n.nodetype in cseinvariant) or
           ((n.nodetype=inlinen) and
            (tinlinenode(n).inlinenumber in [in_length_x,in_assigned_x,in_sqr_real,in_sqrt_real,in_sin_real,in_cos_real,in_abs_long,
Index: tests/webtbs/tw38129.pp
--- tests/webtbs/tw38129.pp	(nonexistent)
+++ tests/webtbs/tw38129.pp	(working copy)
@@ -0,0 +1,43 @@
+{ %OPT=-O2 }
+program tw38129;
+{$mode objfpc}
+{$WARN 5094 off : Function result variable of a managed type does not seem to be initialized}
+function WriteBits(const Val: Single; Divs: Integer): ansistring;
+  const
+    ProgChars: array[Boolean] of Char = (#192, #219);
+  var
+    I: Integer;
+  begin
+    SetLength(Result, Divs);
+    for I := 0 to Divs - 1 do
+      PAnsiChar(Result)[I] := ProgChars[(Val >= (0.75 + I) / Divs) or (I = Divs - 1) and (Val >= 1)];
+  end;
+  Expected: array[0..4] of ansistring = (
+    #192#192#192#192,
+    #219#192#192#192,
+    #219#219#192#192,
+    #219#219#219#192,
+    #219#219#219#219
+  );
+  X, Y: Integer;
+  CurrentBits: ansistring;
+  for X := Low(Expected) to High(Expected) do
+  begin
+    CurrentBits := WriteBits(X * 0.25, 4);
+    for Y := 1 to Length(CurrentBits) do
+      if CurrentBits[Y] <> Expected[X][Y] then
+      begin
+        WriteLn('FAIL on X = ', X, ', Y = ', Y, '; expected ', Expected[X][Y], ', got ', CurrentBits[X]);
+        Halt(1);
+      end;
+  end;
+  WriteLn('ok');
cse-init-fix.patch (2,196 bytes)   

J. Gareth Moreton

2020-12-01 01:41

developer   ~0127284

aarch64-linux tests have passed (thanks Sven for helping me solve the technical issues in running them).

Sven Barth

2020-12-01 09:22

manager   ~0127286

You're welcome. :)

Chris Rorden

2020-12-01 15:28

reporter   ~0127299

I tested this patch on the trunk lazarus and FPC on the Apple M1 Aarch64

Issue 38053 persists. The command
 fpc -O2 fsz.pas; ls -l dipso002.tif; ./fsz dipso002.tif
fails, but one can succeed with:
  fpc -O2 -OoNOREGVAR fsz.pas; ls -l dipso002.tif; ./fsz dipso002.tif

Issue 38055 changes slightly:

Building Lazarus with
 -O2 -g- -Xs
builds an executable that runs.

Building Lazarus with
 -O3 -g- -Xs
Does not compile:
 Error: /Users/chrisrorden/src/lazarus/components/virtualtreeview/lib/aarch64-darwin-cocoa/laz.virtualtrees.s:12674:9: error: invalid operand for instruction

Building Lazarus with
Causes an executable to be created that crashes at launch.

J. Gareth Moreton

2020-12-01 20:33

developer   ~0127307

Guess my job is not yet over then!

J. Gareth Moreton

2020-12-11 18:24

developer   ~0127543

Issue is still present as of r47749 (patch not yet applied).


2020-12-11 18:31

administrator   ~0127544

THe patch is not correct and hides only the real problem. I don't have a solution yet, but I'll see if I can look into it the next days.

J. Gareth Moreton

2020-12-12 00:09

developer   ~0127549

I will openly admit that the patch was nothing more than a band-aid, and I unfortunately don't know WHY the temp allocator gets confused in this situation.

Issue History

Date Modified Username Field Change
2020-11-25 02:02 runewalsh New Issue
2020-11-25 02:06 runewalsh Note Added: 0127170
2020-11-25 09:16 Thaddy de Koning Note Added: 0127172
2020-11-25 09:43 J. Gareth Moreton Note Added: 0127173
2020-11-25 09:43 J. Gareth Moreton Assigned To => J. Gareth Moreton
2020-11-25 09:43 J. Gareth Moreton Status new => assigned
2020-11-25 09:43 J. Gareth Moreton Note Edited: 0127173 View Revisions
2020-11-25 09:44 J. Gareth Moreton Priority normal => high
2020-11-25 09:44 J. Gareth Moreton Severity minor => major
2020-11-25 09:44 J. Gareth Moreton FPCTarget => -
2020-11-25 09:46 J. Gareth Moreton Note Edited: 0127173 View Revisions
2020-11-25 17:38 runewalsh Note Added: 0127179
2020-11-25 20:43 J. Gareth Moreton Status assigned => confirmed
2020-11-25 20:43 J. Gareth Moreton Note Added: 0127183
2020-11-25 20:46 J. Gareth Moreton Note Edited: 0127183 View Revisions
2020-11-25 20:50 Thaddy de Koning Note Added: 0127185
2020-11-25 21:01 J. Gareth Moreton Note Added: 0127186
2020-11-25 21:13 runewalsh Note Added: 0127188
2020-11-25 21:24 J. Gareth Moreton Note Added: 0127189
2020-11-25 23:06 runewalsh Note Added: 0127191
2020-11-26 00:44 J. Gareth Moreton Note Added: 0127195
2020-11-26 05:24 J. Gareth Moreton Note Added: 0127196
2020-11-26 07:08 avk Note Added: 0127197
2020-11-26 12:01 runewalsh Note Edited: 0127179 View Revisions
2020-11-26 16:15 J. Gareth Moreton Relationship added related to 0037527
2020-11-28 15:39 J. Gareth Moreton Note Added: 0127238
2020-11-28 15:40 J. Gareth Moreton Note Edited: 0127238 View Revisions
2020-11-28 16:00 J. Gareth Moreton Relationship added related to 0038053
2020-11-28 16:00 J. Gareth Moreton Relationship added related to 0038055
2020-11-28 16:29 J. Gareth Moreton Note Added: 0127240
2020-11-28 16:30 J. Gareth Moreton Note Edited: 0127240 View Revisions
2020-11-28 16:31 J. Gareth Moreton Note Edited: 0127240 View Revisions
2020-11-28 16:32 J. Gareth Moreton Priority high => urgent
2020-11-28 16:32 J. Gareth Moreton Severity major => crash
2020-11-28 16:33 J. Gareth Moreton Tag Attached: compiler
2020-11-28 16:33 J. Gareth Moreton Tag Attached: aarch64
2020-11-28 16:33 J. Gareth Moreton Tag Attached: x86_64
2020-11-29 00:47 J. Gareth Moreton Note Added: 0127250
2020-11-29 05:14 J. Gareth Moreton Note Added: 0127252
2020-11-29 05:15 J. Gareth Moreton Status confirmed => assigned
2020-11-29 05:15 J. Gareth Moreton Note Added: 0127253
2020-11-29 18:23 J. Gareth Moreton Note Added: 0127259
2020-11-30 02:28 J. Gareth Moreton Note Added: 0127266
2020-11-30 08:06 J. Gareth Moreton Note Added: 0127267
2020-11-30 13:29 J. Gareth Moreton Note Added: 0127268
2020-11-30 13:29 J. Gareth Moreton File Added: cse-init-fix.patch
2020-11-30 13:29 J. Gareth Moreton Status assigned => feedback
2020-11-30 13:29 J. Gareth Moreton Assigned To J. Gareth Moreton => Florian
2020-12-01 01:41 J. Gareth Moreton Note Added: 0127284
2020-12-01 09:22 Sven Barth Note Added: 0127286
2020-12-01 15:28 Chris Rorden Note Added: 0127299
2020-12-01 20:33 J. Gareth Moreton Note Added: 0127307
2020-12-11 18:24 J. Gareth Moreton Note Added: 0127543
2020-12-11 18:31 Florian Note Added: 0127544
2020-12-12 00:09 J. Gareth Moreton Note Added: 0127549