View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0038129 | FPC | Compiler | public | 2020-11-25 02:02 | 2020-12-12 00:09 |
Reporter | runewalsh | Assigned To | Florian | ||
Priority | urgent | Severity | crash | Reproducibility | sometimes |
Status | feedback | Resolution | open | ||
Platform | x86-64-win64 | ||||
Summary | 0038129: Wrong code generated on x86-64 | ||||
Description | Code below should write "#######... 70%", but depending on compiler settings it writes garbage or even crashes. I have this happening on x86-64-win64 with optimization levels greater than -O1. Bug vanishes after replacing pChar(result)[i] := with conventional result[1 + i] := or after removing the second condition part, i. e. (i = int32(divs) - 1) and (progress >= 1) or after replacing any of 32-bit variables with 64-bit equivalent so I can't even come close to guessing what's going on here. | ||||
Steps To Reproduce | function Bar(const progress: single; divs: uint32): string; const BarSym: array[boolean] of char = ('.', '#'); var i: int32; begin SetLength(result, divs); for i := 0 to int32(divs) - 1 do pChar(result)[i] := BarSym[(progress >= (0.75 + i) / divs) or (i = int32(divs) - 1) and (progress >= 1)]; end; var s: string; begin writeln(Bar(0.7, 10) + ' 70%'); end. | ||||
Tags | aarch64, compiler, x86_64 | ||||
Fixed in Revision | |||||
FPCOldBugId | |||||
FPCTarget | - | ||||
Attached Files |
|
related to | 0037527 | new | X86_64 code optimization | |
related to | 0038053 | assigned | J. Gareth Moreton | AArch64 -O2 bug not seen with -O1 |
related to | 0038055 | resolved | Compiling Lazarus for AArch64 with -O3 creates non-functional code |
|
*var s: string is unnecessary, also {$longstrings+} implied, of course. |
|
Why do you use / instead of div? Your result is not a float, but an integer type. |
|
Ignoring complaints about the quality of the code, the fact that it produces a bad binary is a pretty major bug. I'll take a look. (Also, Thaddy... the result is Boolean, not an integer type. The floating-point division is part of the right side of a comparison) |
|
What's wrong about the quality of the code T_________T Also, this is just an intermediate version which I could not reduce further (neither I tried hard enough), for example, the second condition part is superfluous and can be removed outright, but hey, remember that the bug vanishes then. |
|
Confirmed that -O2 produces different (and incorrect) output when compared to -O1 and -O-. Specifying -O2 -OoNOPEEPHOLE still produces bad code, indicating that the bug is not in the peephole optimiser and is higher in the chain. (P.S. There's nothing wrong with your code - it was more of a jab at Thaddy) |
|
well, a boolean is not a float. The division code is wrong. |
|
"(progress >= (0.75 + i) / divs)" is a Boolean. It's a comparison of two floats. |
|
I draw a progress bar that displays floating-point ‘progress’ ∈ [0.0; 1.0] using ‘divs’ characters. For ‘divs’ = 5, i-th character (i ∈ [0; 4]) is drawn as ‘#’ if ‘progress’ is at least, respectively, 0.15, 0.35, 0.55, 0.75 or 0.95, and as ‘.’ otherwise. These breakpoints are calculated as (i + 0.75) / divs. |
|
Sorry for all the negative discussion on your code. While personally I try to use integers rather than floats (and using the compiler optimisation that converts an integer divison by a constant into a multiplication), there's nothing wrong with your code. I have reproduced the compiler bug though. As a general rule of thumb, the quality of one's code or the techniques used shouldn't be up for discussion. If it causes the compiler to glitch out like it does here, it is a compiler bug through and through. |
|
Well, you're right. (Or at least I could convert “/ divs” at the right to “* divs” at the left.) Sorry for off-topic, but I myself became interested. I have microbenchmarked various operations over various types (on x64): https://i.imgur.com/d1tKIZR.png. “Independent” version does R := A op B, where R, A and B are global variables. “Dependent” version does R += A op B and is somewhat more realistic because we are going to use the result right away. For the measurement, ‘Benchmark’ exponentially searches ‘iterations’ value that will take enough time (1/6 sec). For me, both “independent” and “dependent” floating point divisions are slightly under 2 times slower than integer multiplications, so they aren't that scary, provided you have hardware support for floating point. |
|
Though I don't fully understand the specifics yet, floating-point division is generally much faster than integer division, which is why the latter is convertred into a multiplication where possible. And indeed, multiplication is faster than division overall in most cases. In the meantime, I've found the cause of the strange output, but not the reason why: # [10] for i := 0 to int32(divs) - 1 do leal -1(%esi),%eax testl %eax,%eax jnge .Lj6 movl $-1,%ecx .p2align 4,,10 .p2align 3 .Lj7: # Peephole Optimization: LeaMov2Lea done # Peephole Optimization: Lea2Add done addl $1,%ecx # [11] pChar(result)[i] := BarSym[(progress >= (0.75 + i) / divs) or (i = int32(divs) - 1) and (progress >= 1)]; cvtsi2ssl %ecx,%xmm0 addss _$BREAKP$_Ld1(%rip),%xmm0 # Peephole Optimization: MovAnd2Mov 1 done movl %esi,%edx cvtsi2ssq %rdx,%xmm1 divss %xmm1,%xmm0 comiss %xmm6,%xmm0 jp .Lj12 jbe .Lj10 .Lj12: # Peephole Optimization: Mov2Nop 5 done # Peephole Optimization: %edx = %esi; changed to minimise pipeline stall (MovXXX2MovXXX) movslq %esi,%rdx subq $1,%rdx movslq %ecx,%r8 cmpq %r8,%rdx jne .Lj13 comiss _$BREAKP$_Ld2(%rip),%xmm6 jp .Lj13 jnae .Lj13 .Lj10: # Peephole Optimization: movq $1,%r9 -> movl $1,%r9d (immediate can be represented with just 32 bits) movl $1,%r9d jmp .Lj18 .Lj13: xorq %r9,%r9 .Lj18: movq (%rbx),%rdx testq %rdx,%rdx jne .Lj19 leaq FPC_EMPTYCHAR(%rip),%rdx .Lj19: leaq TC_$P$BREAKP$_$BAR$SINGLE$LONGWORD$$ANSISTRING_$$_BARSYM(%rip),%r10 movb (%r10,%r9,1),%r9b movb %r9b,(%rdx,%r8,1) <<<------ r8 is not initailised cmpl %eax,%ecx jnge .Lj7 .Lj6: # [12] end; The counter variable i is stored in %ecx, which is then sign-extended into %r8 just after .Lj12, but this snippet of code is not entered if the left-hand side of the "or" operation (the floating-point comparison) is True. If %r8 happens to be zero, the program won't crash but the string contains garbage between index 0 and the first ".". If %r8 is something else entirely, then you'll most likely get an access violation. Under -O1, the counter is stored on the stack, but gets sign-extended into a register which is then used in place of %r8 in the write instruction. (Under -O1) addl $1,-28(%rbp) ... (some code that checks if Result is a null string) movslq -28(%rbp),%rdx cvtsi2ssl -28(%rbp),%xmm0 (Under -O2) addl $1,%ecx cvtsi2ssl %ecx,%xmm0 |
|
Testing other platforms... problem doesn't appear to occur under i386-win32. |
|
It seems this related to 0037527. |
|
BREAKTHROUGH! This issue is not isolated to x86_64-win64. I have just reproduced it on aarch64-linux. Under -O1: #######... 70% Under -O2: Runtime error 216 at $000000000040026C $000000000040026C $0000000000400168 |
|
And to prove it's the same problem... ... .Ll4: // [11] pChar(result)[i] := BarSym[(progress >= (0.75 + i) / divs) or (i = int32(divs) - 1) and (progress >= 1)]; fmov s0, 7.5000000000000000E-001 scvtf s1,w2 fadd s0,s0,s1 ucvtf s1,w20 fdiv s0,s0,s1 fcmpe s8,s0 b.ge .Lj10 <-- if first condition is True and control flow jumps to .Lj10, x1 is left uninitialised. .Lj11: mov w1,w20 sxtw x1,w1 sub x3,x1,1 sxtw x1,w2 cmp x3,x1 b.ne .Lj12 fmov s0, 1.0000000000000000E+000 fcmpe s8,s0 b.ge .Lj10 .Lj14: b .Lj12 .Lj10: movz x5,1 b .Lj16 .Lj12: movz x5,0 .Lj16: adrp x3,:got:TC_$P$BREAKP$_$BAR$SINGLE$LONGWORD$$ANSISTRING_$$_BARSYM ldr x3,[x3, :got_lo12:TC_$P$BREAKP$_$BAR$SINGLE$LONGWORD$$ANSISTRING_$$_BARSYM] ldr x4,[x19] cbnz x4,.Lj17 adrp x4,:got:FPC_EMPTYCHAR ldr x4,[x4, :got_lo12:FPC_EMPTYCHAR] .Lj17: ldrb w3,[x3, x5] strb w3,[x4, x1] <-- x1 not initlalised. Runtime error 216 / SIGSEGV is triggered. |
|
Okay, so I modified the code temporarily to print out the node tree after the first pass and just before the second pass, and currently everything looks correct with the temporary stores - it's created and destroyed in the correct places, so an incorrect node tree can be ruled out So it looks like the second pass is incorrect somewhere. I think I might upgrade my DEBUG_NODE_XML feature to contain the second tree alongside the initial code tree that's currently dumped, because it's a useful debugging feature. |
|
So the issue is to do with tcgtempcreatenode and tcgtemprefnode. In some situations, the temporary storage is only initialised at the first reference. In the above code example, the temporary storage contains an implicitly-typecast 'i' and its first reference is only conditionally executed because it is the right-side argument of an 'or' operator. Node processing use a kind of reverse Polish notation for assignments... the expressions are calculated first, hence why these are the first references to 'i', and then the index in the target variable. I'm still looking at a solution, sometimes a tcgtemprefnode can appear before its tcgtempcreatenode and moving the initialisation code to tcgtempcreatenode pass_generate_code can sometimes raise an internal error. |
|
(Assigned to myself because this problem seems to manifest in multiple places and is a compiler logic error) |
|
This is actually difficult to fix fundamentally. As I discovered the hard way, I can't just "uplift" the temp initialisations to outside the "or" block, because the left-hand side may be testing if the right-hand side can even be executed safely. e.g. if Assigned(p) and (p^ = etc.). Uplifting the temp reference here will cause a null pointer exception. I'll keep at it though. |
|
It seems the source of the bug is in the Common Subexpression Elimination (CSE) optimisation. I comment out "do_optcse(code);" in "generate_code" (or specify -OoNOCSE) and the machine code is no longer bad. I'm going to try a few things. |
|
It looks like this glitch was present in 3.2.0 as well - might need double-checking by another. I've got a patch working - just finishing the test runs, then I'll submit. |
|
So I found a fix, even if it's a little band-aidy. The compiler now forces a new reference when dealing with an assignment to an array-like variable. It may make CSE less efficient in places, and there is probably room for improvement in the future, but hopefully this resolves the issue. Give it a test run and see if it lives up to the standard. I'm still trying to get the aarch64-linux tests complete - having some configuration difficulties. cse-init-fix.patch (2,196 bytes)
Index: compiler/optcse.pas =================================================================== --- compiler/optcse.pas (revision 47639) +++ compiler/optcse.pas (working copy) @@ -64,6 +64,15 @@ function searchsubdomain(var n:tnode; arg: pointer) : foreachnoderesult; begin + { When writing to an array, the index can be uninitialised if a + conditional value that depends on the index is assigned to it } + if (n.nodetype = vecn) and (nf_callunique in n.flags) then + begin + pboolean(arg)^:=false; + result:=fen_norecurse_true; + Exit; + end; + if (n.nodetype in cseinvariant) or ((n.nodetype=inlinen) and (tinlinenode(n).inlinenumber in [in_length_x,in_assigned_x,in_sqr_real,in_sqrt_real,in_sin_real,in_cos_real,in_abs_long, Index: tests/webtbs/tw38129.pp =================================================================== --- tests/webtbs/tw38129.pp (nonexistent) +++ tests/webtbs/tw38129.pp (working copy) @@ -0,0 +1,43 @@ +{ %OPT=-O2 } +program tw38129; +{$mode objfpc} +{$h+} +{$WARN 5094 off : Function result variable of a managed type does not seem to be initialized} + +function WriteBits(const Val: Single; Divs: Integer): ansistring; + const + ProgChars: array[Boolean] of Char = (#192, #219); + var + I: Integer; + begin + SetLength(Result, Divs); + for I := 0 to Divs - 1 do + PAnsiChar(Result)[I] := ProgChars[(Val >= (0.75 + I) / Divs) or (I = Divs - 1) and (Val >= 1)]; + end; + +const + Expected: array[0..4] of ansistring = ( + #192#192#192#192, + #219#192#192#192, + #219#219#192#192, + #219#219#219#192, + #219#219#219#219 + ); + +var + X, Y: Integer; + CurrentBits: ansistring; +begin + for X := Low(Expected) to High(Expected) do + begin + CurrentBits := WriteBits(X * 0.25, 4); + for Y := 1 to Length(CurrentBits) do + if CurrentBits[Y] <> Expected[X][Y] then + begin + WriteLn('FAIL on X = ', X, ', Y = ', Y, '; expected ', Expected[X][Y], ', got ', CurrentBits[X]); + Halt(1); + end; + end; + WriteLn('ok'); +end. + |
|
aarch64-linux tests have passed (thanks Sven for helping me solve the technical issues in running them). |
|
You're welcome. :) |
|
I tested this patch on the trunk lazarus and FPC on the Apple M1 Aarch64 Issue 38053 persists. The command fpc -O2 fsz.pas; ls -l dipso002.tif; ./fsz dipso002.tif fails, but one can succeed with: fpc -O2 -OoNOREGVAR fsz.pas; ls -l dipso002.tif; ./fsz dipso002.tif Issue 38055 changes slightly: Building Lazarus with -O2 -g- -Xs builds an executable that runs. Building Lazarus with -O3 -g- -Xs Does not compile: Error: /Users/chrisrorden/src/lazarus/components/virtualtreeview/lib/aarch64-darwin-cocoa/laz.virtualtrees.s:12674:9: error: invalid operand for instruction Building Lazarus with -O3 -OoNOCONSTPROP -OoNODFA -g- -Xs Causes an executable to be created that crashes at launch. |
|
Guess my job is not yet over then! |
|
Issue is still present as of r47749 (patch not yet applied). |
|
THe patch is not correct and hides only the real problem. I don't have a solution yet, but I'll see if I can look into it the next days. |
|
I will openly admit that the patch was nothing more than a band-aid, and I unfortunately don't know WHY the temp allocator gets confused in this situation. |
Date Modified | Username | Field | Change |
---|---|---|---|
2020-11-25 02:02 | runewalsh | New Issue | |
2020-11-25 02:06 | runewalsh | Note Added: 0127170 | |
2020-11-25 09:16 | Thaddy de Koning | Note Added: 0127172 | |
2020-11-25 09:43 | J. Gareth Moreton | Note Added: 0127173 | |
2020-11-25 09:43 | J. Gareth Moreton | Assigned To | => J. Gareth Moreton |
2020-11-25 09:43 | J. Gareth Moreton | Status | new => assigned |
2020-11-25 09:43 | J. Gareth Moreton | Note Edited: 0127173 | View Revisions |
2020-11-25 09:44 | J. Gareth Moreton | Priority | normal => high |
2020-11-25 09:44 | J. Gareth Moreton | Severity | minor => major |
2020-11-25 09:44 | J. Gareth Moreton | FPCTarget | => - |
2020-11-25 09:46 | J. Gareth Moreton | Note Edited: 0127173 | View Revisions |
2020-11-25 17:38 | runewalsh | Note Added: 0127179 | |
2020-11-25 20:43 | J. Gareth Moreton | Status | assigned => confirmed |
2020-11-25 20:43 | J. Gareth Moreton | Note Added: 0127183 | |
2020-11-25 20:46 | J. Gareth Moreton | Note Edited: 0127183 | View Revisions |
2020-11-25 20:50 | Thaddy de Koning | Note Added: 0127185 | |
2020-11-25 21:01 | J. Gareth Moreton | Note Added: 0127186 | |
2020-11-25 21:13 | runewalsh | Note Added: 0127188 | |
2020-11-25 21:24 | J. Gareth Moreton | Note Added: 0127189 | |
2020-11-25 23:06 | runewalsh | Note Added: 0127191 | |
2020-11-26 00:44 | J. Gareth Moreton | Note Added: 0127195 | |
2020-11-26 05:24 | J. Gareth Moreton | Note Added: 0127196 | |
2020-11-26 07:08 | avk | Note Added: 0127197 | |
2020-11-26 12:01 | runewalsh | Note Edited: 0127179 | View Revisions |
2020-11-26 16:15 | J. Gareth Moreton | Relationship added | related to 0037527 |
2020-11-28 15:39 | J. Gareth Moreton | Note Added: 0127238 | |
2020-11-28 15:40 | J. Gareth Moreton | Note Edited: 0127238 | View Revisions |
2020-11-28 16:00 | J. Gareth Moreton | Relationship added | related to 0038053 |
2020-11-28 16:00 | J. Gareth Moreton | Relationship added | related to 0038055 |
2020-11-28 16:29 | J. Gareth Moreton | Note Added: 0127240 | |
2020-11-28 16:30 | J. Gareth Moreton | Note Edited: 0127240 | View Revisions |
2020-11-28 16:31 | J. Gareth Moreton | Note Edited: 0127240 | View Revisions |
2020-11-28 16:32 | J. Gareth Moreton | Priority | high => urgent |
2020-11-28 16:32 | J. Gareth Moreton | Severity | major => crash |
2020-11-28 16:33 | J. Gareth Moreton | Tag Attached: compiler | |
2020-11-28 16:33 | J. Gareth Moreton | Tag Attached: aarch64 | |
2020-11-28 16:33 | J. Gareth Moreton | Tag Attached: x86_64 | |
2020-11-29 00:47 | J. Gareth Moreton | Note Added: 0127250 | |
2020-11-29 05:14 | J. Gareth Moreton | Note Added: 0127252 | |
2020-11-29 05:15 | J. Gareth Moreton | Status | confirmed => assigned |
2020-11-29 05:15 | J. Gareth Moreton | Note Added: 0127253 | |
2020-11-29 18:23 | J. Gareth Moreton | Note Added: 0127259 | |
2020-11-30 02:28 | J. Gareth Moreton | Note Added: 0127266 | |
2020-11-30 08:06 | J. Gareth Moreton | Note Added: 0127267 | |
2020-11-30 13:29 | J. Gareth Moreton | Note Added: 0127268 | |
2020-11-30 13:29 | J. Gareth Moreton | File Added: cse-init-fix.patch | |
2020-11-30 13:29 | J. Gareth Moreton | Status | assigned => feedback |
2020-11-30 13:29 | J. Gareth Moreton | Assigned To | J. Gareth Moreton => Florian |
2020-12-01 01:41 | J. Gareth Moreton | Note Added: 0127284 | |
2020-12-01 09:22 | Sven Barth | Note Added: 0127286 | |
2020-12-01 15:28 | Chris Rorden | Note Added: 0127299 | |
2020-12-01 20:33 | J. Gareth Moreton | Note Added: 0127307 | |
2020-12-11 18:24 | J. Gareth Moreton | Note Added: 0127543 | |
2020-12-11 18:31 | Florian | Note Added: 0127544 | |
2020-12-12 00:09 | J. Gareth Moreton | Note Added: 0127549 |