View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0038334 | FPC | Compiler | public | 2021-01-08 19:43 | 2021-01-11 06:08 |
Reporter | J. Gareth Moreton | Assigned To | Florian | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Platform | i386 | OS | Microsoft Windows | ||
Product Version | 3.3.1 | ||||
Fixed in Version | 3.3.1 | ||||
Summary | 0038334: [Patch] [x86] JccAdd2SetccAdd optimisation is faulty | ||||
Description | The JccAdd2SetccAdd, introduced in revision 48086 (and present as of 48115), sometimes produces bad code. | ||||
Steps To Reproduce | Build the following problem under i386-win32 with the -O3 option: program test48086; {$mode objfpc}{$H+} function IsFontNameXLogicalFontDesc(const LongFontName: string): boolean; var MinusCnt, p: integer; begin MinusCnt:=0; for p:=1 to length(LongFontName) do if LongFontName[p]='-' then inc(MinusCnt); Result:=(MinusCnt=14); end; var myfont:string; begin myfont:='Myfont--------------'; if IsFontNameXLogicalFontDesc(myfont) then writeln('NO ERROR'); end. Observe a run-time error upon execution of the generated EXE file. | ||||
Additional Information | In the assembly dump (generated with -alr): ... .Lj8: addl $1,%ebx # [8] if LongFontName[p]='-' then inc(MinusCnt); cmpb $45,-1(%eax,%ebx,1) # Peephole Optimization: JccAdd2SetccAdd sete %al movzbl %al,%eax # Register eflags released addl %eax,%ecx .Lj12: cmpl %edx,%ebx jnge .Lj8 # Register eflags,ebx,edx,eax released .Lj7: ... "JccAdd2SetccAdd" replaces "jne .Lj12; addl $1,%ecx" with "sete %al; movzbl %al,%eax; addl %eax,%ecx". The problem here is that it selects a register that is still in use, namely %al/%eax. If control flow returns to .Lj8 at the "jnge .Lj8" instruction, the "cmpl" instruction that follows attempts to dereference an address that uses %eax and now contains an invalid value (either 0 or 1).. - in the sample program, %eax contained an address, while %ebx is being used as an index. | ||||
Tags | compiler, i386, optimization, patch, x86_64 | ||||
Fixed in Revision | 48124 | ||||
FPCOldBugId | |||||
FPCTarget | - | ||||
Attached Files |
|
|
As a couple of side-notes with the optimisation itself... I would possibly recommend attempting to use "xorl %regl,%regl; cmp (etc); sete %regb" instead of "cmp (etc); sete %regb; movzbl %regb;%regl", since this takes fewer bytes to encode and removes the pipeline stall between SETcc and MOVZX. Secondly, the reference count on the label that the original jump was pointing to (.Lj12 in the sample) is not being decremented. Correctly decrementing it will cause the label to be stripped entirely. |
|
This patch seems to fix the problem. There were some side-effects going on, I think. It still doesn't feel quite right under i386, because I'm not sure why you have to explicitly specify the registers (and EBX is not usually a volatile register, so I've removed it to play safe for now). I'll seek to improve this optimisation in the future. |
|
Just to note, I also decremented the label reference in the patch, usually allowing the removal of the label completely. This permits at least one additional optimisation in the RTL: sysutils.s - Before: ... # Peephole Optimization: JccAdd2SetccAdd sete %al movzbl %al,%eax subl %eax,%esi .Lj4032: ; <-- Dead Label testl %esi,%esi je .Lj4033 ... After: ... # Peephole Optimization: JccAdd2SetccAdd sete %al movzbl %al,%eax subl %eax,%esi ; <- sub/test gets optimised to just sub je .Lj4033 ... |
|
Patch works good. i386-win32 lazarus IDE runs without exception. |
|
I cannot see how the patch solves the problem? Actually, if I use a ppc386 compiled by 3.2.0 everything works, if I use one compiled by a 3.3.1 it doesn't. So it looks like more that the patch works around the problem? |
|
Yeah, I confess that I am not entirely sure why the problem is fundamentally fixed. There's some kind of side-effect going on somewhere with one of the functions, or another subtle compiler bug that's corrupting the register tracking. Then again, besides the label reference decrement, I argue that my patch provides a speed boost because calls to functions such as get_volatile_registers_int shouldn't change within the for-loop, so calling them beforehand and storing the result provides a performance gain. I'll see if I can debug the fault though. |
|
Found the problem... it WAS my fault after all! A completely different optimisation caused this one to get messed up! 0038339 |
|
I'd still like to submit this patch (slightly modified) as a minor performance boost to this optimisation. jcc-add-opt-rework.patch (2,607 bytes)
Index: compiler/x86/aoptx86.pas =================================================================== --- compiler/x86/aoptx86.pas (revision 48117) +++ compiler/x86/aoptx86.pas (working copy) @@ -5693,6 +5693,7 @@ symbol: TAsmSymbol; reg: tsuperregister; regavailable: Boolean; + {$ifdef x86_64}VolatileRegSet,{$endif x86_64} TrackedRegSet: TRegSet; begin result:=false; symbol:=nil; @@ -5798,19 +5799,30 @@ begin TransferUsedRegs(TmpUsedRegs); UpdateUsedRegs(TmpUsedRegs, tai(p.next)); + { Don't call UpdateUsedRegs(TmpUsedRegs, tai(hp1.next)) + in case the register used by hp1 gets deallocated. } + TrackedRegSet := TmpUsedRegs[R_INTREGISTER].GetUsedRegs; +{$ifdef x86_64} + VolatileRegSet := paramanager.get_volatile_registers_int(current_procinfo.procdef.proccalloption); +{$endif x86_64} { search for an available register which is volatile } regavailable:=false; for reg in tcpuregisterset do begin - if (reg in paramanager.get_volatile_registers_int(current_procinfo.procdef.proccalloption)) and - not(reg in TmpUsedRegs[R_INTREGISTER].GetUsedRegs) and - not(RegInInstruction(newreg(R_INTREGISTER,reg,R_SUBL),hp1)) -{$ifdef i386} - and (reg in [RS_EAX,RS_EBX,RS_ECX,RS_EDX]) -{$endif i386} - then + if +{$ifdef x86_64} + (reg in VolatileRegSet) and +{$else x86_64} + (reg in [RS_EAX,RS_ECX,RS_EDX]) and +{$endif x86_64} + not(reg in TrackedRegSet) then begin + { If the register appears in hp1, it will be marked + as in use, even if it's deallocated right + afterwards because we didn't call + UpdateUsedRegs(TmpUsedRegs, tai(hp1.next). } + regavailable:=true; break; end; @@ -5818,6 +5830,7 @@ if regavailable then begin + TAsmLabel(symbol).decrefs; Taicpu(p).clearop(0); Taicpu(p).ops:=1; Taicpu(p).is_jmp:=false; |
|
The patch is not really correct as e.g. there are calling conventions where ebx is volatile. The test is only about testing if the register can be used as 8 Bit register. And the speed boost can be discussed controversal: the code path is taken very seldomly but it increases stack size and thus cache pollution. |
|
Hmmm, that's a good point. Now I understand why i386 (and i8086) only asked for AL, BL, CL or DL - the reason escaped me initially. Though i8086 doesn't use x86's pass 2, I used ($ifndef x86_64} instead of {$ifdef i386} to ensure that platform was covered should it use x86's pass 2 one day. |
|
I would argue in that case, just as a very small performance boost, that "(reg in [RS_EAX,RS_EBX,RS_ECX,RS_EDX])" be the first condition rather than the last one, since it's very quick to check and doesn't involve calling "paramanager.get_volatile_registers_int" and "GetUsedRegs" for ESI, EDI, EBP and ESP. |
Date Modified | Username | Field | Change |
---|---|---|---|
2021-01-08 19:43 | J. Gareth Moreton | New Issue | |
2021-01-08 19:44 | J. Gareth Moreton | Relationship added | related to 0038294 |
2021-01-08 19:49 | J. Gareth Moreton | Note Added: 0128176 | |
2021-01-08 19:50 | J. Gareth Moreton | Note Edited: 0128176 | View Revisions |
2021-01-08 19:51 | J. Gareth Moreton | Note Edited: 0128176 | View Revisions |
2021-01-08 19:51 | J. Gareth Moreton | Note Edited: 0128176 | View Revisions |
2021-01-08 19:52 | J. Gareth Moreton | Note Edited: 0128176 | View Revisions |
2021-01-09 05:56 | J. Gareth Moreton | Note Added: 0128188 | |
2021-01-09 05:56 | J. Gareth Moreton | File Added: jcc-add-opt-fix.patch | |
2021-01-09 05:56 | J. Gareth Moreton | Summary | [x86] JccAdd2SetccAdd optimisation is faulty => [Patch] [x86] JccAdd2SetccAdd optimisation is faulty |
2021-01-09 05:56 | J. Gareth Moreton | FPCTarget | => - |
2021-01-09 06:23 | J. Gareth Moreton | Note Added: 0128189 | |
2021-01-09 06:31 | Do-wan Kim | Note Added: 0128190 | |
2021-01-09 18:52 | Florian | Note Added: 0128210 | |
2021-01-09 20:45 | J. Gareth Moreton | Note Added: 0128216 | |
2021-01-09 20:46 | J. Gareth Moreton | Note Edited: 0128216 | View Revisions |
2021-01-09 20:46 | J. Gareth Moreton | Note Edited: 0128216 | View Revisions |
2021-01-09 20:46 | J. Gareth Moreton | Note Edited: 0128216 | View Revisions |
2021-01-10 00:28 | J. Gareth Moreton | Relationship added | related to 0038339 |
2021-01-10 00:29 | J. Gareth Moreton | Note Added: 0128230 | |
2021-01-10 00:55 | J. Gareth Moreton | File Deleted: jcc-add-opt-fix.patch | |
2021-01-10 00:55 | J. Gareth Moreton | Note Added: 0128232 | |
2021-01-10 00:55 | J. Gareth Moreton | File Added: jcc-add-opt-rework.patch | |
2021-01-10 04:02 | J. Gareth Moreton | Tag Attached: patch | |
2021-01-10 04:02 | J. Gareth Moreton | Tag Attached: compiler | |
2021-01-10 04:02 | J. Gareth Moreton | Tag Attached: optimization | |
2021-01-10 04:02 | J. Gareth Moreton | Tag Attached: i386 | |
2021-01-10 04:02 | J. Gareth Moreton | Tag Attached: x86_64 | |
2021-01-10 11:24 | Florian | Assigned To | => Florian |
2021-01-10 11:24 | Florian | Status | new => resolved |
2021-01-10 11:24 | Florian | Resolution | open => fixed |
2021-01-10 11:24 | Florian | Fixed in Version | => 3.3.1 |
2021-01-10 11:24 | Florian | Fixed in Revision | => 48124 |
2021-01-10 16:40 | Florian | Note Added: 0128245 | |
2021-01-10 17:08 | J. Gareth Moreton | Note Added: 0128247 | |
2021-01-10 17:09 | J. Gareth Moreton | Note Edited: 0128247 | View Revisions |
2021-01-10 17:15 | J. Gareth Moreton | Note Edited: 0128247 | View Revisions |
2021-01-10 17:18 | J. Gareth Moreton | Note Added: 0128248 | |
2021-01-11 06:07 | J. Gareth Moreton | Relationship added | child of 0038343 |
2021-01-11 06:08 | J. Gareth Moreton | Relationship deleted | child of 0038343 |
2021-01-11 06:08 | J. Gareth Moreton | Relationship added | parent of 0038343 |