View Issue Details

IDProjectCategoryView StatusLast Update
0038334FPCCompilerpublic2021-01-11 06:08
ReporterJ. Gareth Moreton Assigned ToFlorian  
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Platformi386OSMicrosoft Windows 
Product Version3.3.1 
Fixed in Version3.3.1 
Summary0038334: [Patch] [x86] JccAdd2SetccAdd optimisation is faulty
DescriptionThe JccAdd2SetccAdd, introduced in revision 48086 (and present as of 48115), sometimes produces bad code.
Steps To ReproduceBuild the following problem under i386-win32 with the -O3 option:

program test48086;
{$mode objfpc}{$H+}
function IsFontNameXLogicalFontDesc(const LongFontName: string): boolean;
var MinusCnt, p: integer;
begin
  MinusCnt:=0;
  for p:=1 to length(LongFontName) do
    if LongFontName[p]='-' then inc(MinusCnt);
  Result:=(MinusCnt=14);
end;
var
myfont:string;
begin
 myfont:='Myfont--------------';
 if IsFontNameXLogicalFontDesc(myfont) then
  writeln('NO ERROR');
end.

Observe a run-time error upon execution of the generated EXE file.
Additional InformationIn the assembly dump (generated with -alr):

    ...
.Lj8:
    addl $1,%ebx
# [8] if LongFontName[p]='-' then inc(MinusCnt);
    cmpb $45,-1(%eax,%ebx,1)
# Peephole Optimization: JccAdd2SetccAdd
    sete %al
    movzbl %al,%eax
    # Register eflags released
    addl %eax,%ecx
.Lj12:
    cmpl %edx,%ebx
    jnge .Lj8
    # Register eflags,ebx,edx,eax released
.Lj7:
    ...

"JccAdd2SetccAdd" replaces "jne .Lj12; addl $1,%ecx" with "sete %al; movzbl %al,%eax; addl %eax,%ecx". The problem here is that it selects a register that is still in use, namely %al/%eax. If control flow returns to .Lj8 at the "jnge .Lj8" instruction, the "cmpl" instruction that follows attempts to dereference an address that uses %eax and now contains an invalid value (either 0 or 1).. - in the sample program, %eax contained an address, while %ebx is being used as an index.
Tagscompiler, i386, optimization, patch, x86_64
Fixed in Revision48124
FPCOldBugId
FPCTarget-
Attached Files

Relationships

related to 0038339 resolvedFlorian [Patch] MovzxCmp2CmpMovzx oversight 
parent of 0038343 resolvedFlorian [Patch] OptPass2Jcc Refactor 
related to 0038294 resolvedFlorian [Patch] Advanced MOVZX optimisations 

Activities

J. Gareth Moreton

2021-01-08 19:49

developer   ~0128176

Last edited: 2021-01-08 19:52

View 5 revisions

As a couple of side-notes with the optimisation itself...

I would possibly recommend attempting to use "xorl %regl,%regl; cmp (etc); sete %regb" instead of "cmp (etc); sete %regb; movzbl %regb;%regl", since this takes fewer bytes to encode and removes the pipeline stall between SETcc and MOVZX. Secondly, the reference count on the label that the original jump was pointing to (.Lj12 in the sample) is not being decremented. Correctly decrementing it will cause the label to be stripped entirely.

J. Gareth Moreton

2021-01-09 05:56

developer   ~0128188

This patch seems to fix the problem. There were some side-effects going on, I think.

It still doesn't feel quite right under i386, because I'm not sure why you have to explicitly specify the registers (and EBX is not usually a volatile register, so I've removed it to play safe for now). I'll seek to improve this optimisation in the future.

J. Gareth Moreton

2021-01-09 06:23

developer   ~0128189

Just to note, I also decremented the label reference in the patch, usually allowing the removal of the label completely. This permits at least one additional optimisation in the RTL:

sysutils.s - Before:

    ...
# Peephole Optimization: JccAdd2SetccAdd
    sete %al
    movzbl %al,%eax
    subl %eax,%esi
.Lj4032: ; <-- Dead Label
    testl %esi,%esi
    je .Lj4033
    ...

After:

    ...
# Peephole Optimization: JccAdd2SetccAdd
    sete %al
    movzbl %al,%eax
    subl %eax,%esi ; <- sub/test gets optimised to just sub
    je .Lj4033
    ...

Do-wan Kim

2021-01-09 06:31

reporter   ~0128190

Patch works good. i386-win32 lazarus IDE runs without exception.

Florian

2021-01-09 18:52

administrator   ~0128210

I cannot see how the patch solves the problem? Actually, if I use a ppc386 compiled by 3.2.0 everything works, if I use one compiled by a 3.3.1 it doesn't. So it looks like more that the patch works around the problem?

J. Gareth Moreton

2021-01-09 20:45

developer   ~0128216

Last edited: 2021-01-09 20:46

View 4 revisions

Yeah, I confess that I am not entirely sure why the problem is fundamentally fixed. There's some kind of side-effect going on somewhere with one of the functions, or another subtle compiler bug that's corrupting the register tracking. Then again, besides the label reference decrement, I argue that my patch provides a speed boost because calls to functions such as get_volatile_registers_int shouldn't change within the for-loop, so calling them beforehand and storing the result provides a performance gain.

I'll see if I can debug the fault though.

J. Gareth Moreton

2021-01-10 00:29

developer   ~0128230

Found the problem... it WAS my fault after all! A completely different optimisation caused this one to get messed up! 0038339

J. Gareth Moreton

2021-01-10 00:55

developer   ~0128232

I'd still like to submit this patch (slightly modified) as a minor performance boost to this optimisation.
jcc-add-opt-rework.patch (2,607 bytes)   
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 48117)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -5693,6 +5693,7 @@
         symbol: TAsmSymbol;
         reg: tsuperregister;
         regavailable: Boolean;
+        {$ifdef x86_64}VolatileRegSet,{$endif x86_64} TrackedRegSet: TRegSet;
       begin
         result:=false;
         symbol:=nil;
@@ -5798,19 +5799,30 @@
                   begin
                     TransferUsedRegs(TmpUsedRegs);
                     UpdateUsedRegs(TmpUsedRegs, tai(p.next));
+                    { Don't call UpdateUsedRegs(TmpUsedRegs, tai(hp1.next))
+                      in case the register used by hp1 gets deallocated. }
 
+                    TrackedRegSet := TmpUsedRegs[R_INTREGISTER].GetUsedRegs;
+{$ifdef x86_64}
+                    VolatileRegSet := paramanager.get_volatile_registers_int(current_procinfo.procdef.proccalloption);
+{$endif x86_64}
                     { search for an available register which is volatile }
                     regavailable:=false;
                     for reg in tcpuregisterset do
                       begin
-                        if (reg in paramanager.get_volatile_registers_int(current_procinfo.procdef.proccalloption)) and
-                          not(reg in TmpUsedRegs[R_INTREGISTER].GetUsedRegs) and
-                          not(RegInInstruction(newreg(R_INTREGISTER,reg,R_SUBL),hp1))
-{$ifdef i386}
-                          and (reg in [RS_EAX,RS_EBX,RS_ECX,RS_EDX])
-{$endif i386}
-                          then
+                        if
+{$ifdef x86_64}
+                          (reg in VolatileRegSet) and
+{$else x86_64}
+                          (reg in [RS_EAX,RS_ECX,RS_EDX]) and
+{$endif x86_64}
+                          not(reg in TrackedRegSet) then
                           begin
+                            { If the register appears in hp1, it will be marked
+                              as in use, even if it's deallocated right
+                              afterwards because we didn't call
+                              UpdateUsedRegs(TmpUsedRegs, tai(hp1.next). }
+
                             regavailable:=true;
                             break;
                           end;
@@ -5818,6 +5830,7 @@
 
                     if regavailable then
                       begin
+                        TAsmLabel(symbol).decrefs;
                         Taicpu(p).clearop(0);
                         Taicpu(p).ops:=1;
                         Taicpu(p).is_jmp:=false;
jcc-add-opt-rework.patch (2,607 bytes)   

Florian

2021-01-10 16:40

administrator   ~0128245

The patch is not really correct as e.g. there are calling conventions where ebx is volatile. The test is only about testing if the register can be used as 8 Bit register.

And the speed boost can be discussed controversal: the code path is taken very seldomly but it increases stack size and thus cache pollution.

J. Gareth Moreton

2021-01-10 17:08

developer   ~0128247

Last edited: 2021-01-10 17:15

View 3 revisions

Hmmm, that's a good point. Now I understand why i386 (and i8086) only asked for AL, BL, CL or DL - the reason escaped me initially.

Though i8086 doesn't use x86's pass 2, I used ($ifndef x86_64} instead of {$ifdef i386} to ensure that platform was covered should it use x86's pass 2 one day.

J. Gareth Moreton

2021-01-10 17:18

developer   ~0128248

I would argue in that case, just as a very small performance boost, that "(reg in [RS_EAX,RS_EBX,RS_ECX,RS_EDX])" be the first condition rather than the last one, since it's very quick to check and doesn't involve calling "paramanager.get_volatile_registers_int" and "GetUsedRegs" for ESI, EDI, EBP and ESP.

Issue History

Date Modified Username Field Change
2021-01-08 19:43 J. Gareth Moreton New Issue
2021-01-08 19:44 J. Gareth Moreton Relationship added related to 0038294
2021-01-08 19:49 J. Gareth Moreton Note Added: 0128176
2021-01-08 19:50 J. Gareth Moreton Note Edited: 0128176 View Revisions
2021-01-08 19:51 J. Gareth Moreton Note Edited: 0128176 View Revisions
2021-01-08 19:51 J. Gareth Moreton Note Edited: 0128176 View Revisions
2021-01-08 19:52 J. Gareth Moreton Note Edited: 0128176 View Revisions
2021-01-09 05:56 J. Gareth Moreton Note Added: 0128188
2021-01-09 05:56 J. Gareth Moreton File Added: jcc-add-opt-fix.patch
2021-01-09 05:56 J. Gareth Moreton Summary [x86] JccAdd2SetccAdd optimisation is faulty => [Patch] [x86] JccAdd2SetccAdd optimisation is faulty
2021-01-09 05:56 J. Gareth Moreton FPCTarget => -
2021-01-09 06:23 J. Gareth Moreton Note Added: 0128189
2021-01-09 06:31 Do-wan Kim Note Added: 0128190
2021-01-09 18:52 Florian Note Added: 0128210
2021-01-09 20:45 J. Gareth Moreton Note Added: 0128216
2021-01-09 20:46 J. Gareth Moreton Note Edited: 0128216 View Revisions
2021-01-09 20:46 J. Gareth Moreton Note Edited: 0128216 View Revisions
2021-01-09 20:46 J. Gareth Moreton Note Edited: 0128216 View Revisions
2021-01-10 00:28 J. Gareth Moreton Relationship added related to 0038339
2021-01-10 00:29 J. Gareth Moreton Note Added: 0128230
2021-01-10 00:55 J. Gareth Moreton File Deleted: jcc-add-opt-fix.patch
2021-01-10 00:55 J. Gareth Moreton Note Added: 0128232
2021-01-10 00:55 J. Gareth Moreton File Added: jcc-add-opt-rework.patch
2021-01-10 04:02 J. Gareth Moreton Tag Attached: patch
2021-01-10 04:02 J. Gareth Moreton Tag Attached: compiler
2021-01-10 04:02 J. Gareth Moreton Tag Attached: optimization
2021-01-10 04:02 J. Gareth Moreton Tag Attached: i386
2021-01-10 04:02 J. Gareth Moreton Tag Attached: x86_64
2021-01-10 11:24 Florian Assigned To => Florian
2021-01-10 11:24 Florian Status new => resolved
2021-01-10 11:24 Florian Resolution open => fixed
2021-01-10 11:24 Florian Fixed in Version => 3.3.1
2021-01-10 11:24 Florian Fixed in Revision => 48124
2021-01-10 16:40 Florian Note Added: 0128245
2021-01-10 17:08 J. Gareth Moreton Note Added: 0128247
2021-01-10 17:09 J. Gareth Moreton Note Edited: 0128247 View Revisions
2021-01-10 17:15 J. Gareth Moreton Note Edited: 0128247 View Revisions
2021-01-10 17:18 J. Gareth Moreton Note Added: 0128248
2021-01-11 06:07 J. Gareth Moreton Relationship added child of 0038343
2021-01-11 06:08 J. Gareth Moreton Relationship deleted child of 0038343
2021-01-11 06:08 J. Gareth Moreton Relationship added parent of 0038343