View Issue Details

IDProjectCategoryView StatusLast Update
0036271FPCCompilerpublic2019-11-24 15:12
ReporterJ. Gareth MoretonAssigned ToFlorian 
PrioritynormalSeverityminorReproducibilityN/A
Status resolvedResolutionfixed 
PlatformCross-platformOSMicrosoft WindowsOS Version10 Professional
Product Version3.3.1Product Buildr43323 
Target VersionFixed in Version3.3.1 
Summary0036271: [Patch] Jump optimisations in code generator
DescriptionAs part of a rewrite for the x86-64 optimiser overhaul, here are some largely cross-platform optimisations to jumps and jump chains. The attached PDF file (even though that itself could use a revision) explains the individual changes and optimisations, but overall it seeks to improve the quality of conditional and unconditional jumps in generated code.

Note that "jump-optimizations.patch" requires "jump-optimizations-condition_in.patch" to work, due to the introduction of the new "condition_in" function that requires platform-specific implementations.
Steps To ReproduceApply patches and compile. Confirm correct functionality of compiled binaries and no regressions in test suite.

(Non-Intel platforms will need extensive testing)
Additional InformationThough not intended, there seems to be a major improvement in compile time under x86_64-win64. This is possibly due to label stripping and other optimisations that remove entries from the linked list of instructions etc, thereby reducing the time it takes for subsequent passes to analyse a procedure.

Time to compile (not link) Lazarus under trunk:

[72.883] 1304078 lines compiled, 72.9 sec
[74.453] 1304078 lines compiled, 74.5 sec
[82.164] 1304078 lines compiled, 82.2 sec

Time to compile (not link) Lazarus with jump optimisations:

[66.648] 1304078 lines compiled, 66.6 sec
[65.609] 1304078 lines compiled, 65.6 sec
[64.695] 1304078 lines compiled, 64.7 sec

----

The compiled and linked Lazarus binary is smaller as well due to the additional stripping of unnecessary alignment hints and finding new optimisations etc:

20,110,336 (trunk)
20,092,416 (jump optimisations)

----

The compilation of Lazarus source file "lazarus\components\codetools\basiccodetools.pas" shows a wide range of improvements and is a good showcase for the many jump optimisations - for example:

Trunk:

...
.Lj2729:
    movslq %r8d,%r9
    subq %r9,%rdx
    leaq 1(%rdx),%r9
    cmpl %r9d,%r11d
    jge .Lj2732
    .p2align 4,,10
    .p2align 3
    movl %r11d,%r9d
.Lj2732:
# Peephole Optimization: MovTestJxx2MovTestJxx done
    movq %rcx,%rdx
    testq %rcx,%rcx
...

Jump optimisations;

.Lj2729:
    movslq %r8d,%r9
    subq %r9,%rdx
    leaq 1(%rdx),%r9
    cmpl %r9d,%r11d
    cmovngel %r11d,%r9d
# Peephole Optimization: MovTestJxx2MovTestJxx done
    movq %rcx,%rdx
    testq %rcx,%rcx
...

(Note that label .Lj2732 has been removed completely)
Tagscompiler, optimization, optimizations, patch
Fixed in Revision43439, 43440, 43441
FPCOldBugId
FPCTarget-
Attached Files
  • Jump Optimisation Specs.pdf (117,538 bytes)
  • jump-optimizations-condition_in.patch (16,912 bytes)
    Index: compiler/aarch64/cpubase.pas
    ===================================================================
    --- compiler/aarch64/cpubase.pas	(revision 43323)
    +++ compiler/aarch64/cpubase.pas	(working copy)
    @@ -321,6 +321,9 @@
         function inverse_cond(const c: TAsmCond): TAsmCond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
         function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +
         procedure shifterop_reset(var so : tshifterop); {$ifdef USEINLINE}inline;{$endif USEINLINE}
     
         function dwarf_reg(r:tregister):shortint;
    @@ -488,6 +491,26 @@
           end;
     
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +      begin
    +        Result := (c = C_None) or conditions_equal(Subset, c);
    +
    +        { Please update as necessary. [Kit] }
    +        if not Result then
    +          case Subset of
    +            C_EQ:
    +              Result := (c in [C_GE, C_LE]);
    +            C_LT:
    +              Result := (c in [C_LE]);
    +            C_GT:
    +              Result := (c in [C_GE]);
    +            else
    +              Result := False;
    +          end;
    +      end;
    +
    +
         function dwarf_reg(r:tregister):shortint;
           begin
             result:=regdwarf_table[findreg_by_number(r)];
    Index: compiler/arm/cpubase.pas
    ===================================================================
    --- compiler/arm/cpubase.pas	(revision 43323)
    +++ compiler/arm/cpubase.pas	(working copy)
    @@ -365,6 +365,9 @@
         function inverse_cond(const c: TAsmCond): TAsmCond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
         function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +
         procedure shifterop_reset(var so : tshifterop); {$ifdef USEINLINE}inline;{$endif USEINLINE}
         function is_pc(const r : tregister) : boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     
    @@ -540,6 +543,26 @@
           end;
     
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +      begin
    +        Result := (c = C_None) or conditions_equal(Subset, c);
    +
    +        { Please update as necessary. [Kit] }
    +        if not Result then
    +          case Subset of
    +            C_EQ:
    +              Result := (c in [C_GE, C_LE]);
    +            C_LT:
    +              Result := (c in [C_LE]);
    +            C_GT:
    +              Result := (c in [C_GE]);
    +            else
    +              Result := False;
    +          end;
    +      end;
    +
    +
         function is_shifter_const(d : aint;var imm_shift : byte) : boolean;
           var
              i : longint;
    Index: compiler/avr/cpubase.pas
    ===================================================================
    --- compiler/avr/cpubase.pas	(revision 43323)
    +++ compiler/avr/cpubase.pas	(working copy)
    @@ -300,6 +300,9 @@
         function inverse_cond(const c: TAsmCond): TAsmCond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
         function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +
         function dwarf_reg(r:tregister):byte;
         function dwarf_reg_no_error(r:tregister):shortint;
         function eh_return_data_regno(nr: longint): longint;
    @@ -410,6 +413,24 @@
           end;
     
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +      begin
    +        Result := (c = C_None) or conditions_equal(Subset, c);
    +
    +        { Please update as necessary. [Kit] }
    +        if not Result then
    +          case Subset of
    +            C_EQ:
    +              Result := (c in [C_GE]);
    +            C_LT:
    +              Result := (c in [C_NE]);
    +            else
    +              Result := False;
    +          end;
    +      end;
    +
    +
         function rotl(d : dword;b : byte) : dword;
           begin
              result:=(d shr (32-b)) or (d shl b);
    Index: compiler/m68k/cpubase.pas
    ===================================================================
    --- compiler/m68k/cpubase.pas	(revision 43323)
    +++ compiler/m68k/cpubase.pas	(working copy)
    @@ -364,6 +364,10 @@
     
         function inverse_cond(const c: TAsmCond): TAsmCond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
         function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
    +
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +
         function dwarf_reg(r:tregister):shortint;
         function dwarf_reg_no_error(r:tregister):shortint;
         function eh_return_data_regno(nr: longint): longint;
    @@ -592,6 +596,26 @@
         function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
           begin
             result := c1 = c2;
    +    end;
    +
    +
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +      begin
    +        Result := (c = C_None) or conditions_equal(Subset, c);
    +
    +        { Please update as necessary. [Kit] }
    +        if not Result then
    +          case Subset of
    +            C_EQ:
    +              Result := (c in [C_GE, C_LE]);
    +            C_LT:
    +              Result := (c in [C_LE]);
    +            C_GT:
    +              Result := (c in [C_GE]);
    +            else
    +              Result := False;
    +          end;
           end;
     
     
    Index: compiler/mips/cpubase.pas
    ===================================================================
    --- compiler/mips/cpubase.pas	(revision 43323)
    +++ compiler/mips/cpubase.pas	(working copy)
    @@ -259,6 +259,9 @@
         function inverse_cond(const c: TAsmCond): TAsmCond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
         function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +
         { Returns the tcgsize corresponding with the size of reg.}
         function reg_cgsize(const reg: tregister) : tcgsize;
         function cgsize2subreg(regtype: tregistertype; s:tcgsize):tsubregister;
    @@ -367,6 +370,30 @@
           end;
     
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +      begin
    +        Result := (c = C_None) or conditions_equal(Subset, c);
    +
    +        { Please update as necessary. [Kit] }
    +        if not Result then
    +          case Subset of
    +            C_EQ:
    +              Result := (c in [C_GE, C_LE, C_GEU, C_LEU]);
    +            C_LT:
    +              Result := (c in [C_LE]);
    +            C_LTU:
    +              Result := (c in [C_LEU]);
    +            C_GT:
    +              Result := (c in [C_GE]);
    +            C_GTU:
    +              Result := (c in [C_GEU]);
    +            else
    +              Result := False;
    +          end;
    +      end;
    +
    +
         function std_regnum_search(const s:string):Tregister;
           begin
             result:=regnumber_table[findreg_by_name_table(s,std_regname_table,std_regname_index)];
    Index: compiler/powerpc64/cpubase.pas
    ===================================================================
    --- compiler/powerpc64/cpubase.pas	(revision 43323)
    +++ compiler/powerpc64/cpubase.pas	(working copy)
    @@ -392,9 +392,12 @@
     function std_regname(r: Tregister): string;
     function is_condreg(r: tregister): boolean;
     
    -function inverse_cond(const c: TAsmCond): Tasmcond;
    -{$IFDEF USEINLINE}inline;{$ENDIF USEINLINE}
    +function inverse_cond(const c: TAsmCond): TAsmCond; {$IFDEF USEINLINE}inline;{$ENDIF USEINLINE}
     function conditions_equal(const c1, c2: TAsmCond): boolean;
    +
    + { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +function condition_in(const Subset, c: TAsmCond): Boolean;
    +
     function dwarf_reg(r:tregister):shortint;
     function dwarf_reg_no_error(r:tregister):shortint;
     function eh_return_data_regno(nr: longint): longint;
    @@ -472,6 +475,15 @@
         (c1.crbit = c2.crbit));
     end;
     
    +{ Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +function condition_in(const Subset, c: TAsmCond): Boolean;
    +  begin
    +    Result := (c.cond = C_None) or conditions_equal(Subset, c);
    +
    +    { TODO: Can a PowerPC programmer please update this procedure to
    +      actually detect subsets? Thanks. [Kit] }
    +  end;
    +
     function flags_to_cond(const f: TResFlags): TAsmCond;
     const
       flag_2_cond: array[F_EQ..F_SO] of TAsmCondFlag =
    Index: compiler/powerpc/cpubase.pas
    ===================================================================
    --- compiler/powerpc/cpubase.pas	(revision 43323)
    +++ compiler/powerpc/cpubase.pas	(working copy)
    @@ -395,6 +395,10 @@
     
         function inverse_cond(const c: TAsmCond): Tasmcond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
         function conditions_equal(const c1, c2: TAsmCond): boolean;
    +
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +
         function dwarf_reg(r:tregister):shortint;
         function dwarf_reg_no_error(r:tregister):shortint;
         function eh_return_data_regno(nr: longint): longint;
    @@ -474,6 +478,16 @@
           end;
     
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +      begin
    +        Result := (c.cond = C_None) or conditions_equal(Subset, c);
    +
    +        { TODO: Can a PowerPC programmer please update this procedure to
    +          actually detect subsets? Thanks. [Kit] }
    +      end;
    +
    +
         function flags_to_cond(const f: TResFlags) : TAsmCond;
           const
             flag_2_cond: array[F_EQ..F_SO] of TAsmCondFlag =
    Index: compiler/riscv32/cpubase.pas
    ===================================================================
    --- compiler/riscv32/cpubase.pas	(revision 43323)
    +++ compiler/riscv32/cpubase.pas	(working copy)
    @@ -333,6 +333,9 @@
     
         function conditions_equal(const c1,c2: TAsmCond): boolean;
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +
     implementation
     
         uses
    @@ -459,4 +462,20 @@
             result:=c1=c2;
           end;
     
    +
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +      begin
    +        Result := (c = C_None) or conditions_equal(Subset, c);
    +
    +        if not Result then
    +          case Subset of
    +            C_EQ:
    +              Result := (c in [C_GE, C_GEU]);
    +            else
    +              Result := False;
    +          end;
    +      end;
    +
    +
     end.
    Index: compiler/riscv64/cpubase.pas
    ===================================================================
    --- compiler/riscv64/cpubase.pas	(revision 43323)
    +++ compiler/riscv64/cpubase.pas	(working copy)
    @@ -348,6 +348,9 @@
     
         function conditions_equal(const c1,c2: TAsmCond): boolean;
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +
     implementation
     
         uses
    @@ -474,5 +477,20 @@
             result:=c1=c2;
           end;
     
    +
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +      begin
    +        Result := (c = C_None) or conditions_equal(Subset, c);
    +
    +        if not Result then
    +          case Subset of
    +            C_EQ:
    +              Result := (c in [C_GE, C_GEU]);
    +            else
    +              Result := False;
    +          end;
    +      end;
    +
     end.
     
    Index: compiler/sparcgen/cpubase.pas
    ===================================================================
    --- compiler/sparcgen/cpubase.pas	(revision 43323)
    +++ compiler/sparcgen/cpubase.pas	(working copy)
    @@ -335,6 +335,9 @@
         function inverse_cond(const c: TAsmCond): TAsmCond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
         function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +
         function  flags_to_cond(const f: TResFlags) : TAsmCond;
         function cgsize2subreg(regtype: tregistertype; s:Tcgsize):Tsubregister;
         function reg_cgsize(const reg: tregister): tcgsize;
    @@ -522,6 +525,33 @@
           end;
     
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +      begin
    +        Result := (c = C_None) or conditions_equal(Subset, c);
    +
    +        { TODO: Can a SparcGEN programmer please update this procedure to
    +          detect all subsets? Thanks. [Kit] }
    +        if not Result then
    +          case Subset of
    +            C_A:
    +              Result := (c in [C_A,  C_AE]);
    +            C_B:
    +              Result := (c in [C_B,  C_BE]);
    +            C_E:
    +              Result := (c in [C_AE, C_BE]);
    +            C_FE:
    +              Result := (c in [C_FLE,C_FGE]);
    +            C_FL:
    +              Result := (c in [C_FLE]);
    +            C_FG:
    +              Result := (c in [C_FGE]);
    +            else
    +              Result := False;
    +          end;
    +      end;
    +
    +
         function dwarf_reg(r:tregister):shortint;
           begin
             result:=regdwarf_table[findreg_by_number(r)];
    Index: compiler/x86/cpubase.pas
    ===================================================================
    --- compiler/x86/cpubase.pas	(revision 43323)
    +++ compiler/x86/cpubase.pas	(working copy)
    @@ -353,6 +353,9 @@
         function inverse_cond(const c: TAsmCond): TAsmCond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
         function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +
         { checks whether two segment registers are normally equal in the current memory model }
         function segment_regs_equal(r1,r2:tregister):boolean;
     
    @@ -666,6 +669,49 @@
           end;
     
     
    +    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
    +    function condition_in(const Subset, c: TAsmCond): Boolean;
    +      begin
    +        Result := (c = C_None) or conditions_equal(Subset, c);
    +        if not Result then
    +          case Subset of
    +            C_A,  C_NBE:
    +              Result := (c in [C_A,  C_AE, C_NB, C_NBE]);
    +            C_AE, C_NB:
    +              Result := (c in [C_AE, C_NB]);
    +            C_B,  C_NAE:
    +              Result := (c in [C_B,  C_BE, C_C,  C_NA, C_NAE]);
    +            C_BE, C_NA:
    +              Result := (c in [C_BE, C_NA]);
    +            C_C:
    +              { C_B  / C_NAE: CF = 1
    +                C_BE / C_NA:  CF = 1 or ZF = 1 }
    +              Result := (c in [C_B,  C_BE, C_NA, C_NAE]);
    +            C_E,  C_Z:
    +              Result := (c in [C_AE, C_BE, C_E,  C_NA, C_NB, C_NG, C_NL]);
    +            C_G,  C_NLE:
    +              Result := (c in [C_G,  C_GE, C_NL, C_NLE]);
    +            C_GE, C_NL:
    +              Result := (c in [C_GE, C_NL]);
    +            C_L,  C_NGE:
    +              Result := (c in [C_L,  C_LE, C_NG, C_NGE]);
    +            C_LE, C_NG:
    +              Result := (c in [C_LE, C_NG]);
    +            C_NC:
    +              { C_A  / C_NBE: CF = 0 and ZF = 0; not a subset because ZF has to be zero as well
    +                C_AE / C_NB:  CF = 0 }
    +              Result := (c in [C_AE, C_NB]);
    +            C_NE, C_NZ:
    +              Result := (c in [C_NE, C_NZ, C_A,  C_B,  C_NAE,C_NBE,C_L,  C_G,  C_NLE,C_NGE]);
    +            C_NP, C_PO:
    +              Result := (c in [C_NP, C_PO]);
    +            C_P,  C_PE:
    +              Result := (c in [C_P,  C_PE]);
    +            else
    +              Result := False;
    +          end;
    +      end;
    +
         function dwarf_reg(r:tregister):shortint;
           begin
             result:=regdwarf_table[findreg_by_number(r)];
    
  • jump-optimizations-x86-specific-v3.patch (8,568 bytes)
    Index: compiler/x86/aoptx86.pas
    ===================================================================
    --- compiler/x86/aoptx86.pas	(revision 43323)
    +++ compiler/x86/aoptx86.pas	(working copy)
    @@ -3119,49 +3119,44 @@
                       end;
                   end;
     
    -            if ((hp1.typ = ait_label) and (symbol = tai_label(hp1).labsym))
    -                or ((hp1.typ = ait_align) and GetNextInstruction(hp1, hp2) and (hp2.typ = ait_label) and (symbol = tai_label(hp2).labsym)) then
    +          { Detect the following:
    +              jmp<cond>     @Lbl1
    +              jmp           @Lbl2
    +              ...
    +            @Lbl1:
    +              ret
    +
    +            Change to:
    +
    +              jmp<inv_cond> @Lbl2
    +              ret
    +          }
    +            if MatchInstruction(hp1, A_JMP, []) then
                   begin
    -                { If Jcc is immediately followed by the label that it's supposed to jump to, remove it }
    -                DebugMsg(SPeepholeOptimization + 'Removed conditional jump whose destination was immediately after it', p);
    -                UpdateUsedRegs(hp1);
    -
    -                TAsmLabel(symbol).decrefs;
    -                { if the label refs. reach zero, remove any alignment before the label }
    -                if (hp1.typ = ait_align) then
    +                hp2 := getlabelwithsym(TAsmLabel(symbol));
    +                if Assigned(hp2) and SkipLabels(hp2,hp2) and
    +                  MatchInstruction(hp2,A_RET,[S_NO]) then
                       begin
    -                    UpdateUsedRegs(hp2);
    -                    if (TAsmLabel(symbol).getrefs = 0) then
    -                    begin
    -                      asml.Remove(hp1);
    -                      hp1.Free;
    -                    end;
    -                    hp1 := hp2; { Set hp1 to the label }
    -                  end;
    +                    taicpu(p).condition := inverse_cond(taicpu(p).condition);
     
    -                asml.remove(p);
    -                p.free;
    +                    { Change label address to that of the unconditional jump }
    +                    taicpu(p).loadoper(0, taicpu(hp1).oper[0]^);
     
    -                if (TAsmLabel(symbol).getrefs = 0) then
    -                  begin
    -                    GetNextInstruction(hp1, p); { Instruction following the label }
    -                    asml.remove(hp1);
    -                    hp1.free;
    -
    -                    UpdateUsedRegs(p);
    -                    Result := True;
    -                  end
    -                else
    -                  begin
    -                    { We don't need to set the result to True because we know hp1
    -                      is a label and won't trigger any optimisation routines. [Kit] }
    -                    p := hp1;
    +                    TAsmLabel(symbol).DecRefs;
    +                    taicpu(hp1).opcode := A_RET;
    +                    taicpu(hp1).is_jmp := false;
    +                    taicpu(hp1).ops := taicpu(hp2).ops;
    +                    case taicpu(hp2).ops of
    +                      0:
    +                        taicpu(hp1).clearop(0);
    +                      1:
    +                        taicpu(hp1).loadconst(0,taicpu(hp2).oper[0]^.val);
    +                      else
    +                        internalerror(2016041302);
    +                    end;
                       end;
    -
    -                Exit;
                   end;
               end;
    -
     {$ifndef i8086}
             if CPUX86_HAS_CMOV in cpu_capabilities[current_settings.cputype] then
               begin
    @@ -3216,24 +3211,29 @@
                               else
                                 hp2 := hp1;
     
    -                          if not Assigned(hp2) then
    -                            InternalError(2018062910);
    +                          { Remember what the first hp2 is in case there's multiple aligns and labels to get rid of }
    +                          hp3 := hp2;
    +                          repeat
    +                            if not Assigned(hp2) then
    +                              InternalError(2018062910);
     
    -                          if (hp2.typ <> ait_label) then
    -                            begin
    -                              { There's something other than CMOVs here.  Move the original jump
    -                                to right before this point, then break out.
    -
    -                                Originally this was part of the above internal error, but it got
    -                                triggered on the bootstrapping process sometimes. Investigate. [Kit] }
    -                              asml.remove(p);
    -                              asml.insertbefore(p, hp2);
    -                              DebugMsg('Jcc/CMOVcc drop-out', p);
    -                              UpdateUsedRegs(p);
    -                              Result := True;
    -                              Exit;
    +                            case hp2.typ of
    +                              ait_label:
    +                                { What we expected - break out of the loop (it won't be a dead label at the top of
    +                                  a cluster because that was optimised at an earlier stage) }
    +                                Break;
    +                              ait_align:
    +                                { Go to the next entry until a label is found (may be multiple aligns before it) }
    +                                begin
    +                                  hp2 := tai(hp2.Next);
    +                                  Continue;
    +                                end;
    +                              else
    +                                InternalError(2018062911);
                                 end;
     
    +                          until False;
    +
                               { Now we can safely decrement the reference count }
                               tasmlabel(symbol).decrefs;
     
    @@ -3245,10 +3245,7 @@
     
                               { Remove the label if this is its final reference }
                               if (tasmlabel(symbol).getrefs=0) then
    -                            begin
    -                              asml.remove(hp2);
    -                              hp2.free;
    -                            end;
    +                            StripLabelFast(hp3);
     
                               if Assigned(p) then
                                 begin
    @@ -3342,17 +3339,8 @@
                                     hp1.free;
     
                                     { Remove label xxx (it will have a ref of zero due to the initial check }
    -                                if (hp4.typ = ait_align) then
    -                                  begin
    -                                    { Account for alignment as well }
    -                                    GetNextInstruction(hp4, hp1);
    -                                    asml.remove(hp1);
    -                                    hp1.free;
    -                                  end;
    +                                StripLabelFast(hp4);
     
    -                                asml.remove(hp4);
    -                                hp4.free;
    -
                                     { Now we can safely decrement it }
                                     tasmlabel(symbol).decrefs;
     
    @@ -3362,24 +3350,13 @@
                                     asml.remove(hp2);
                                     hp2.free;
     
    -                                { Remove label yyy (and the optional alignment) if its reference will fall to zero }
    -                                if tasmlabel(symbol).getrefs = 1 then
    -                                  begin
    -                                    if (hp3.typ = ait_align) then
    -                                      begin
    -                                        { Account for alignment as well }
    -                                        GetNextInstruction(hp3, hp1);
    -                                        asml.remove(hp1);
    -                                        hp1.free;
    -                                      end;
    +                                { As before, now we can safely decrement it }
    +                                tasmlabel(symbol).decrefs;
     
    -                                    asml.remove(hp3);
    -                                    hp3.free;
    +                                { Remove label yyy (and the optional alignment) if its reference falls to zero }
    +                                if tasmlabel(symbol).getrefs = 0 then
    +                                  StripLabelFast(hp3);
     
    -                                    { As before, now we can safely decrement it }
    -                                    tasmlabel(symbol).decrefs;
    -                                  end;
    -
                                     if Assigned(p) then
                                       begin
                                         UpdateUsedRegs(p);
    
  • jump-optimizations-v3.patch (52,731 bytes)
    Index: compiler/aoptobj.pas
    ===================================================================
    --- compiler/aoptobj.pas	(revision 43323)
    +++ compiler/aoptobj.pas	(working copy)
    @@ -248,8 +248,9 @@
             { label numbers                                                  }
             LabelInfo: PLabelInfo;
     
    -        { Start and end of the block that is currently being optimized }
    -        BlockStart, BlockEnd: Tai;
    +        { Start and end of the block that is currently being optimized, and
    +          a selected start point after the start of the block }
    +        BlockStart, BlockEnd, StartPoint: Tai;
     
             DFA: TAOptDFA;
     
    @@ -269,6 +270,12 @@
             Procedure ClearUsedRegs;
             Procedure UpdateUsedRegs(p : Tai);
             class procedure UpdateUsedRegs(var Regs: TAllUsedRegs; p: Tai);
    +
    +        { If UpdateUsedRegsAndOptimize has read ahead, the result is one before
    +          the next valid entry (so "p.Next" returns what's expected).  If no
    +          reading ahead happened, then the result is equal to p. }
    +        function UpdateUsedRegsAndOptimize(p : Tai): Tai;
    +
             Function CopyUsedRegs(var dest : TAllUsedRegs) : boolean;
             procedure RestoreUsedRegs(const Regs : TAllUsedRegs);
             procedure TransferUsedRegs(var dest: TAllUsedRegs);
    @@ -363,6 +370,34 @@
             function PeepHoleOptPass2Cpu(var p: tai): boolean; virtual;
             function PostPeepHoleOptsCpu(var p: tai): boolean; virtual;
     
    +        { Output debug message to console - null function if EXTDEBUG is not defined }
    +        class procedure DebugWrite(Message: string); static; inline;
    +
    +        { Removes all instructions between an unconditional jump and the next label }
    +        procedure RemoveDeadCodeAfterJump(p: taicpu);
    +
    +        { If hp is a label, strip it if its reference count is zero.  Repeat until
    +          a non-label is found, or a label with a non-zero reference count.
    +          True is returned if something was stripped }
    +        function StripDeadLabels(hp: tai; var NextValid: tai): Boolean;
    +
    +        { Strips a label and any aligns that appear before it (if hp points to
    +          them rather than the label).  Only call this procedure on a label that
    +          you already know is no longer referenced }
    +        procedure StripLabelFast(hp: tai); {$ifdef USEINLINE}inline;{$endif USEINLINE}
    +
    +        { Checks and removes "jmp @@lbl; @lbl". Returns True if the jump was removed }
    +        function CollapseZeroDistJump(var p: tai; hp1: tai; ThisLabel: TAsmLabel): Boolean;
    +
    +        { If a group of labels are clustered, change the jump to point to the last one that is still referenced }
    +        function CollapseLabelCluster(jump: tai; var lbltai: tai): TAsmLabel;
    +{$ifndef JVM}
    +        function OptimizeConditionalJump(CJLabel: TAsmLabel; var p: tai; hp1: tai; var stoploop: Boolean): Boolean;
    +{$endif JVM}
    +
    +        { Jump/label optimisation entry method }
    +        function DoJumpOptimizations(var p: tai; var stoploop: Boolean): Boolean;
    +
             { insert debug comments about which registers are read and written by
               each instruction. Useful for debugging the InstructionLoadsFromReg and
               other similar functions. }
    @@ -910,6 +945,82 @@
             end;
     
     
    +      { If UpdateUsedRegsAndOptimize has read ahead, the result is one before
    +        the next valid entry (so "p.Next" returns what's expected).  If no
    +        reading ahead happened, then the result is equal to p. }
    +      function TAOptObj.UpdateUsedRegsAndOptimize(p : Tai): Tai;
    +        var
    +          NotFirst: Boolean;
    +        begin
    +          { this code is based on TUsedRegs.Update to avoid multiple passes through the asmlist,
    +            the code is duplicated here }
    +
    +          Result := p;
    +          if (p.typ in [ait_instruction, ait_label]) then
    +            begin
    +              if (p.next <> BlockEnd) and (tai(p.next).typ <> ait_instruction) then
    +                begin
    +                  { Advance one, otherwise the routine exits immediately and wastes time }
    +                  p := tai(p.Next);
    +                  NotFirst := True;
    +                end
    +              else
    +                { If the next entry is an instruction, nothing will be updated or
    +                  optimised here, so exit now to save time }
    +                Exit;
    +            end
    +          else
    +            NotFirst := False;
    +
    +          repeat
    +            while assigned(p) and
    +                  ((p.typ in (SkipInstr + [ait_align, ait_label] - [ait_RegAlloc])) or
    +                   ((p.typ = ait_marker) and
    +                    (tai_Marker(p).Kind in [mark_AsmBlockEnd,mark_NoLineInfoStart,mark_NoLineInfoEnd]))) do
    +                 begin
    +                   { Here's the optimise part }
    +                   if (p.typ in [ait_align, ait_label]) then
    +                     begin
    +                       if StripDeadLabels(p, p) then
    +                         begin
    +                           { Note, if the first instruction is stripped and is
    +                             the only one that gets removed, Result will now
    +                             contain a dangling pointer, so compensate for this. }
    +                           if not NotFirst then
    +                             Result := tai(p.Previous);
    +
    +                           Continue;
    +                         end;
    +
    +                       if ((p.typ = ait_label) and not labelCanBeSkipped(tai_label(p))) then
    +                         Break;
    +                     end;
    +
    +                   Result := p;
    +                   p := tai(p.next);
    +                 end;
    +            while assigned(p) and
    +                  (p.typ=ait_RegAlloc) Do
    +              begin
    +                case tai_regalloc(p).ratype of
    +                  ra_alloc :
    +                    Include(UsedRegs[getregtype(tai_regalloc(p).reg)].UsedRegs, getsupreg(tai_regalloc(p).reg));
    +                  ra_dealloc :
    +                    Exclude(UsedRegs[getregtype(tai_regalloc(p).reg)].UsedRegs, getsupreg(tai_regalloc(p).reg));
    +                  else
    +                    { Do nothing };
    +                end;
    +                Result := p;
    +                p := tai(p.next);
    +              end;
    +            NotFirst := True;
    +          until not(assigned(p)) or
    +                (not(p.typ in SkipInstr + [ait_align]) and
    +                 not((p.typ = ait_label) and
    +                     labelCanBeSkipped(tai_label(p))));
    +        end;
    +
    +
           procedure TAOptObj.UpdateUsedRegs(p : Tai);
             begin
               { this code is based on TUsedRegs.Update to avoid multiple passes through the asmlist,
    @@ -1346,18 +1457,34 @@
           end;
     
     
    -    function FindAnyLabel(hp: tai; var l: tasmlabel): Boolean;
    +    function FindLiveLabel(hp: tai; var l: tasmlabel): Boolean;
    +      var
    +        next: tai;
           begin
    -        FindAnyLabel := false;
    -        while assigned(hp.next) and
    -              (tai(hp.next).typ in (SkipInstr+[ait_align])) Do
    -          hp := tai(hp.next);
    -        if assigned(hp.next) and
    -           (tai(hp.next).typ = ait_label) then
    +        FindLiveLabel := false;
    +
    +        while True do
               begin
    -            FindAnyLabel := true;
    -            l := tai_label(hp.next).labsym;
    -          end
    +            while assigned(hp.next) and
    +                  (tai(hp.next).typ in (SkipInstr+[ait_align])) Do
    +              hp := tai(hp.next);
    +
    +            next := tai(hp.next);
    +            if assigned(next) and
    +              (tai(next).typ = ait_label) then
    +              begin
    +                l := tai_label(next).labsym;
    +                if not l.is_used then
    +                  begin
    +                    { Unsafe label }
    +                    hp := next;
    +                    Continue;
    +                  end;
    +
    +                FindLiveLabel := true;
    +              end;
    +            Exit;
    +          end;
           end;
     
     
    @@ -1385,11 +1512,9 @@
     {$if defined(arm) or defined(aarch64)}
               (hp.condition=c_None) and
     {$endif arm or aarch64}
    -{$if defined(riscv32) or defined(riscv64)}          
               (hp.ops>0) and
    +{$if defined(riscv32) or defined(riscv64)}
               (hp.oper[0]^.reg=NR_X0) and
    -{$else riscv}
    -          (hp.ops>0) and
     {$endif riscv}
               (JumpTargetOp(hp)^.typ = top_ref) and
               (JumpTargetOp(hp)^.ref^.symbol is TAsmLabel);
    @@ -1424,10 +1549,592 @@
           end;
     
     
    +    { Output debug message to console - null function if EXTDEBUG is not defined }
    +    class procedure TAOptObj.DebugWrite(Message: string); inline;
    +      begin
    +{$ifdef EXTDEBUG}
    +        WriteLn(Message);
    +{$else EXTDEBUG}
    +        { Do nothing }
    +{$endif EXTDEBUG}
    +      end;
    +
    +    { Removes all instructions between an unconditional jump and the next label }
    +    procedure TAOptObj.RemoveDeadCodeAfterJump(p: taicpu);
    +      var
    +        hp1, hp2: tai;
    +      begin
    +        { the following code removes all code between a jmp and the next label,
    +          because it can never be executed
    +        }
    +        while GetNextInstruction(p, hp1) and
    +              (hp1 <> BlockEnd) and
    +              (hp1.typ <> ait_label)
    +{$ifdef JVM}
    +              and (hp1.typ <> ait_jcatch)
    +{$endif}
    +              do
    +          if not(hp1.typ in ([ait_label]+skipinstr)) then
    +            begin
    +              if (hp1.typ = ait_instruction) and
    +                 taicpu(hp1).is_jmp and
    +                 (JumpTargetOp(taicpu(hp1))^.typ = top_ref) and
    +                 (JumpTargetOp(taicpu(hp1))^.ref^.symbol is TAsmLabel) then
    +                 TAsmLabel(JumpTargetOp(taicpu(hp1))^.ref^.symbol).decrefs;
    +              { don't kill start/end of assembler block,
    +                no-line-info-start/end etc }
    +              if (hp1.typ <> ait_marker) then
    +                begin
    +{$ifdef cpudelayslot}
    +                  if (hp1.typ=ait_instruction) and (taicpu(hp1).is_jmp) then
    +                    RemoveDelaySlot(hp1);
    +{$endif cpudelayslot}
    +                  if (hp1.typ = ait_align) then
    +                    begin
    +                      { Only remove the align if a label doesn't immediately follow }
    +                      if GetNextInstruction(hp1, hp2) and (hp2.typ = ait_label) then
    +                        { The label is unskippable }
    +                        Exit;
    +                    end;
    +                  asml.remove(hp1);
    +                  hp1.free;
    +                end
    +              else
    +                p:=taicpu(hp1);
    +            end
    +          else
    +            Break;
    +      end;
    +
    +    { If hp is a label, strip it if its reference count is zero.  Repeat until
    +      a non-label is found, or a label with a non-zero reference count.
    +      True is returned if something was stripped }
    +    function TAOptObj.StripDeadLabels(hp: tai; var NextValid: tai): Boolean;
    +      var
    +        tmp, tmpNext: tai;
    +        hp1: tai;
    +        CurrentAlign: tai;
    +      begin
    +        CurrentAlign := nil;
    +        Result := False;
    +        hp1 := hp;
    +        NextValid := hp;
    +
    +        { Stop if hp is an instruction, for example }
    +        while (hp1 <> BlockEnd) and (hp1.typ in [ait_label,ait_align]) do
    +          begin
    +            case hp1.typ of
    +              ait_label:
    +                begin
    +                  with tai_label(hp1).labsym do
    +                    if is_used or (bind <> AB_LOCAL) or (labeltype <> alt_jump) then
    +                      begin
    +                        { Valid label }
    +                        if Result then
    +                          NextValid := hp1;
    +
    +                        DebugWrite('JUMP DEBUG: Last label in cluster:' + tostr(labelnr));
    +
    +                        Exit;
    +                      end;
    +
    +                  DebugWrite('JUMP DEBUG: Removed label ' + tostr(TAsmLabel(tai_label(hp1).labsym).labelnr));
    +
    +                  { Set tmp to the next valid entry }
    +                  tmp := tai(hp1.Next);
    +                  { Remove label }
    +                  AsmL.Remove(hp1);
    +                  hp1.Free;
    +
    +                  hp1 := tmp;
    +
    +                  Result := True;
    +                  Continue;
    +                end;
    +              { Also remove the align if it comes before an unused label }
    +              ait_align:
    +                begin
    +                  tmp := tai(hp1.Next);
    +                  if tmp = BlockEnd then
    +                    { End of block }
    +                    Exit;
    +
    +                  if (cs_debuginfo in current_settings.moduleswitches) or
    +                     (cs_use_lineinfo in current_settings.globalswitches) then
    +                     { Don't remove aligns if debuginfo is present }
    +                    begin
    +                      if (tmp.typ in [ait_label,ait_align]) then
    +                        begin
    +                          hp1 := tmp;
    +                          Continue;
    +                        end
    +                      else
    +                        Break;
    +                    end;
    +
    +                  repeat
    +
    +                    case tmp.typ of
    +                      ait_align: { Merge the aligns if permissible }
    +                        begin
    +                          { Check the maxbytes field though, since this may result in the
    +                            alignment being ignored }
    +                          if ((tai_align_abstract(hp1).maxbytes = 0) and (tai_align_abstract(tmp).maxbytes = 0)) or
    +                            { If a maxbytes field is present, only merge if the aligns have the same granularity }
    +                            ((tai_align_abstract(hp1).aligntype = tai_align_abstract(tmp).aligntype)) then
    +                            begin
    +                              with tai_align_abstract(hp1) do
    +                                begin
    +                                  aligntype := max(aligntype, tai_align_abstract(tmp).aligntype);
    +                                  maxbytes := max(maxbytes, tai_align_abstract(tmp).maxbytes);
    +                                  fillsize := max(fillsize, tai_align_abstract(tmp).fillsize);
    +                                  use_op := use_op or tai_align_abstract(tmp).use_op;
    +
    +                                  if use_op and (tai_align_abstract(tmp).fillop <> 0) then
    +                                    fillop := tai_align_abstract(tmp).fillop;
    +                                end;
    +
    +                              tmpNext := tai(tmp.Next);
    +                              AsmL.Remove(tmp);
    +                              tmp.Free;
    +                              Result := True;
    +                              tmp := tmpNext;
    +                            end
    +                          else
    +                            tmp := tai(tmp.Next);
    +
    +                          Continue;
    +                        end;
    +                      ait_label:
    +                        begin
    +                          { Signal that we can possibly delete this align entry }
    +                          CurrentAlign := hp1;
    +
    +                          repeat
    +                            with tai_label(tmp).labsym do
    +                              if is_used or (bind <> AB_LOCAL) or (labeltype <> alt_jump) then
    +                                begin
    +                                  { Valid label }
    +                                  if Result then
    +                                    NextValid := tmp;
    +
    +                                  DebugWrite('JUMP DEBUG: Last label in cluster:' + tostr(labelnr));
    +
    +                                  Exit;
    +                                end;
    +
    +                            DebugWrite('JUMP DEBUG: Removed label ' + tostr(TAsmLabel(tai_label(tmp).labsym).labelnr));
    +
    +                            { Remove label }
    +                            tmpNext := tai(tmp.Next);
    +                            AsmL.Remove(tmp);
    +                            tmp.Free;
    +                            Result := True;
    +                            tmp := tmpNext;
    +
    +                            { Loop here for a minor performance gain }
    +                          until (tmp = BlockEnd) or (tmp.typ <> ait_label);
    +
    +                          { Re-evaluate the align and see what follows }
    +                          Continue;
    +                        end
    +                      else
    +                        begin
    +                          { Set hp1 to the instruction after the align, because the
    +                            align might get deleted later and hence set NextValid
    +                            to a dangling pointer. [Kit] }
    +                          hp1 := tmp;
    +                          Break;
    +                        end;
    +                    end;
    +                  until (tmp = BlockEnd);
    +
    +                  { Break out of the outer loop if the above Break is called }
    +                  if (hp1 = tmp) then
    +                    Break;
    +                end
    +              else
    +                Break;
    +            end;
    +            hp1 := tai(hp1.Next);
    +          end;
    +
    +        { hp1 will be the next valid entry }
    +        NextValid := hp1;
    +
    +        { Remove the alignment field (but only if the next valid entry is not a live label) }
    +        while Assigned(CurrentAlign) and (CurrentAlign.typ = ait_align) do
    +          begin
    +            DebugWrite('JUMP DEBUG: Alignment field removed');
    +
    +            tmp := tai(CurrentAlign.next);
    +
    +            AsmL.Remove(CurrentAlign);
    +            CurrentAlign.Free;
    +
    +            CurrentAlign := tmp;
    +          end;
    +      end;
    +
    +
    +    { Strips a label and any aligns that appear before it (if hp points to
    +      them rather than the label).  Only call this procedure on a label that
    +      you already know is no longer referenced }
    +    procedure TAOptObj.StripLabelFast(hp: tai); {$ifdef USEINLINE}inline;{$endif USEINLINE}
    +      var
    +        tmp: tai;
    +      begin
    +        repeat
    +          case hp.typ of
    +            ait_align:
    +              begin
    +                tmp := tai(hp.Next);
    +                asml.Remove(hp);
    +                hp.Free;
    +                hp := tmp;
    +                { Control flow will now return to 'repeat' }
    +              end;
    +            ait_label:
    +              begin
    +{$ifdef EXTDEBUG}
    +                { When not in debug mode, deleting a live label will cause an
    +                  access violation later on. [Kit] }
    +                if tai_label(hp).labsym.getrefs <> 0 then
    +                  InternalError(2019110802);
    +{$endif EXTDEBUG}
    +                asml.Remove(hp);
    +                hp.Free;
    +                Exit;
    +              end;
    +            else
    +              InternalError(2019110801);
    +          end;
    +        until False;
    +      end;
    +
    +    { If a group of labels are clustered, change the jump to point to the last one
    +      that is still referenced }
    +    function TAOptObj.CollapseLabelCluster(jump: tai; var lbltai: tai): TAsmLabel;
    +      var
    +        LastLabel: TAsmLabel;
    +        hp2: tai;
    +      begin
    +        Result := tai_label(lbltai).labsym;
    +        LastLabel := Result;
    +        hp2 := tai(lbltai.next);
    +
    +        while (hp2 <> BlockEnd) and (hp2.typ in SkipInstr + [ait_align, ait_label]) do
    +          begin
    +
    +            if (hp2.typ = ait_label) and
    +              (tai_label(hp2).labsym.is_used) and
    +              (tai_label(hp2).labsym.labeltype = alt_jump) then
    +              LastLabel := tai_label(hp2).labsym;
    +
    +            hp2 := tai(hp2.next);
    +          end;
    +
    +        if (Result <> LastLabel) then
    +          begin
    +            Result.decrefs;
    +            JumpTargetOp(taicpu(jump))^.ref^.symbol := LastLabel;
    +            LastLabel.increfs;
    +            Result := LastLabel;
    +            lbltai := hp2;
    +          end;
    +      end;
    +
    +{$ifndef JVM}
    +    function TAOptObj.OptimizeConditionalJump(CJLabel: TAsmLabel; var p: tai; hp1: tai; var stoploop: Boolean): Boolean;
    +      var
    +        hp2: tai;
    +        NCJLabel: TAsmLabel;
    +      begin
    +        Result := False;
    +        while (hp1 <> BlockEnd) do
    +          begin
    +            StripDeadLabels(hp1, hp1);
    +
    +            if (hp1 <> BlockEnd) and
    +              (tai(hp1).typ=ait_instruction) and
    +              IsJumpToLabel(taicpu(hp1)) then
    +              begin
    +                NCJLabel := TAsmLabel(JumpTargetOp(taicpu(hp1))^.ref^.symbol);
    +
    +                if IsJumpToLabelUncond(taicpu(hp1)) then
    +                  begin
    +                    { Do it now to get it out of the way and to aid optimisations
    +                      later on in this method }
    +                    RemoveDeadCodeAfterJump(taicpu(hp1));
    +
    +                    { GetNextInstruction could be factored out, but hp2 might be
    +                      different after "RemoveDeadCodeAfterJump" }
    +                    GetNextInstruction(hp1, hp2);
    +
    +                    { Check for:
    +                        jmp<cond> @Lbl
    +                        jmp       @Lbl
    +                    }
    +                    if (CJLabel = NCJLabel) then
    +                      begin
    +                        DebugWrite('JUMP DEBUG: Short-circuited conditional jump');
    +                        { Both jumps go to the same label }
    +                        CJLabel.decrefs;
    +{$ifdef cpudelayslot}
    +                        RemoveDelaySlot(p);
    +{$endif cpudelayslot}
    +                        UpdateUsedRegs(tai(p.Next));
    +                        AsmL.Remove(p);
    +                        p.Free;
    +                        p := hp1;
    +
    +                        Result := True;
    +                        Exit;
    +                      end;
    +
    +                    if FindLabel(CJLabel, hp2) then
    +                      begin
    +                        { change the following jumps:
    +                            jmp<cond> CJLabel         jmp<inv_cond> NCJLabel
    +                            jmp       NCJLabel >>>    <code>
    +                          CJLabel:                  NCJLabel:
    +                            <code>
    +                          NCJLabel:
    +                        }
    +{$if defined(arm) or defined(aarch64)}
    +                        if (taicpu(p).condition<>C_None)
    +{$if defined(aarch64)}
    +                        { can't have conditional branches to
    +                          global labels on AArch64, because the
    +                          offset may become too big }
    +                        and (NCJLabel.bind=AB_LOCAL)
    +{$endif aarch64}
    +                        then
    +                          begin
    +{$endif arm or aarch64}
    +                            DebugWrite('JUMP DEBUG: Conditional jump inversion');
    +
    +                            taicpu(p).condition:=inverse_cond(taicpu(p).condition);
    +                            CJLabel.decrefs;
    +
    +                            CJLabel := NCJLabel;
    +                            JumpTargetOp(taicpu(p))^.ref^.symbol := NCJLabel;
    +
    +                            { when freeing hp1, the reference count
    +                              isn't decreased, so don't increase }
    +{$ifdef cpudelayslot}
    +                            RemoveDelaySlot(hp1);
    +{$endif cpudelayslot}
    +                            asml.remove(hp1);
    +                            hp1.free;
    +
    +                            hp1 := tai(p.Next);
    +
    +                            { Attempt another iteration in case more jumps follow }
    +                            Continue;
    +{$if defined(arm) or defined(aarch64)}
    +                          end;
    +{$endif arm or aarch64}
    +                      end
    +                    else if Assigned(hp2) and CollapseZeroDistJump(hp1, hp2, NCJLabel) then
    +                      begin
    +                        { Attempt another iteration in case more jumps follow }
    +                        Continue;
    +                      end;
    +                  end
    +                else
    +                  begin
    +                    GetNextInstruction(hp1, hp2);
    +
    +                    { Check for:
    +                        jmp<cond1>    @Lbl1
    +                        jmp<cond2>    @Lbl2
    +
    +                        Remove 2nd jump if conditions are equal or cond2 is fully covered by cond1
    +                    }
    +
    +                    if condition_in(taicpu(hp1).condition, taicpu(p).condition) then
    +                      begin
    +                        DebugWrite('JUMP DEBUG: Dominated conditional jump');
    +
    +                        NCJLabel.decrefs;
    +                        AsmL.Remove(hp1);
    +                        hp1.Free;
    +                        hp1 := tai(p.Next);
    +
    +                        { Flag another pass in case @Lbl2 appeared earlier in the procedure and is now a dead label }
    +                        stoploop := False;
    +                        { Attempt another iteration in case more jumps follow }
    +                        Continue;
    +                      end;
    +
    +                    { Check for:
    +                        jmp<cond1>  @Lbl1
    +                        jmp<cond2>  @Lbl2
    +
    +                      And inv(cond2) is a subset of cond1 (e.g. je followed by jne, or jae followed by jbe) )
    +                    }
    +                    if condition_in(inverse_cond(taicpu(hp1).condition), taicpu(p).condition) then
    +                      begin
    +                        { If @lbl1 immediately follows jmp<cond2>, we can remove
    +                          the first jump completely }
    +                        if FindLabel(CJLabel, hp2) then
    +                          begin
    +                            DebugWrite('JUMP DEBUG: jmp<cond> before jmp<inv_cond> - removed first jump');
    +
    +                            CJLabel.decrefs;
    +{$ifdef cpudelayslot}
    +                            RemoveDelaySlot(p);
    +{$endif cpudelayslot}
    +                            UpdateUsedRegs(tai(p.Next));
    +                            AsmL.Remove(p);
    +                            p.Free;
    +                            p := hp1;
    +
    +                            Result := True;
    +                            Exit;
    +
    +{$if not defined(avr) and not defined(riscv32) and not defined(riscv64)}
    +                          end
    +                        else
    +                          { NOTE: There is currently no watertight, cross-platform way to create
    +                            an unconditional jump without access to the cg object.  If anyone can
    +                            improve this particular optimisation to work on AVR and RISC-V,
    +                            please do. [Kit]}
    +                          begin
    +                            { Since cond1 is a subset of inv(cond2), jmp<cond2> will always branch if
    +                              jmp<cond1> does not, so change jmp<cond2> to an unconditional jump. }
    +
    +                            DebugWrite('JUMP DEBUG: jmp<cond> before jmp<inv_cond> - made second jump unconditional');
    +
    +{$if defined(powerpc) or defined(powerpc64)}
    +                            taicpu(hp2).condition.cond := C_None;
    +                            taicpu(hp2).condition.simple := True;
    +{$else powerpc}
    +                            taicpu(hp2).condition := C_None;
    +{$endif powerpc}
    +                            taicpu(hp2).opcode := aopt_uncondjmp;
    +
    +                            { NOTE: Changing the jump to unconditional won't open up new opportunities
    +                              for GetFinalDestination on earlier jumps because there's no live label
    +                              between the two jump instructions, so setting 'stoploop' to False only
    +                              wastes time. [Kit] }
    +
    +                            { See if more optimisations are possible }
    +                            Continue;
    +{$endif}
    +                          end;
    +                      end;
    +                  end;
    +            end;
    +
    +            if GetFinalDestination(taicpu(p),0) then
    +              stoploop := False;
    +
    +            Exit;
    +          end;
    +
    +      end;
    +{$endif JVM}
    +
    +    function TAOptObj.CollapseZeroDistJump(var p: tai; hp1: tai; ThisLabel: TAsmLabel): Boolean;
    +      var
    +        tmp: tai;
    +      begin
    +        Result := False;
    +
    +        { remove jumps to labels coming right after them }
    +        if FindLabel(ThisLabel, hp1) and
    +            { Cannot remove the first instruction }
    +            (p<>StartPoint) then
    +          begin
    +            ThisLabel.decrefs;
    +
    +            tmp := tai(p.Next); { Might be an align before the label }
    +{$ifdef cpudelayslot}
    +            RemoveDelaySlot(p);
    +{$endif cpudelayslot}
    +            asml.remove(p);
    +            p.free;
    +
    +            StripDeadLabels(tmp, p);
    +
    +            if p.typ <> ait_instruction then
    +              GetNextInstruction(UpdateUsedRegsAndOptimize(p), p);
    +
    +            Result := True;
    +          end;
    +
    +    end;
    +
    +
    +    function TAOptObj.DoJumpOptimizations(var p: tai; var stoploop: Boolean): Boolean;
    +      var
    +        hp1, hp2: tai;
    +        ThisLabel: TAsmLabel;
    +        ThisPassResult: Boolean;
    +      begin
    +        Result := False;
    +        if (p.typ <> ait_instruction) or not IsJumpToLabel(taicpu(p)) then
    +          Exit;
    +
    +        repeat
    +          ThisPassResult := False;
    +
    +          if GetNextInstruction(p, hp1) and (hp1 <> BlockEnd) then
    +            begin
    +              SkipEntryExitMarker(hp1,hp1);
    +              if (hp1 = BlockEnd) then
    +                Exit;
    +
    +              ThisLabel := TAsmLabel(JumpTargetOp(taicpu(p))^.ref^.symbol);
    +
    +              hp2 := getlabelwithsym(ThisLabel);
    +
    +              { getlabelwithsym returning nil occurs if a label is in a
    +                different block (e.g. on the other side of an asm...end pair). }
    +              if Assigned(hp2) then
    +                begin
    +                  { If there are multiple labels in a row, change the destination to the last one
    +                    in order to aid optimisation later }
    +                  ThisLabel := CollapseLabelCluster(p, hp2);
    +
    +                  if CollapseZeroDistJump(p, hp1, ThisLabel) then
    +                    begin
    +                      stoploop := False;
    +                      Result := True;
    +                      Exit;
    +                    end;
    +
    +                  if IsJumpToLabelUncond(taicpu(p)) then
    +                    begin
    +                      { Remove unreachable code between the jump and the next label }
    +                      RemoveDeadCodeAfterJump(taicpu(p));
    +
    +                      ThisPassResult := GetFinalDestination(taicpu(p), 0);
    +
    +                      if ThisPassResult then
    +                        { Might have caused some earlier labels to become dead }
    +                        stoploop := False;
    +                    end
    +{$ifndef JVM}
    +                  else if (taicpu(p).opcode = aopt_condjmp) then
    +                    ThisPassResult := OptimizeConditionalJump(ThisLabel, p, hp1, stoploop)
    +{$endif JVM}
    +                    ;
    +                end;
    +
    +            end;
    +
    +          Result := Result or ThisPassResult;
    +        until not (ThisPassResult and (p.typ = ait_instruction) and IsJumpToLabel(taicpu(p)));
    +
    +      end;
    +
    +
         function TAOptObj.GetFinalDestination(hp: taicpu; level: longint): boolean;
           {traces sucessive jumps to their final destination and sets it, e.g.
    -       je l1                je l3
    -       <code>               <code>
    +       je l1                je l3       <code>               <code>
            l1:       becomes    l1:
            je l2                je l3
            <code>               <code>
    @@ -1434,100 +2141,158 @@
            l2:                  l2:
            jmp l3               jmp l3
     
    -       the level parameter denotes how deeep we have already followed the jump,
    +       the level parameter denotes how deep we have already followed the jump,
            to avoid endless loops with constructs such as "l5: ; jmp l5"           }
     
           var p1: tai;
    -          {$if not defined(MIPS) and not defined(riscv64) and not defined(riscv32) and not defined(JVM)}
               p2: tai;
    -          l: tasmlabel;
    -          {$endif}
    +{$if not defined(MIPS) and not defined(riscv64) and not defined(riscv32) and not defined(JVM)}
    +          p3: tai;
    +{$endif}
    +          ThisLabel, l: tasmlabel;
     
           begin
    -        GetfinalDestination := false;
    +        GetFinalDestination := false;
             if level > 20 then
               exit;
    -        p1 := getlabelwithsym(tasmlabel(JumpTargetOp(hp)^.ref^.symbol));
    +
    +        ThisLabel := TAsmLabel(JumpTargetOp(hp)^.ref^.symbol);
    +        p1 := getlabelwithsym(ThisLabel);
             if assigned(p1) then
               begin
                 SkipLabels(p1,p1);
    -            if (tai(p1).typ = ait_instruction) and
    +            if (p1.typ = ait_instruction) and
                    (taicpu(p1).is_jmp) then
    -              if { the next instruction after the label where the jump hp arrives}
    -                 { is unconditional or of the same type as hp, so continue       }
    -                 IsJumpToLabelUncond(taicpu(p1))
    +              begin
    +                p2 := tai(p1.Next);
    +
    +                { Collapse any zero distance jumps we stumble across }
    +                while (p1<>StartPoint) and CollapseZeroDistJump(p1, p2, TAsmLabel(JumpTargetOp(taicpu(p1))^.ref^.symbol)) do
    +                  begin
    +                    { Note: Cannot remove the first instruction }
    +                    if (p1.typ = ait_label) then
    +                      SkipLabels(p1, p1);
    +
    +                    if not Assigned(p1) then
    +                      { No more valid commands }
    +                      Exit;
    +
    +                    { Check to see that we are actually still at a jump }
    +                    if not ((tai(p1).typ = ait_instruction) and (taicpu(p1).is_jmp)) then
    +                      begin
    +                        { Required to ensure recursion works properly, but to also
    +                          return false if a jump isn't modified. [Kit] }
    +                        if level > 0 then GetFinalDestination := True;
    +                        Exit;
    +                      end;
    +
    +                    p2 := tai(p1.Next);
    +                    if p2 = BlockEnd then
    +                      Exit;
    +                  end;
    +
     {$if not defined(MIPS) and not defined(riscv64) and not defined(riscv32) and not defined(JVM)}
    -{ for MIPS, it isn't enough to check the condition; first operands must be same, too. }
    -                 or
    -                 conditions_equal(taicpu(p1).condition,hp.condition) or
    +                p3 := p2;
    +{$endif not MIPS and not RV64 and not RV32 and not JVM}
     
    -                 { the next instruction after the label where the jump hp arrives
    -                   is the opposite of hp (so this one is never taken), but after
    -                   that one there is a branch that will be taken, so perform a
    -                   little hack: set p1 equal to this instruction (that's what the
    -                   last SkipLabels is for, only works with short bool evaluation)}
    -                 (conditions_equal(taicpu(p1).condition,inverse_cond(hp.condition)) and
    -                  SkipLabels(p1,p2) and
    -                  (p2.typ = ait_instruction) and
    -                  (taicpu(p2).is_jmp) and
    -                   (IsJumpToLabelUncond(taicpu(p2)) or
    -                   (conditions_equal(taicpu(p2).condition,hp.condition))) and
    -                  SkipLabels(p1,p1))
    +                if { the next instruction after the label where the jump hp arrives}
    +                   { is unconditional or of the same type as hp, so continue       }
    +                   IsJumpToLabelUncond(taicpu(p1))
    +
    +                   { TODO: For anyone with experience with MIPS or RISC-V, please add support for tracing
    +                     conditional jumps. [Kit] }
    +
    +{$if not defined(MIPS) and not defined(riscv64) and not defined(riscv32) and not defined(JVM)}
    +  { for MIPS, it isn't enough to check the condition; first operands must be same, too. }
    +                   or
    +                   condition_in(hp.condition, taicpu(p1).condition) or
    +
    +                   { the next instruction after the label where the jump hp arrives
    +                     is the opposite of hp (so this one is never taken), but after
    +                     that one there is a branch that will be taken, so perform a
    +                     little hack: set p1 equal to this instruction }
    +                   (condition_in(hp.condition, inverse_cond(taicpu(p1).condition)) and
    +                     SkipLabels(p3,p2) and
    +                     (p2.typ = ait_instruction) and
    +                     (taicpu(p2).is_jmp) and
    +                       (IsJumpToLabelUncond(taicpu(p2)) or
    +                       (condition_in(hp.condition, taicpu(p2).condition))
    +                     ) and
    +                     SetAndTest(p2,p1)
    +                   )
     {$endif not MIPS and not RV64 and not RV32 and not JVM}
    -                 then
    -                begin
    -                  { quick check for loops of the form "l5: ; jmp l5 }
    -                  if (tasmlabel(JumpTargetOp(taicpu(p1))^.ref^.symbol).labelnr =
    -                       tasmlabel(JumpTargetOp(hp)^.ref^.symbol).labelnr) then
    -                    exit;
    -                  if not GetFinalDestination(taicpu(p1),succ(level)) then
    -                    exit;
    +                   then
    +                  begin
    +                    { quick check for loops of the form "l5: ; jmp l5" }
    +                    if (TAsmLabel(JumpTargetOp(taicpu(p1))^.ref^.symbol).labelnr = ThisLabel.labelnr) then
    +                      exit;
    +                    if not GetFinalDestination(taicpu(p1),succ(level)) then
    +                      exit;
    +
    +                    { NOTE: Do not move this before the "l5: ; jmp l5" check,
    +                      because GetFinalDestination may change the destination
    +                      label of p1. [Kit] }
    +
    +                    l := tasmlabel(JumpTargetOp(taicpu(p1))^.ref^.symbol);
    +
     {$if defined(aarch64)}
    -                  { can't have conditional branches to
    -                    global labels on AArch64, because the
    -                    offset may become too big }
    -                  if not(taicpu(hp).condition in [C_None,C_AL,C_NV]) and
    -                     (tasmlabel(JumpTargetOp(taicpu(p1))^.ref^.symbol).bind<>AB_LOCAL) then
    -                    exit;
    +                    { can't have conditional branches to
    +                      global labels on AArch64, because the
    +                      offset may become too big }
    +                    if not(taicpu(hp).condition in [C_None,C_AL,C_NV]) and
    +                       (l.bind<>AB_LOCAL) then
    +                      exit;
     {$endif aarch64}
    -                  tasmlabel(JumpTargetOp(hp)^.ref^.symbol).decrefs;
    -                  JumpTargetOp(hp)^.ref^.symbol:=JumpTargetOp(taicpu(p1))^.ref^.symbol;
    -                  tasmlabel(JumpTargetOp(hp)^.ref^.symbol).increfs;
    -                end
    +                    ThisLabel.decrefs;
    +                    JumpTargetOp(hp)^.ref^.symbol:=l;
    +                    l.increfs;
    +                    GetFinalDestination := True;
    +                    Exit;
    +                  end
     {$if not defined(MIPS) and not defined(riscv64) and not defined(riscv32) and not defined(JVM)}
    -              else
    -                if conditions_equal(taicpu(p1).condition,inverse_cond(hp.condition)) then
    -                  if not FindAnyLabel(p1,l) then
    +                else
    +                  if condition_in(inverse_cond(hp.condition), taicpu(p1).condition) then
                         begin
    -      {$ifdef finaldestdebug}
    -                      insertllitem(asml,p1,p1.next,tai_comment.Create(
    -                        strpnew('previous label inserted'))));
    -      {$endif finaldestdebug}
    -                      current_asmdata.getjumplabel(l);
    -                      insertllitem(p1,p1.next,tai_label.Create(l));
    -                      tasmlabel(JumpTargetOp(hp)^.ref^.symbol).decrefs;
    -                      JumpTargetOp(hp)^.ref^.symbol := l;
    -                      l.increfs;
    -      {               this won't work, since the new label isn't in the labeltable }
    -      {               so it will fail the rangecheck. Labeltable should become a   }
    -      {               hashtable to support this:                                   }
    -      {               GetFinalDestination(asml, hp);                               }
    -                    end
    -                  else
    -                    begin
    -      {$ifdef finaldestdebug}
    -                      insertllitem(asml,p1,p1.next,tai_comment.Create(
    -                        strpnew('next label reused'))));
    -      {$endif finaldestdebug}
    -                      l.increfs;
    -                      tasmlabel(JumpTargetOp(hp)^.ref^.symbol).decrefs;
    -                      JumpTargetOp(hp)^.ref^.symbol := l;
    -                      if not GetFinalDestination(hp,succ(level)) then
    -                        exit;
    +                      if not FindLiveLabel(p1,l) then
    +                        begin
    +{$ifdef finaldestdebug}
    +                          insertllitem(asml,p1,p1.next,tai_comment.Create(
    +                            strpnew('previous label inserted'))));
    +{$endif finaldestdebug}
    +                          current_asmdata.getjumplabel(l);
    +                          insertllitem(p1,p1.next,tai_label.Create(l));
    +
    +                          ThisLabel.decrefs;
    +                          JumpTargetOp(hp)^.ref^.symbol := l;
    +                          l.increfs;
    +                          GetFinalDestination := True;
    +          {               this won't work, since the new label isn't in the labeltable }
    +          {               so it will fail the rangecheck. Labeltable should become a   }
    +          {               hashtable to support this:                                   }
    +          {               GetFinalDestination(asml, hp);                               }
    +                        end
    +                      else
    +                        begin
    +{$ifdef finaldestdebug}
    +                          insertllitem(asml,p1,p1.next,tai_comment.Create(
    +                            strpnew('next label reused'))));
    +{$endif finaldestdebug}
    +                          l.increfs;
    +                          ThisLabel.decrefs;
    +                          JumpTargetOp(hp)^.ref^.symbol := l;
    +                          if not GetFinalDestination(hp,succ(level)) then
    +                            exit;
    +                        end;
    +                      GetFinalDestination := True;
    +                      Exit;
                         end;
     {$endif not MIPS and not RV64 and not RV32 and not JVM}
    +              end;
               end;
    -        GetFinalDestination := true;
    +
    +        { Required to ensure recursion works properly, but to also
    +          return false if a jump isn't modified. [Kit] }
    +        if level > 0 then GetFinalDestination := True;
           end;
     
     
    @@ -1554,14 +2319,20 @@
         procedure TAOptObj.PeepHoleOptPass1;
           var
             p,hp1,hp2,hp3 : tai;
    -        stoploop:boolean;
    +        stoploop, FirstInstruction: boolean;
           begin
    +        StartPoint := BlockStart;
    +
             repeat
               stoploop:=true;
    -          p := BlockStart;
    +          p := StartPoint;
    +          FirstInstruction := True;
               ClearUsedRegs;
    -          while (p <> BlockEnd) Do
    +
    +          while Assigned(p) and (p <> BlockEnd) Do
                 begin
    +              prefetch(p.Next);
    +
                   { I'am not sure why this is done, UsedRegs should reflect the register usage before the instruction
                     If an instruction needs the information of this, it can easily create a TempUsedRegs (FK)
                   UpdateUsedRegs(tai(p.next));
    @@ -1570,155 +2341,39 @@
                   if p.Typ=ait_instruction then
                     InsertLLItem(tai(p.Previous),p,tai_comment.create(strpnew(GetAllocationString(UsedRegs))));
       {$endif DEBUG_OPTALLOC}
    +
    +              { Handle Jmp Optimizations first }
    +              if DoJumpOptimizations(p, stoploop) then
    +                begin
    +                  UpdateUsedRegs(p);
    +                  if FirstInstruction then
    +                    { Update StartPoint, since the old p was removed;
    +                      don't set FirstInstruction to False though, as
    +                      the new p might get removed too. }
    +                    StartPoint := p;
    +
    +                  if (p.typ = ait_instruction) and IsJumpToLabel(taicpu(p)) then
    +                    Continue;
    +                end;
    +
                   if PeepHoleOptPass1Cpu(p) then
                     begin
                       stoploop:=false;
                       UpdateUsedRegs(p);
    +
    +                  if FirstInstruction then
    +                    { Update StartPoint, since the old p was modified;
    +                      don't set FirstInstruction to False though, as
    +                      the new p might get modified too. }
    +                    StartPoint := p;
    +
                       continue;
                     end;
    -              case p.Typ Of
    -                ait_instruction:
    -                  begin
    -                    { Handle Jmp Optimizations }
    -                    if taicpu(p).is_jmp then
    -                      begin
    -                        { the following if-block removes all code between a jmp and the next label,
    -                          because it can never be executed
    -                        }
    -                        if IsJumpToLabelUncond(taicpu(p)) then
    -                          begin
    -                            hp2:=p;
    -                            while GetNextInstruction(hp2, hp1) and
    -                                  (hp1.typ <> ait_label)
    -{$ifdef JVM}
    -                                  and (hp1.typ <> ait_jcatch)
    -{$endif}
    -                                  do
    -                              if not(hp1.typ in ([ait_label]+skipinstr)) then
    -                                begin
    -                                  if (hp1.typ = ait_instruction) and
    -                                     taicpu(hp1).is_jmp and
    -                                     (JumpTargetOp(taicpu(hp1))^.typ = top_ref) and
    -                                     (JumpTargetOp(taicpu(hp1))^.ref^.symbol is TAsmLabel) then
    -                                     TAsmLabel(JumpTargetOp(taicpu(hp1))^.ref^.symbol).decrefs;
    -                                  { don't kill start/end of assembler block,
    -                                    no-line-info-start/end, cfi end, etc }
    -                                  if not(hp1.typ in [ait_align,ait_marker]) and
    -                                     ((hp1.typ<>ait_cfi) or
    -                                      (tai_cfi_base(hp1).cfityp<>cfi_endproc)) then
    -                                    begin
    -{$ifdef cpudelayslot}
    -                                      if (hp1.typ=ait_instruction) and (taicpu(hp1).is_jmp) then
    -                                        RemoveDelaySlot(hp1);
    -{$endif cpudelayslot}
    -                                      asml.remove(hp1);
    -                                      hp1.free;
    -                                      stoploop:=false;
    -                                    end
    -                                  else
    -                                    hp2:=hp1;
    -                                end
    -                              else break;
    -                            end;
    -                        if GetNextInstruction(p, hp1) then
    -                          begin
    -                            SkipEntryExitMarker(hp1,hp1);
    -                            { remove unconditional jumps to a label coming right after them }
    -                            if IsJumpToLabelUncond(taicpu(p)) and
    -                              FindLabel(tasmlabel(JumpTargetOp(taicpu(p))^.ref^.symbol), hp1) and
    -          { TODO: FIXME removing the first instruction fails}
    -                                (p<>blockstart) then
    -                              begin
    -                                tasmlabel(JumpTargetOp(taicpu(p))^.ref^.symbol).decrefs;
    -{$ifdef cpudelayslot}
    -                                RemoveDelaySlot(p);
    -{$endif cpudelayslot}
    -                                hp2:=tai(hp1.next);
    -                                asml.remove(p);
    -                                p.free;
    -                                p:=hp2;
    -                                stoploop:=false;
    -                                continue;
    -                              end
    -                            else if assigned(hp1) then
    -                              begin
    -                                { change the following jumps:
    -                                    jmp<cond> lab_1         jmp<cond_inverted> lab_2
    -                                    jmp       lab_2  >>>    <code>
    -                                  lab_1:                  lab_2:
    -                                    <code>
    -                                  lab_2:
    -                                }
    -                                hp3:=hp1;
    -                                if hp1.typ = ait_label then
    -                                  SkipLabels(hp1,hp1);
    -                                if (tai(hp1).typ=ait_instruction) and
    -                                  IsJumpToLabelUncond(taicpu(hp1)) and
    -                                  GetNextInstruction(hp1, hp2) and
    -                                  IsJumpToLabel(taicpu(p)) and
    -                                  FindLabel(tasmlabel(JumpTargetOp(taicpu(p))^.ref^.symbol), hp2) and
    -                                  { prevent removal of infinite loop from
    -                                        jmp<cond> lab_1
    -                                      lab_2:
    -                                        jmp lab2
    -                                      ..
    -                                      lab_1:
    -                                  }
    -                                  not FindLabel(tasmlabel(JumpTargetOp(taicpu(hp1))^.ref^.symbol), hp3) then
    -                                  begin
    -                                    if (taicpu(p).opcode=aopt_condjmp)
    -{$if defined(arm) or defined(aarch64)}
    -                                      and (taicpu(p).condition<>C_None)
    -{$endif arm or aarch64}
    -{$if defined(aarch64)}
    -                                      { can't have conditional branches to
    -                                        global labels on AArch64, because the
    -                                        offset may become too big }
    -                                      and (tasmlabel(JumpTargetOp(taicpu(hp1))^.ref^.symbol).bind=AB_LOCAL)
    -{$endif aarch64}
    -                                    then
    -                                      begin
    -                                        taicpu(p).condition:=inverse_cond(taicpu(p).condition);
    -                                        tai_label(hp2).labsym.decrefs;
    -                                        JumpTargetOp(taicpu(p))^.ref^.symbol:=JumpTargetOp(taicpu(hp1))^.ref^.symbol;
    -                                        { when freeing hp1, the reference count
    -                                          isn't decreased, so don't increase
     
    -                                         taicpu(p).oper[0]^.ref^.symbol.increfs;
    -                                        }
    -{$ifdef cpudelayslot}
    -                                        RemoveDelaySlot(hp1);
    -{$endif cpudelayslot}
    -                                        asml.remove(hp1);
    -                                        hp1.free;
    -                                        stoploop:=false;
    -                                        GetFinalDestination(taicpu(p),0);
    -                                      end
    -                                    else
    -                                      begin
    -                                        GetFinalDestination(taicpu(p),0);
    -                                        p:=tai(p.next);
    -                                        continue;
    -                                      end;
    -                                  end
    -                                else if IsJumpToLabel(taicpu(p)) then
    -                                  GetFinalDestination(taicpu(p),0);
    -                              end;
    -                          end;
    -                      end
    -                    else
    -                    { All other optimizes }
    -                      begin
    -                      end; { if is_jmp }
    -                  end;
    -                else
    -                  ;
    -              end;
    +              FirstInstruction := False;
                   if assigned(p) then
    -                begin
    -                  UpdateUsedRegs(p);
    -                  p:=tai(p.next);
    -                end;
    +                p := tai(UpdateUsedRegsAndOptimize(p).Next);
    +
                 end;
             until stoploop or not(cs_opt_level3 in current_settings.optimizerswitches);
           end;
    @@ -1735,10 +2390,7 @@
                 if PeepHoleOptPass2Cpu(p) then
                   continue;
                 if assigned(p) then
    -              begin
    -                UpdateUsedRegs(p);
    -                p:=tai(p.next);
    -              end;
    +              p := tai(UpdateUsedRegsAndOptimize(p).Next);
               end;
           end;
     
    

Relationships

related to 0033909 closedFlorian New Jcc optimization makes error 
parent of 0036295 resolvedFlorian [Refactor] OptPass2Jcc clean-up 
parent of 0036299 resolvedFlorian [Patch] Bug fix for jump optimisations under debug mode 
related to 0036352 resolvedFlorian [Refactor] Clean-up of i386 peephole optimizer 
child of 0034628 closedJ. Gareth Moreton [Patch / Refactor] x86_64 optimizer overhaul 

Activities

J. Gareth Moreton

2019-11-06 15:01

developer  

Jump Optimisation Specs.pdf (117,538 bytes)

J. Gareth Moreton

2019-11-06 15:21

developer   ~0119101

Some of the "condition_in" implementations will need improvements from programmers more familiar with the specific platforms.

Florian

2019-11-06 18:14

administrator   ~0119105

Some remarks (without testing the code yet):

- if a function is called FindAnyLabel, it should find any label and not skip some
- the compiler sources normally do not indent ifdefs
- heavy ifdefing for debug messages should be avoided, introduce a separate subroutine for this if really needed
- collapsing the align statements in general is not desired, the combination
    .p2align 4,,10
    .p2align 3
is emitted on purpose: too big alignment code is avoided, but everything is aligned at least to 8 byte boundares
- inlining functions/subroutines with more than a few lines normally makes no sense
- indent comments like the surrounding code

J. Gareth Moreton

2019-11-06 19:07

developer   ~0119107

I should have probably emphasised the point in the PDF file. The alignment merging is mainly a speed boost. I am aware that ".p2align 4,,10; .p2align 3" is a deliberate combination, but in the linked list, it's only a single entry. When two such adjacent aligns are found, they are merged, taking the maximum values of each.

I'll have to double-check what FindAnyLabel does, since it's an existing function if I remember correctly.

In regards to inlining routines with more than one line of code, I'm fairly sure the comments explain why... the function is only called in one place, so it effectively just pastes in the code where the lone call is.

In the meantime, I'll clean up the patches as you mention.

Florian

2019-11-06 19:47

administrator   ~0119110

- Merging the two alignments is plainly wrong. It does not result in the same alignment.
- Inlining a long routine only because it is used only once is no reason: later it might get used more than once, shorter routines might result in better code etc.
- Yes, before FindAnyLabel found any label as the name says, now it ignores those which are not used

J. Gareth Moreton

2019-11-06 21:22

developer   ~0119112

".p2align 4,,10; .p2align 3" are encoded as a single linked list entry. Normally merging two of these entries just results in one of these alignment pairs in the final assembly. The two .p2align's are not merged directly. However, if I'm right in thinking, is the alignment ignored if it would require more than 'maxbytes' to align the code? In that case I can see my mistake and will make adjustments.

I just know that the routine is used only once and its contents separated from the caller (instead of merged inside it) for reasons of maintainability. I know for a fact that encoding a call instead of inlining the code is nothing short of a loss in both execution speed and code size. At present the compiler is not smart enough to inline routines that are only used once, and I doubt it will be able to without whole-program optimisation at the very least. I would argue that the 'inline' directive can be removed if the routine is used more than once. I don't understand the hostility and 'that's no excuse to use inline'.

The reason why "FindAnyLabel" now skips over unused labels is because an unused label is now flagged for deletion, so it's not considered safe to land on it. Given that additional check, is it permissible for the function to be renamed to "FindLiveLabel"?

J. Gareth Moreton

2019-11-06 22:41

developer   ~0119113

This might be because I'm only fully testing things on the fly on i386-win32 and x86_64-win64, but being more conservative with the align merges caused some interesting side-effects... 1) some more unnecessary aligns were removed, and 2) I actually ended up triggering "Jcc/CMOVcc drop-out" from 0033909 ... it also answers the question as to why the Jcc->CMOV optimisation sometimes failed... it would stop on an alignment field instead of a label and not properly handle it.

I've removed the inline hints if just to avoid a fight. It just seemed like a sound optimisation because the function is only called from one spot, and that's an instance where the code, regardless of size, can be inlined without any loss because it won't be duplicated.

J. Gareth Moreton

2019-11-06 23:13

developer   ~0119115

Patches updated. I couldn't remove all of the DEBUG_JUMP ifdefs because I had quite a few temporary variables in one function that served to make the debug messages more informative, but keeping them in when debugging is not enabled is just wasted cycles.

jump-optimizations-condition_in.patch (16,912 bytes)
Index: compiler/aarch64/cpubase.pas
===================================================================
--- compiler/aarch64/cpubase.pas	(revision 43323)
+++ compiler/aarch64/cpubase.pas	(working copy)
@@ -321,6 +321,9 @@
     function inverse_cond(const c: TAsmCond): TAsmCond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+
     procedure shifterop_reset(var so : tshifterop); {$ifdef USEINLINE}inline;{$endif USEINLINE}
 
     function dwarf_reg(r:tregister):shortint;
@@ -488,6 +491,26 @@
       end;
 
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+      begin
+        Result := (c = C_None) or conditions_equal(Subset, c);
+
+        { Please update as necessary. [Kit] }
+        if not Result then
+          case Subset of
+            C_EQ:
+              Result := (c in [C_GE, C_LE]);
+            C_LT:
+              Result := (c in [C_LE]);
+            C_GT:
+              Result := (c in [C_GE]);
+            else
+              Result := False;
+          end;
+      end;
+
+
     function dwarf_reg(r:tregister):shortint;
       begin
         result:=regdwarf_table[findreg_by_number(r)];
Index: compiler/arm/cpubase.pas
===================================================================
--- compiler/arm/cpubase.pas	(revision 43323)
+++ compiler/arm/cpubase.pas	(working copy)
@@ -365,6 +365,9 @@
     function inverse_cond(const c: TAsmCond): TAsmCond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+
     procedure shifterop_reset(var so : tshifterop); {$ifdef USEINLINE}inline;{$endif USEINLINE}
     function is_pc(const r : tregister) : boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
 
@@ -540,6 +543,26 @@
       end;
 
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+      begin
+        Result := (c = C_None) or conditions_equal(Subset, c);
+
+        { Please update as necessary. [Kit] }
+        if not Result then
+          case Subset of
+            C_EQ:
+              Result := (c in [C_GE, C_LE]);
+            C_LT:
+              Result := (c in [C_LE]);
+            C_GT:
+              Result := (c in [C_GE]);
+            else
+              Result := False;
+          end;
+      end;
+
+
     function is_shifter_const(d : aint;var imm_shift : byte) : boolean;
       var
          i : longint;
Index: compiler/avr/cpubase.pas
===================================================================
--- compiler/avr/cpubase.pas	(revision 43323)
+++ compiler/avr/cpubase.pas	(working copy)
@@ -300,6 +300,9 @@
     function inverse_cond(const c: TAsmCond): TAsmCond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+
     function dwarf_reg(r:tregister):byte;
     function dwarf_reg_no_error(r:tregister):shortint;
     function eh_return_data_regno(nr: longint): longint;
@@ -410,6 +413,24 @@
       end;
 
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+      begin
+        Result := (c = C_None) or conditions_equal(Subset, c);
+
+        { Please update as necessary. [Kit] }
+        if not Result then
+          case Subset of
+            C_EQ:
+              Result := (c in [C_GE]);
+            C_LT:
+              Result := (c in [C_NE]);
+            else
+              Result := False;
+          end;
+      end;
+
+
     function rotl(d : dword;b : byte) : dword;
       begin
          result:=(d shr (32-b)) or (d shl b);
Index: compiler/m68k/cpubase.pas
===================================================================
--- compiler/m68k/cpubase.pas	(revision 43323)
+++ compiler/m68k/cpubase.pas	(working copy)
@@ -364,6 +364,10 @@
 
     function inverse_cond(const c: TAsmCond): TAsmCond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
+
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+
     function dwarf_reg(r:tregister):shortint;
     function dwarf_reg_no_error(r:tregister):shortint;
     function eh_return_data_regno(nr: longint): longint;
@@ -592,6 +596,26 @@
     function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
       begin
         result := c1 = c2;
+    end;
+
+
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+      begin
+        Result := (c = C_None) or conditions_equal(Subset, c);
+
+        { Please update as necessary. [Kit] }
+        if not Result then
+          case Subset of
+            C_EQ:
+              Result := (c in [C_GE, C_LE]);
+            C_LT:
+              Result := (c in [C_LE]);
+            C_GT:
+              Result := (c in [C_GE]);
+            else
+              Result := False;
+          end;
       end;
 
 
Index: compiler/mips/cpubase.pas
===================================================================
--- compiler/mips/cpubase.pas	(revision 43323)
+++ compiler/mips/cpubase.pas	(working copy)
@@ -259,6 +259,9 @@
     function inverse_cond(const c: TAsmCond): TAsmCond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+
     { Returns the tcgsize corresponding with the size of reg.}
     function reg_cgsize(const reg: tregister) : tcgsize;
     function cgsize2subreg(regtype: tregistertype; s:tcgsize):tsubregister;
@@ -367,6 +370,30 @@
       end;
 
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+      begin
+        Result := (c = C_None) or conditions_equal(Subset, c);
+
+        { Please update as necessary. [Kit] }
+        if not Result then
+          case Subset of
+            C_EQ:
+              Result := (c in [C_GE, C_LE, C_GEU, C_LEU]);
+            C_LT:
+              Result := (c in [C_LE]);
+            C_LTU:
+              Result := (c in [C_LEU]);
+            C_GT:
+              Result := (c in [C_GE]);
+            C_GTU:
+              Result := (c in [C_GEU]);
+            else
+              Result := False;
+          end;
+      end;
+
+
     function std_regnum_search(const s:string):Tregister;
       begin
         result:=regnumber_table[findreg_by_name_table(s,std_regname_table,std_regname_index)];
Index: compiler/powerpc64/cpubase.pas
===================================================================
--- compiler/powerpc64/cpubase.pas	(revision 43323)
+++ compiler/powerpc64/cpubase.pas	(working copy)
@@ -392,9 +392,12 @@
 function std_regname(r: Tregister): string;
 function is_condreg(r: tregister): boolean;
 
-function inverse_cond(const c: TAsmCond): Tasmcond;
-{$IFDEF USEINLINE}inline;{$ENDIF USEINLINE}
+function inverse_cond(const c: TAsmCond): TAsmCond; {$IFDEF USEINLINE}inline;{$ENDIF USEINLINE}
 function conditions_equal(const c1, c2: TAsmCond): boolean;
+
+ { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+function condition_in(const Subset, c: TAsmCond): Boolean;
+
 function dwarf_reg(r:tregister):shortint;
 function dwarf_reg_no_error(r:tregister):shortint;
 function eh_return_data_regno(nr: longint): longint;
@@ -472,6 +475,15 @@
     (c1.crbit = c2.crbit));
 end;
 
+{ Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+function condition_in(const Subset, c: TAsmCond): Boolean;
+  begin
+    Result := (c.cond = C_None) or conditions_equal(Subset, c);
+
+    { TODO: Can a PowerPC programmer please update this procedure to
+      actually detect subsets? Thanks. [Kit] }
+  end;
+
 function flags_to_cond(const f: TResFlags): TAsmCond;
 const
   flag_2_cond: array[F_EQ..F_SO] of TAsmCondFlag =
Index: compiler/powerpc/cpubase.pas
===================================================================
--- compiler/powerpc/cpubase.pas	(revision 43323)
+++ compiler/powerpc/cpubase.pas	(working copy)
@@ -395,6 +395,10 @@
 
     function inverse_cond(const c: TAsmCond): Tasmcond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     function conditions_equal(const c1, c2: TAsmCond): boolean;
+
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+
     function dwarf_reg(r:tregister):shortint;
     function dwarf_reg_no_error(r:tregister):shortint;
     function eh_return_data_regno(nr: longint): longint;
@@ -474,6 +478,16 @@
       end;
 
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+      begin
+        Result := (c.cond = C_None) or conditions_equal(Subset, c);
+
+        { TODO: Can a PowerPC programmer please update this procedure to
+          actually detect subsets? Thanks. [Kit] }
+      end;
+
+
     function flags_to_cond(const f: TResFlags) : TAsmCond;
       const
         flag_2_cond: array[F_EQ..F_SO] of TAsmCondFlag =
Index: compiler/riscv32/cpubase.pas
===================================================================
--- compiler/riscv32/cpubase.pas	(revision 43323)
+++ compiler/riscv32/cpubase.pas	(working copy)
@@ -333,6 +333,9 @@
 
     function conditions_equal(const c1,c2: TAsmCond): boolean;
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+
 implementation
 
     uses
@@ -459,4 +462,20 @@
         result:=c1=c2;
       end;
 
+
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+      begin
+        Result := (c = C_None) or conditions_equal(Subset, c);
+
+        if not Result then
+          case Subset of
+            C_EQ:
+              Result := (c in [C_GE, C_GEU]);
+            else
+              Result := False;
+          end;
+      end;
+
+
 end.
Index: compiler/riscv64/cpubase.pas
===================================================================
--- compiler/riscv64/cpubase.pas	(revision 43323)
+++ compiler/riscv64/cpubase.pas	(working copy)
@@ -348,6 +348,9 @@
 
     function conditions_equal(const c1,c2: TAsmCond): boolean;
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+
 implementation
 
     uses
@@ -474,5 +477,20 @@
         result:=c1=c2;
       end;
 
+
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+      begin
+        Result := (c = C_None) or conditions_equal(Subset, c);
+
+        if not Result then
+          case Subset of
+            C_EQ:
+              Result := (c in [C_GE, C_GEU]);
+            else
+              Result := False;
+          end;
+      end;
+
 end.
 
Index: compiler/sparcgen/cpubase.pas
===================================================================
--- compiler/sparcgen/cpubase.pas	(revision 43323)
+++ compiler/sparcgen/cpubase.pas	(working copy)
@@ -335,6 +335,9 @@
     function inverse_cond(const c: TAsmCond): TAsmCond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+
     function  flags_to_cond(const f: TResFlags) : TAsmCond;
     function cgsize2subreg(regtype: tregistertype; s:Tcgsize):Tsubregister;
     function reg_cgsize(const reg: tregister): tcgsize;
@@ -522,6 +525,33 @@
       end;
 
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+      begin
+        Result := (c = C_None) or conditions_equal(Subset, c);
+
+        { TODO: Can a SparcGEN programmer please update this procedure to
+          detect all subsets? Thanks. [Kit] }
+        if not Result then
+          case Subset of
+            C_A:
+              Result := (c in [C_A,  C_AE]);
+            C_B:
+              Result := (c in [C_B,  C_BE]);
+            C_E:
+              Result := (c in [C_AE, C_BE]);
+            C_FE:
+              Result := (c in [C_FLE,C_FGE]);
+            C_FL:
+              Result := (c in [C_FLE]);
+            C_FG:
+              Result := (c in [C_FGE]);
+            else
+              Result := False;
+          end;
+      end;
+
+
     function dwarf_reg(r:tregister):shortint;
       begin
         result:=regdwarf_table[findreg_by_number(r)];
Index: compiler/x86/cpubase.pas
===================================================================
--- compiler/x86/cpubase.pas	(revision 43323)
+++ compiler/x86/cpubase.pas	(working copy)
@@ -353,6 +353,9 @@
     function inverse_cond(const c: TAsmCond): TAsmCond; {$ifdef USEINLINE}inline;{$endif USEINLINE}
     function conditions_equal(const c1, c2: TAsmCond): boolean; {$ifdef USEINLINE}inline;{$endif USEINLINE}
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+
     { checks whether two segment registers are normally equal in the current memory model }
     function segment_regs_equal(r1,r2:tregister):boolean;
 
@@ -666,6 +669,49 @@
       end;
 
 
+    { Checks if Subset is a subset of c (e.g. "less than" is a subset of "less than or equal" }
+    function condition_in(const Subset, c: TAsmCond): Boolean;
+      begin
+        Result := (c = C_None) or conditions_equal(Subset, c);
+        if not Result then
+          case Subset of
+            C_A,  C_NBE:
+              Result := (c in [C_A,  C_AE, C_NB, C_NBE]);
+            C_AE, C_NB:
+              Result := (c in [C_AE, C_NB]);
+            C_B,  C_NAE:
+              Result := (c in [C_B,  C_BE, C_C,  C_NA, C_NAE]);
+            C_BE, C_NA:
+              Result := (c in [C_BE, C_NA]);
+            C_C:
+              { C_B  / C_NAE: CF = 1
+                C_BE / C_NA:  CF = 1 or ZF = 1 }
+              Result := (c in [C_B,  C_BE, C_NA, C_NAE]);
+            C_E,  C_Z:
+              Result := (c in [C_AE, C_BE, C_E,  C_NA, C_NB, C_NG, C_NL]);
+            C_G,  C_NLE:
+              Result := (c in [C_G,  C_GE, C_NL, C_NLE]);
+            C_GE, C_NL:
+              Result := (c in [C_GE, C_NL]);
+            C_L,  C_NGE:
+              Result := (c in [C_L,  C_LE, C_NG, C_NGE]);
+            C_LE, C_NG:
+              Result := (c in [C_LE, C_NG]);
+            C_NC:
+              { C_A  / C_NBE: CF = 0 and ZF = 0; not a subset because ZF has to be zero as well
+                C_AE / C_NB:  CF = 0 }
+              Result := (c in [C_AE, C_NB]);
+            C_NE, C_NZ:
+              Result := (c in [C_NE, C_NZ, C_A,  C_B,  C_NAE,C_NBE,C_L,  C_G,  C_NLE,C_NGE]);
+            C_NP, C_PO:
+              Result := (c in [C_NP, C_PO]);
+            C_P,  C_PE:
+              Result := (c in [C_P,  C_PE]);
+            else
+              Result := False;
+          end;
+      end;
+
     function dwarf_reg(r:tregister):shortint;
       begin
         result:=regdwarf_table[findreg_by_number(r)];

J. Gareth Moreton

2019-11-06 23:53

developer   ~0119116

Last edited: 2019-11-06 23:56

View 4 revisions

I should re-emphasise... under x86 platforms, a ".p2align 4,,10; .p2align 3" pair is encapsulated under a single tai_align entry that is only resolved when it comes to writing the object file, so merging adjacent tai_align entries is beneficial, because otherwise you end up with two pairs of alignment hints (.p2align 4,,10; .p2align 3; .p2align 4,,10; .p2align 3), which will ALWAYS align to 16 bytes because the first pair will, at least, align to an 8-byte boundary, and then the second pair will align it to a 16-byte boundary if the first pair doesn't.

Nevertheless, the merging code now only merges adjacent alignment hints under the following conditions:
- Both of the "maxbytes" fields are zero.
- If either of the "maxbytes" fields are non-zero, then only if the alignment sizes match.

There is additional code at the end of the relevant routine that will remove clusters of alignment hints if all the labels they are associated with have become dead labels, whereas before, it only removed the latest align (under the assumption that they had been merged before).

Note that speed gains in the compiler seem to not be as good as they were before.

J. Gareth Moreton

2019-11-07 00:37

developer   ~0119119

Speed gains are still reasonably good actually:

[78.188] 1304078 lines compiled, 78.2 sec (Trunk)
[69.641] 1304078 lines compiled, 69.6 sec (Jump optimisations)

I'm just experimenting with an additional change to OptPass2Jcc, although I won't put it into the patch just yet because it's kind of a step beyond simple jump optimisations. I'm looking at replacing the "Jcc/CMOVcc drop-out" code with an internal error, which was my original intention on 0033909 until I found out that it was triggered during cross-compilation sometimes. However, I think I know why it's triggered now and have made a catch for that particular set-up (it's when it finds alignment fields instead of a label).

J. Gareth Moreton

2019-11-07 04:15

developer   ~0119121

Last edited: 2019-11-07 04:16

View 2 revisions

I still think using "inline" for single-use functions is fine so long as you explain why in a comment. If we keep denying things because it 'may' be called a second time in the future or it 'may' cause problems in a very unlikely edge-case, what hope is there in improving the compiler? We talk about not doing micro-optimisations because we should improve the compiler instead, but who is improving it?

To talk about the inlined functions... if they may be called a second time in future, the programmer will know about the inline directive because they should be studying the function beforehand instead of using it blindly, just like you'd check the method to see if it's virtual or static, for example. The compiler isn't exactly a well-documented API, and anyone using a function blindly is a fool.

Let me say this... I have absolutely no faith in the compiler producing optimal code, and never will. If I need something done fast, like a graphics engine or a mathematical or scientific algorithm, I hand-code the critical parts in assembly language. I can't migrate to C++ because the Microsoft x86_64 compiler forbids inline assembler, and unfortunately C++, being an old language like Object Pascal, doesn't interface with modern CPU features like vectored registers too well. __m128 is offically an opaque type and manipulating individual fields is a mess, yet seems to be the only thing that can be used to interface with the MM registers fully. We can't be stuck in the past though.

Now it happens I love Object Pascal, but the only dialects out there really are Delphi and FPC - Delphi is expensive, and FPC is a slow, stagnant and bloated mess. I can only offer improvements up to a point, but there's always something to critique, like "it will break assembly language" even though using assembly language to communicate with a third-party API is going to be risky and somewhat unsupported anyway. And then you also say "the function being used once is no reason" to use "inline". Why not? Because someone else might call the function again? Who and where? What's to stop them from removing "inline" in that case? When I use "inline", I tell the compiler 'I know what I'm doing'. If you're so against "inline", especially with your auto-inline proposal, remove the directive. It's your tool after all.

J. Gareth Moreton

2019-11-08 19:57

developer   ~0119169

Last edited: 2019-11-08 20:05

View 2 revisions

Updated the patches to be cleaner and to remove code duplication. I added a new function called "StripLabelFast", which removes a label and its associated alignment hint if appropriate, which is faster than StripDeadLabels because you only pass in the relevant list entry and it doesn't return anything, but it also doesn't check to see if the label is dead or not (comments warn about this) - currently it's only called from OptPass2Jcc (hence the x86-specific one now requires the main jump optimizations patch to work).

It completely escaped me that the speed-up in the compiler is due to the jump optimisations being applied to itself rather than instructions and labels being trimmed sooner in the Peephole Optimizer.



jump-optimizations-x86-specific-v3.patch (8,568 bytes)
Index: compiler/x86/aoptx86.pas
===================================================================
--- compiler/x86/aoptx86.pas	(revision 43323)
+++ compiler/x86/aoptx86.pas	(working copy)
@@ -3119,49 +3119,44 @@
                   end;
               end;
 
-            if ((hp1.typ = ait_label) and (symbol = tai_label(hp1).labsym))
-                or ((hp1.typ = ait_align) and GetNextInstruction(hp1, hp2) and (hp2.typ = ait_label) and (symbol = tai_label(hp2).labsym)) then
+          { Detect the following:
+              jmp<cond>     @Lbl1
+              jmp           @Lbl2
+              ...
+            @Lbl1:
+              ret
+
+            Change to:
+
+              jmp<inv_cond> @Lbl2
+              ret
+          }
+            if MatchInstruction(hp1, A_JMP, []) then
               begin
-                { If Jcc is immediately followed by the label that it's supposed to jump to, remove it }
-                DebugMsg(SPeepholeOptimization + 'Removed conditional jump whose destination was immediately after it', p);
-                UpdateUsedRegs(hp1);
-
-                TAsmLabel(symbol).decrefs;
-                { if the label refs. reach zero, remove any alignment before the label }
-                if (hp1.typ = ait_align) then
+                hp2 := getlabelwithsym(TAsmLabel(symbol));
+                if Assigned(hp2) and SkipLabels(hp2,hp2) and
+                  MatchInstruction(hp2,A_RET,[S_NO]) then
                   begin
-                    UpdateUsedRegs(hp2);
-                    if (TAsmLabel(symbol).getrefs = 0) then
-                    begin
-                      asml.Remove(hp1);
-                      hp1.Free;
-                    end;
-                    hp1 := hp2; { Set hp1 to the label }
-                  end;
+                    taicpu(p).condition := inverse_cond(taicpu(p).condition);
 
-                asml.remove(p);
-                p.free;
+                    { Change label address to that of the unconditional jump }
+                    taicpu(p).loadoper(0, taicpu(hp1).oper[0]^);
 
-                if (TAsmLabel(symbol).getrefs = 0) then
-                  begin
-                    GetNextInstruction(hp1, p); { Instruction following the label }
-                    asml.remove(hp1);
-                    hp1.free;
-
-                    UpdateUsedRegs(p);
-                    Result := True;
-                  end
-                else
-                  begin
-                    { We don't need to set the result to True because we know hp1
-                      is a label and won't trigger any optimisation routines. [Kit] }
-                    p := hp1;
+                    TAsmLabel(symbol).DecRefs;
+                    taicpu(hp1).opcode := A_RET;
+                    taicpu(hp1).is_jmp := false;
+                    taicpu(hp1).ops := taicpu(hp2).ops;
+                    case taicpu(hp2).ops of
+                      0:
+                        taicpu(hp1).clearop(0);
+                      1:
+                        taicpu(hp1).loadconst(0,taicpu(hp2).oper[0]^.val);
+                      else
+                        internalerror(2016041302);
+                    end;
                   end;
-
-                Exit;
               end;
           end;
-
 {$ifndef i8086}
         if CPUX86_HAS_CMOV in cpu_capabilities[current_settings.cputype] then
           begin
@@ -3216,24 +3211,29 @@
                           else
                             hp2 := hp1;
 
-                          if not Assigned(hp2) then
-                            InternalError(2018062910);
+                          { Remember what the first hp2 is in case there's multiple aligns and labels to get rid of }
+                          hp3 := hp2;
+                          repeat
+                            if not Assigned(hp2) then
+                              InternalError(2018062910);
 
-                          if (hp2.typ <> ait_label) then
-                            begin
-                              { There's something other than CMOVs here.  Move the original jump
-                                to right before this point, then break out.
-
-                                Originally this was part of the above internal error, but it got
-                                triggered on the bootstrapping process sometimes. Investigate. [Kit] }
-                              asml.remove(p);
-                              asml.insertbefore(p, hp2);
-                              DebugMsg('Jcc/CMOVcc drop-out', p);
-                              UpdateUsedRegs(p);
-                              Result := True;
-                              Exit;
+                            case hp2.typ of
+                              ait_label:
+                                { What we expected - break out of the loop (it won't be a dead label at the top of
+                                  a cluster because that was optimised at an earlier stage) }
+                                Break;
+                              ait_align:
+                                { Go to the next entry until a label is found (may be multiple aligns before it) }
+                                begin
+                                  hp2 := tai(hp2.Next);
+                                  Continue;
+                                end;
+                              else
+                                InternalError(2018062911);
                             end;
 
+                          until False;
+
                           { Now we can safely decrement the reference count }
                           tasmlabel(symbol).decrefs;
 
@@ -3245,10 +3245,7 @@
 
                           { Remove the label if this is its final reference }
                           if (tasmlabel(symbol).getrefs=0) then
-                            begin
-                              asml.remove(hp2);
-                              hp2.free;
-                            end;
+                            StripLabelFast(hp3);
 
                           if Assigned(p) then
                             begin
@@ -3342,17 +3339,8 @@
                                 hp1.free;
 
                                 { Remove label xxx (it will have a ref of zero due to the initial check }
-                                if (hp4.typ = ait_align) then
-                                  begin
-                                    { Account for alignment as well }
-                                    GetNextInstruction(hp4, hp1);
-                                    asml.remove(hp1);
-                                    hp1.free;
-                                  end;
+                                StripLabelFast(hp4);
 
-                                asml.remove(hp4);
-                                hp4.free;
-
                                 { Now we can safely decrement it }
                                 tasmlabel(symbol).decrefs;
 
@@ -3362,24 +3350,13 @@
                                 asml.remove(hp2);
                                 hp2.free;
 
-                                { Remove label yyy (and the optional alignment) if its reference will fall to zero }
-                                if tasmlabel(symbol).getrefs = 1 then
-                                  begin
-                                    if (hp3.typ = ait_align) then
-                                      begin
-                                        { Account for alignment as well }
-                                        GetNextInstruction(hp3, hp1);
-                                        asml.remove(hp1);
-                                        hp1.free;
-                                      end;
+                                { As before, now we can safely decrement it }
+                                tasmlabel(symbol).decrefs;
 
-                                    asml.remove(hp3);
-                                    hp3.free;
+                                { Remove label yyy (and the optional alignment) if its reference falls to zero }
+                                if tasmlabel(symbol).getrefs = 0 then
+                                  StripLabelFast(hp3);
 
-                                    { As before, now we can safely decrement it }
-                                    tasmlabel(symbol).decrefs;
-                                  end;
-
                                 if Assigned(p) then
                                   begin
                                     UpdateUsedRegs(p);
jump-optimizations-v3.patch (52,731 bytes)
Index: compiler/aoptobj.pas
===================================================================
--- compiler/aoptobj.pas	(revision 43323)
+++ compiler/aoptobj.pas	(working copy)
@@ -248,8 +248,9 @@
         { label numbers                                                  }
         LabelInfo: PLabelInfo;
 
-        { Start and end of the block that is currently being optimized }
-        BlockStart, BlockEnd: Tai;
+        { Start and end of the block that is currently being optimized, and
+          a selected start point after the start of the block }
+        BlockStart, BlockEnd, StartPoint: Tai;
 
         DFA: TAOptDFA;
 
@@ -269,6 +270,12 @@
         Procedure ClearUsedRegs;
         Procedure UpdateUsedRegs(p : Tai);
         class procedure UpdateUsedRegs(var Regs: TAllUsedRegs; p: Tai);
+
+        { If UpdateUsedRegsAndOptimize has read ahead, the result is one before
+          the next valid entry (so "p.Next" returns what's expected).  If no
+          reading ahead happened, then the result is equal to p. }
+        function UpdateUsedRegsAndOptimize(p : Tai): Tai;
+
         Function CopyUsedRegs(var dest : TAllUsedRegs) : boolean;
         procedure RestoreUsedRegs(const Regs : TAllUsedRegs);
         procedure TransferUsedRegs(var dest: TAllUsedRegs);
@@ -363,6 +370,34 @@
         function PeepHoleOptPass2Cpu(var p: tai): boolean; virtual;
         function PostPeepHoleOptsCpu(var p: tai): boolean; virtual;
 
+        { Output debug message to console - null function if EXTDEBUG is not defined }
+        class procedure DebugWrite(Message: string); static; inline;
+
+        { Removes all instructions between an unconditional jump and the next label }
+        procedure RemoveDeadCodeAfterJump(p: taicpu);
+
+        { If hp is a label, strip it if its reference count is zero.  Repeat until
+          a non-label is found, or a label with a non-zero reference count.
+          True is returned if something was stripped }
+        function StripDeadLabels(hp: tai; var NextValid: tai): Boolean;
+
+        { Strips a label and any aligns that appear before it (if hp points to
+          them rather than the label).  Only call this procedure on a label that
+          you already know is no longer referenced }
+        procedure StripLabelFast(hp: tai); {$ifdef USEINLINE}inline;{$endif USEINLINE}
+
+        { Checks and removes "jmp @@lbl; @lbl". Returns True if the jump was removed }
+        function CollapseZeroDistJump(var p: tai; hp1: tai; ThisLabel: TAsmLabel): Boolean;
+
+        { If a group of labels are clustered, change the jump to point to the last one that is still referenced }
+        function CollapseLabelCluster(jump: tai; var lbltai: tai): TAsmLabel;
+{$ifndef JVM}
+        function OptimizeConditionalJump(CJLabel: TAsmLabel; var p: tai; hp1: tai; var stoploop: Boolean): Boolean;
+{$endif JVM}
+
+        { Jump/label optimisation entry method }
+        function DoJumpOptimizations(var p: tai; var stoploop: Boolean): Boolean;
+
         { insert debug comments about which registers are read and written by
           each instruction. Useful for debugging the InstructionLoadsFromReg and
           other similar functions. }
@@ -910,6 +945,82 @@
         end;
 
 
+      { If UpdateUsedRegsAndOptimize has read ahead, the result is one before
+        the next valid entry (so "p.Next" returns what's expected).  If no
+        reading ahead happened, then the result is equal to p. }
+      function TAOptObj.UpdateUsedRegsAndOptimize(p : Tai): Tai;
+        var
+          NotFirst: Boolean;
+        begin
+          { this code is based on TUsedRegs.Update to avoid multiple passes through the asmlist,
+            the code is duplicated here }
+
+          Result := p;
+          if (p.typ in [ait_instruction, ait_label]) then
+            begin
+              if (p.next <> BlockEnd) and (tai(p.next).typ <> ait_instruction) then
+                begin
+                  { Advance one, otherwise the routine exits immediately and wastes time }
+                  p := tai(p.Next);
+                  NotFirst := True;
+                end
+              else
+                { If the next entry is an instruction, nothing will be updated or
+                  optimised here, so exit now to save time }
+                Exit;
+            end
+          else
+            NotFirst := False;
+
+          repeat
+            while assigned(p) and
+                  ((p.typ in (SkipInstr + [ait_align, ait_label] - [ait_RegAlloc])) or
+                   ((p.typ = ait_marker) and
+                    (tai_Marker(p).Kind in [mark_AsmBlockEnd,mark_NoLineInfoStart,mark_NoLineInfoEnd]))) do
+                 begin
+                   { Here's the optimise part }
+                   if (p.typ in [ait_align, ait_label]) then
+                     begin
+                       if StripDeadLabels(p, p) then
+                         begin
+                           { Note, if the first instruction is stripped and is
+                             the only one that gets removed, Result will now
+                             contain a dangling pointer, so compensate for this. }
+                           if not NotFirst then
+                             Result := tai(p.Previous);
+
+                           Continue;
+                         end;
+
+                       if ((p.typ = ait_label) and not labelCanBeSkipped(tai_label(p))) then
+                         Break;
+                     end;
+
+                   Result := p;
+                   p := tai(p.next);
+                 end;
+            while assigned(p) and
+                  (p.typ=ait_RegAlloc) Do
+              begin
+                case tai_regalloc(p).ratype of
+                  ra_alloc :
+                    Include(UsedRegs[getregtype(tai_regalloc(p).reg)].UsedRegs, getsupreg(tai_regalloc(p).reg));
+                  ra_dealloc :
+                    Exclude(UsedRegs[getregtype(tai_regalloc(p).reg)].UsedRegs, getsupreg(tai_regalloc(p).reg));
+                  else
+                    { Do nothing };
+                end;
+                Result := p;
+                p := tai(p.next);
+              end;
+            NotFirst := True;
+          until not(assigned(p)) or
+                (not(p.typ in SkipInstr + [ait_align]) and
+                 not((p.typ = ait_label) and
+                     labelCanBeSkipped(tai_label(p))));
+        end;
+
+
       procedure TAOptObj.UpdateUsedRegs(p : Tai);
         begin
           { this code is based on TUsedRegs.Update to avoid multiple passes through the asmlist,
@@ -1346,18 +1457,34 @@
       end;
 
 
-    function FindAnyLabel(hp: tai; var l: tasmlabel): Boolean;
+    function FindLiveLabel(hp: tai; var l: tasmlabel): Boolean;
+      var
+        next: tai;
       begin
-        FindAnyLabel := false;
-        while assigned(hp.next) and
-              (tai(hp.next).typ in (SkipInstr+[ait_align])) Do
-          hp := tai(hp.next);
-        if assigned(hp.next) and
-           (tai(hp.next).typ = ait_label) then
+        FindLiveLabel := false;
+
+        while True do
           begin
-            FindAnyLabel := true;
-            l := tai_label(hp.next).labsym;
-          end
+            while assigned(hp.next) and
+                  (tai(hp.next).typ in (SkipInstr+[ait_align])) Do
+              hp := tai(hp.next);
+
+            next := tai(hp.next);
+            if assigned(next) and
+              (tai(next).typ = ait_label) then
+              begin
+                l := tai_label(next).labsym;
+                if not l.is_used then
+                  begin
+                    { Unsafe label }
+                    hp := next;
+                    Continue;
+                  end;
+
+                FindLiveLabel := true;
+              end;
+            Exit;
+          end;
       end;
 
 
@@ -1385,11 +1512,9 @@
 {$if defined(arm) or defined(aarch64)}
           (hp.condition=c_None) and
 {$endif arm or aarch64}
-{$if defined(riscv32) or defined(riscv64)}          
           (hp.ops>0) and
+{$if defined(riscv32) or defined(riscv64)}
           (hp.oper[0]^.reg=NR_X0) and
-{$else riscv}
-          (hp.ops>0) and
 {$endif riscv}
           (JumpTargetOp(hp)^.typ = top_ref) and
           (JumpTargetOp(hp)^.ref^.symbol is TAsmLabel);
@@ -1424,10 +1549,592 @@
       end;
 
 
+    { Output debug message to console - null function if EXTDEBUG is not defined }
+    class procedure TAOptObj.DebugWrite(Message: string); inline;
+      begin
+{$ifdef EXTDEBUG}
+        WriteLn(Message);
+{$else EXTDEBUG}
+        { Do nothing }
+{$endif EXTDEBUG}
+      end;
+
+    { Removes all instructions between an unconditional jump and the next label }
+    procedure TAOptObj.RemoveDeadCodeAfterJump(p: taicpu);
+      var
+        hp1, hp2: tai;
+      begin
+        { the following code removes all code between a jmp and the next label,
+          because it can never be executed
+        }
+        while GetNextInstruction(p, hp1) and
+              (hp1 <> BlockEnd) and
+              (hp1.typ <> ait_label)
+{$ifdef JVM}
+              and (hp1.typ <> ait_jcatch)
+{$endif}
+              do
+          if not(hp1.typ in ([ait_label]+skipinstr)) then
+            begin
+              if (hp1.typ = ait_instruction) and
+                 taicpu(hp1).is_jmp and
+                 (JumpTargetOp(taicpu(hp1))^.typ = top_ref) and
+                 (JumpTargetOp(taicpu(hp1))^.ref^.symbol is TAsmLabel) then
+                 TAsmLabel(JumpTargetOp(taicpu(hp1))^.ref^.symbol).decrefs;
+              { don't kill start/end of assembler block,
+                no-line-info-start/end etc }
+              if (hp1.typ <> ait_marker) then
+                begin
+{$ifdef cpudelayslot}
+                  if (hp1.typ=ait_instruction) and (taicpu(hp1).is_jmp) then
+                    RemoveDelaySlot(hp1);
+{$endif cpudelayslot}
+                  if (hp1.typ = ait_align) then
+                    begin
+                      { Only remove the align if a label doesn't immediately follow }
+                      if GetNextInstruction(hp1, hp2) and (hp2.typ = ait_label) then
+                        { The label is unskippable }
+                        Exit;
+                    end;
+                  asml.remove(hp1);
+                  hp1.free;
+                end
+              else
+                p:=taicpu(hp1);
+            end
+          else
+            Break;
+      end;
+
+    { If hp is a label, strip it if its reference count is zero.  Repeat until
+      a non-label is found, or a label with a non-zero reference count.
+      True is returned if something was stripped }
+    function TAOptObj.StripDeadLabels(hp: tai; var NextValid: tai): Boolean;
+      var
+        tmp, tmpNext: tai;
+        hp1: tai;
+        CurrentAlign: tai;
+      begin
+        CurrentAlign := nil;
+        Result := False;
+        hp1 := hp;
+        NextValid := hp;
+
+        { Stop if hp is an instruction, for example }
+        while (hp1 <> BlockEnd) and (hp1.typ in [ait_label,ait_align]) do
+          begin
+            case hp1.typ of
+              ait_label:
+                begin
+                  with tai_label(hp1).labsym do
+                    if is_used or (bind <> AB_LOCAL) or (labeltype <> alt_jump) then
+                      begin
+                        { Valid label }
+                        if Result then
+                          NextValid := hp1;
+
+                        DebugWrite('JUMP DEBUG: Last label in cluster:' + tostr(labelnr));
+
+                        Exit;
+                      end;
+
+                  DebugWrite('JUMP DEBUG: Removed label ' + tostr(TAsmLabel(tai_label(hp1).labsym).labelnr));
+
+                  { Set tmp to the next valid entry }
+                  tmp := tai(hp1.Next);
+                  { Remove label }
+                  AsmL.Remove(hp1);
+                  hp1.Free;
+
+                  hp1 := tmp;
+
+                  Result := True;
+                  Continue;
+                end;
+              { Also remove the align if it comes before an unused label }
+              ait_align:
+                begin
+                  tmp := tai(hp1.Next);
+                  if tmp = BlockEnd then
+                    { End of block }
+                    Exit;
+
+                  if (cs_debuginfo in current_settings.moduleswitches) or
+                     (cs_use_lineinfo in current_settings.globalswitches) then
+                     { Don't remove aligns if debuginfo is present }
+                    begin
+                      if (tmp.typ in [ait_label,ait_align]) then
+                        begin
+                          hp1 := tmp;
+                          Continue;
+                        end
+                      else
+                        Break;
+                    end;
+
+                  repeat
+
+                    case tmp.typ of
+                      ait_align: { Merge the aligns if permissible }
+                        begin
+                          { Check the maxbytes field though, since this may result in the
+                            alignment being ignored }
+                          if ((tai_align_abstract(hp1).maxbytes = 0) and (tai_align_abstract(tmp).maxbytes = 0)) or
+                            { If a maxbytes field is present, only merge if the aligns have the same granularity }
+                            ((tai_align_abstract(hp1).aligntype = tai_align_abstract(tmp).aligntype)) then
+                            begin
+                              with tai_align_abstract(hp1) do
+                                begin
+                                  aligntype := max(aligntype, tai_align_abstract(tmp).aligntype);
+                                  maxbytes := max(maxbytes, tai_align_abstract(tmp).maxbytes);
+                                  fillsize := max(fillsize, tai_align_abstract(tmp).fillsize);
+                                  use_op := use_op or tai_align_abstract(tmp).use_op;
+
+                                  if use_op and (tai_align_abstract(tmp).fillop <> 0) then
+                                    fillop := tai_align_abstract(tmp).fillop;
+                                end;
+
+                              tmpNext := tai(tmp.Next);
+                              AsmL.Remove(tmp);
+                              tmp.Free;
+                              Result := True;
+                              tmp := tmpNext;
+                            end
+                          else
+                            tmp := tai(tmp.Next);
+
+                          Continue;
+                        end;
+                      ait_label:
+                        begin
+                          { Signal that we can possibly delete this align entry }
+                          CurrentAlign := hp1;
+
+                          repeat
+                            with tai_label(tmp).labsym do
+                              if is_used or (bind <> AB_LOCAL) or (labeltype <> alt_jump) then
+                                begin
+                                  { Valid label }
+                                  if Result then
+                                    NextValid := tmp;
+
+                                  DebugWrite('JUMP DEBUG: Last label in cluster:' + tostr(labelnr));
+
+                                  Exit;
+                                end;
+
+                            DebugWrite('JUMP DEBUG: Removed label ' + tostr(TAsmLabel(tai_label(tmp).labsym).labelnr));
+
+                            { Remove label }
+                            tmpNext := tai(tmp.Next);
+                            AsmL.Remove(tmp);
+                            tmp.Free;
+                            Result := True;
+                            tmp := tmpNext;
+
+                            { Loop here for a minor performance gain }
+                          until (tmp = BlockEnd) or (tmp.typ <> ait_label);
+
+                          { Re-evaluate the align and see what follows }
+                          Continue;
+                        end
+                      else
+                        begin
+                          { Set hp1 to the instruction after the align, because the
+                            align might get deleted later and hence set NextValid
+                            to a dangling pointer. [Kit] }
+                          hp1 := tmp;
+                          Break;
+                        end;
+                    end;
+                  until (tmp = BlockEnd);
+
+                  { Break out of the outer loop if the above Break is called }
+                  if (hp1 = tmp) then
+                    Break;
+                end
+              else
+                Break;
+            end;
+            hp1 := tai(hp1.Next);
+          end;
+
+        { hp1 will be the next valid entry }
+        NextValid := hp1;
+
+        { Remove the alignment field (but only if the next valid entry is not a live label) }
+        while Assigned(CurrentAlign) and (CurrentAlign.typ = ait_align) do
+          begin
+            DebugWrite('JUMP DEBUG: Alignment field removed');
+
+            tmp := tai(CurrentAlign.next);
+
+            AsmL.Remove(CurrentAlign);
+            CurrentAlign.Free;
+
+            CurrentAlign := tmp;
+          end;
+      end;
+
+
+    { Strips a label and any aligns that appear before it (if hp points to
+      them rather than the label).  Only call this procedure on a label that
+      you already know is no longer referenced }
+    procedure TAOptObj.StripLabelFast(hp: tai); {$ifdef USEINLINE}inline;{$endif USEINLINE}
+      var
+        tmp: tai;
+      begin
+        repeat
+          case hp.typ of
+            ait_align:
+              begin
+                tmp := tai(hp.Next);
+                asml.Remove(hp);
+                hp.Free;
+                hp := tmp;
+                { Control flow will now return to 'repeat' }
+              end;
+            ait_label:
+              begin
+{$ifdef EXTDEBUG}
+                { When not in debug mode, deleting a live label will cause an
+                  access violation later on. [Kit] }
+                if tai_label(hp).labsym.getrefs <> 0 then
+                  InternalError(2019110802);
+{$endif EXTDEBUG}
+                asml.Remove(hp);
+                hp.Free;
+                Exit;
+              end;
+            else
+              InternalError(2019110801);
+          end;
+        until False;
+      end;
+
+    { If a group of labels are clustered, change the jump to point to the last one
+      that is still referenced }
+    function TAOptObj.CollapseLabelCluster(jump: tai; var lbltai: tai): TAsmLabel;
+      var
+        LastLabel: TAsmLabel;
+        hp2: tai;
+      begin
+        Result := tai_label(lbltai).labsym;
+        LastLabel := Result;
+        hp2 := tai(lbltai.next);
+
+        while (hp2 <> BlockEnd) and (hp2.typ in SkipInstr + [ait_align, ait_label]) do
+          begin
+
+            if (hp2.typ = ait_label) and
+              (tai_label(hp2).labsym.is_used) and
+              (tai_label(hp2).labsym.labeltype = alt_jump) then
+              LastLabel := tai_label(hp2).labsym;
+
+            hp2 := tai(hp2.next);
+          end;
+
+        if (Result <> LastLabel) then
+          begin
+            Result.decrefs;
+            JumpTargetOp(taicpu(jump))^.ref^.symbol := LastLabel;
+            LastLabel.increfs;
+            Result := LastLabel;
+            lbltai := hp2;
+          end;
+      end;
+
+{$ifndef JVM}
+    function TAOptObj.OptimizeConditionalJump(CJLabel: TAsmLabel; var p: tai; hp1: tai; var stoploop: Boolean): Boolean;
+      var
+        hp2: tai;
+        NCJLabel: TAsmLabel;
+      begin
+        Result := False;
+        while (hp1 <> BlockEnd) do
+          begin
+            StripDeadLabels(hp1, hp1);
+
+            if (hp1 <> BlockEnd) and
+              (tai(hp1).typ=ait_instruction) and
+              IsJumpToLabel(taicpu(hp1)) then
+              begin
+                NCJLabel := TAsmLabel(JumpTargetOp(taicpu(hp1))^.ref^.symbol);
+
+                if IsJumpToLabelUncond(taicpu(hp1)) then
+                  begin
+                    { Do it now to get it out of the way and to aid optimisations
+                      later on in this method }
+                    RemoveDeadCodeAfterJump(taicpu(hp1));
+
+                    { GetNextInstruction could be factored out, but hp2 might be
+                      different after "RemoveDeadCodeAfterJump" }
+                    GetNextInstruction(hp1, hp2);
+
+                    { Check for:
+                        jmp<cond> @Lbl
+                        jmp       @Lbl
+                    }
+                    if (CJLabel = NCJLabel) then
+                      begin
+                        DebugWrite('JUMP DEBUG: Short-circuited conditional jump');
+                        { Both jumps go to the same label }
+                        CJLabel.decrefs;
+{$ifdef cpudelayslot}
+                        RemoveDelaySlot(p);
+{$endif cpudelayslot}
+                        UpdateUsedRegs(tai(p.Next));
+                        AsmL.Remove(p);
+                        p.Free;
+                        p := hp1;
+
+                        Result := True;
+                        Exit;
+                      end;
+
+                    if FindLabel(CJLabel, hp2) then
+                      begin
+                        { change the following jumps:
+                            jmp<cond> CJLabel         jmp<inv_cond> NCJLabel
+                            jmp       NCJLabel >>>    <code>
+                          CJLabel:                  NCJLabel:
+                            <code>
+                          NCJLabel:
+                        }
+{$if defined(arm) or defined(aarch64)}
+                        if (taicpu(p).condition<>C_None)
+{$if defined(aarch64)}
+                        { can't have conditional branches to
+                          global labels on AArch64, because the
+                          offset may become too big }
+                        and (NCJLabel.bind=AB_LOCAL)
+{$endif aarch64}
+                        then
+                          begin
+{$endif arm or aarch64}
+                            DebugWrite('JUMP DEBUG: Conditional jump inversion');
+
+                            taicpu(p).condition:=inverse_cond(taicpu(p).condition);
+                            CJLabel.decrefs;
+
+                            CJLabel := NCJLabel;
+                            JumpTargetOp(taicpu(p))^.ref^.symbol := NCJLabel;
+
+                            { when freeing hp1, the reference count
+                              isn't decreased, so don't increase }
+{$ifdef cpudelayslot}
+                            RemoveDelaySlot(hp1);
+{$endif cpudelayslot}
+                            asml.remove(hp1);
+                            hp1.free;
+
+                            hp1 := tai(p.Next);
+
+                            { Attempt another iteration in case more jumps follow }
+                            Continue;
+{$if defined(arm) or defined(aarch64)}
+                          end;
+{$endif arm or aarch64}
+                      end
+                    else if Assigned(hp2) and CollapseZeroDistJump(hp1, hp2, NCJLabel) then
+                      begin
+                        { Attempt another iteration in case more jumps follow }
+                        Continue;
+                      end;
+                  end
+                else
+                  begin
+                    GetNextInstruction(hp1, hp2);
+
+                    { Check for:
+                        jmp<cond1>    @Lbl1
+                        jmp<cond2>    @Lbl2
+
+                        Remove 2nd jump if conditions are equal or cond2 is fully covered by cond1
+                    }
+
+                    if condition_in(taicpu(hp1).condition, taicpu(p).condition) then
+                      begin
+                        DebugWrite('JUMP DEBUG: Dominated conditional jump');
+
+                        NCJLabel.decrefs;
+                        AsmL.Remove(hp1);
+                        hp1.Free;
+                        hp1 := tai(p.Next);
+
+                        { Flag another pass in case @Lbl2 appeared earlier in the procedure and is now a dead label }
+                        stoploop := False;
+                        { Attempt another iteration in case more jumps follow }
+                        Continue;
+                      end;
+
+                    { Check for:
+                        jmp<cond1>  @Lbl1
+                        jmp<cond2>  @Lbl2
+
+                      And inv(cond2) is a subset of cond1 (e.g. je followed by jne, or jae followed by jbe) )
+                    }
+                    if condition_in(inverse_cond(taicpu(hp1).condition), taicpu(p).condition) then
+                      begin
+                        { If @lbl1 immediately follows jmp<cond2>, we can remove
+                          the first jump completely }
+                        if FindLabel(CJLabel, hp2) then
+                          begin
+                            DebugWrite('JUMP DEBUG: jmp<cond> before jmp<inv_cond> - removed first jump');
+
+                            CJLabel.decrefs;
+{$ifdef cpudelayslot}
+                            RemoveDelaySlot(p);
+{$endif cpudelayslot}
+                            UpdateUsedRegs(tai(p.Next));
+                            AsmL.Remove(p);
+                            p.Free;
+                            p := hp1;
+
+                            Result := True;
+                            Exit;
+
+{$if not defined(avr) and not defined(riscv32) and not defined(riscv64)}
+                          end
+                        else
+                          { NOTE: There is currently no watertight, cross-platform way to create
+                            an unconditional jump without access to the cg object.  If anyone can
+                            improve this particular optimisation to work on AVR and RISC-V,
+                            please do. [Kit]}
+                          begin
+                            { Since cond1 is a subset of inv(cond2), jmp<cond2> will always branch if
+                              jmp<cond1> does not, so change jmp<cond2> to an unconditional jump. }
+
+                            DebugWrite('JUMP DEBUG: jmp<cond> before jmp<inv_cond> - made second jump unconditional');
+
+{$if defined(powerpc) or defined(powerpc64)}
+                            taicpu(hp2).condition.cond := C_None;
+                            taicpu(hp2).condition.simple := True;
+{$else powerpc}
+                            taicpu(hp2).condition := C_None;
+{$endif powerpc}
+                            taicpu(hp2).opcode := aopt_uncondjmp;
+
+                            { NOTE: Changing the jump to unconditional won't open up new opportunities
+                              for GetFinalDestination on earlier jumps because there's no live label
+                              between the two jump instructions, so setting 'stoploop' to False only
+                              wastes time. [Kit] }
+
+                            { See if more optimisations are possible }
+                            Continue;
+{$endif}
+                          end;
+                      end;
+                  end;
+            end;
+
+            if GetFinalDestination(taicpu(p),0) then
+              stoploop := False;
+
+            Exit;
+          end;
+
+      end;
+{$endif JVM}
+
+    function TAOptObj.CollapseZeroDistJump(var p: tai; hp1: tai; ThisLabel: TAsmLabel): Boolean;
+      var
+        tmp: tai;
+      begin
+        Result := False;
+
+        { remove jumps to labels coming right after them }
+        if FindLabel(ThisLabel, hp1) and
+            { Cannot remove the first instruction }
+            (p<>StartPoint) then
+          begin
+            ThisLabel.decrefs;
+
+            tmp := tai(p.Next); { Might be an align before the label }
+{$ifdef cpudelayslot}
+            RemoveDelaySlot(p);
+{$endif cpudelayslot}
+            asml.remove(p);
+            p.free;
+
+            StripDeadLabels(tmp, p);
+
+            if p.typ <> ait_instruction then
+              GetNextInstruction(UpdateUsedRegsAndOptimize(p), p);
+
+            Result := True;
+          end;
+
+    end;
+
+
+    function TAOptObj.DoJumpOptimizations(var p: tai; var stoploop: Boolean): Boolean;
+      var
+        hp1, hp2: tai;
+        ThisLabel: TAsmLabel;
+        ThisPassResult: Boolean;
+      begin
+        Result := False;
+        if (p.typ <> ait_instruction) or not IsJumpToLabel(taicpu(p)) then
+          Exit;
+
+        repeat
+          ThisPassResult := False;
+
+          if GetNextInstruction(p, hp1) and (hp1 <> BlockEnd) then
+            begin
+              SkipEntryExitMarker(hp1,hp1);
+              if (hp1 = BlockEnd) then
+                Exit;
+
+              ThisLabel := TAsmLabel(JumpTargetOp(taicpu(p))^.ref^.symbol);
+
+              hp2 := getlabelwithsym(ThisLabel);
+
+              { getlabelwithsym returning nil occurs if a label is in a
+                different block (e.g. on the other side of an asm...end pair). }
+              if Assigned(hp2) then
+                begin
+                  { If there are multiple labels in a row, change the destination to the last one
+                    in order to aid optimisation later }
+                  ThisLabel := CollapseLabelCluster(p, hp2);
+
+                  if CollapseZeroDistJump(p, hp1, ThisLabel) then
+                    begin
+                      stoploop := False;
+                      Result := True;
+                      Exit;
+                    end;
+
+                  if IsJumpToLabelUncond(taicpu(p)) then
+                    begin
+                      { Remove unreachable code between the jump and the next label }
+                      RemoveDeadCodeAfterJump(taicpu(p));
+
+                      ThisPassResult := GetFinalDestination(taicpu(p), 0);
+
+                      if ThisPassResult then
+                        { Might have caused some earlier labels to become dead }
+                        stoploop := False;
+                    end
+{$ifndef JVM}
+                  else if (taicpu(p).opcode = aopt_condjmp) then
+                    ThisPassResult := OptimizeConditionalJump(ThisLabel, p, hp1, stoploop)
+{$endif JVM}
+                    ;
+                end;
+
+            end;
+
+          Result := Result or ThisPassResult;
+        until not (ThisPassResult and (p.typ = ait_instruction) and IsJumpToLabel(taicpu(p)));
+
+      end;
+
+
     function TAOptObj.GetFinalDestination(hp: taicpu; level: longint): boolean;
       {traces sucessive jumps to their final destination and sets it, e.g.
-       je l1                je l3
-       <code>               <code>
+       je l1                je l3       <code>               <code>
        l1:       becomes    l1:
        je l2                je l3
        <code>               <code>
@@ -1434,100 +2141,158 @@
        l2:                  l2:
        jmp l3               jmp l3
 
-       the level parameter denotes how deeep we have already followed the jump,
+       the level parameter denotes how deep we have already followed the jump,
        to avoid endless loops with constructs such as "l5: ; jmp l5"           }
 
       var p1: tai;
-          {$if not defined(MIPS) and not defined(riscv64) and not defined(riscv32) and not defined(JVM)}
           p2: tai;
-          l: tasmlabel;
-          {$endif}
+{$if not defined(MIPS) and not defined(riscv64) and not defined(riscv32) and not defined(JVM)}
+          p3: tai;
+{$endif}
+          ThisLabel, l: tasmlabel;
 
       begin
-        GetfinalDestination := false;
+        GetFinalDestination := false;
         if level > 20 then
           exit;
-        p1 := getlabelwithsym(tasmlabel(JumpTargetOp(hp)^.ref^.symbol));
+
+        ThisLabel := TAsmLabel(JumpTargetOp(hp)^.ref^.symbol);
+        p1 := getlabelwithsym(ThisLabel);
         if assigned(p1) then
           begin
             SkipLabels(p1,p1);
-            if (tai(p1).typ = ait_instruction) and
+            if (p1.typ = ait_instruction) and
                (taicpu(p1).is_jmp) then
-              if { the next instruction after the label where the jump hp arrives}
-                 { is unconditional or of the same type as hp, so continue       }
-                 IsJumpToLabelUncond(taicpu(p1))
+              begin
+                p2 := tai(p1.Next);
+
+                { Collapse any zero distance jumps we stumble across }
+                while (p1<>StartPoint) and CollapseZeroDistJump(p1, p2, TAsmLabel(JumpTargetOp(taicpu(p1))^.ref^.symbol)) do
+                  begin
+                    { Note: Cannot remove the first instruction }
+                    if (p1.typ = ait_label) then
+                      SkipLabels(p1, p1);
+
+                    if not Assigned(p1) then
+                      { No more valid commands }
+                      Exit;
+
+                    { Check to see that we are actually still at a jump }
+                    if not ((tai(p1).typ = ait_instruction) and (taicpu(p1).is_jmp)) then
+                      begin
+                        { Required to ensure recursion works properly, but to also
+                          return false if a jump isn't modified. [Kit] }
+                        if level > 0 then GetFinalDestination := True;
+                        Exit;
+                      end;
+
+                    p2 := tai(p1.Next);
+                    if p2 = BlockEnd then
+                      Exit;
+                  end;
+
 {$if not defined(MIPS) and not defined(riscv64) and not defined(riscv32) and not defined(JVM)}
-{ for MIPS, it isn't enough to check the condition; first operands must be same, too. }
-                 or
-                 conditions_equal(taicpu(p1).condition,hp.condition) or
+                p3 := p2;
+{$endif not MIPS and not RV64 and not RV32 and not JVM}
 
-                 { the next instruction after the label where the jump hp arrives
-                   is the opposite of hp (so this one is never taken), but after
-                   that one there is a branch that will be taken, so perform a
-                   little hack: set p1 equal to this instruction (that's what the
-                   last SkipLabels is for, only works with short bool evaluation)}
-                 (conditions_equal(taicpu(p1).condition,inverse_cond(hp.condition)) and
-                  SkipLabels(p1,p2) and
-                  (p2.typ = ait_instruction) and
-                  (taicpu(p2).is_jmp) and
-                   (IsJumpToLabelUncond(taicpu(p2)) or
-                   (conditions_equal(taicpu(p2).condition,hp.condition))) and
-                  SkipLabels(p1,p1))
+                if { the next instruction after the label where the jump hp arrives}
+                   { is unconditional or of the same type as hp, so continue       }
+                   IsJumpToLabelUncond(taicpu(p1))
+
+                   { TODO: For anyone with experience with MIPS or RISC-V, please add support for tracing
+                     conditional jumps. [Kit] }
+
+{$if not defined(MIPS) and not defined(riscv64) and not defined(riscv32) and not defined(JVM)}
+  { for MIPS, it isn't enough to check the condition; first operands must be same, too. }
+                   or
+                   condition_in(hp.condition, taicpu(p1).condition) or
+
+                   { the next instruction after the label where the jump hp arrives
+                     is the opposite of hp (so this one is never taken), but after
+                     that one there is a branch that will be taken, so perform a
+                     little hack: set p1 equal to this instruction }
+                   (condition_in(hp.condition, inverse_cond(taicpu(p1).condition)) and
+                     SkipLabels(p3,p2) and
+                     (p2.typ = ait_instruction) and
+                     (taicpu(p2).is_jmp) and
+                       (IsJumpToLabelUncond(taicpu(p2)) or
+                       (condition_in(hp.condition, taicpu(p2).condition))
+                     ) and
+                     SetAndTest(p2,p1)
+                   )
 {$endif not MIPS and not RV64 and not RV32 and not JVM}
-                 then
-                begin
-                  { quick check for loops of the form "l5: ; jmp l5 }
-                  if (tasmlabel(JumpTargetOp(taicpu(p1))^.ref^.symbol).labelnr =
-                       tasmlabel(JumpTargetOp(hp)^.ref^.symbol).labelnr) then
-                    exit;
-                  if not GetFinalDestination(taicpu(p1),succ(level)) then
-                    exit;
+                   then
+                  begin
+                    { quick check for loops of the form "l5: ; jmp l5" }
+                    if (TAsmLabel(JumpTargetOp(taicpu(p1))^.ref^.symbol).labelnr = ThisLabel.labelnr) then
+                      exit;
+                    if not GetFinalDestination(taicpu(p1),succ(level)) then
+                      exit;
+
+                    { NOTE: Do not move this before the "l5: ; jmp l5" check,
+                      because GetFinalDestination may change the destination
+                      label of p1. [Kit] }
+
+                    l := tasmlabel(JumpTargetOp(taicpu(p1))^.ref^.symbol);
+
 {$if defined(aarch64)}
-                  { can't have conditional branches to
-                    global labels on AArch64, because the
-                    offset may become too big }
-                  if not(taicpu(hp).condition in [C_None,C_AL,C_NV]) and
-                     (tasmlabel(JumpTargetOp(taicpu(p1))^.ref^.symbol).bind<>AB_LOCAL) then
-                    exit;
+                    { can't have conditional branches to
+                      global labels on AArch64, because the
+                      offset may become too big }
+                    if not(taicpu(hp).condition in [C_None,C_AL,C_NV]) and
+                       (l.bind<>AB_LOCAL) then
+                      exit;
 {$endif aarch64}
-                  tasmlabel(JumpTargetOp(hp)^.ref^.symbol).decrefs;
-                  JumpTargetOp(hp)^.ref^.symbol:=JumpTargetOp(taicpu(p1))^.ref^.symbol;
-                  tasmlabel(JumpTargetOp(hp)^.ref^.symbol).increfs;
-                end
+                    ThisLabel.decrefs;
+                    JumpTargetOp(hp)^.ref^.symbol:=l;
+                    l.increfs;
+                    GetFinalDestination := True;
+                    Exit;
+                  end
 {$if not defined(MIPS) and not defined(riscv64) and not defined(riscv32) and not defined(JVM)}
-              else
-                if conditions_equal(taicpu(p1).condition,inverse_cond(hp.condition)) then
-                  if not FindAnyLabel(p1,l) then
+                else
+                  if condition_in(inverse_cond(hp.condition), taicpu(p1).condition) then
                     begin
-      {$ifdef finaldestdebug}
-                      insertllitem(asml,p1,p1.next,tai_comment.Create(
-                        strpnew('previous label inserted'))));
-      {$endif finaldestdebug}
-                      current_asmdata.getjumplabel(l);
-                      insertllitem(p1,p1.next,tai_label.Create(l));
-                      tasmlabel(JumpTargetOp(hp)^.ref^.symbol).decrefs;
-                      JumpTargetOp(hp)^.ref^.symbol := l;
-                      l.increfs;
-      {               this won't work, since the new label isn't in the labeltable }
-      {               so it will fail the rangecheck. Labeltable should become a   }
-      {               hashtable to support this:                                   }
-      {               GetFinalDestination(asml, hp);                               }
-                    end
-                  else
-                    begin
-      {$ifdef finaldestdebug}
-                      insertllitem(asml,p1,p1.next,tai_comment.Create(
-                        strpnew('next label reused'))));
-      {$endif finaldestdebug}
-                      l.increfs;
-                      tasmlabel(JumpTargetOp(hp)^.ref^.symbol).decrefs;
-                      JumpTargetOp(hp)^.ref^.symbol := l;
-                      if not GetFinalDestination(hp,succ(level)) then
-                        exit;
+                      if not FindLiveLabel(p1,l) then
+                        begin
+{$ifdef finaldestdebug}
+                          insertllitem(asml,p1,p1.next,tai_comment.Create(
+                            strpnew('previous label inserted'))));
+{$endif finaldestdebug}
+                          current_asmdata.getjumplabel(l);
+                          insertllitem(p1,p1.next,tai_label.Create(l));
+
+                          ThisLabel.decrefs;
+                          JumpTargetOp(hp)^.ref^.symbol := l;
+                          l.increfs;
+                          GetFinalDestination := True;
+          {               this won't work, since the new label isn't in the labeltable }
+          {               so it will fail the rangecheck. Labeltable should become a   }
+          {               hashtable to support this:                                   }
+          {               GetFinalDestination(asml, hp);                               }
+                        end
+                      else
+                        begin
+{$ifdef finaldestdebug}
+                          insertllitem(asml,p1,p1.next,tai_comment.Create(
+                            strpnew('next label reused'))));
+{$endif finaldestdebug}
+                          l.increfs;
+                          ThisLabel.decrefs;
+                          JumpTargetOp(hp)^.ref^.symbol := l;
+                          if not GetFinalDestination(hp,succ(level)) then
+                            exit;
+                        end;
+                      GetFinalDestination := True;
+                      Exit;
                     end;
 {$endif not MIPS and not RV64 and not RV32 and not JVM}
+              end;
           end;
-        GetFinalDestination := true;
+
+        { Required to ensure recursion works properly, but to also
+          return false if a jump isn't modified. [Kit] }
+        if level > 0 then GetFinalDestination := True;
       end;
 
 
@@ -1554,14 +2319,20 @@
     procedure TAOptObj.PeepHoleOptPass1;
       var
         p,hp1,hp2,hp3 : tai;
-        stoploop:boolean;
+        stoploop, FirstInstruction: boolean;
       begin
+        StartPoint := BlockStart;
+
         repeat
           stoploop:=true;
-          p := BlockStart;
+          p := StartPoint;
+          FirstInstruction := True;
           ClearUsedRegs;
-          while (p <> BlockEnd) Do
+
+          while Assigned(p) and (p <> BlockEnd) Do
             begin
+              prefetch(p.Next);
+
               { I'am not sure why this is done, UsedRegs should reflect the register usage before the instruction
                 If an instruction needs the information of this, it can easily create a TempUsedRegs (FK)
               UpdateUsedRegs(tai(p.next));
@@ -1570,155 +2341,39 @@
               if p.Typ=ait_instruction then
                 InsertLLItem(tai(p.Previous),p,tai_comment.create(strpnew(GetAllocationString(UsedRegs))));
   {$endif DEBUG_OPTALLOC}
+
+              { Handle Jmp Optimizations first }
+              if DoJumpOptimizations(p, stoploop) then
+                begin
+                  UpdateUsedRegs(p);
+                  if FirstInstruction then
+                    { Update StartPoint, since the old p was removed;
+                      don't set FirstInstruction to False though, as
+                      the new p might get removed too. }
+                    StartPoint := p;
+
+                  if (p.typ = ait_instruction) and IsJumpToLabel(taicpu(p)) then
+                    Continue;
+                end;
+
               if PeepHoleOptPass1Cpu(p) then
                 begin
                   stoploop:=false;
                   UpdateUsedRegs(p);
+
+                  if FirstInstruction then
+                    { Update StartPoint, since the old p was modified;
+                      don't set FirstInstruction to False though, as
+                      the new p might get modified too. }
+                    StartPoint := p;
+
                   continue;
                 end;
-              case p.Typ Of
-                ait_instruction:
-                  begin
-                    { Handle Jmp Optimizations }
-                    if taicpu(p).is_jmp then
-                      begin
-                        { the following if-block removes all code between a jmp and the next label,
-                          because it can never be executed
-                        }
-                        if IsJumpToLabelUncond(taicpu(p)) then
-                          begin
-                            hp2:=p;
-                            while GetNextInstruction(hp2, hp1) and
-                                  (hp1.typ <> ait_label)
-{$ifdef JVM}
-                                  and (hp1.typ <> ait_jcatch)
-{$endif}
-                                  do
-                              if not(hp1.typ in ([ait_label]+skipinstr)) then
-                                begin
-                                  if (hp1.typ = ait_instruction) and
-                                     taicpu(hp1).is_jmp and
-                                     (JumpTargetOp(taicpu(hp1))^.typ = top_ref) and
-                                     (JumpTargetOp(taicpu(hp1))^.ref^.symbol is TAsmLabel) then
-                                     TAsmLabel(JumpTargetOp(taicpu(hp1))^.ref^.symbol).decrefs;
-                                  { don't kill start/end of assembler block,
-                                    no-line-info-start/end, cfi end, etc }
-                                  if not(hp1.typ in [ait_align,ait_marker]) and
-                                     ((hp1.typ<>ait_cfi) or
-                                      (tai_cfi_base(hp1).cfityp<>cfi_endproc)) then
-                                    begin
-{$ifdef cpudelayslot}
-                                      if (hp1.typ=ait_instruction) and (taicpu(hp1).is_jmp) then
-                                        RemoveDelaySlot(hp1);
-{$endif cpudelayslot}
-                                      asml.remove(hp1);
-                                      hp1.free;
-                                      stoploop:=false;
-                                    end
-                                  else
-                                    hp2:=hp1;
-                                end
-                              else break;
-                            end;
-                        if GetNextInstruction(p, hp1) then
-                          begin
-                            SkipEntryExitMarker(hp1,hp1);
-                            { remove unconditional jumps to a label coming right after them }
-                            if IsJumpToLabelUncond(taicpu(p)) and
-                              FindLabel(tasmlabel(JumpTargetOp(taicpu(p))^.ref^.symbol), hp1) and
-          { TODO: FIXME removing the first instruction fails}
-                                (p<>blockstart) then
-                              begin
-                                tasmlabel(JumpTargetOp(taicpu(p))^.ref^.symbol).decrefs;
-{$ifdef cpudelayslot}
-                                RemoveDelaySlot(p);
-{$endif cpudelayslot}
-                                hp2:=tai(hp1.next);
-                                asml.remove(p);
-                                p.free;
-                                p:=hp2;
-                                stoploop:=false;
-                                continue;
-                              end
-                            else if assigned(hp1) then
-                              begin
-                                { change the following jumps:
-                                    jmp<cond> lab_1         jmp<cond_inverted> lab_2
-                                    jmp       lab_2  >>>    <code>
-                                  lab_1:                  lab_2:
-                                    <code>
-                                  lab_2:
-                                }
-                                hp3:=hp1;
-                                if hp1.typ = ait_label then
-                                  SkipLabels(hp1,hp1);
-                                if (tai(hp1).typ=ait_instruction) and
-                                  IsJumpToLabelUncond(taicpu(hp1)) and
-                                  GetNextInstruction(hp1, hp2) and
-                                  IsJumpToLabel(taicpu(p)) and
-                                  FindLabel(tasmlabel(JumpTargetOp(taicpu(p))^.ref^.symbol), hp2) and
-                                  { prevent removal of infinite loop from
-                                        jmp<cond> lab_1
-                                      lab_2:
-                                        jmp lab2
-                                      ..
-                                      lab_1:
-                                  }
-                                  not FindLabel(tasmlabel(JumpTargetOp(taicpu(hp1))^.ref^.symbol), hp3) then
-                                  begin
-                                    if (taicpu(p).opcode=aopt_condjmp)
-{$if defined(arm) or defined(aarch64)}
-                                      and (taicpu(p).condition<>C_None)
-{$endif arm or aarch64}
-{$if defined(aarch64)}
-                                      { can't have conditional branches to
-                                        global labels on AArch64, because the
-                                        offset may become too big }
-                                      and (tasmlabel(JumpTargetOp(taicpu(hp1))^.ref^.symbol).bind=AB_LOCAL)
-{$endif aarch64}
-                                    then
-                                      begin
-                                        taicpu(p).condition:=inverse_cond(taicpu(p).condition);
-                                        tai_label(hp2).labsym.decrefs;
-                                        JumpTargetOp(taicpu(p))^.ref^.symbol:=JumpTargetOp(taicpu(hp1))^.ref^.symbol;
-                                        { when freeing hp1, the reference count
-                                          isn't decreased, so don't increase
 
-                                         taicpu(p).oper[0]^.ref^.symbol.increfs;
-                                        }
-{$ifdef cpudelayslot}
-                                        RemoveDelaySlot(hp1);
-{$endif cpudelayslot}
-                                        asml.remove(hp1);
-                                        hp1.free;
-                                        stoploop:=false;
-                                        GetFinalDestination(taicpu(p),0);
-                                      end
-                                    else
-                                      begin
-                                        GetFinalDestination(taicpu(p),0);
-                                        p:=tai(p.next);
-                                        continue;
-                                      end;
-                                  end
-                                else if IsJumpToLabel(taicpu(p)) then
-                                  GetFinalDestination(taicpu(p),0);
-                              end;
-                          end;
-                      end
-                    else
-                    { All other optimizes }
-                      begin
-                      end; { if is_jmp }
-                  end;
-                else
-                  ;
-              end;
+              FirstInstruction := False;
               if assigned(p) then
-                begin
-                  UpdateUsedRegs(p);
-                  p:=tai(p.next);
-                end;
+                p := tai(UpdateUsedRegsAndOptimize(p).Next);
+
             end;
         until stoploop or not(cs_opt_level3 in current_settings.optimizerswitches);
       end;
@@ -1735,10 +2390,7 @@
             if PeepHoleOptPass2Cpu(p) then
               continue;
             if assigned(p) then
-              begin
-                UpdateUsedRegs(p);
-                p:=tai(p.next);
-              end;
+              p := tai(UpdateUsedRegsAndOptimize(p).Next);
           end;
       end;
 

J. Gareth Moreton

2019-11-09 21:28

developer   ~0119176

Added a relationship to an old closed issue because the comment did say "Investigate", which is pretty much the same as "TODO", but by coincidence the answer to the problem revealed itself, so the workaround in OptPass2Jcc is no longer necessary. Everything relevant is in the x86-specific patch.

Florian

2019-11-10 14:30

administrator   ~0119188

If you think inlining huge pieces of code is usefull, write a separate pass for the compiler doing so based on wpo and runtime profile. Everything else is wasting time and cluttering code.

J. Gareth Moreton

2019-11-10 14:45

developer   ~0119190

Which inline are you referring to? I've removed the directive from the functions in the patches. The new "StripLabelFast" procedure is inline because the loop is pretty short and it directly replaces three similar loops in "OptPass2Jcc" (in this case, my justification is reducing code duplication).

As discussed on the mailing list though, some good points were raised in regards to why inlining large, single-use routines is not always best, so I stand corrected. It would likely have to be something using WPO and maybe analysis of the particular assembler instructions to determine the output size.

Florian

2019-11-10 17:13

administrator   ~0119192

I was refering to the comment above.

Florian

2019-11-10 17:15

administrator   ~0119193

Thanks, I have applied it. I tested it on multiple targets.

Issue History

Date Modified Username Field Change
2019-11-06 15:01 J. Gareth Moreton New Issue
2019-11-06 15:01 J. Gareth Moreton File Added: Jump Optimisation Specs.pdf
2019-11-06 15:01 J. Gareth Moreton File Added: jump-optimizations-condition_in.patch
2019-11-06 15:01 J. Gareth Moreton File Added: jump-optimizations.patch
2019-11-06 15:01 J. Gareth Moreton File Added: jump-optimizations-x86-specific.patch
2019-11-06 15:09 J. Gareth Moreton Relationship added child of 0034628
2019-11-06 15:10 J. Gareth Moreton Tag Attached: patch
2019-11-06 15:10 J. Gareth Moreton Tag Attached: compiler
2019-11-06 15:10 J. Gareth Moreton Tag Attached: optimization
2019-11-06 15:10 J. Gareth Moreton Tag Attached: optimizations
2019-11-06 15:12 J. Gareth Moreton Additional Information Updated View Revisions
2019-11-06 15:12 J. Gareth Moreton FPCTarget => -
2019-11-06 15:21 J. Gareth Moreton Note Added: 0119101
2019-11-06 18:14 Florian Note Added: 0119105
2019-11-06 19:07 J. Gareth Moreton Note Added: 0119107
2019-11-06 19:47 Florian Note Added: 0119110
2019-11-06 21:22 J. Gareth Moreton Note Added: 0119112
2019-11-06 22:41 J. Gareth Moreton Note Added: 0119113
2019-11-06 23:11 J. Gareth Moreton File Deleted: jump-optimizations-condition_in.patch
2019-11-06 23:11 J. Gareth Moreton File Deleted: jump-optimizations.patch
2019-11-06 23:11 J. Gareth Moreton File Deleted: jump-optimizations-x86-specific.patch
2019-11-06 23:13 J. Gareth Moreton File Added: jump-optimizations-condition_in.patch
2019-11-06 23:13 J. Gareth Moreton File Added: jump-optimizations.patch
2019-11-06 23:13 J. Gareth Moreton File Added: jump-optimizations-x86-specific.patch
2019-11-06 23:13 J. Gareth Moreton Note Added: 0119115
2019-11-06 23:53 J. Gareth Moreton Note Added: 0119116
2019-11-06 23:54 J. Gareth Moreton Note Edited: 0119116 View Revisions
2019-11-06 23:54 J. Gareth Moreton Note Edited: 0119116 View Revisions
2019-11-06 23:56 J. Gareth Moreton Note Edited: 0119116 View Revisions
2019-11-07 00:37 J. Gareth Moreton Note Added: 0119119
2019-11-07 04:15 J. Gareth Moreton Note Added: 0119121
2019-11-07 04:16 J. Gareth Moreton Note Edited: 0119121 View Revisions
2019-11-08 19:53 J. Gareth Moreton File Deleted: jump-optimizations.patch
2019-11-08 19:53 J. Gareth Moreton File Deleted: jump-optimizations-x86-specific.patch
2019-11-08 19:57 J. Gareth Moreton File Added: jump-optimizations-x86-specific-v3.patch
2019-11-08 19:57 J. Gareth Moreton File Added: jump-optimizations-v3.patch
2019-11-08 19:57 J. Gareth Moreton Note Added: 0119169
2019-11-08 20:05 J. Gareth Moreton Note Edited: 0119169 View Revisions
2019-11-09 21:26 J. Gareth Moreton Relationship added related to 0033909
2019-11-09 21:28 J. Gareth Moreton Note Added: 0119176
2019-11-10 14:30 Florian Note Added: 0119188
2019-11-10 14:45 J. Gareth Moreton Note Added: 0119190
2019-11-10 17:13 Florian Note Added: 0119192
2019-11-10 17:15 Florian Assigned To => Florian
2019-11-10 17:15 Florian Status new => resolved
2019-11-10 17:15 Florian Resolution open => fixed
2019-11-10 17:15 Florian Fixed in Version => 3.3.1
2019-11-10 17:15 Florian Fixed in Revision => 43439, 43440, 43441
2019-11-10 17:15 Florian Note Added: 0119193
2019-11-11 20:40 J. Gareth Moreton Relationship added parent of 0036295
2019-11-12 03:19 J. Gareth Moreton Relationship added parent of 0036299
2019-11-24 15:12 J. Gareth Moreton Relationship added related to 0036352