View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0038579 | FPC | Compiler | public | 2021-03-04 09:15 | 2021-03-15 22:12 |
Reporter | J. Gareth Moreton | Assigned To | Florian | ||
Priority | low | Severity | tweak | Reproducibility | N/A |
Status | resolved | Resolution | fixed | ||
Platform | i386 and x86_64 | OS | Microsoft Windows | ||
Product Version | 3.3.1 | ||||
Fixed in Version | 3.3.1 | ||||
Summary | 0038579: [Patch] AddMov2LeaAdd and -Os change for SubMov2LeaSub | ||||
Description | This patch adds the logical partner to the SubMov2LeaSub optimisation, namely AddMov2LeaAdd. It also modifies SubMov2LeaSub slightly to be selective in when to apply the optimisation under -Os (size over speed) settings, in that it will apply the optimisation if it can remove the Sub instruction (the same goes for the Add instruction in this new optimisation). Tested on i386-win32 and x86_64-win64 and confirmed no regressions in test suite when run normally and when run with "-O4 -gl" options. | ||||
Steps To Reproduce | Apply patch and confirm correct compilation. | ||||
Additional Information | There is a slight risk of the optimisation causing diminishing returns when applied multiple times in a row. For example, in the System unit under -O4: .Lj749: addq $1,%rax movq %rax,%rcx # Peephole Optimization: %rcx = %rax; changed to minimise pipeline stall (MovXXX2MovXXX) movq %rax,%rdx shrq $3,%rdx andq $7,%rcx movzbl (%rbx,%rdx),%r8d ... Becomes... .Lj749: leaq 1(%rax),%rcx # Peephole Optimization: AddMov2LeaAdd leaq 1(%rax),%rdx # Peephole Optimization: AddMov2LeaAdd addq $1,%rax shrq $3,%rdx andq $7,%rcx movzbl (%rbx,%rdx),%r8d ... There comes a point where there is no speed gain because the processor runs out of available AGUs (used for LEA), while ALUs (used for MOV and ADD) are more numerous (Broadwell CPUs, for example, have 4 integer ALUs and 3 AGUs). Such a diminishing return is an edge case though, and might be possible to mitigate in the post-peephole stage by, say, changing ADD or MOV instructions to LEA or vice versa in order to balance out the AGU and ALU usage. In the example given though, it is likely there will still be a gain of 1 cycle, since the ADD instruction will run at the same time as the two LEA instructions, or simultaneously with the SHR and AND instructions. | ||||
Tags | i386, optimizations, patch.compiler, x86, x86_64 | ||||
Fixed in Revision | 48989 | ||||
FPCOldBugId | |||||
FPCTarget | - | ||||
Attached Files |
|
|
Would you please provide a more useful title than a revision number? |
|
Separated the SubMov2LeaSub change to a separate patch to keep it distinct. (Also updated them because the patch was accidentally out of date and had a bug) |
|
addmov2lea.patch (6,683 bytes)
Index: compiler/x86/aoptx86.pas =================================================================== --- compiler/x86/aoptx86.pas (revision 48872) +++ compiler/x86/aoptx86.pas (working copy) @@ -7025,7 +7025,7 @@ function TX86AsmOptimizer.OptPass2ADD(var p : tai) : boolean; var - hp1: tai; + hp1: tai; NewRef: TReference; { This entire nested function is used in an if-statement below, but we want to avoid all the used reg transfers and GetNextInstruction calls @@ -7045,45 +7045,103 @@ begin Result := False; + if not GetNextInstruction(p, hp1) or (hp1.typ <> ait_instruction) then + Exit; - { Change: - add %reg2,%reg1 - mov/s/z #(%reg1),%reg1 (%reg1 superregisters must be the same) + if (taicpu(p).opsize in [S_L{$ifdef x86_64}, S_Q{$endif}]) then + begin + { Change: + add %reg2,%reg1 + mov/s/z #(%reg1),%reg1 (%reg1 superregisters must be the same) - To: - mov/s/z #(%reg1,%reg2),%reg1 - } + To: + mov/s/z #(%reg1,%reg2),%reg1 + } + if MatchOpType(taicpu(p), top_reg, top_reg) and + MatchInstruction(hp1, [A_MOV, A_MOVZX, A_MOVSX{$ifdef x86_64}, A_MOVSXD{$endif}], []) and + MatchOpType(taicpu(hp1), top_ref, top_reg) and + (taicpu(hp1).oper[0]^.ref^.scalefactor <= 1) and + ( + ( + (taicpu(hp1).oper[0]^.ref^.base = taicpu(p).oper[1]^.reg) and + (taicpu(hp1).oper[0]^.ref^.index = NR_NO) + ) or ( + (taicpu(hp1).oper[0]^.ref^.index = taicpu(p).oper[1]^.reg) and + (taicpu(hp1).oper[0]^.ref^.base = NR_NO) + ) + ) and ( + Reg1WriteOverwritesReg2Entirely(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[1]^.reg) or + ( + { If the super registers ARE equal, then this MOV/S/Z does a partial write } + not SuperRegistersEqual(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[1]^.reg) and + MemRegisterNotUsedLater + ) + ) then + begin + taicpu(hp1).oper[0]^.ref^.base := taicpu(p).oper[1]^.reg; + taicpu(hp1).oper[0]^.ref^.index := taicpu(p).oper[0]^.reg; - if (taicpu(p).opsize in [S_L{$ifdef x86_64}, S_Q{$endif}]) and - MatchOpType(taicpu(p), top_reg, top_reg) and - GetNextInstruction(p, hp1) and - MatchInstruction(hp1, [A_MOV, A_MOVZX, A_MOVSX{$ifdef x86_64}, A_MOVSXD{$endif}], []) and - MatchOpType(taicpu(hp1), top_ref, top_reg) and - (taicpu(hp1).oper[0]^.ref^.scalefactor <= 1) and - ( - ( - (taicpu(hp1).oper[0]^.ref^.base = taicpu(p).oper[1]^.reg) and - (taicpu(hp1).oper[0]^.ref^.index = NR_NO) - ) or ( - (taicpu(hp1).oper[0]^.ref^.index = taicpu(p).oper[1]^.reg) and - (taicpu(hp1).oper[0]^.ref^.base = NR_NO) - ) - ) and ( - Reg1WriteOverwritesReg2Entirely(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[1]^.reg) or - ( - { If the super registers ARE equal, then this MOV/S/Z does a partial write } - not SuperRegistersEqual(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[1]^.reg) and - MemRegisterNotUsedLater - ) - ) then - begin - taicpu(hp1).oper[0]^.ref^.base := taicpu(p).oper[1]^.reg; - taicpu(hp1).oper[0]^.ref^.index := taicpu(p).oper[0]^.reg; + DebugMsg(SPeepholeOptimization + 'AddMov2Mov done', p); + RemoveCurrentp(p, hp1); + Result := True; + Exit; + end; - DebugMsg(SPeepholeOptimization + 'AddMov2Mov done', p); - RemoveCurrentp(p, hp1); - Result := True; - Exit; + { Change: + addl/q $x,%reg1 + movl/q %reg1,%reg2 + To: + leal/q $x(%reg1),%reg2 + addl/q $x,%reg1 (can be removed if %reg1 or the flags are not used afterwards) + + Breaks the dependency chain. + } + if MatchOpType(taicpu(p),top_const,top_reg) and + MatchInstruction(hp1, A_MOV, [taicpu(p).opsize]) and + (taicpu(hp1).oper[1]^.typ = top_reg) and + MatchOperand(taicpu(hp1).oper[0]^, taicpu(p).oper[1]^.reg) and + ( + { Don't do AddMov2LeaAdd under -Os, but do allow AddMov2Lea } + not (cs_opt_size in current_settings.optimizerswitches) or + ( + not RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp1, TmpUsedRegs) and + RegUsedAfterInstruction(NR_DEFAULTFLAGS, hp1, TmpUsedRegs) + ) + ) then + begin + { Change the MOV instruction to a LEA instruction, and update the + first operand } + + reference_reset(NewRef, 1, []); + NewRef.base := taicpu(p).oper[1]^.reg; + NewRef.scalefactor := 1; + NewRef.offset := taicpu(p).oper[0]^.val; + + taicpu(hp1).opcode := A_LEA; + taicpu(hp1).loadref(0, NewRef); + + TransferUsedRegs(TmpUsedRegs); + UpdateUsedRegs(TmpUsedRegs, tai(p.Next)); + if RegUsedAfterInstruction(NewRef.base, hp1, TmpUsedRegs) or + RegUsedAfterInstruction(NR_DEFAULTFLAGS, hp1, TmpUsedRegs) then + begin + { Move what is now the LEA instruction to before the SUB instruction } + Asml.Remove(hp1); + Asml.InsertBefore(hp1, p); + AllocRegBetween(taicpu(hp1).oper[1]^.reg, hp1, p, UsedRegs); + + DebugMsg(SPeepholeOptimization + 'AddMov2LeaAdd', p); + p := hp1; + end + else + begin + { Since %reg1 or the flags aren't used afterwards, we can delete p completely } + RemoveCurrentP(p, hp1); + DebugMsg(SPeepholeOptimization + 'AddMov2Lea', p); + end; + + Result := True; + end; end; end; submov2lea-upgrade.patch (1,197 bytes)
@@ -7131,13 +7189,20 @@ a CMP instruction if one follows. } Result := False; - if not (cs_opt_size in current_settings.optimizerswitches) and - (taicpu(p).opsize in [S_L{$ifdef x86_64}, S_Q{$endif x86_64}]) and + if (taicpu(p).opsize in [S_L{$ifdef x86_64}, S_Q{$endif x86_64}]) and MatchOpType(taicpu(p),top_const,top_reg) and GetNextInstruction(p, hp1) and MatchInstruction(hp1, A_MOV, [taicpu(p).opsize]) and (taicpu(hp1).oper[1]^.typ = top_reg) and - MatchOperand(taicpu(hp1).oper[0]^, taicpu(p).oper[1]^.reg) then + MatchOperand(taicpu(hp1).oper[0]^, taicpu(p).oper[1]^.reg) and + ( + { Don't do SubMov2LeaSub under -Os, but do allow SubMov2Lea } + not (cs_opt_size in current_settings.optimizerswitches) or + ( + not RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp1, TmpUsedRegs) and + RegUsedAfterInstruction(NR_DEFAULTFLAGS, hp1, TmpUsedRegs) + ) + ) then begin { Change the MOV instruction to a LEA instruction, and update the first operand } |
|
What's an example of a useful title, Sven, that isn't specified in another field? (Oh, sorry, I got the title and build number back to front somehow) |
|
Thank you for fixing the title. :) |
|
Thanks, both applied. |
Date Modified | Username | Field | Change |
---|---|---|---|
2021-03-04 09:15 | J. Gareth Moreton | New Issue | |
2021-03-04 09:15 | J. Gareth Moreton | File Added: addmov2lea.patch | |
2021-03-04 09:16 | Sven Barth | Note Added: 0129370 | |
2021-03-04 09:17 | J. Gareth Moreton | File Deleted: addmov2lea.patch | |
2021-03-04 09:17 | J. Gareth Moreton | Note Added: 0129371 | |
2021-03-04 09:17 | J. Gareth Moreton | File Added: addmov2lea.patch | |
2021-03-04 09:17 | J. Gareth Moreton | File Added: submov2lea-upgrade.patch | |
2021-03-04 09:20 | J. Gareth Moreton | File Deleted: addmov2lea.patch | |
2021-03-04 09:20 | J. Gareth Moreton | File Deleted: submov2lea-upgrade.patch | |
2021-03-04 09:20 | J. Gareth Moreton | Note Added: 0129372 | |
2021-03-04 09:20 | J. Gareth Moreton | File Added: addmov2lea.patch | |
2021-03-04 09:20 | J. Gareth Moreton | File Added: submov2lea-upgrade.patch | |
2021-03-04 09:21 | J. Gareth Moreton | Note Added: 0129374 | |
2021-03-04 09:22 | J. Gareth Moreton | Tag Attached: patch.compiler | |
2021-03-04 09:22 | J. Gareth Moreton | Tag Attached: optimizations | |
2021-03-04 09:22 | J. Gareth Moreton | Tag Attached: i386 | |
2021-03-04 09:22 | J. Gareth Moreton | Tag Attached: x86 | |
2021-03-04 09:22 | J. Gareth Moreton | Tag Attached: x86_64 | |
2021-03-04 09:22 | J. Gareth Moreton | Note Edited: 0129371 | View Revisions |
2021-03-04 09:23 | J. Gareth Moreton | Priority | normal => low |
2021-03-04 09:23 | J. Gareth Moreton | Severity | minor => tweak |
2021-03-04 09:23 | J. Gareth Moreton | Build | [Patch] AddMov2LeaAdd => r48871 |
2021-03-04 09:23 | J. Gareth Moreton | Summary | r48871 => [Patch] AddMov2LeaAdd |
2021-03-04 09:23 | J. Gareth Moreton | FPCTarget | => - |
2021-03-04 09:23 | J. Gareth Moreton | Note Edited: 0129374 | View Revisions |
2021-03-04 09:23 | J. Gareth Moreton | Note Edited: 0129374 | View Revisions |
2021-03-04 09:24 | J. Gareth Moreton | Summary | [Patch] AddMov2LeaAdd => [Patch] AddMov2LeaAdd and -Os change for SubMov2LeaSub |
2021-03-04 09:30 | J. Gareth Moreton | Relationship added | related to 0038555 |
2021-03-04 09:31 | J. Gareth Moreton | Relationship deleted | related to 0038555 |
2021-03-04 09:31 | J. Gareth Moreton | Relationship added | child of 0038555 |
2021-03-04 13:15 | Sven Barth | Note Added: 0129379 | |
2021-03-15 22:12 | Florian | Assigned To | => Florian |
2021-03-15 22:12 | Florian | Status | new => resolved |
2021-03-15 22:12 | Florian | Resolution | open => fixed |
2021-03-15 22:12 | Florian | Fixed in Version | => 3.3.1 |
2021-03-15 22:12 | Florian | Fixed in Revision | => 48989 |
2021-03-15 22:12 | Florian | Note Added: 0129698 |