View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0038130 | FPC | Compiler | public | 2020-11-25 09:34 | 2020-12-23 05:10 |
Reporter | J. Gareth Moreton | Assigned To | Florian | ||
Priority | normal | Severity | tweak | Reproducibility | N/A |
Status | resolved | Resolution | fixed | ||
Platform | i386 and x86_64 | OS | Microsoft Windows | ||
Product Version | 3.3.1 | ||||
Fixed in Version | 3.3.1 | ||||
Summary | 0038130: [Patch] Miscellaneous x86 optimisations | ||||
Description | This patch contains some small, miscellaneous optimisations for x86 platforms: - if "movzx %reg.%reg; shr x,%reg" is found (same register, just different sizes) and x is small enough (<=7 for movzbl, for example), the "shr" instruction is moved before "movzx", as this might allow further optimisation on the next pass. - if "movsx/d %reg.%reg; sar x,%reg" is found (same register, just different sizes) and x is small enough (<=7 for movsbl, for example), the "sar" instruction is moved before "movsx", as this might allow further optimisation on the next pass. - If "and x,%reg; shr y,%reg" is found and the two instructions cover all the bits (e.g. "andb $248,%reg; shrb $3,%reg"), the "and" instruction is removed. The above "movzx; shr" optimisation allows this to happen much more frequently. - if "and x,%reg; movsx/d %reg,%reg" is found (same register, just different sizes) and the 'and' instruction causes the sign bit to become zero, the movsx/d instruction is changed to movzx. Currently this doesn't cause a speed improvement, but is part of a larger, more in-depth optimisation routine that is still in development, and couldn't be easily removed from the patch. It is otherwise harmless though. - "and %reg,%reg" for B, W and L sizes is now removed if the FLAGS register is not in use and the previous instruction wrote to %reg (same size), since if the FLAGS register is not in use, "and %reg,%reg" serves only to zero the upper 64-bits of the register, which the previous instruction has done already. - (Pass 2) If "add %reg2,%reg1; mov/s/z x(%reg1),%reg1" is found (usually caused by the Lea2AddBase optimisation), it is changed to "mov/s/z x(%reg2,%reg1),%reg1", thus removing the "add" and not just reducing the instruction count but also eliminating a potential bottleneck. - (Post peephole) If "and x,%ax; movzwl %ax,%eax" is found (it has to be %e/ax) and x ensures the sign bit is zero (i.e. <= $7FFF), then "movzwl %ax,%eax" is changed to "cwtl" (or CWDE, depending on the asm mode) - note that "cwtl" is a shorter encoding for "movswl %ax,%eax", but since the sign bit is zero, it acts like a zero extend. This partly reverses the "and; movsx" to "and; movzx" change above if it didn't open any new optimisations. but occasionally it makes an improvement by itself. | ||||
Steps To Reproduce | Apply patch, confirm correct compilation and small speed boosts. | ||||
Additional Information | Some of these optimisations, notably the ones based around movzx, are part of a larger, more in-depth optimisation routine that is under development, but do a good job by themselves. x86_64-win64 has been fully tested with no regressions. i386-win32 requires further testing but is currently blocked due to an unrelated package compilation failure. | ||||
Tags | compiler, i386, optimization, patch, x86, x86_64 | ||||
Fixed in Revision | 47824 | ||||
FPCOldBugId | |||||
FPCTarget | - | ||||
Attached Files |
|
parent of | 0038247 | resolved | Pierre Muller | [Patch] Optimisation fix for Internal Error 201810201 |
|
Okay, now that I got i386-win32 working again, the build crashes on -CriotR, but oddly it crashes during the ppcwpo1 stage, which I've never seen before, so I'll have to find out what's going on there. |
|
Okay, maybe not. The crash occurs on the trunk too! |
|
Made OptPass1AND more efficient and the new OptPass2ADD a bit safer. i386-win32 tests pass without issue (well, the only issue is my antivirus preventing the writing of certain temporary files!) misc-ops.patch (25,225 bytes)
Index: compiler/i386/aoptcpu.pas =================================================================== --- compiler/i386/aoptcpu.pas (revision 47604) +++ compiler/i386/aoptcpu.pas (working copy) @@ -238,6 +238,8 @@ if InsContainsSegRef(taicpu(p)) then exit; case taicpu(p).opcode Of + A_ADD: + Result:=OptPass2ADD(p); A_Jcc: Result:=OptPass2Jcc(p); A_Lea: @@ -334,6 +336,8 @@ end; A_TEST, A_OR: Result:=PostPeepholeOptTestOr(p); + A_AND: + Result:=PostPeepholeOptAnd(p); A_MOVSX: Result:=PostPeepholeOptMOVSX(p); else Index: compiler/x86/aoptx86.pas =================================================================== --- compiler/x86/aoptx86.pas (revision 47604) +++ compiler/x86/aoptx86.pas (working copy) @@ -146,6 +146,7 @@ function OptPass2Jcc(var p : tai) : boolean; function OptPass2Lea(var p: tai): Boolean; function OptPass2SUB(var p: tai): Boolean; + function OptPass2ADD(var p : tai): Boolean; function PostPeepholeOptMov(var p : tai) : Boolean; {$ifdef x86_64} { These post-peephole optimisations only affect 64-bit registers. [Kit] } @@ -152,6 +153,7 @@ function PostPeepholeOptMovzx(var p : tai) : Boolean; function PostPeepholeOptXor(var p : tai) : Boolean; {$endif} + function PostPeepholeOptAnd(var p : tai) : boolean; function PostPeepholeOptMOVSX(var p : tai) : boolean; function PostPeepholeOptCmp(var p : tai) : Boolean; function PostPeepholeOptTestOr(var p : tai) : Boolean; @@ -5706,6 +5708,55 @@ Result := True; Exit; end + else if reg_and_hp1_is_instr and + (taicpu(p).oper[0]^.typ = top_reg) and + ( + ((taicpu(hp1).opcode = A_SHR) and (taicpu(p).opcode = A_MOVZX)) or + ((taicpu(hp1).opcode = A_SAR) and (taicpu(p).opcode <> A_MOVZX)) + ) and + (taicpu(hp1).oper[0]^.typ = top_const) and + SuperRegistersEqual(taicpu(p).oper[0]^.reg, taicpu(p).oper[1]^.reg) and + MatchOperand(taicpu(hp1).oper[1]^, taicpu(p).oper[1]^.reg) and + { Minimum shift value allowed is the bit size of the smallest register - 1 } + (taicpu(hp1).oper[0]^.val < + { Multiply by 8 because tcgsize2size returns bytes, not bits } + 8 * ( + tcgsize2size[reg_cgsize(taicpu(p).oper[0]^.reg)] + ) + ) then + begin + { For: + movsx %reg1,%reg1 movzx %reg1,%reg1 (same register, just different sizes) + sar ##, %reg1 shr ##, %reg1 + + Move the shift to before the movx instruction if the shift value + is not too large. + } + asml.Remove(hp1); + asml.InsertBefore(hp1, p); + + taicpu(hp1).oper[1]^.reg := taicpu(p).oper[0]^.reg; + + case taicpu(p).opsize of + s_BW, S_BL{$ifdef x86_64}, S_BQ{$endif}: + taicpu(hp1).opsize := S_B; + S_WL{$ifdef x86_64}, S_WQ{$endif}: + taicpu(hp1).opsize := S_W; + {$ifdef x86_64} + S_LQ: + taicpu(hp1).opsize := S_L; + {$endif} + else + InternalError(2020112401); + end; + + if (taicpu(hp1).opcode = A_SHR) then + DebugMsg(SPeepholeOptimization + 'MovzShr2ShrMovz', hp1) + else + DebugMsg(SPeepholeOptimization + 'MovsSar2SarMovs', hp1); + + Result := True; + end else if taicpu(p).opcode=A_MOVZX then begin { removes superfluous And's after movzx's } @@ -5952,132 +6003,181 @@ function TX86AsmOptimizer.OptPass1AND(var p : tai) : boolean; var - hp1 : tai; + hp1, hp2 : tai; MaskLength : Cardinal; + MaskedBits : TCgInt; begin Result:=false; - if GetNextInstruction(p, hp1) then + { There are no optimisations for reference targets } + if (taicpu(p).oper[1]^.typ <> top_reg) then + Exit; + + while GetNextInstruction(p, hp1) and + (hp1.typ = ait_instruction) do begin - if MatchOpType(taicpu(p),top_const,top_reg) and - MatchInstruction(hp1,A_AND,[]) and - MatchOpType(taicpu(hp1),top_const,top_reg) and - (getsupreg(taicpu(p).oper[1]^.reg) = getsupreg(taicpu(hp1).oper[1]^.reg)) and - { the second register must contain the first one, so compare their subreg types } - (getsubreg(taicpu(p).oper[1]^.reg)<=getsubreg(taicpu(hp1).oper[1]^.reg)) and - (abs(taicpu(p).oper[0]^.val and taicpu(hp1).oper[0]^.val)<$80000000) then - { change - and const1, reg - and const2, reg - to - and (const1 and const2), reg - } + if (taicpu(p).oper[0]^.typ = top_const) then begin - taicpu(hp1).loadConst(0, taicpu(p).oper[0]^.val and taicpu(hp1).oper[0]^.val); - DebugMsg(SPeepholeOptimization + 'AndAnd2And done',hp1); - RemoveCurrentP(p, hp1); - Result:=true; - exit; - end - else if MatchOpType(taicpu(p),top_const,top_reg) and - MatchInstruction(hp1,A_MOVZX,[]) and - MatchOpType(taicpu(hp1),top_reg,top_reg) and - SuperRegistersEqual(taicpu(p).oper[1]^.reg,taicpu(hp1).oper[1]^.reg) and - (getsupreg(taicpu(hp1).oper[0]^.reg)=getsupreg(taicpu(hp1).oper[1]^.reg)) and - (((taicpu(p).opsize=S_W) and - (taicpu(hp1).opsize=S_BW)) or - ((taicpu(p).opsize=S_L) and - (taicpu(hp1).opsize in [S_WL,S_BL{$ifdef x86_64},S_BQ,S_WQ{$endif x86_64}])) + if (taicpu(hp1).opcode = A_AND) and + MatchOpType(taicpu(hp1),top_const,top_reg) and + (getsupreg(taicpu(p).oper[1]^.reg) = getsupreg(taicpu(hp1).oper[1]^.reg)) and + { the second register must contain the first one, so compare their subreg types } + (getsubreg(taicpu(p).oper[1]^.reg)<=getsubreg(taicpu(hp1).oper[1]^.reg)) and + (abs(taicpu(p).oper[0]^.val and taicpu(hp1).oper[0]^.val)<$80000000) then + { change + and const1, reg + and const2, reg + to + and (const1 and const2), reg + } + begin + taicpu(hp1).loadConst(0, taicpu(p).oper[0]^.val and taicpu(hp1).oper[0]^.val); + DebugMsg(SPeepholeOptimization + 'AndAnd2And done',hp1); + RemoveCurrentP(p, hp1); + Result:=true; + exit; + end + else if (taicpu(hp1).opcode = A_MOVZX) and + MatchOpType(taicpu(hp1),top_reg,top_reg) and + SuperRegistersEqual(taicpu(p).oper[1]^.reg,taicpu(hp1).oper[1]^.reg) and + (getsupreg(taicpu(hp1).oper[0]^.reg)=getsupreg(taicpu(hp1).oper[1]^.reg)) and + (((taicpu(p).opsize=S_W) and + (taicpu(hp1).opsize=S_BW)) or + ((taicpu(p).opsize=S_L) and + (taicpu(hp1).opsize in [S_WL,S_BL{$ifdef x86_64},S_BQ,S_WQ{$endif x86_64}])) {$ifdef x86_64} - or - ((taicpu(p).opsize=S_Q) and - (taicpu(hp1).opsize in [S_BQ,S_WQ,S_BL,S_WL])) + or + ((taicpu(p).opsize=S_Q) and + (taicpu(hp1).opsize in [S_BQ,S_WQ,S_BL,S_WL])) {$endif x86_64} - ) then - begin - if (((taicpu(hp1).opsize) in [S_BW,S_BL{$ifdef x86_64},S_BQ{$endif x86_64}]) and - ((taicpu(p).oper[0]^.val and $ff)=taicpu(p).oper[0]^.val) - ) or - (((taicpu(hp1).opsize) in [S_WL{$ifdef x86_64},S_WQ{$endif x86_64}]) and - ((taicpu(p).oper[0]^.val and $ffff)=taicpu(p).oper[0]^.val)) - then + ) then begin - { Unlike MOVSX, MOVZX doesn't actually have a version that zero-extends a - 32-bit register to a 64-bit register, or even a version called MOVZXD, so - code that tests for the presence of AND 0xffffffff followed by MOVZX is - wasted, and is indictive of a compiler bug if it were triggered. [Kit] + if (((taicpu(hp1).opsize) in [S_BW,S_BL{$ifdef x86_64},S_BQ{$endif x86_64}]) and + ((taicpu(p).oper[0]^.val and $ff)=taicpu(p).oper[0]^.val) + ) or + (((taicpu(hp1).opsize) in [S_WL{$ifdef x86_64},S_WQ{$endif x86_64}]) and + ((taicpu(p).oper[0]^.val and $ffff)=taicpu(p).oper[0]^.val)) + then + begin + { Unlike MOVSX, MOVZX doesn't actually have a version that zero-extends a + 32-bit register to a 64-bit register, or even a version called MOVZXD, so + code that tests for the presence of AND 0xffffffff followed by MOVZX is + wasted, and is indictive of a compiler bug if it were triggered. [Kit] - NOTE: To zero-extend from 32 bits to 64 bits, simply use the standard MOV. - } - DebugMsg(SPeepholeOptimization + 'AndMovzToAnd done',p); + NOTE: To zero-extend from 32 bits to 64 bits, simply use the standard MOV. + } + DebugMsg(SPeepholeOptimization + 'AndMovzToAnd done',p); - RemoveInstruction(hp1); - Exit; - end; - end - else if MatchOpType(taicpu(p),top_const,top_reg) and - MatchInstruction(hp1,A_SHL,[]) and - MatchOpType(taicpu(hp1),top_const,top_reg) and - (getsupreg(taicpu(p).oper[1]^.reg)=getsupreg(taicpu(hp1).oper[1]^.reg)) then - begin + RemoveInstruction(hp1); + + { See if there are other optimisations possible } + Continue; + end; + end + else if (taicpu(hp1).opcode = A_SHL) and + MatchOpType(taicpu(hp1),top_const,top_reg) and + (getsupreg(taicpu(p).oper[1]^.reg)=getsupreg(taicpu(hp1).oper[1]^.reg)) then + begin {$ifopt R+} {$define RANGE_WAS_ON} {$R-} {$endif} - { get length of potential and mask } - MaskLength:=SizeOf(taicpu(p).oper[0]^.val)*8-BsrQWord(taicpu(p).oper[0]^.val)-1; + { get length of potential and mask } + MaskLength:=SizeOf(taicpu(p).oper[0]^.val)*8-BsrQWord(taicpu(p).oper[0]^.val)-1; - { really a mask? } + { really a mask? } {$ifdef RANGE_WAS_ON} {$R+} {$endif} - if (((QWord(1) shl MaskLength)-1)=taicpu(p).oper[0]^.val) and - { unmasked part shifted out? } - ((MaskLength+taicpu(hp1).oper[0]^.val)>=topsize2memsize[taicpu(hp1).opsize]) then + if (((QWord(1) shl MaskLength)-1)=taicpu(p).oper[0]^.val) and + { unmasked part shifted out? } + ((MaskLength+taicpu(hp1).oper[0]^.val)>=topsize2memsize[taicpu(hp1).opsize]) then + begin + DebugMsg(SPeepholeOptimization + 'AndShlToShl done',p); + RemoveCurrentP(p, hp1); + Result:=true; + exit; + end; + end + else if (taicpu(hp1).opcode = A_SHR) and + MatchOpType(taicpu(hp1),top_const,top_reg) and + (taicpu(p).oper[1]^.reg = taicpu(hp1).oper[1]^.reg) and + (taicpu(hp1).oper[0]^.val <= 63) then begin - DebugMsg(SPeepholeOptimization + 'AndShlToShl done',p); - RemoveCurrentP(p, hp1); - Result:=true; - exit; - end; - end - else if MatchOpType(taicpu(p),top_const,top_reg) and - MatchInstruction(hp1,A_MOVSX{$ifdef x86_64},A_MOVSXD{$endif x86_64},[]) and - (taicpu(hp1).oper[0]^.typ = top_reg) and - MatchOperand(taicpu(p).oper[1]^,taicpu(hp1).oper[1]^) and - (getsupreg(taicpu(hp1).oper[0]^.reg)=getsupreg(taicpu(hp1).oper[1]^.reg)) and - (((taicpu(p).opsize=S_W) and - (taicpu(hp1).opsize=S_BW)) or - ((taicpu(p).opsize=S_L) and - (taicpu(hp1).opsize in [S_WL,S_BL])) + { Does SHR combined with the AND cover all the bits? + + e.g. for "andb $252,%reg; shrb $2,%reg" - the "and" can be removed } + + MaskedBits := taicpu(p).oper[0]^.val or ((TCgInt(1) shl taicpu(hp1).oper[0]^.val) - 1); + + if ((taicpu(p).opsize = S_B) and ((MaskedBits and $FF) = $FF)) or + ((taicpu(p).opsize = S_W) and ((MaskedBits and $FFFF) = $FFFF)) or + ((taicpu(p).opsize = S_L) and ((MaskedBits and $FFFFFFFF) = $FFFFFFFF)) then + begin + DebugMsg(SPeepholeOptimization + 'AndShrToShr done', p); + RemoveCurrentP(p, hp1); + Result := True; + Exit; + end; + end + else if ((taicpu(hp1).opcode = A_MOVSX){$ifdef x86_64} or (taicpu(hp1).opcode = A_MOVSXD){$endif x86_64}) and + (taicpu(hp1).oper[0]^.typ = top_reg) and + SuperRegistersEqual(taicpu(hp1).oper[0]^.reg, taicpu(hp1).oper[1]^.reg) then + begin + if SuperRegistersEqual(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[1]^.reg) and + ( + ( + (taicpu(hp1).opsize in [S_BW,S_BL{$ifdef x86_64},S_BQ{$endif x86_64}]) and + ((taicpu(p).oper[0]^.val and $7F) = taicpu(p).oper[0]^.val) + ) or ( + (taicpu(hp1).opsize in [S_WL{$ifdef x86_64},S_WQ{$endif x86_64}]) and + ((taicpu(p).oper[0]^.val and $7FFF) = taicpu(p).oper[0]^.val) {$ifdef x86_64} - or - ((taicpu(p).opsize=S_Q) and - (taicpu(hp1).opsize in [S_BQ,S_WQ,S_LQ])) + ) or ( + (taicpu(hp1).opsize = S_LQ) and + ((taicpu(p).oper[0]^.val and $7fffffff) = taicpu(p).oper[0]^.val) {$endif x86_64} - ) then - begin - if (((taicpu(hp1).opsize) in [S_BW,S_BL{$ifdef x86_64},S_BQ{$endif x86_64}]) and - ((taicpu(p).oper[0]^.val and $7f)=taicpu(p).oper[0]^.val) - ) or - (((taicpu(hp1).opsize) in [S_WL{$ifdef x86_64},S_WQ{$endif x86_64}]) and - ((taicpu(p).oper[0]^.val and $7fff)=taicpu(p).oper[0]^.val)) + ) + ) then + begin + if (taicpu(p).oper[1]^.reg = taicpu(hp1).oper[1]^.reg){$ifdef x86_64} or (taicpu(hp1).opsize = S_LQ){$endif x86_64} then + begin + DebugMsg(SPeepholeOptimization + 'AndMovsxToAnd',p); + RemoveInstruction(hp1); + { See if there are other optimisations possible } + Continue; + end; + + { The super-registers are the same though. + + Note that this change by itself doesn't improve + code speed, but it opens up other optimisations. } {$ifdef x86_64} - or - (((taicpu(hp1).opsize)=S_LQ) and - ((taicpu(p).oper[0]^.val and $7fffffff)=taicpu(p).oper[0]^.val) - ) + { Convert 64-bit register to 32-bit } + case taicpu(hp1).opsize of + S_BQ: + begin + taicpu(hp1).opsize := S_BL; + taicpu(hp1).oper[1]^.reg := newreg(R_INTREGISTER, getsupreg(taicpu(hp1).oper[1]^.reg), R_SUBD); + end; + S_WQ: + begin + taicpu(hp1).opsize := S_WL; + taicpu(hp1).oper[1]^.reg := newreg(R_INTREGISTER, getsupreg(taicpu(hp1).oper[1]^.reg), R_SUBD); + end + else + ; + end; {$endif x86_64} - then - begin - DebugMsg(SPeepholeOptimization + 'AndMovsxToAnd',p); - RemoveInstruction(hp1); - Exit; - end; - end - else if (taicpu(p).oper[1]^.typ = top_reg) and - (hp1.typ = ait_instruction) and - (taicpu(hp1).is_jmp) and + DebugMsg(SPeepholeOptimization + 'AndMovsxToAndMovzx', hp1); + taicpu(hp1).opcode := A_MOVZX; + { See if there are other optimisations possible } + Continue; + end; + end; + end; + + if (taicpu(hp1).is_jmp) and (taicpu(hp1).opcode<>A_JMP) and not(RegInUsedRegs(taicpu(p).oper[1]^.reg,UsedRegs)) then begin @@ -6093,10 +6193,12 @@ taicpu(p).opcode := A_TEST; Exit; end; + + Break; end; { Lone AND tests } - if MatchOpType(taicpu(p),top_const,top_reg) then + if (taicpu(p).oper[0]^.typ = top_const) then begin { - Convert and $0xFF,reg to and reg,reg if reg is 8-bit @@ -6116,9 +6218,97 @@ end; end; + { Backward check to determine necessity of and %reg,%reg } + if (taicpu(p).oper[0]^.typ = top_reg) and + (taicpu(p).oper[0]^.reg = taicpu(p).oper[1]^.reg) and + not RegInUsedRegs(NR_DEFAULTFLAGS, UsedRegs) and + GetLastInstruction(p, hp2) and + RegModifiedByInstruction(taicpu(p).oper[1]^.reg, hp2) and + { Check size of adjacent instruction to determine if the AND is + effectively a null operation } + ( + (taicpu(p).opsize = taicpu(hp2).opsize) or + { Note: Don't include S_Q } + ((taicpu(p).opsize = S_L) and (taicpu(hp2).opsize in [S_BL, S_WL])) or + ((taicpu(p).opsize = S_W) and (taicpu(hp2).opsize in [S_BW, S_BL, S_WL, S_L])) or + ((taicpu(p).opsize = S_B) and (taicpu(hp2).opsize in [S_BW, S_BL, S_WL, S_W, S_L])) + ) then + begin + DebugMsg(SPeepholeOptimization + 'And2Nop', p); + { If GetNextInstruction returned False, hp1 will be nil } + RemoveCurrentP(p, hp1); + Result := True; + Exit; + end; + end; + function TX86AsmOptimizer.OptPass2ADD(var p : tai) : boolean; + var + hp1: tai; + + { This entire nested function is used in an if-statement below, but we + want to avoid all the used reg transfers and GetNextInstruction calls + until we really have to check } + function MemRegisterNotUsedLater: Boolean; inline; + var + hp2: tai; + begin + TransferUsedRegs(TmpUsedRegs); + hp2 := p; + repeat + UpdateUsedRegs(TmpUsedRegs, tai(hp2.Next)); + until not GetNextInstruction(hp2, hp2) or (hp2 = hp1); + + Result := not RegUsedAfterInstruction(taicpu(p).oper[1]^.reg, hp1, TmpUsedRegs); + end; + + begin + Result := False; + + { Change: + add %reg2,%reg1 + mov/s/z #(%reg1),%reg1 (%reg1 superregisters must be the same) + + To: + mov/s/z #(%reg1,%reg2),%reg1 + } + + if (taicpu(p).opsize in [S_L{$ifdef x86_64}, S_Q{$endif}]) and + MatchOpType(taicpu(p), top_reg, top_reg) and + GetNextInstruction(p, hp1) and + MatchInstruction(hp1, [A_MOV, A_MOVZX, A_MOVSX{$ifdef x86_64}, A_MOVSXD{$endif}], []) and + MatchOpType(taicpu(hp1), top_ref, top_reg) and + (taicpu(hp1).oper[0]^.ref^.scalefactor <= 1) and + ( + ( + (taicpu(hp1).oper[0]^.ref^.base = taicpu(p).oper[1]^.reg) and + (taicpu(hp1).oper[0]^.ref^.index = NR_NO) + ) or ( + (taicpu(hp1).oper[0]^.ref^.index = taicpu(p).oper[1]^.reg) and + (taicpu(hp1).oper[0]^.ref^.base = NR_NO) + ) + ) and ( + Reg1WriteOverwritesReg2Entirely(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[1]^.reg) or + ( + { If the super registers ARE equal, then this MOV/S/Z does a partial write } + not SuperRegistersEqual(taicpu(p).oper[1]^.reg, taicpu(hp1).oper[1]^.reg) and + MemRegisterNotUsedLater + ) + ) then + begin + taicpu(hp1).oper[0]^.ref^.base := taicpu(p).oper[1]^.reg; + taicpu(hp1).oper[0]^.ref^.index := taicpu(p).oper[0]^.reg; + + DebugMsg(SPeepholeOptimization + 'AddMov2Mov done', p); + RemoveCurrentp(p, hp1); + Result := True; + Exit; + end; + end; + + function TX86AsmOptimizer.OptPass2Lea(var p : tai) : Boolean; begin Result:=false; @@ -6405,6 +6595,41 @@ end; + function TX86AsmOptimizer.PostPeepholeOptAnd(var p : tai) : boolean; + var + hp1: tai; + begin + { Detect: + andw x, %ax (0 <= x < $8000) + ... + movzwl %ax,%eax + + Change movzwl %ax,%eax to cwtl (shorter encoding for movswl %ax,%eax) + } + + Result := False; + if MatchOpType(taicpu(p), top_const, top_reg) and + (taicpu(p).oper[1]^.reg = NR_AX) and { This is also enough to determine that opsize = S_W } + ((taicpu(p).oper[0]^.val and $7FFF) = taicpu(p).oper[0]^.val) and + GetNextInstructionUsingReg(p, hp1, NR_EAX) and + MatchInstruction(hp1, A_MOVZX, [S_WL]) and + MatchOperand(taicpu(hp1).oper[0]^, NR_AX) and + MatchOperand(taicpu(hp1).oper[1]^, NR_EAX) then + begin + DebugMsg(SPeepholeOptimization + 'Converted movzwl %ax,%eax to cwtl (via AndMovz2AndCwtl)', hp1); + taicpu(hp1).opcode := A_CWDE; + taicpu(hp1).clearop(0); + taicpu(hp1).clearop(1); + taicpu(hp1).ops := 0; + + { A change was made, but not with p, so move forward 1 } + p := tai(p.Next); + Result := True; + end; + + end; + + function TX86AsmOptimizer.PostPeepholeOptMOVSX(var p : tai) : boolean; begin Result := False; Index: compiler/x86_64/aoptcpu.pas =================================================================== --- compiler/x86_64/aoptcpu.pas (revision 47604) +++ compiler/x86_64/aoptcpu.pas (working copy) @@ -169,6 +169,8 @@ Result:=OptPass2Lea(p); A_SUB: Result:=OptPass2SUB(p); + A_ADD: + Result:=OptPass2ADD(p); else ; end; @@ -188,6 +190,8 @@ case taicpu(p).opcode of A_MOV: Result:=PostPeepholeOptMov(p); + A_AND: + Result:=PostPeepholeOptAnd(p); A_MOVSX: Result:=PostPeepholeOptMOVSX(p); A_MOVZX: |
|
Thanks, applied. |
Date Modified | Username | Field | Change |
---|---|---|---|
2020-11-25 09:34 | J. Gareth Moreton | New Issue | |
2020-11-25 09:34 | J. Gareth Moreton | File Added: misc-ops.patch | |
2020-11-25 09:34 | J. Gareth Moreton | Tag Attached: x86 | |
2020-11-25 09:34 | J. Gareth Moreton | Tag Attached: x86_64 | |
2020-11-25 09:34 | J. Gareth Moreton | Tag Attached: compiler | |
2020-11-25 09:34 | J. Gareth Moreton | Tag Attached: i386 | |
2020-11-25 09:34 | J. Gareth Moreton | Tag Attached: optimization | |
2020-11-25 09:34 | J. Gareth Moreton | Tag Attached: patch | |
2020-11-25 09:36 | J. Gareth Moreton | Severity | minor => tweak |
2020-11-25 09:36 | J. Gareth Moreton | Description Updated | View Revisions |
2020-11-25 09:36 | J. Gareth Moreton | FPCTarget | => - |
2020-11-25 09:37 | J. Gareth Moreton | Description Updated | View Revisions |
2020-11-25 09:40 | J. Gareth Moreton | Additional Information Updated | View Revisions |
2020-11-26 18:40 | J. Gareth Moreton | Note Added: 0127201 | |
2020-11-27 05:34 | J. Gareth Moreton | Note Added: 0127204 | |
2020-11-27 21:25 | J. Gareth Moreton | File Deleted: misc-ops.patch | |
2020-11-27 21:27 | J. Gareth Moreton | Note Added: 0127214 | |
2020-11-27 21:27 | J. Gareth Moreton | File Added: misc-ops.patch | |
2020-12-20 19:06 | Florian | Assigned To | => Florian |
2020-12-20 19:06 | Florian | Status | new => resolved |
2020-12-20 19:06 | Florian | Resolution | open => fixed |
2020-12-20 19:06 | Florian | Fixed in Version | => 3.3.1 |
2020-12-20 19:06 | Florian | Fixed in Revision | => 47824 |
2020-12-20 19:06 | Florian | Note Added: 0127729 | |
2020-12-23 05:10 | J. Gareth Moreton | Relationship added | parent of 0038247 |