View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0036437 | FPC | Compiler | public | 2019-12-14 10:56 | 2020-01-28 22:40 |
Reporter | J. Gareth Moreton | Assigned To | J. Gareth Moreton | ||
Priority | low | Severity | tweak | Reproducibility | N/A |
Status | closed | Resolution | suspended | ||
Platform | Cross-platform | OS | Microsoft Windows | ||
Product Version | 3.3.1 | ||||
Summary | 0036437: [Patch] Efficiency boosts in post-peephole optimisation stage | ||||
Description | This patch makes some general-purpose changes to the post-peephole optimisation stage. since this stage is only meant as a final chance to convert instructions to more efficient forms. As a result, if the current list entry is not an instruction, it no longer calls the platform-specific PostPeepholeOptsCPU. As an additional note, keeping track of the registers one instruction ahead proved to be less efficient both in terms of compiler speed and instruction conversion. On x86 platforms, certain "mov $0,%reg" instructions weren't being converted to "xor %reg,%reg" because a comparison instruction immediately followed (the FLAGS register gets allocated, implying that using xor would scramble it, even though it's not an issue here). | ||||
Steps To Reproduce | Apply patch and confirm correct compilation and testing. | ||||
Additional Information | i386-win32 and x86_64-win64 compile without problems, and "make fullcycle" is successful. Individual PostPeepholeOptsCPU routines were evaluated to determine if the register tracking changes would cause any adverse effects, which none were found. | ||||
Tags | compiler, optimizations, patch | ||||
Fixed in Revision | |||||
FPCOldBugId | |||||
FPCTarget | - | ||||
Attached Files |
|
|
PostPeepholeRegisters.patch (32,906 bytes)
Index: compiler/aarch64/aoptcpu.pas =================================================================== --- compiler/aarch64/aoptcpu.pas (revision 43679) +++ compiler/aarch64/aoptcpu.pas (working copy) @@ -550,15 +550,12 @@ function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean; begin result := false; - if p.typ=ait_instruction then - begin - case taicpu(p).opcode of - A_CMP: - Result:=OptPostCMP(p); - else - ; - end; - end; + case taicpu(p).opcode of + A_CMP: + Result:=OptPostCMP(p); + else + ; + end; end; begin Index: compiler/aoptobj.pas =================================================================== --- compiler/aoptobj.pas (revision 43679) +++ compiler/aoptobj.pas (working copy) @@ -2497,14 +2497,15 @@ ClearUsedRegs; while (p <> BlockEnd) Do begin - UpdateUsedRegs(tai(p.next)); - if PostPeepHoleOptsCpu(p) then - continue; - if assigned(p) then + if (p.typ = ait_instruction) and PostPeepHoleOptsCpu(p) then begin - UpdateUsedRegs(p); - p:=tai(p.next); + if (p.typ <> ait_instruction) then + UpdateUsedRegs(p); + Continue; end; + + UpdateUsedRegs(tai(p.Next)); + GetNextInstruction(p, p); end; end; Index: compiler/arm/aoptcpu.pas =================================================================== --- compiler/arm/aoptcpu.pas (revision 43679) +++ compiler/arm/aoptcpu.pas (working copy) @@ -3064,156 +3064,153 @@ begin result:=false; - if p.typ = ait_instruction then + if MatchInstruction(p, A_MOV, [C_None], [PF_None]) and + (taicpu(p).oper[1]^.typ=top_const) and + (taicpu(p).oper[1]^.val >= 0) and + (taicpu(p).oper[1]^.val < 256) and + (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then begin - if MatchInstruction(p, A_MOV, [C_None], [PF_None]) and - (taicpu(p).oper[1]^.typ=top_const) and - (taicpu(p).oper[1]^.val >= 0) and - (taicpu(p).oper[1]^.val < 256) and - (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then - begin - DebugMsg('Peephole Mov2Movs done', p); - asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); - asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); - IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); - taicpu(p).oppostfix:=PF_S; - result:=true; - end - else if MatchInstruction(p, A_MVN, [C_None], [PF_None]) and - (taicpu(p).oper[1]^.typ=top_reg) and - (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then - begin - DebugMsg('Peephole Mvn2Mvns done', p); - asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); - asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); - IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); - taicpu(p).oppostfix:=PF_S; - result:=true; - end - else if MatchInstruction(p, A_RSB, [C_None], [PF_None]) and - (taicpu(p).ops = 3) and - (taicpu(p).oper[2]^.typ=top_const) and - (taicpu(p).oper[2]^.val=0) and - (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then - begin - DebugMsg('Peephole Rsb2Rsbs done', p); - asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); - asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); - IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); - taicpu(p).oppostfix:=PF_S; - result:=true; - end - else if MatchInstruction(p, [A_ADD,A_SUB], [C_None], [PF_None]) and - (taicpu(p).ops = 3) and - MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and - (not MatchOperand(taicpu(p).oper[0]^, NR_STACK_POINTER_REG)) and - (taicpu(p).oper[2]^.typ=top_const) and - (taicpu(p).oper[2]^.val >= 0) and - (taicpu(p).oper[2]^.val < 256) and - (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then - begin - DebugMsg('Peephole AddSub2*s done', p); - asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); - asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); - IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); - taicpu(p).loadconst(1,taicpu(p).oper[2]^.val); - taicpu(p).oppostfix:=PF_S; - taicpu(p).ops := 2; - result:=true; - end - else if MatchInstruction(p, [A_ADD,A_SUB], [C_None], [PF_None]) and - (taicpu(p).ops = 2) and - (taicpu(p).oper[1]^.typ=top_reg) and - (not MatchOperand(taicpu(p).oper[0]^, NR_STACK_POINTER_REG)) and - (not MatchOperand(taicpu(p).oper[1]^, NR_STACK_POINTER_REG)) and - (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then - begin - DebugMsg('Peephole AddSub2*s done', p); - asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); - asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); - IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); - taicpu(p).oppostfix:=PF_S; - result:=true; - end - else if MatchInstruction(p, [A_ADD], [C_None], [PF_None]) and - (taicpu(p).ops = 3) and - MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and - (taicpu(p).oper[2]^.typ=top_reg) then - begin - DebugMsg('Peephole AddRRR2AddRR done', p); - taicpu(p).ops := 2; - taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg); - result:=true; - end - else if MatchInstruction(p, [A_AND,A_ORR,A_EOR,A_BIC,A_LSL,A_LSR,A_ASR,A_ROR], [C_None], [PF_None]) and - (taicpu(p).ops = 3) and - MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and - (taicpu(p).oper[2]^.typ=top_reg) and - (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then - begin - DebugMsg('Peephole opXXY2opsXY done', p); - asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); - asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); - IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); - taicpu(p).ops := 2; - taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg); - taicpu(p).oppostfix:=PF_S; - result:=true; - end - else if MatchInstruction(p, [A_AND,A_ORR,A_EOR,A_BIC,A_LSL,A_LSR,A_ASR,A_ROR], [C_None], [PF_S]) and - (taicpu(p).ops = 3) and - MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and - (taicpu(p).oper[2]^.typ in [top_reg,top_const]) then - begin - DebugMsg('Peephole opXXY2opXY done', p); - taicpu(p).ops := 2; - if taicpu(p).oper[2]^.typ=top_reg then - taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg) - else - taicpu(p).loadconst(1,taicpu(p).oper[2]^.val); - result:=true; - end - else if MatchInstruction(p, [A_AND,A_ORR,A_EOR], [C_None], [PF_None,PF_S]) and - (taicpu(p).ops = 3) and - MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[2]^) and - (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then - begin - DebugMsg('Peephole opXYX2opsXY done', p); - asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); - asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); - IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); - taicpu(p).oppostfix:=PF_S; - taicpu(p).ops := 2; - result:=true; - end - else if MatchInstruction(p, [A_MOV], [C_None], [PF_None]) and - (taicpu(p).ops=3) and - (taicpu(p).oper[2]^.typ=top_shifterop) and - (taicpu(p).oper[2]^.shifterop^.shiftmode in [SM_LSL,SM_LSR,SM_ASR,SM_ROR]) and - //MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and - (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then - begin - DebugMsg('Peephole Mov2Shift done', p); - asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); - asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); - IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); - taicpu(p).oppostfix:=PF_S; + DebugMsg('Peephole Mov2Movs done', p); + asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); + asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); + IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); + taicpu(p).oppostfix:=PF_S; + result:=true; + end + else if MatchInstruction(p, A_MVN, [C_None], [PF_None]) and + (taicpu(p).oper[1]^.typ=top_reg) and + (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then + begin + DebugMsg('Peephole Mvn2Mvns done', p); + asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); + asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); + IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); + taicpu(p).oppostfix:=PF_S; + result:=true; + end + else if MatchInstruction(p, A_RSB, [C_None], [PF_None]) and + (taicpu(p).ops = 3) and + (taicpu(p).oper[2]^.typ=top_const) and + (taicpu(p).oper[2]^.val=0) and + (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then + begin + DebugMsg('Peephole Rsb2Rsbs done', p); + asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); + asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); + IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); + taicpu(p).oppostfix:=PF_S; + result:=true; + end + else if MatchInstruction(p, [A_ADD,A_SUB], [C_None], [PF_None]) and + (taicpu(p).ops = 3) and + MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and + (not MatchOperand(taicpu(p).oper[0]^, NR_STACK_POINTER_REG)) and + (taicpu(p).oper[2]^.typ=top_const) and + (taicpu(p).oper[2]^.val >= 0) and + (taicpu(p).oper[2]^.val < 256) and + (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then + begin + DebugMsg('Peephole AddSub2*s done', p); + asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); + asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); + IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); + taicpu(p).loadconst(1,taicpu(p).oper[2]^.val); + taicpu(p).oppostfix:=PF_S; + taicpu(p).ops := 2; + result:=true; + end + else if MatchInstruction(p, [A_ADD,A_SUB], [C_None], [PF_None]) and + (taicpu(p).ops = 2) and + (taicpu(p).oper[1]^.typ=top_reg) and + (not MatchOperand(taicpu(p).oper[0]^, NR_STACK_POINTER_REG)) and + (not MatchOperand(taicpu(p).oper[1]^, NR_STACK_POINTER_REG)) and + (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then + begin + DebugMsg('Peephole AddSub2*s done', p); + asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); + asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); + IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); + taicpu(p).oppostfix:=PF_S; + result:=true; + end + else if MatchInstruction(p, [A_ADD], [C_None], [PF_None]) and + (taicpu(p).ops = 3) and + MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and + (taicpu(p).oper[2]^.typ=top_reg) then + begin + DebugMsg('Peephole AddRRR2AddRR done', p); + taicpu(p).ops := 2; + taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg); + result:=true; + end + else if MatchInstruction(p, [A_AND,A_ORR,A_EOR,A_BIC,A_LSL,A_LSR,A_ASR,A_ROR], [C_None], [PF_None]) and + (taicpu(p).ops = 3) and + MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and + (taicpu(p).oper[2]^.typ=top_reg) and + (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then + begin + DebugMsg('Peephole opXXY2opsXY done', p); + asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); + asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); + IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); + taicpu(p).ops := 2; + taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg); + taicpu(p).oppostfix:=PF_S; + result:=true; + end + else if MatchInstruction(p, [A_AND,A_ORR,A_EOR,A_BIC,A_LSL,A_LSR,A_ASR,A_ROR], [C_None], [PF_S]) and + (taicpu(p).ops = 3) and + MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and + (taicpu(p).oper[2]^.typ in [top_reg,top_const]) then + begin + DebugMsg('Peephole opXXY2opXY done', p); + taicpu(p).ops := 2; + if taicpu(p).oper[2]^.typ=top_reg then + taicpu(p).loadreg(1,taicpu(p).oper[2]^.reg) + else + taicpu(p).loadconst(1,taicpu(p).oper[2]^.val); + result:=true; + end + else if MatchInstruction(p, [A_AND,A_ORR,A_EOR], [C_None], [PF_None,PF_S]) and + (taicpu(p).ops = 3) and + MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[2]^) and + (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then + begin + DebugMsg('Peephole opXYX2opsXY done', p); + asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); + asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); + IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); + taicpu(p).oppostfix:=PF_S; + taicpu(p).ops := 2; + result:=true; + end + else if MatchInstruction(p, [A_MOV], [C_None], [PF_None]) and + (taicpu(p).ops=3) and + (taicpu(p).oper[2]^.typ=top_shifterop) and + (taicpu(p).oper[2]^.shifterop^.shiftmode in [SM_LSL,SM_LSR,SM_ASR,SM_ROR]) and + //MatchOperand(taicpu(p).oper[0]^, taicpu(p).oper[1]^) and + (not RegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs)) then + begin + DebugMsg('Peephole Mov2Shift done', p); + asml.InsertBefore(tai_regalloc.alloc(NR_DEFAULTFLAGS,p), p); + asml.InsertAfter(tai_regalloc.dealloc(NR_DEFAULTFLAGS,p), p); + IncludeRegInUsedRegs(NR_DEFAULTFLAGS,UsedRegs); + taicpu(p).oppostfix:=PF_S; - case taicpu(p).oper[2]^.shifterop^.shiftmode of - SM_LSL: taicpu(p).opcode:=A_LSL; - SM_LSR: taicpu(p).opcode:=A_LSR; - SM_ASR: taicpu(p).opcode:=A_ASR; - SM_ROR: taicpu(p).opcode:=A_ROR; - else - internalerror(2019050912); - end; + case taicpu(p).oper[2]^.shifterop^.shiftmode of + SM_LSL: taicpu(p).opcode:=A_LSL; + SM_LSR: taicpu(p).opcode:=A_LSR; + SM_ASR: taicpu(p).opcode:=A_ASR; + SM_ROR: taicpu(p).opcode:=A_ROR; + else + internalerror(2019050912); + end; - if taicpu(p).oper[2]^.shifterop^.rs<>NR_NO then - taicpu(p).loadreg(2, taicpu(p).oper[2]^.shifterop^.rs) - else - taicpu(p).loadconst(2, taicpu(p).oper[2]^.shifterop^.shiftimm); - result:=true; - end + if taicpu(p).oper[2]^.shifterop^.rs<>NR_NO then + taicpu(p).loadreg(2, taicpu(p).oper[2]^.shifterop^.rs) + else + taicpu(p).loadconst(2, taicpu(p).oper[2]^.shifterop^.shiftimm); + result:=true; end; end; Index: compiler/i386/aoptcpu.pas =================================================================== --- compiler/i386/aoptcpu.pas (revision 43679) +++ compiler/i386/aoptcpu.pas (working copy) @@ -263,77 +263,70 @@ hp1: tai; begin Result:=false; - case p.Typ Of - Ait_Instruction: - begin - if InsContainsSegRef(taicpu(p)) then - Exit; - case taicpu(p).opcode Of - A_CALL: - Result:=PostPeepHoleOptCall(p); - A_LEA: - Result:=PostPeepholeOptLea(p); - A_CMP: - Result:=PostPeepholeOptCmp(p); - A_MOV: - Result:=PostPeepholeOptMov(p); - A_MOVZX: - { if register vars are on, it's possible there is code like } - { "cmpl $3,%eax; movzbl 8(%ebp),%ebx; je .Lxxx" } - { so we can't safely replace the movzx then with xor/mov, } - { since that would change the flags (JM) } - if not(cs_opt_regvar in current_settings.optimizerswitches) then + if InsContainsSegRef(taicpu(p)) then + Exit; + case taicpu(p).opcode Of + A_CALL: + Result:=PostPeepHoleOptCall(p); + A_LEA: + Result:=PostPeepholeOptLea(p); + A_CMP: + Result:=PostPeepholeOptCmp(p); + A_MOV: + Result:=PostPeepholeOptMov(p); + A_MOVZX: + { if register vars are on, it's possible there is code like } + { "cmpl $3,%eax; movzbl 8(%ebp),%ebx; je .Lxxx" } + { so we can't safely replace the movzx then with xor/mov, } + { since that would change the flags (JM) } + if not(cs_opt_regvar in current_settings.optimizerswitches) then + begin + if (taicpu(p).oper[1]^.typ = top_reg) then + if (taicpu(p).oper[0]^.typ = top_reg) + then + case taicpu(p).opsize of + S_BL: + begin + if IsGP32Reg(taicpu(p).oper[1]^.reg) and + not(cs_opt_size in current_settings.optimizerswitches) and + (current_settings.optimizecputype = cpu_Pentium) then + {Change "movzbl %reg1, %reg2" to + "xorl %reg2, %reg2; movb %reg1, %reg2" for Pentium and + PentiumMMX} + begin + hp1 := taicpu.op_reg_reg(A_XOR, S_L, + taicpu(p).oper[1]^.reg, taicpu(p).oper[1]^.reg); + InsertLLItem(p.previous, p, hp1); + taicpu(p).opcode := A_MOV; + taicpu(p).changeopsize(S_B); + setsubreg(taicpu(p).oper[1]^.reg,R_SUBL); + end; + end; + else + ; + end + else if (taicpu(p).oper[0]^.typ = top_ref) and + (taicpu(p).oper[0]^.ref^.base <> taicpu(p).oper[1]^.reg) and + (taicpu(p).oper[0]^.ref^.index <> taicpu(p).oper[1]^.reg) and + not(cs_opt_size in current_settings.optimizerswitches) and + IsGP32Reg(taicpu(p).oper[1]^.reg) and + (current_settings.optimizecputype = cpu_Pentium) and + (taicpu(p).opsize = S_BL) then + {changes "movzbl mem, %reg" to "xorl %reg, %reg; movb mem, %reg8" for + Pentium and PentiumMMX} begin - if (taicpu(p).oper[1]^.typ = top_reg) then - if (taicpu(p).oper[0]^.typ = top_reg) - then - case taicpu(p).opsize of - S_BL: - begin - if IsGP32Reg(taicpu(p).oper[1]^.reg) and - not(cs_opt_size in current_settings.optimizerswitches) and - (current_settings.optimizecputype = cpu_Pentium) then - {Change "movzbl %reg1, %reg2" to - "xorl %reg2, %reg2; movb %reg1, %reg2" for Pentium and - PentiumMMX} - begin - hp1 := taicpu.op_reg_reg(A_XOR, S_L, - taicpu(p).oper[1]^.reg, taicpu(p).oper[1]^.reg); - InsertLLItem(p.previous, p, hp1); - taicpu(p).opcode := A_MOV; - taicpu(p).changeopsize(S_B); - setsubreg(taicpu(p).oper[1]^.reg,R_SUBL); - end; - end; - else - ; - end - else if (taicpu(p).oper[0]^.typ = top_ref) and - (taicpu(p).oper[0]^.ref^.base <> taicpu(p).oper[1]^.reg) and - (taicpu(p).oper[0]^.ref^.index <> taicpu(p).oper[1]^.reg) and - not(cs_opt_size in current_settings.optimizerswitches) and - IsGP32Reg(taicpu(p).oper[1]^.reg) and - (current_settings.optimizecputype = cpu_Pentium) and - (taicpu(p).opsize = S_BL) then - {changes "movzbl mem, %reg" to "xorl %reg, %reg; movb mem, %reg8" for - Pentium and PentiumMMX} - begin - hp1 := taicpu.Op_reg_reg(A_XOR, S_L, taicpu(p).oper[1]^.reg, - taicpu(p).oper[1]^.reg); - taicpu(p).opcode := A_MOV; - taicpu(p).changeopsize(S_B); - setsubreg(taicpu(p).oper[1]^.reg,R_SUBL); - InsertLLItem(p.previous, p, hp1); - end; - end; - A_TEST, A_OR: - Result:=PostPeepholeOptTestOr(p); - else - ; - end; - end; + hp1 := taicpu.Op_reg_reg(A_XOR, S_L, taicpu(p).oper[1]^.reg, + taicpu(p).oper[1]^.reg); + taicpu(p).opcode := A_MOV; + taicpu(p).changeopsize(S_B); + setsubreg(taicpu(p).oper[1]^.reg,R_SUBL); + InsertLLItem(p.previous, p, hp1); + end; + end; + A_TEST, A_OR: + Result:=PostPeepholeOptTestOr(p); else - ; + { Do nothing }; end; end; Index: compiler/i8086/aoptcpu.pas =================================================================== --- compiler/i8086/aoptcpu.pas (revision 43679) +++ compiler/i8086/aoptcpu.pas (working copy) @@ -151,22 +151,15 @@ function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean; begin result := false; - case p.typ of - ait_instruction: - begin - case taicpu(p).opcode of - {A_MOV commented out, because it still breaks some i8086 code :( } - {A_MOV: - Result:=PostPeepholeOptMov(p);} - A_CMP: - Result:=PostPeepholeOptCmp(p); - A_OR, - A_TEST: - Result:=PostPeepholeOptTestOr(p); - else - ; - end; - end; + case taicpu(p).opcode of + {A_MOV commented out, because it still breaks some i8086 code :( } + {A_MOV: + Result:=PostPeepholeOptMov(p);} + A_CMP: + Result:=PostPeepholeOptCmp(p); + A_OR, + A_TEST: + Result:=PostPeepholeOptTestOr(p); else ; end; Index: compiler/jvm/aoptcpu.pas =================================================================== --- compiler/jvm/aoptcpu.pas (revision 43679) +++ compiler/jvm/aoptcpu.pas (working copy) @@ -175,9 +175,7 @@ function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean; begin - result:= - (p.typ=ait_instruction) and - RemoveLoadLoadSwap(p); + Result := RemoveLoadLoadSwap(p); end; begin Index: compiler/powerpc/aoptcpu.pas =================================================================== --- compiler/powerpc/aoptcpu.pas (revision 43679) +++ compiler/powerpc/aoptcpu.pas (working copy) @@ -441,70 +441,63 @@ next1: tai; begin result := false; - case p.typ of - ait_instruction: + case taicpu(p).opcode of + A_RLWINM_: begin - case taicpu(p).opcode of - A_RLWINM_: + // rlwinm_ is cracked on the G5, andi_/andis_ aren't + if (taicpu(p).oper[2]^.val = 0) then + if (taicpu(p).oper[3]^.val < 16) and + (taicpu(p).oper[4]^.val < 16) then begin - // rlwinm_ is cracked on the G5, andi_/andis_ aren't - if (taicpu(p).oper[2]^.val = 0) then - if (taicpu(p).oper[3]^.val < 16) and - (taicpu(p).oper[4]^.val < 16) then - begin - taicpu(p).opcode := A_ANDIS_; - taicpu(p).oper[2]^.val := word( - ((1 shl (16-taicpu(p).oper[3]^.val)) - 1) xor - ((1 shl (15-taicpu(p).oper[4]^.val)) - 1)); - taicpu(p).freeop(3); - taicpu(p).freeop(4); - taicpu(p).ops := 3; - taicpu(p).opercnt := 3; - end - else if (taicpu(p).oper[3]^.val >= 16) and - (taicpu(p).oper[4]^.val >= 16) then - begin - taicpu(p).opcode := A_ANDI_; - taicpu(p).oper[2]^.val := word(rlwinm2mask(taicpu(p).oper[3]^.val,taicpu(p).oper[4]^.val)); - taicpu(p).freeop(3); - taicpu(p).freeop(4); - taicpu(p).ops := 3; - taicpu(p).opercnt := 3; - end; + taicpu(p).opcode := A_ANDIS_; + taicpu(p).oper[2]^.val := word( + ((1 shl (16-taicpu(p).oper[3]^.val)) - 1) xor + ((1 shl (15-taicpu(p).oper[4]^.val)) - 1)); + taicpu(p).freeop(3); + taicpu(p).freeop(4); + taicpu(p).ops := 3; + taicpu(p).opercnt := 3; + end + else if (taicpu(p).oper[3]^.val >= 16) and + (taicpu(p).oper[4]^.val >= 16) then + begin + taicpu(p).opcode := A_ANDI_; + taicpu(p).oper[2]^.val := word(rlwinm2mask(taicpu(p).oper[3]^.val,taicpu(p).oper[4]^.val)); + taicpu(p).freeop(3); + taicpu(p).freeop(4); + taicpu(p).ops := 3; + taicpu(p).opercnt := 3; end; - else - ; - end; - - // change "integer operation with destination reg" followed by a - // comparison to zero of that reg, with a variant of that integer - // operation which sets the flags (if it exists) - if not(result) and - (taicpu(p).ops >= 2) and - (taicpu(p).oper[0]^.typ = top_reg) and - (taicpu(p).oper[1]^.typ = top_reg) and - getnextinstruction(p,next1) and - (next1.typ = ait_instruction) and - (taicpu(next1).opcode = A_CMPWI) and - // make sure it the result goes to cr0 - (((taicpu(next1).ops = 2) and - (taicpu(next1).oper[1]^.val = 0) and - (taicpu(next1).oper[0]^.reg = taicpu(p).oper[0]^.reg)) or - ((taicpu(next1).ops = 3) and - (taicpu(next1).oper[2]^.val = 0) and - (taicpu(next1).oper[0]^.typ = top_reg) and - (getsupreg(taicpu(next1).oper[0]^.reg) = RS_CR0) and - (taicpu(next1).oper[1]^.reg = taicpu(p).oper[0]^.reg))) and - changetomodifyflags(taicpu(p)) then - begin - asml.remove(next1); - next1.free; - result := true; - end; end; else ; end; + + // change "integer operation with destination reg" followed by a + // comparison to zero of that reg, with a variant of that integer + // operation which sets the flags (if it exists) + if not(result) and + (taicpu(p).ops >= 2) and + (taicpu(p).oper[0]^.typ = top_reg) and + (taicpu(p).oper[1]^.typ = top_reg) and + getnextinstruction(p,next1) and + (next1.typ = ait_instruction) and + (taicpu(next1).opcode = A_CMPWI) and + // make sure it the result goes to cr0 + (((taicpu(next1).ops = 2) and + (taicpu(next1).oper[1]^.val = 0) and + (taicpu(next1).oper[0]^.reg = taicpu(p).oper[0]^.reg)) or + ((taicpu(next1).ops = 3) and + (taicpu(next1).oper[2]^.val = 0) and + (taicpu(next1).oper[0]^.typ = top_reg) and + (getsupreg(taicpu(next1).oper[0]^.reg) = RS_CR0) and + (taicpu(next1).oper[1]^.reg = taicpu(p).oper[0]^.reg))) and + changetomodifyflags(taicpu(p)) then + begin + asml.remove(next1); + next1.free; + result := true; + end; end; begin Index: compiler/riscv32/aoptcpu.pas =================================================================== --- compiler/riscv32/aoptcpu.pas (revision 43679) +++ compiler/riscv32/aoptcpu.pas (working copy) @@ -63,17 +63,8 @@ function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean; - var - next1: tai; begin - result := false; - case p.typ of - ait_instruction: - begin - end; - else - ; - end; + Result := False; end; begin Index: compiler/x86_64/aoptcpu.pas =================================================================== --- compiler/x86_64/aoptcpu.pas (revision 43679) +++ compiler/x86_64/aoptcpu.pas (working copy) @@ -168,29 +168,22 @@ function TCpuAsmOptimizer.PostPeepHoleOptsCpu(var p: tai): boolean; begin result := false; - case p.typ of - ait_instruction: - begin - case taicpu(p).opcode of - A_MOV: - Result:=PostPeepholeOptMov(p); - A_MOVZX: - Result:=PostPeepholeOptMovzx(p); - A_CMP: - Result:=PostPeepholeOptCmp(p); - A_OR, - A_TEST: - Result:=PostPeepholeOptTestOr(p); - A_XOR: - Result:=PostPeepholeOptXor(p); - A_CALL: - Result:=PostPeepholeOptCall(p); - A_LEA: - Result:=PostPeepholeOptLea(p); - else - ; - end; - end; + case taicpu(p).opcode of + A_MOV: + Result:=PostPeepholeOptMov(p); + A_MOVZX: + Result:=PostPeepholeOptMovzx(p); + A_CMP: + Result:=PostPeepholeOptCmp(p); + A_OR, + A_TEST: + Result:=PostPeepholeOptTestOr(p); + A_XOR: + Result:=PostPeepholeOptXor(p); + A_CALL: + Result:=PostPeepholeOptCall(p); + A_LEA: + Result:=PostPeepholeOptLea(p); else ; end; |
|
Completed a full regression run with a number of patches combined, namely this one, the two over at 0036382, and a patch I sent Florian privately that addresses an internal error for x86_64-darwin - no regressions noted on i386-win32 and x86_64-win64. Will attempt i386-linux tonight. |
|
One new failure in i386-linux: webtbs/tw2377.pp - now to determine if that is due to my additions or me unintentionally pressing something on the keyboard during that test (it tests the Keyboard unit). |
|
Can't seem to reproduce the failure. Seemed to be a glitch elsewhere. Granted, my changes were to the peephole optimiser so a failure on this test is non-sensical anyway. Still, might need another party to test. |
|
I do not like the patch for two (connected) reasons: - it makes PostPeepHoleOpts behave different from the other helpers, thus harder to understand - even if the first point is might not be valid: if the parameter is ensured to be a taicpu, it should have this type |
|
I guess the points are valid. I always thought the idea with the post-Peephole stage is to convert instructions into more efficient forms after all other optimisations are complete. I'll see what I can do in making it easier to understand. |
|
I don't think the parameter type can be easily changed from tai to taicpu though because if an instruction gets deleted, what appears next in the list may not be an instruction. The GetNextInstruction method cannot be used because this doesn't update UsedRegs, so something else is required. All I can do is come up with a potential showcase and hope it isn't too complex. |
|
Suspending this one for now because there is a possibility of improving this on all peephole stages - requires further investigation. |
Date Modified | Username | Field | Change |
---|---|---|---|
2019-12-14 10:56 | J. Gareth Moreton | New Issue | |
2019-12-14 10:56 | J. Gareth Moreton | File Added: PostPeepholeRegisters.patch | |
2019-12-14 10:57 | J. Gareth Moreton | Tag Attached: patch | |
2019-12-14 10:57 | J. Gareth Moreton | Tag Attached: compiler | |
2019-12-14 10:57 | J. Gareth Moreton | Tag Attached: optimizations | |
2019-12-14 10:57 | J. Gareth Moreton | Priority | normal => low |
2019-12-14 10:57 | J. Gareth Moreton | Severity | minor => tweak |
2019-12-14 10:57 | J. Gareth Moreton | FPCTarget | => - |
2019-12-16 02:25 | J. Gareth Moreton | Note Added: 0119872 | |
2019-12-16 02:30 | J. Gareth Moreton | Note Edited: 0119872 | View Revisions |
2019-12-16 15:57 | J. Gareth Moreton | Note Added: 0119885 | |
2019-12-16 18:50 | J. Gareth Moreton | Note Added: 0119889 | |
2019-12-29 10:35 | Florian | Note Added: 0120122 | |
2019-12-29 15:15 | J. Gareth Moreton | Note Added: 0120132 | |
2019-12-29 15:25 | J. Gareth Moreton | Note Added: 0120134 | |
2020-01-04 13:08 | J. Gareth Moreton | Assigned To | => J. Gareth Moreton |
2020-01-04 13:08 | J. Gareth Moreton | Status | new => closed |
2020-01-04 13:08 | J. Gareth Moreton | Resolution | open => suspended |
2020-01-04 13:08 | J. Gareth Moreton | Note Added: 0120208 |