useless instructions in rtl/arm/arm.inc move_ldp
Original Reporter info from Mantis: red
-
Reporter name: Simon Ley
Original Reporter info from Mantis: red
- Reporter name: Simon Ley
Description:
the move_pld function in rtl/arm/arm.inc does use the preload-instruction "pld" (as the name suggests), but it's actually preloading both source and destination, causing unnecessary overhead. I have written some
testcode measuring the speed of the original move_pld and move_blended versions, libc-memmove and a slightly modified move_pld version which does only do pld for source. My results are:
-bash-4.1# ./testmove
move_pld : 14394
move_pld2 : 8252
Move_blended: 8166
memmove : 5774
system.move : 14402
Steps to reproduce:
- compile test.pas for ARMv5
- run, see results
Additional information:
-bash-4.1# cat /proc/cpuinfo
Processor : Feroceon 88FR131 rev 1 (v5l)
BogoMIPS : 1199.30
Features : swp half thumb fastmult edsp
CPU implementer : 0x56
CPU architecture: 5TE
CPU variant : 0x2
CPU part : 0x131
CPU revision : 1
Hardware : Marvell OpenRD Ultimate Board
Revision : 0000
Serial : 0000000000000000
Fun Trivia: when the original move_pld is used to move data into a buffer which was assigned using mmap, the copied data might end up corrupted when it's accessed in another process / by the kernel. I do not know why this happens, it might be correlated to calling pld for the destination buffer and immediately overwriting it afterwards, as this does NOT happen when my modified move_pld function is used instead.