AMD64 versions of FillWord, FillDWord and FillQWord are poorly optimised
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
The implementations of FillWord, FillDWord and FillQWord are very poorly optimised on AMD64 platforms (e.g. Win64), falling back on general-purpose Pascal code. This may catch programmers off-guard who expect these functions to be faster than FillChar (generally they're about the same speed) when initialising memory of a known type larger than a byte.
Find attached a patch that implement assembly language optimisations for Win64 and 64-bit Linux (System V ABI).
Steps to reproduce:
Use QueryPerformanceTimer or equivalent to evaluate the average running time of FillWord, FillDWord and FillQWord, then apply the patch and perform the tests again to (hopefully) see drastic improvements.
Additional information:
The implementations make use of SSE2 (required to be present in 64-bit systems) and non-temporal hints when filling blocks of memory of a megabyte or larger. Smaller blocks make use of "rep stosq".
The Windows versions have been thoroughly tested for correctness, including on memory not aligned to a 16-byte boundary, and with counts that are not a power of two, but the Linux versions have NOT been tested for correctness due to the submitter's inability to currently compile and test for Linux.
Possibly requires additional code for proper stack unwinding during Structured Exception Handling in Windows due to the presence of "push %rdi" - Linux does not have this issue as the stack and non-volatile registers are not utilised.
Limitation:
The pointer to x must fall on a 2-byte, 4-byte and 8-byte boundary for FillWord, FillDWord and FillQWord respectively - failure to do so will likely raise an exception (caused by calling "movntdq" with misaligned memory). This limitation is fair because writing across a boundary in normal conditions (e.g. writing a Word to memory with an odd-numbered pointer) is highly unusual and normally deliberately contrived, since implicit and explicit memory assignment routines tend to put the memory block on a boundary that's relevant to the requested type, or to the machine word size.
Mantis conversion info:
- Mantis ID: 32637
- OS: Windows 7 (64-bit)
- OS Build: Enterprise
- Build: x86_64-win64-win32/win64
- Platform: Win64
- Version: 3.1.1
- Monitored by: » @xhajt03 (Tomas Hajny), » @neurolabusc1 (Chris Rorden), » @MageSlayer (Denis Golovan), » @CuriousKit (J. Gareth Moreton)