[Suggestion] Assembler optimisation w/ small 64-bit integers
Original Reporter info from Mantis: CuriousKit @CuriousKit
-
Reporter name: J. Gareth Moreton
Original Reporter info from Mantis: CuriousKit @CuriousKit
- Reporter name: J. Gareth Moreton
Description:
There are times when assembler commands, including compiled code, produce needlessly large machine code on 64-bit Intel systems.
Situation 1:
When dealing with small negative numbers (between -(2^31) and -1), the "MOV reg, immediate" operation (and compiled equivalent) produces 10 bytes of machine code.
For example, MOV RAX, -1 (Intel format) outputs 48 b8 ff ff ff ff ff ff ff ff. The 48 is the REX prefix, b8 is the opcode for "MOV RAX", and the eight ff's contain the 64-bit representation of -1.
Using an alternative MOV opcode, c7, this can be shrunk to 7 bytes as follows: 48 c7 c0 ff ff ff ff. (Set the c0 and 48 bytes appropriately if the destination register is different)
Situation 2:
When dealing with small positive numbers (between 0 and 2^32 - 1), the "MOV reg, immediate" operation (and compiled equivalent) produces 10 bytes of machine code.
For example, MOV RAX, 4000000000 (Intel format) outputs 48 b8 00 28 6b ee 00 00 00 00. The 48 is the REX prefix, b8 is the opcode for "MOV RAX", and the remaining eight bytes contain the 64-bit representation of 4,000,000,000.
In this situation, this can be optimised by using the equivalent of MOV EAX, 4000000000 instead, since the upper 32 bits of RAX (or whichever destination register is being used) are guaranteed to be set to zero.
MOV EAX, 4000000000 outputs b8 00 28 6b ee, only 5 bytes. (The REX prefix is required if R8D-R15D are used, making it 6 bytes in these cases, but still a saving).
When writing in actual assembly language, commands such as "MOV RAX, 4000000000" should be replaced with "MOV EAX, 4000000000" by the programmer rather than the compiler.
Steps to reproduce:
Situation 1:
Write the following routine, put a breakpoint on "Result := -1;", and call it elsewhere. When the breakpoint is triggered, press Ctrl+Alt+D to observe the disassembly at this point.
function TestFunction: Int64;
begin
Result := -1;
end;
Situation 2:
Write the following routine, put a breakpoint on "Result := 4000000000;", and call it elsewhere. When the breakpoint is triggered, press Ctrl+Alt+D to observe the disassembly at this point.
function TestFunction: Int64;
begin
Result := 4000000000;
end;
Additional information:
As specified in the "Intel® 64 and IA-32 Architectures Software Developer's Manual", Volume 2A, page 3-530, the version of MOV that corresponds to "MOV r/m64, imm32" (the c7 opcode) specifically says that the 32-bit immediate is sign-extended rather than zero-extended.
All compilation tests were done on the highest level of optimization (-O4) and the results are identical whether the optimization is configured for speed or size.
Mantis conversion info:
- Mantis ID: 32037
- OS: Windows 7 (64-bit)
- OS Build: Enterprise
- Build: x86_64-win64-win32/win64
- Platform: Win64
- Version: 3.0.2
- Fixed in version: 3.1.1
- Fixed in revision: 37376 (#198c53a9), 37377 (#ce7487b7)
- Monitored by: » Vincent (Vincent Snijders), » @CuriousKit (J. Gareth Moreton)