[Patch] AArch64 Improved speed and efficiency with constant generation

Original Reporter info from Mantis: CuriousKit @CuriousKit

Reporter name: J. Gareth Moreton

Description:

After the introduction of "magic division", a new class of constants were revealed... reciprocals of small divisors that were massive 64-bit numbers but which could be encoded with an "orr/movk" pair, since often only the first word differed. This patch therefore permits the encoding of constants where copying the 3rd word onto the 1st word produced a value that's valid for AArch64's barrel shifter, and then using movk to correct the 1st word. For example, instead of encoding $AAAAAAAAAAAAAAAB (reciprocal of 3) as "movz reg,#0xAAAB; movk reg,#0xAAAA,lsl 16; movk reg,#0xAAAA,lsl 32; movk reg,#0xAAAA,lsl 48", it is instead encoded as "orr reg,xzr,#0xAAAAAAAAAAAAAAAA; movk reg,#0xAAAB". Cycle count is identical (2), but overall code size is 8 bytes smaller.

Additionally, 32-bit numbers are now encoded as a single ORR instruction (as the synthetic MOV instruction for clarity) where possible.

Steps to reproduce:

Apply patch and confirm correct compilation and slightly smaller code size.

Additional information:

Some minor speed-ups have been applied to the routine by rearranging some conditional checks, and to help follow convention and aid analysis of assembly dumps, ORR Instructions are encoded as MOV instructions when they appear by themselves. Also, in some situations, a 64-bit constant was applied to a 32-bit register (usually as the result of sign-extension) and would appear as such in the assembly dump - such constants are now truncated by the compiler to remove confusion and prevent potential future assembler errors due to the oversized value.

Mantis conversion info:

Mantis ID: 38837
OS: Debian GNU/Linux (Raspberry Pi)
OS Build: 10
Build: r49298
Platform: aarch64-linux
Version: 3.3.1
Fixed in version: 3.3.1
Fixed in revision: 49321 (#210674b9)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information