[AVR] Incorrect overflow checking when h/w MUL* instructions are involved
Original Reporter info from Mantis: Stein2051 @Stein2051
-
Reporter name: Max Nazhalov
Original Reporter info from Mantis: Stein2051 @Stein2051
- Reporter name: Max Nazhalov
Description:
For AVR subarchitectures which support hardware MUL* instructions compiler utilizes them to perform 8/16-bit multiplications, but generates wrong overflow checking code when {$Q+}.
Example 1: unsigned8*unsigned8 produces following assembler output
mul r18,r19
clr r1
mov r18,r0
brcc .skip
call FPC_OVERFLOW
.skip:
MUL instruction sets C-flag equal to bit15 of the product, so BRCC only checks bit15 ignoring bits 8..14.
The proper way could be something like this:
mul r18,r19
mov r18,r0
tst r1
breq .skip
clr r1
call FPC_OVERFLOW
.skip:
Example 2: signed8*signed8 produces following assembler output
muls r18,r19
clr r1
mov r18, r0
brvc .skip
call FPC_OVERFLOW
.skip:
First -- MULS does not alter V-flag.
Second -- CLR R1 clears V-flag.
So the following BRVC is completely senseless.
Simple check is also possible here:
muls r18,r19
mov r18, r0
sbrc r0,7
inc r1
tst r1
breq .skip
clr r1
call FPC_OVERFLOW
.skip:
since valid (no-overflow) values for r1 after mul either $00 or $FF, depending on the bit7 of the result.
Example 3: unsigned16*unsigned16 produces following assembler output [lets denote A=Ah:AL, B=Bh:Bl]
mul r20,r22 [Al*Bl]
mov r18,r0
mov r19,r1
mul r21,r22 [Ah*BL]
add r19,r0 {!#1}
mul r20,r23 [Al*Bh]
add r19,r0 {!#2}
clr r1
brcc .skip
call FPC_OVERFLOW
.skip:
Note {!#1}: contents of R1 is ignored, overflow out of R19 is ignored too.
Note {!#2}: contents of R1 is ignored, following BRCC checks only overflow out of R19.
This way the entire check is far incomplete, it doesn't ever bothers to check whenever Ah*Bh=0 or not.
While inline fixing seems doable, the resulting code will be much (~10 instr.) longer, so I'd prefer to specialize fpc_mul_word_checkoverflow helper instead.
Example 4: signed16signed16 produces following assembler output
.. same as ex.3 ...
clr r1
brvc .skip
call FPC_OVERFLOW
.skip:
To the problems of the example 3 it adds the use of BRVC instructions.
Here V-flag has nothing to do with any kind of overflow, moreover it is cleared by the preceding CLR R1.
I doubt inline fixing is possible -- there are too many caveats when doing signed multiword multiplication with overflow checks.
The best way seems to introduce correct signed 16x16-to-32 helper, and check the resulting longint does fit into smallint.
Sample implementation can be found in Atmel AVR Instruction Set Manual, section "MULSU" -- it rather long, and utilizes all 3 available MUL variants -- MUL, MULS, MULSU.
Example 5: signed8*unsigned8 produces following assembler output
lds r22,(A)
mov r18,r1
sbrc r22,7
com r18
lds r23,(B)
mov r19,r1
mul .. same as ex.4 ...
I.e. it promotes operation to the 16-bit and follows ex.4 way, with all its problems.
Here would be much more logical to utilize MULSU instruction directly producing correct signed16 result, then checking it as proposed in ex.2.
Or leave the result as a signed16 for further calcs -- no overflow check is required at all then, only range checking on the following assign.
---
Note: all that fixes mentioned above are not required for {$Q-} -- generated code seems to be adequately correct.