View Issue Details

IDProjectCategoryView StatusLast Update
0038806FPCCompilerpublic2021-04-29 22:01
ReporterJ. Gareth Moreton Assigned ToFlorian  
PrioritynormalSeverityminorReproducibilityN/A
Status resolvedResolutionfixed 
Platformaarch64-linuxOSDebian GNU/Linux (Raspberry Pi) 
Product Version3.3.1 
Fixed in Version3.3.1 
Summary0038806: [Patch] AArch64 "magic division" (replace division by constant with multiplication)
DescriptionThis patch implements the compile-level speed-up that turns divisions by a constant into a multiplication by a reciprocal, thus providing a speed boost in a wide range of applications.
Steps To ReproduceApply patch and confirm correct compilation and no regressions in test suite.
Additional InformationOptimisations are generally not applied if -Os is specified. Code reuses magic number generation code where possible.

A new bench test, which also acts as a regression test, has also been included in a separate patch. This was developed to measure speed gains and to also catch a coding error that other tests were not cleanly detecting (I was adding a carry bit in the wrong place due to an internal overflow).

NOTE: Signed mod operations have not yet been optimised due to incorrect results being returned at times. Signed and unsigned division, and unsigned mod, have been optimised.
Tagsaarch64, optimization, patch
Fixed in Revision49290, 49291
FPCOldBugId
FPCTarget-
Attached Files

Activities

J. Gareth Moreton

2021-04-27 00:14

developer  

a64-magic-div.patch (22,372 bytes)   
Index: compiler/aarch64/ncpumat.pas
===================================================================
--- compiler/aarch64/ncpumat.pas	(revision 49247)
+++ compiler/aarch64/ncpumat.pas	(working copy)
@@ -71,20 +71,35 @@
       var
          op         : tasmop;
          tmpreg,
+         zeroreg,
          numerator,
          divider,
+         largernumreg,
+         largerresreg,
          resultreg  : tregister;
-         hl : tasmlabel;
+         hl         : tasmlabel;
          overflowloc: tlocation;
-         power: longint;
+         power      : longint;
+         opsize     : tcgsize;
 
+         dividend   : Int64;
+         high_bit,
+         reciprocal : QWord;
+         { Just to save on stack space and the like }
+         reciprocal_signed : Int64 absolute reciprocal;
+
+         expandword,
+         magic_add  : Boolean;
+         shift      : byte;
+
+         shifterop  : tshifterop;
+         hp         : taicpu;
+
        procedure genOrdConstNodeDiv;
          var
            helper1, helper2: TRegister;
            so: tshifterop;
-           opsize: TCgSize;
          begin
-           opsize:=def_cgsize(resultdef);
            if tordconstnode(right).value=0 then
              internalerror(2020021601)
            else if tordconstnode(right).value=1 then
@@ -98,7 +113,7 @@
                current_asmdata.CurrAsmList.concat(setoppostfix(taicpu.op_reg_reg(A_NEG,
                  resultreg,numerator),toppostfix(ord(cs_check_overflow in current_settings.localswitches)*ord(PF_S))));
              end
-           else if ispowerof2(tordconstnode(right).value,power) then
+           else if isabspowerof2(tordconstnode(right).value,power) then
              begin
                if (is_signed(right.resultdef)) then
                  begin
@@ -115,98 +130,318 @@
                     so.shiftimm:=resultdef.size*8-power;
                     current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg_shifterop(A_ADD,helper2,numerator,helper1,so));
                     cg.a_op_const_reg_reg(current_asmdata.CurrAsmList,OP_SAR,def_cgsize(resultdef),power,helper2,resultreg);
+
+                    if (tordconstnode(right).value < 0) then
+                      { Invert the result }
+                      current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg(A_NEG,resultreg,resultreg));
                   end
-               else
-                 cg.a_op_const_reg_reg(current_asmdata.CurrAsmList,OP_SHR,opsize,power,numerator,resultreg)
+                else
+                  cg.a_op_const_reg_reg(current_asmdata.CurrAsmList,OP_SHR,opsize,power,numerator,resultreg)
              end
            else
-             { Everything else is handled in the generic code }
-             cg.g_div_const_reg_reg(current_asmdata.CurrAsmList,opsize,
-               tordconstnode(right).value.svalue,numerator,resultreg);
-         end;
+             { Generic division }
+             begin
+               if is_signed(left.resultdef) then
+                 op:=A_SDIV
+               else
+                 op:=A_UDIV;
 
-      begin
-       secondpass(left);
-       secondpass(right);
-       { avoid warning }
-       divider:=NR_NO;
+               { If we didn't acquire the original divisor earlier, grab it now }
+               if divider = NR_NO then
+                 begin
+                   divider:=cg.getintregister(current_asmdata.CurrAsmList,opsize);
+                   cg.a_load_const_reg(current_asmdata.CurrAsmList,opsize,tordconstnode(right).value.svalue,divider);
+                 end;
 
-       { set result location }
-       location_reset(location,LOC_REGISTER,def_cgsize(resultdef));
-       location.register:=cg.getintregister(current_asmdata.CurrAsmList,def_cgsize(resultdef));
-       resultreg:=location.register;
+               current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg(op,resultreg,numerator,divider));
+             end;
+         end;
 
-       { put numerator in register }
-       hlcg.location_force_reg(current_asmdata.CurrAsmList,left.location,left.resultdef,left.resultdef,true);
-       numerator:=left.location.register;
-
-       if (right.nodetype=ordconstn) and
-          ((tordconstnode(right).value=1) or
-           (tordconstnode(right).value=int64(-1)) or
-           (tordconstnode(right).value=0) or
-           ispowerof2(tordconstnode(right).value,power)) then
+       procedure genOverflowCheck;
          begin
-           genOrdConstNodeDiv;
-           if nodetype=modn then
+           { in case of overflow checking, also check for low(int64) div (-1)
+             (no hardware support for this either) }
+           if (cs_check_overflow in current_settings.localswitches) and
+              is_signed(left.resultdef) and
+              ((right.nodetype<>ordconstn) or
+               (tordconstnode(right).value=-1)) then
              begin
-               divider:=cg.getintregister(current_asmdata.CurrAsmList,def_cgsize(resultdef));
-               cg.a_load_const_reg(current_asmdata.CurrAsmList,def_cgsize(resultdef),int64(tordconstnode(right).value),divider);
+               { num=ffff... and div=8000... <=>
+                 num xor not(div xor 8000...) = 0
+                 (and we have the "eon" operation, which performs "xor not(...)" }
+               tmpreg:=hlcg.getintregister(current_asmdata.CurrAsmList,left.resultdef);
+               hlcg.a_op_const_reg_reg(current_asmdata.CurrAsmList,OP_XOR,left.resultdef,low(int64),numerator,tmpreg);
+               current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg(A_EON,
+                 tmpreg,numerator,tmpreg));
+               current_asmdata.CurrAsmList.concat(taicpu.op_reg_const(A_CMP,tmpreg,0));
+               { now the zero/equal flag is set in case we divided low(int64) by
+                 (-1) }
+               location_reset(overflowloc,LOC_FLAGS,OS_NO);
+               overflowloc.resflags:=F_EQ;
+               cg.g_overflowcheck_loc(current_asmdata.CurrAsmList,location,resultdef,overflowloc);
              end;
-         end
-       else
-         begin
-           { load divider in a register }
-           hlcg.location_force_reg(current_asmdata.CurrAsmList,right.location,right.resultdef,right.resultdef,true);
-           divider:=right.location.register;
-
-           { start division }
-           if is_signed(left.resultdef) then
-             op:=A_SDIV
-           else
-             op:=A_UDIV;
-           current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg(op,location.register,numerator,divider));
          end;
 
-       { no divide-by-zero detection available in hardware, emulate (if it's a
-         constant, this will have been detected earlier already) }
-       if (right.nodetype<>ordconstn) then
-         begin
-           current_asmdata.CurrAsmList.concat(taicpu.op_reg_const(A_CMP,
-             right.location.register,0));
+      begin
+        secondpass(left);
+        secondpass(right);
+        { avoid warning }
+        divider := NR_NO;
+        largernumreg := NR_NO;
+        expandword := False;
 
-           current_asmdata.getjumplabel(hl);
-           current_asmdata.CurrAsmList.concat(taicpu.op_cond_sym(A_B,C_NE,hl));
-           cg.a_call_name(current_asmdata.CurrAsmList,'FPC_DIVBYZERO',false);
-           cg.a_label(current_asmdata.CurrAsmList,hl);
-         end;
+        opsize := def_cgsize(resultdef);
 
-       { in case of overflow checking, also check for low(int64) div (-1)
-         (no hardware support for this either) }
-       if (cs_check_overflow in current_settings.localswitches) and
-          is_signed(left.resultdef) and
-          ((right.nodetype<>ordconstn) or
-           (tordconstnode(right).value=-1)) then
-         begin
-           { num=ffff... and div=8000... <=>
-             num xor not(div xor 8000...) = 0
-             (and we have the "eon" operation, which performs "xor not(...)" }
-           tmpreg:=hlcg.getintregister(current_asmdata.CurrAsmList,left.resultdef);
-           hlcg.a_op_const_reg_reg(current_asmdata.CurrAsmList,OP_XOR,left.resultdef,low(int64),left.location.register,tmpreg);
-           current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg(A_EON,
-             tmpreg,left.location.register,tmpreg));
-           current_asmdata.CurrAsmList.concat(taicpu.op_reg_const(A_CMP,tmpreg,0));
-           { now the zero/equal flag is set in case we divided low(int64) by
-             (-1) }
-           location_reset(overflowloc,LOC_FLAGS,OS_NO);
-           overflowloc.resflags:=F_EQ;
-           cg.g_overflowcheck_loc(current_asmdata.CurrAsmList,location,resultdef,overflowloc);
-         end;
+        { set result location }
+        location_reset(location,LOC_REGISTER,opsize);
+        location.register:=cg.getintregister(current_asmdata.CurrAsmList,opsize);
+        resultreg:=location.register;
 
-       { in case of modulo, multiply result again by the divider and subtract
-         from the numerator }
-       if nodetype=modn then
-         current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg_reg(A_MSUB,resultreg,
-           resultreg,divider,numerator));
+        { put numerator in register }
+        hlcg.location_force_reg(current_asmdata.CurrAsmList,left.location,left.resultdef,left.resultdef,true);
+        numerator:=left.location.register;
+
+        if (right.nodetype=ordconstn) then
+          begin
+            { If optimising for size, just use regular division operations }
+            if (cs_opt_size in current_settings.optimizerswitches) or
+              ((tordconstnode(right).value=1) or
+              (tordconstnode(right).value=int64(-1)) or
+              isabspowerof2(tordconstnode(right).value,power)) then
+              begin
+
+                { Store divisor for later (and executed at the same time as the multiplication) }
+                if (nodetype=modn) then
+                  begin
+                    if (tordconstnode(right).value = 1) or (tordconstnode(right).value = int64(-1)) then
+                      begin
+                        { Just evaluates to zero }
+                        current_asmdata.CurrAsmList.concat(taicpu.op_reg_const(A_MOVZ,resultreg, 0));
+                        Exit;
+                      end
+                    { "not cs_opt_size" saves from checking the value of the divisor again
+                      (if cs_opt_size is not set, then the divisor is a power of 2) }
+                    else if not (cs_opt_size in current_settings.optimizerswitches) then
+                      begin
+                        divider:=cg.getintregister(current_asmdata.CurrAsmList,opsize);
+                        cg.a_load_const_reg(current_asmdata.CurrAsmList,opsize,tordconstnode(right).value.svalue,divider);
+                      end
+                  end;
+
+                genOrdConstNodeDiv;
+                genOverflowCheck;
+
+                { in case of modulo, multiply result again by the divider and subtract
+                  from the numerator }
+                if (nodetype=modn) then
+                  begin
+                    if ispowerof2(tordconstnode(right).value,power) then
+                      begin
+                        shifterop.shiftmode := SM_LSL;
+                        shifterop.shiftimm := power;
+
+                        current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg_shifterop(A_SUB,resultreg,numerator,resultreg,shifterop));
+                      end
+                    else
+                      current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg_reg(A_MSUB,resultreg,
+                        resultreg,divider,numerator));
+                  end;
+
+                Exit;
+              end
+            else
+              begin
+                if is_signed(left.resultdef) then
+                  begin
+                    if (nodetype=modn) then { Signed mod doesn't work properly }
+                      begin
+                        divider:=cg.getintregister(current_asmdata.CurrAsmList,opsize);
+                        cg.a_load_const_reg(current_asmdata.CurrAsmList,opsize,tordconstnode(right).value.svalue,divider);
+                        genOrdConstNodeDiv;
+                      end
+                    else
+                      begin
+                        { Read signed value to avoid Internal Error 200706094 }
+                        dividend := tordconstnode(right).value.svalue;
+
+                        calc_divconst_magic_signed(resultdef.size * 8, dividend, reciprocal_signed, shift);
+                        cg.a_load_const_reg(current_asmdata.CurrAsmList, opsize, reciprocal_signed, resultreg);
+
+                        { SMULH is only available for the full 64-bit registers }
+                        if opsize in [OS_64, OS_S64] then
+                          begin
+                            current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg(A_SMULH,resultreg,resultreg,numerator));
+                            largerresreg := resultreg;
+                          end
+                        else
+                          begin
+                            largerresreg := newreg(getregtype(resultreg), getsupreg(resultreg), R_SUBWHOLE);
+                            largernumreg := newreg(getregtype(numerator), getsupreg(numerator), R_SUBWHOLE);
+                            current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg(A_MUL,largerresreg,largerresreg,largernumreg));
+                            expandword := True; { Merge the shift operation with something below }
+                          end;
+
+                        { Store divisor for later (and executed at the same time as the multiplication) }
+                        if nodetype=modn then
+                          begin
+                            divider:=cg.getintregister(current_asmdata.CurrAsmList,opsize);
+                            cg.a_load_const_reg(current_asmdata.CurrAsmList,opsize,dividend,divider);
+                          end;
+
+                        { add or subtract dividend }
+                        if (dividend > 0) and (reciprocal_signed < 0) then
+                          begin
+                            if expandword then
+                              begin
+                                shifterop.shiftmode := SM_ASR;
+                                shifterop.shiftimm := 32;
+                                expandword := False;
+                                current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg_shifterop(A_ADD,largerresreg,largernumreg,largerresreg,shifterop));
+                              end
+                            else
+                              current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg(A_ADD,resultreg,resultreg,numerator));
+                          end
+                        else if (dividend < 0) and (reciprocal_signed > 0) then
+                          begin
+                            if expandword then
+                              begin
+                                { We can't append LSR to the SUB below because it's on the wrong operand }
+                                expandword := False;
+                                current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_const(A_ASR,largerresreg,largerresreg,32));
+                              end;
+
+                            current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg(A_SUB,resultreg,resultreg,numerator));
+                          end
+                        else if expandword then
+                          Inc(shift,32);
+
+                        { shift if necessary }
+                        if (shift <> 0) then
+                          begin
+                            if expandword then
+                              current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_const(A_ASR,largerresreg,largerresreg,shift))
+                            else
+                              current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_const(A_ASR,resultreg,resultreg,shift));
+                          end;
+
+                        { extract and add the sign bit }
+                        shifterop.shiftmode := SM_LSR;
+                        shifterop.shiftimm := left.resultdef.size*8 - 1;
+
+                        if (dividend < 0) then
+                          current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg_shifterop(A_ADD,resultreg,resultreg,resultreg,shifterop))
+                        else
+                          current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg_shifterop(A_ADD,resultreg,resultreg,numerator,shifterop));
+                      end;
+                  end
+                else
+                  begin
+                    calc_divconst_magic_unsigned(resultdef.size * 8, tordconstnode(right).value, reciprocal, magic_add, shift);
+                    cg.a_load_const_reg(current_asmdata.CurrAsmList, opsize, reciprocal, resultreg);
+
+                    { UMULH is only available for the full 64-bit registers }
+                    if opsize in [OS_64, OS_S64] then
+                      begin
+                        current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg(A_UMULH,resultreg,resultreg,numerator));
+                        largerresreg := resultreg;
+                      end
+                    else
+                      begin
+                        largerresreg := newreg(getregtype(resultreg), getsupreg(resultreg), R_SUBWHOLE);
+                        largernumreg := newreg(getregtype(numerator), getsupreg(numerator), R_SUBWHOLE);
+                        current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg(A_MUL,largerresreg,largerresreg,largernumreg));
+                        expandword := True; { Try to merge the shift operation with something below }
+                      end;
+
+                    { Store divisor for later (and executed at the same time as the multiplication) }
+                    if (nodetype=modn) then
+                      begin
+                        divider:=cg.getintregister(current_asmdata.CurrAsmList,opsize);
+                        cg.a_load_const_reg(current_asmdata.CurrAsmList,opsize,tordconstnode(right).value.svalue,divider);
+                      end;
+
+                    if magic_add then
+                      begin
+                        { We can't append LSR to the ADD below because it would require extending the registers
+                          and interfere with the carry bit }
+                        if expandword then
+                          current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_const(A_LSR,largerresreg,largerresreg,32));
+
+                        { Add the reciprocal to the high-order word, tracking the carry bit, shift, then
+                          insert the carry bit via CSEL and ORR }
+
+                        if opsize in [OS_64,OS_S64] then
+                          zeroreg := NR_XZR
+                        else
+                          zeroreg := NR_WZR;
+
+                        high_bit := QWord(1) shl ((resultdef.size * 8) - shift);
+
+                        tmpreg := cg.getintregister(current_asmdata.CurrAsmList, opsize);
+                        cg.a_load_const_reg(current_asmdata.CurrAsmList, opsize, high_bit, tmpreg);
+
+                        { Generate ADDS instruction }
+                        hp := taicpu.op_reg_reg_reg(A_ADD,resultreg,resultreg,numerator);
+                        hp.oppostfix := PF_S;
+                        current_asmdata.CurrAsmList.concat(hp);
+
+                        current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg_cond(A_CSEL,tmpreg,tmpreg,zeroreg, C_CS));
+
+                        shifterop.shiftmode := SM_LSR;
+                        shifterop.shiftimm := shift;
+
+                        current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg_shifterop(A_ORR,resultreg,tmpreg,resultreg,shifterop));
+                      end
+                    else if expandword then
+                      { Include the right-shift by 32 to get the high-order DWord }
+                      current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_const(A_LSR,largerresreg,largerresreg,shift + 32))
+                    else
+                      current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_const(A_LSR,resultreg,resultreg,shift));
+                  end;
+
+              end;
+
+          end
+        { no divide-by-zero detection available in hardware, emulate (if it's a
+          constant, this will have been detected earlier already) }
+        else
+          begin
+            { load divider in a register }
+            hlcg.location_force_reg(current_asmdata.CurrAsmList,right.location,right.resultdef,right.resultdef,true);
+            divider:=right.location.register;
+
+            { ARM-64 developer guides recommend checking for division by zero conditions
+              AFTER the division, since the check and the division can be done in tandem }
+            if is_signed(left.resultdef) then
+              op:=A_SDIV
+            else
+              op:=A_UDIV;
+
+            current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg(op,resultreg,numerator,divider));
+
+            current_asmdata.CurrAsmList.concat(taicpu.op_reg_const(A_CMP,divider,0));
+            current_asmdata.getjumplabel(hl);
+            current_asmdata.CurrAsmList.concat(taicpu.op_cond_sym(A_B,C_NE,hl));
+            cg.a_call_name(current_asmdata.CurrAsmList,'FPC_DIVBYZERO',false);
+            cg.a_label(current_asmdata.CurrAsmList,hl);
+          end;
+
+        genOverflowCheck;
+
+        { in case of modulo, multiply result again by the divider and subtract
+          from the numerator }
+        if (nodetype=modn) then
+          begin
+            { If we didn't acquire the original divisor earlier, grab it now }
+            if divider = NR_NO then
+              begin
+                divider:=cg.getintregister(current_asmdata.CurrAsmList,opsize);
+                cg.a_load_const_reg(current_asmdata.CurrAsmList,opsize,tordconstnode(right).value.svalue,divider);
+              end;
+
+            current_asmdata.CurrAsmList.concat(taicpu.op_reg_reg_reg_reg(A_MSUB,resultreg,
+              resultreg,divider,numerator));
+          end;
     end;
 
 
a64-magic-div.patch (22,372 bytes)   
div-bench-test.patch (71,660 bytes)   
Index: tests/bench/bdiv.pp
===================================================================
--- /dev/null
+++ tests/bench/bdiv.pp
@@ -0,0 +1,469 @@
+{ %OPT=-O2 }
+program bdiv;
+
+{$mode objfpc}{$H+}
+
+uses
+  SysUtils;
+
+{ Utility functions }
+function GetRealTime(const st: TSystemTime): Real;
+  begin
+    Result := st.Hour*3600.0 + st.Minute*60.0 + st.Second + st.MilliSecond/1000.0;
+  end;
+
+{$push}
+{$warn 5057 off}
+function GetRealTime : Real;
+  var
+    st:TSystemTime;
+  begin
+    GetLocalTime(st);
+    result:=GetRealTime(st);
+  end;
+{$pop}
+
+const
+  ITERATIONS = 524288;
+  INTERNAL_LOOPS = 64;
+
+{ TTestAncestor }
+type
+  TTestAncestor = class
+    private
+      FStartTime: Real;
+      FEndTime: Real;
+      FAvgTime: Real;
+      procedure SetStartTime;
+      procedure SetEndTime;
+    protected
+      procedure DoTestIteration(Iteration: Integer); virtual; abstract;
+    public
+      constructor Create; virtual;
+      destructor Destroy; override;
+      procedure Run;
+      function TestTitle: shortstring; virtual; abstract;
+      function WriteResults: Boolean; virtual; abstract;
+      property RunTime: Real read FAvgTime;
+  end;
+
+  TTestClass = class of TTestAncestor;
+
+  TUInt32DivTest = class(TTestAncestor)
+    protected
+      FInputArray: array[$00..$FF] of Cardinal;
+      FResultArray: array[$00..$FF] of Cardinal;
+      function GetDivisor: Cardinal; virtual; abstract;
+      function DoVariableDiv(Numerator: Cardinal): Cardinal; inline;
+    public
+      function WriteResults: Boolean; override;
+  end;
+
+  TUInt32ModTest = class(TUInt32DivTest)
+    protected
+      function DoVariableMod(Numerator: Cardinal): Cardinal; inline;
+    public
+      function WriteResults: Boolean; override;
+  end;
+
+  TSInt32DivTest = class(TTestAncestor)
+    protected
+      FInputArray: array[$00..$FF] of Integer;
+      FResultArray: array[$00..$FF] of Integer;
+      function GetDivisor: Integer; virtual; abstract;
+      function DoVariableDiv(Numerator: Integer): Integer; inline;
+    public
+      function WriteResults: Boolean; override;
+  end;
+
+  TSInt32ModTest = class(TSInt32DivTest)
+    protected
+      function DoVariableMod(Numerator: Integer): Integer; inline;
+    public
+      function WriteResults: Boolean; override;
+  end;
+
+  TUInt64DivTest = class(TTestAncestor)
+    protected
+      FInputArray: array[$00..$FF] of QWord;
+      FResultArray: array[$00..$FF] of QWord;
+      function GetDivisor: QWord; virtual; abstract;
+      function DoVariableDiv(Numerator: QWord): QWord; inline;
+    public
+      function WriteResults: Boolean; override;
+  end;
+
+  TUInt64ModTest = class(TUInt64DivTest)
+    protected
+      function DoVariableMod(Numerator: QWord): QWord; inline;
+    public
+      function WriteResults: Boolean; override;
+  end;
+
+  TSInt64DivTest = class(TTestAncestor)
+    protected
+      FInputArray: array[$00..$FF] of Int64;
+      FResultArray: array[$00..$FF] of Int64;
+      function GetDivisor: Int64; virtual; abstract;
+      function DoVariableDiv(Numerator: Int64): Int64; inline;
+    public
+      function WriteResults: Boolean; override;
+  end;
+
+  TSInt64ModTest = class(TSInt64DivTest)
+    protected
+      function DoVariableMod(Numerator: Int64): Int64; inline;
+    public
+      function WriteResults: Boolean; override;
+  end;
+
+{$I bdiv_u32.inc}
+{$I bdiv_u64.inc}
+{$I bdiv_s32.inc}
+{$I bdiv_s64.inc}
+
+{ TTestAncestor }
+
+constructor TTestAncestor.Create;
+  begin
+    FStartTime := 0;
+    FEndTime := 0;
+    FAvgTime := 0;
+  end;
+
+destructor TTestAncestor.Destroy;
+  begin
+    inherited Destroy;
+  end;
+
+procedure TTestAncestor.SetStartTime;
+  begin
+    FStartTime := GetRealTime();
+  end;
+
+procedure TTestAncestor.SetEndTime;
+  begin
+    FEndTime := GetRealTime();
+    if FEndTime < FStartTime then { Happens if the test runs past midnight }
+      FEndTime := FEndTime + 86400.0;
+  end;
+
+procedure TTestAncestor.Run;
+  var
+    X: Integer;
+  begin
+    SetStartTime;
+    for X := 0 to ITERATIONS - 1 do
+      DoTestIteration(X);
+
+    SetEndTime;
+
+    FAvgTime := FEndTime - FStartTime;
+  end;
+
+{ TUInt32DivTest }
+
+function TUInt32DivTest.DoVariableDiv(Numerator: Cardinal): Cardinal;
+  begin
+    Result := Numerator div GetDivisor;
+  end;
+
+function TUInt32DivTest.WriteResults: Boolean;
+  var
+    X: Integer;
+    Expected: Cardinal;
+  begin
+    Result := True;
+    for X := 0 to 255 do
+      begin
+        Expected := DoVariableDiv(FInputArray[X]);
+        if FResultArray[X] <> Expected then
+          begin
+            WriteLn('FAIL - ', FInputArray[X], ' div ', GetDivisor, '; expected ', Expected, ' got ', FResultArray[X]);
+            Result := False;
+            Exit;
+          end;
+      end;
+  end;
+
+{ TUInt32ModTest }
+
+function TUInt32ModTest.DoVariableMod(Numerator: Cardinal): Cardinal;
+  begin
+    Result := Numerator mod GetDivisor;
+  end;
+
+function TUInt32ModTest.WriteResults: Boolean;
+  var
+    X: Integer;
+    Expected: Cardinal;
+  begin
+    Result := True;
+    for X := 0 to 255 do
+      begin
+        Expected := DoVariableMod(FInputArray[X]);
+        if FResultArray[X] <> Expected then
+          begin
+            WriteLn('FAIL - ', FInputArray[X], ' mod ', GetDivisor, '; expected ', Expected, ' got ', FResultArray[X]);
+            Result := False;
+            Exit;
+          end;
+      end;
+  end;
+
+{ TSInt32DivTest }
+
+function TSInt32DivTest.DoVariableDiv(Numerator: Integer): Integer;
+  begin
+    Result := Numerator div GetDivisor;
+  end;
+
+function TSInt32DivTest.WriteResults: Boolean;
+  var
+    X: Integer;
+    Expected: Integer;
+  begin
+    Result := True;
+    for X := 0 to 255 do
+      begin
+        Expected := DoVariableDiv(FInputArray[X]);
+        if FResultArray[X] <> Expected then
+          begin
+            WriteLn('FAIL - ', FInputArray[X], ' div ', GetDivisor, '; expected ', Expected, ' got ', FResultArray[X]);
+            Result := False;
+            Exit;
+          end;
+      end;
+  end;
+
+{ TSInt32ModTest }
+
+function TSInt32ModTest.DoVariableMod(Numerator: Integer): Integer;
+  begin
+    Result := Numerator mod GetDivisor;
+  end;
+
+function TSInt32ModTest.WriteResults: Boolean;
+  var
+    X: Integer;
+    Expected: Integer;
+  begin
+    Result := True;
+    for X := 0 to 255 do
+      begin
+        Expected := DoVariableMod(FInputArray[X]);
+        if FResultArray[X] <> Expected then
+          begin
+            WriteLn('FAIL - ', FInputArray[X], ' mod ', GetDivisor, '; expected ', Expected, ' got ', FResultArray[X]);
+            Result := False;
+            Exit;
+          end;
+      end;
+  end;
+
+{ TUInt64DivTest }
+
+function TUInt64DivTest.DoVariableDiv(Numerator: QWord): QWord;
+  begin
+    Result := Numerator div GetDivisor;
+  end;
+
+function TUInt64DivTest.WriteResults: Boolean;
+  var
+    X: Integer;
+    Expected: QWord;
+  begin
+    Result := True;
+    for X := 0 to 255 do
+      begin
+        Expected := DoVariableDiv(FInputArray[X]);
+        if FResultArray[X] <> Expected then
+          begin
+            WriteLn('FAIL - ', FInputArray[X], ' div ', GetDivisor, '; expected ', Expected, ' got ', FResultArray[X]);
+            Result := False;
+            Exit;
+          end;
+      end;
+  end;
+
+{ TUInt64ModTest }
+
+function TUInt64ModTest.DoVariableMod(Numerator: QWord): QWord;
+  begin
+    Result := Numerator mod GetDivisor;
+  end;
+
+function TUInt64ModTest.WriteResults: Boolean;
+  var
+    X: Integer;
+    Expected: QWord;
+  begin
+    Result := True;
+    for X := 0 to 255 do
+      begin
+        Expected := DoVariableMod(FInputArray[X]);
+        if FResultArray[X] <> Expected then
+          begin
+            WriteLn('FAIL - ', FInputArray[X], ' mod ', GetDivisor, '; expected ', Expected, ' got ', FResultArray[X]);
+            Result := False;
+            Exit;
+          end;
+      end;
+  end;
+
+{ TSInt64DivTest }
+
+function TSInt64DivTest.DoVariableDiv(Numerator: Int64): Int64;
+  begin
+    Result := Numerator div GetDivisor;
+  end;
+
+function TSInt64DivTest.WriteResults: Boolean;
+  var
+    X: Integer;
+    Expected: Int64;
+  begin
+    Result := True;
+    for X := 0 to 255 do
+      begin
+        Expected := DoVariableDiv(FInputArray[X]);
+        if FResultArray[X] <> Expected then
+          begin
+            WriteLn('FAIL - ', FInputArray[X], ' div ', GetDivisor, '; expected ', Expected, ' got ', FResultArray[X]);
+            Result := False;
+            Exit;
+          end;
+      end;
+  end;
+
+{ TSInt64ModTest }
+
+function TSInt64ModTest.DoVariableMod(Numerator: Int64): Int64;
+  begin
+    Result := Numerator mod GetDivisor;
+  end;
+
+function TSInt64ModTest.WriteResults: Boolean;
+  var
+    X: Integer;
+    Expected: Int64;
+  begin
+    Result := True;
+    for X := 0 to 255 do
+      begin
+        Expected := DoVariableMod(FInputArray[X]);
+        if FResultArray[X] <> Expected then
+          begin
+            WriteLn('FAIL - ', FInputArray[X], ' mod ', GetDivisor, '; expected ', Expected, ' got ', FResultArray[X]);
+            Result := False;
+            Exit;
+          end;
+      end;
+  end;
+
+{ Main function }
+const
+  TestClasses: array[0..53] of TTestClass = (
+    TUInt32Bit1Test,
+    TUInt32Bit1ModTest,
+    TUInt32Bit2Test,
+    TUInt32Bit2ModTest,
+    TUInt32Bit3Test,
+    TUInt32Bit3ModTest,
+    TUInt32Bit10Test,
+    TUInt32Bit10ModTest,
+    TUInt32Bit100Test,
+    TUInt32Bit100ModTest,
+    TUInt32Bit1000Test,
+    TUInt32Bit1000ModTest,
+    TUInt32Bit60000Test,
+    TUInt32Bit60000ModTest,
+    TUInt32Bit146097Test,
+    TUInt32Bit146097ModTest,
+    TUInt32Bit3600000Test,
+    TUInt32Bit3600000ModTest,
+    TUInt64Bit1Test,
+    TUInt64Bit1ModTest,
+    TUInt64Bit2Test,
+    TUInt64Bit2ModTest,
+    TUInt64Bit3Test,
+    TUInt64Bit3ModTest,
+    TUInt64Bit5Test,
+    TUInt64Bit5ModTest,
+    TUInt64Bit10Test,
+    TUInt64Bit10ModTest,
+    TUInt64Bit100Test,
+    TUInt64Bit100ModTest,
+    TUInt64Bit1000000000Test,
+    TUInt64Bit1000000000ModTest,
+    TSInt32Bit1Test,
+    TSInt32Bit1ModTest,
+    TSInt32Bit100Test,
+    TSInt32Bit100ModTest,
+    TSInt64Bit1Test,
+    TSInt64Bit1ModTest,
+    TSInt64Bit10Test,
+    TSInt64Bit10ModTest,
+    TSInt64Bit18Test,
+    TSInt64Bit18ModTest,
+    TSInt64Bit24Test,
+    TSInt64Bit24ModTest,
+    TSInt64Bit100Test,
+    TSInt64Bit100ModTest,
+    TSInt64Bit153Test,
+    TSInt64Bit153ModTest,
+    TSInt64Bit1461Test,
+    TSInt64Bit1461ModTest,
+    TSInt64Bit10000Test,
+    TSInt64Bit10000ModTest,
+    TSInt64Bit86400000Test,
+    TSInt64Bit86400000ModTest
+  );
+
+var
+  CurrentObject: TTestAncestor;
+  Failed: Boolean;
+  X: Integer;
+  SummedUpAverageDuration, AverageDuration : Double;
+begin
+  SummedUpAverageDuration := 0.0;
+  Failed := False;
+  WriteLn('Division compilation and timing test (using constants from System and Sysutils)');
+  WriteLn('-------------------------------------------------------------------------------');
+  for X := Low(TestClasses) to High(TestClasses) do
+    begin
+      try
+        CurrentObject := TestClasses[X].Create;
+        try
+          Write(CurrentObject.TestTitle:43, ' - ');
+          CurrentObject.Run;
+
+          if CurrentObject.WriteResults then
+            begin
+              AverageDuration := ((CurrentObject.RunTime * 1000000000.0) / (ITERATIONS * INTERNAL_LOOPS));
+              WriteLn('Pass - average iteration duration: ', AverageDuration:1:3, ' ns');
+              SummedUpAverageDuration := SummedUpAverageDuration + AverageDuration;
+            end
+          else
+            { Final average isn't processed if a test failed, so there's no need
+              to calculate and add the average duration to it }
+            Failed := True;
+
+        finally
+          CurrentObject.Free;
+        end;
+      except on E: Exception do
+        begin
+          WriteLn('Exception "', E.ClassName, '" raised while running test object of class "', TestClasses[X].ClassName, '"');
+          Failed := True;
+        end;
+      end;
+    end;
+
+  if Failed then
+    Halt(1);
+
+  WriteLn(#10'ok');
+  WriteLn('- Sum of average durations: ', SummedUpAverageDuration:1:3, ' ns');
+  WriteLn('- Overall average duration: ', (SummedUpAverageDuration / Length(TestClasses)):1:3, ' ns');
+end.

Index: tests/bench/bdiv_s32.inc
===================================================================
--- /dev/null
+++ tests/bench/bdiv_s32.inc
@@ -0,0 +1,208 @@
+type
+  { TSInt32Bit1Test }
+
+  TSInt32Bit1Test = class(TSInt32DivTest)
+    protected
+      function GetDivisor: Integer; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt32Bit1ModTest }
+
+  TSInt32Bit1ModTest = class(TSInt32ModTest)
+    protected
+      function GetDivisor: Integer; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt32Bit100Test }
+
+  TSInt32Bit100Test = class(TSInt32DivTest)
+    protected
+      function GetDivisor: Integer; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt32Bit100ModTest }
+
+  TSInt32Bit100ModTest = class(TSInt32ModTest)
+    protected
+      function GetDivisor: Integer; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+{ TSInt32Bit1Test }
+
+function TSInt32Bit1Test.TestTitle: shortstring;
+  begin
+    Result := 'Signed 32-bit division by 1';
+  end;
+
+function TSInt32Bit1Test.GetDivisor: Integer;
+  begin
+    Result := 1;
+  end;
+
+procedure TSInt32Bit1Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Integer;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      0:
+        Numerator := -2147483648;
+      1:
+        Numerator := -2147483600;
+      2:
+        Numerator := -2147483599;
+      253:
+        Numerator := 2147483599;
+      254:
+        Numerator := 2147483600;
+      255:
+        Numerator := 2147483647;
+      else
+        Numerator := Index - 128;
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 1;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt32Bit1ModTest }
+
+function TSInt32Bit1ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Signed 32-bit modulus by 1';
+  end;
+
+function TSInt32Bit1ModTest.GetDivisor: Integer;
+  begin
+    Result := 1;
+  end;
+
+procedure TSInt32Bit1ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Integer;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      0:
+        Numerator := -2147483648;
+      1:
+        Numerator := -2147483600;
+      2:
+        Numerator := -2147483599;
+      253:
+        Numerator := 2147483599;
+      254:
+        Numerator := 2147483600;
+      255:
+        Numerator := 2147483647;
+      else
+        Numerator := Index - 128;
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 1;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt32Bit100Test }
+
+function TSInt32Bit100Test.TestTitle: shortstring;
+  begin
+    Result := 'Signed 32-bit division by 100';
+  end;
+
+function TSInt32Bit100Test.GetDivisor: Integer;
+  begin
+    Result := 100;
+  end;
+
+procedure TSInt32Bit100Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Integer;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      0:
+        Numerator := -2147483648;
+      1:
+        Numerator := -2147483600;
+      2:
+        Numerator := -2147483599;
+      253:
+        Numerator := 2147483599;
+      254:
+        Numerator := 2147483600;
+      255:
+        Numerator := 2147483647;
+      else
+        Numerator := Index - 128;
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 100;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt32Bit100ModTest }
+
+function TSInt32Bit100ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Signed 32-bit modulus by 100';
+  end;
+
+function TSInt32Bit100ModTest.GetDivisor: Integer;
+  begin
+    Result := 100;
+  end;
+
+procedure TSInt32Bit100ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Integer;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      0:
+        Numerator := -2147483648;
+      1:
+        Numerator := -2147483600;
+      2:
+        Numerator := -2147483599;
+      253:
+        Numerator := 2147483599;
+      254:
+        Numerator := 2147483600;
+      255:
+        Numerator := 2147483647;
+      else
+        Numerator := Index - 128;
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 100;
+      
+    FResultArray[Index] := Answer;
+  end;

Index: tests/bench/bdiv_s64.inc
===================================================================
--- /dev/null
+++ tests/bench/bdiv_s64.inc
@@ -0,0 +1,772 @@
+type
+  { TSInt64Bit1Test }
+
+  TSInt64Bit1Test = class(TSInt64DivTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit1ModTest }
+
+  TSInt64Bit1ModTest = class(TSInt64ModTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit10Test }
+
+  TSInt64Bit10Test = class(TSInt64DivTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit10ModTest }
+
+  TSInt64Bit10ModTest = class(TSInt64ModTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit18Test }
+
+  TSInt64Bit18Test = class(TSInt64DivTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit18ModTest }
+
+  TSInt64Bit18ModTest = class(TSInt64ModTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit24Test }
+
+  TSInt64Bit24Test = class(TSInt64DivTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit24ModTest }
+
+  TSInt64Bit24ModTest = class(TSInt64ModTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit100Test }
+
+  TSInt64Bit100Test = class(TSInt64DivTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit100ModTest }
+
+  TSInt64Bit100ModTest = class(TSInt64ModTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit153Test }
+const
+  FS64_153Input: array[$0..$F] of Int64 =
+    (0, 1, 152, 153, 154, -1, -152, -153, -154,
+    8000000000000000117, 8000000000000000118, 8000000000000000119, 
+    -8000000000000000117, -8000000000000000118, -8000000000000000119,
+    Int64($8000000000000000));
+
+type
+  TSInt64Bit153Test = class(TSInt64DivTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit153ModTest }
+
+  TSInt64Bit153ModTest = class(TSInt64ModTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit1461Test }
+const
+  FS64_1461Input: array[$0..$F] of Int64 =
+    (0, 1, 1460, 1461, 1462, -1, -1460, -1461, -1462,
+    8000000000000000582, 8000000000000000583, 8000000000000000584, 
+    -8000000000000000582, -8000000000000000583, -8000000000000000584,
+    Int64($8000000000000000));
+
+type
+  TSInt64Bit1461Test = class(TSInt64DivTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit1461ModTest }
+
+  TSInt64Bit1461ModTest = class(TSInt64ModTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit10000Test }
+const
+  FS64_10000Input: array[$0..$F] of Int64 =
+    (0, 1, 9999, 10000, 10001, -1, -9999, -10000, -10001,
+    7999999999999999999, 8000000000000000000, 8000000000000000001, 
+    -7999999999999999999, -8000000000000000000, -8000000000000000001,
+    Int64($8000000000000000));
+
+type
+  TSInt64Bit10000Test = class(TSInt64DivTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit10000ModTest }
+
+  TSInt64Bit10000ModTest = class(TSInt64ModTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+  
+  { TSInt64Bit86400000Test }
+const
+  FS64_86400000Input: array[$0..$F] of Int64 =
+    (0, 1, 86399999, 86400000, 86400001, -1, -86399999, -86400000, -86400001,
+    8639999999999999999, 8640000000000000000, 8640000000000000001, 
+    -8639999999999999999, -8640000000000000000, -8640000000000000001,
+    Int64($8000000000000000));
+
+type
+  TSInt64Bit86400000Test = class(TSInt64DivTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TSInt64Bit86400000ModTest }
+
+  TSInt64Bit86400000ModTest = class(TSInt64ModTest)
+    protected
+      function GetDivisor: Int64; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+{ TSInt64Bit1Test }
+
+function TSInt64Bit1Test.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit division by 1';
+  end;
+
+function TSInt64Bit1Test.GetDivisor: Int64;
+  begin
+    Result := 1;
+  end;
+
+procedure TSInt64Bit1Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      0:
+        Numerator := Int64($8000000000000000);
+      1:
+        Numerator := Int64($8000000000000006);
+      2:
+        Numerator := Int64($8000000000000007);
+      253:
+        Numerator := Int64($7FFFFFFFFFFFFFF9);
+      254:
+        Numerator := Int64($7FFFFFFFFFFFFFFA);
+      255:
+        Numerator := Int64($7FFFFFFFFFFFFFFF);
+      else
+        Numerator := Index - 128;
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 1;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit1ModTest }
+
+function TSInt64Bit1ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit modulus by 1';
+  end;
+
+function TSInt64Bit1ModTest.GetDivisor: Int64;
+  begin
+    Result := 1;
+  end;
+
+procedure TSInt64Bit1ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      0:
+        Numerator := Int64($8000000000000000);
+      1:
+        Numerator := Int64($8000000000000006);
+      2:
+        Numerator := Int64($8000000000000007);
+      253:
+        Numerator := Int64($7FFFFFFFFFFFFFF9);
+      254:
+        Numerator := Int64($7FFFFFFFFFFFFFFA);
+      255:
+        Numerator := Int64($7FFFFFFFFFFFFFFF);
+      else
+        Numerator := Index - 128;
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 1;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit10Test }
+
+function TSInt64Bit10Test.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit division by 10';
+  end;
+
+function TSInt64Bit10Test.GetDivisor: Int64;
+  begin
+    Result := 10;
+  end;
+
+procedure TSInt64Bit10Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      0:
+        Numerator := Int64($8000000000000000);
+      1:
+        Numerator := Int64($8000000000000006);
+      2:
+        Numerator := Int64($8000000000000007);
+      253:
+        Numerator := Int64($7FFFFFFFFFFFFFF9);
+      254:
+        Numerator := Int64($7FFFFFFFFFFFFFFA);
+      255:
+        Numerator := Int64($7FFFFFFFFFFFFFFF);
+      else
+        Numerator := Index - 128;
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 10;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit10ModTest }
+
+function TSInt64Bit10ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit modulus by 10';
+  end;
+
+function TSInt64Bit10ModTest.GetDivisor: Int64;
+  begin
+    Result := 10;
+  end;
+
+procedure TSInt64Bit10ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      0:
+        Numerator := Int64($8000000000000000);
+      1:
+        Numerator := Int64($8000000000000006);
+      2:
+        Numerator := Int64($8000000000000007);
+      253:
+        Numerator := Int64($7FFFFFFFFFFFFFF9);
+      254:
+        Numerator := Int64($7FFFFFFFFFFFFFFA);
+      255:
+        Numerator := Int64($7FFFFFFFFFFFFFFF);
+      else
+        Numerator := Index - 128;
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 10;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit18Test }
+
+function TSInt64Bit18Test.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit division by 18';
+  end;
+
+function TSInt64Bit18Test.GetDivisor: Int64;
+  begin
+    Result := 18;
+  end;
+
+procedure TSInt64Bit18Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := Index - 128;
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 18;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit18ModTest }
+
+function TSInt64Bit18ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit modulus by 18';
+  end;
+
+function TSInt64Bit18ModTest.GetDivisor: Int64;
+  begin
+    Result := 18;
+  end;
+
+procedure TSInt64Bit18ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := Index - 128;
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 18;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit24Test }
+
+function TSInt64Bit24Test.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit division by 24';
+  end;
+
+function TSInt64Bit24Test.GetDivisor: Int64;
+  begin
+    Result := 24;
+  end;
+
+procedure TSInt64Bit24Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := Index - 128;
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 24;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit24ModTest }
+
+function TSInt64Bit24ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit modulus by 24';
+  end;
+
+function TSInt64Bit24ModTest.GetDivisor: Int64;
+  begin
+    Result := 24;
+  end;
+
+procedure TSInt64Bit24ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := Index - 128;
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 24;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit100Test }
+
+function TSInt64Bit100Test.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit division by 100';
+  end;
+
+function TSInt64Bit100Test.GetDivisor: Int64;
+  begin
+    Result := 100;
+  end;
+
+procedure TSInt64Bit100Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      0:
+        Numerator := Int64($8000000000000000);
+      1:
+        Numerator := Int64($8000000000000008);
+      2:
+        Numerator := Int64($8000000000000009);
+      253:
+        Numerator := Int64($7FFFFFFFFFFFFFF7);
+      254:
+        Numerator := Int64($7FFFFFFFFFFFFFF8);
+      255:
+        Numerator := Int64($7FFFFFFFFFFFFFFF);
+      else
+        Numerator := Index - 128;
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 100;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit100ModTest }
+
+function TSInt64Bit100ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit modulus by 100';
+  end;
+
+function TSInt64Bit100ModTest.GetDivisor: Int64;
+  begin
+    Result := 100;
+  end;
+
+procedure TSInt64Bit100ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      0:
+        Numerator := Int64($8000000000000000);
+      1:
+        Numerator := Int64($8000000000000008);
+      2:
+        Numerator := Int64($8000000000000009);
+      253:
+        Numerator := Int64($7FFFFFFFFFFFFFF7);
+      254:
+        Numerator := Int64($7FFFFFFFFFFFFFF8);
+      255:
+        Numerator := Int64($7FFFFFFFFFFFFFFF);
+      else
+        Numerator := Index - 128;
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 100;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit153Test }
+
+function TSInt64Bit153Test.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit division by 153';
+  end;
+
+function TSInt64Bit153Test.GetDivisor: Int64;
+  begin
+    Result := 153;
+  end;
+
+procedure TSInt64Bit153Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FS64_153Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 153;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit153ModTest }
+
+function TSInt64Bit153ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit modulus by 153';
+  end;
+
+function TSInt64Bit153ModTest.GetDivisor: Int64;
+  begin
+    Result := 153;
+  end;
+
+procedure TSInt64Bit153ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FS64_153Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 153;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit1461Test }
+
+function TSInt64Bit1461Test.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit division by 1,461';
+  end;
+
+function TSInt64Bit1461Test.GetDivisor: Int64;
+  begin
+    Result := 1461;
+  end;
+
+procedure TSInt64Bit1461Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FS64_1461Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 1461;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit1461ModTest }
+
+function TSInt64Bit1461ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit modulus by 1,461';
+  end;
+
+function TSInt64Bit1461ModTest.GetDivisor: Int64;
+  begin
+    Result := 1461;
+  end;
+
+procedure TSInt64Bit1461ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FS64_1461Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 1461;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit10000Test }
+
+function TSInt64Bit10000Test.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit division by 10,000 (Currency)';
+  end;
+
+function TSInt64Bit10000Test.GetDivisor: Int64;
+  begin
+    Result := 10000;
+  end;
+
+procedure TSInt64Bit10000Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FS64_10000Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 10000;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit10000ModTest }
+
+function TSInt64Bit10000ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit modulus by 10,000 (Currency)';
+  end;
+
+function TSInt64Bit10000ModTest.GetDivisor: Int64;
+  begin
+    Result := 10000;
+  end;
+
+procedure TSInt64Bit10000ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FS64_10000Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 10000;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit86400000Test }
+
+function TSInt64Bit86400000Test.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit division by 86,400,000';
+  end;
+
+function TSInt64Bit86400000Test.GetDivisor: Int64;
+  begin
+    Result := 86400000;
+  end;
+
+procedure TSInt64Bit86400000Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FS64_86400000Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 86400000;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TSInt64Bit86400000ModTest }
+
+function TSInt64Bit86400000ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Signed 64-bit modulus by 86,400,000';
+  end;
+
+function TSInt64Bit86400000ModTest.GetDivisor: Int64;
+  begin
+    Result := 86400000;
+  end;
+
+procedure TSInt64Bit86400000ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Int64;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FS64_86400000Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 86400000;
+      
+    FResultArray[Index] := Answer;
+  end;

Index: tests/bench/bdiv_u32.inc
===================================================================
--- /dev/null
+++ tests/bench/bdiv_u32.inc
@@ -0,0 +1,769 @@
+type
+  { TUInt32Bit1Test }
+
+  TUInt32Bit1Test = class(TUInt32DivTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit1ModTest }
+
+  TUInt32Bit1ModTest = class(TUInt32ModTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit2Test }
+
+  TUInt32Bit2Test = class(TUInt32DivTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit2ModTest }
+
+  TUInt32Bit2ModTest = class(TUInt32ModTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit3Test }
+
+  TUInt32Bit3Test = class(TUInt32DivTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit3ModTest }
+
+  TUInt32Bit3ModTest = class(TUInt32ModTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit10Test }
+
+  TUInt32Bit10Test = class(TUInt32DivTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit10ModTest }
+
+  TUInt32Bit10ModTest = class(TUInt32ModTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit100Test }
+
+  TUInt32Bit100Test = class(TUInt32DivTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit100ModTest }
+
+  TUInt32Bit100ModTest = class(TUInt32ModTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit1000Test }
+const
+  FU32_1000Input: array[$0..$F] of Cardinal =
+    (0, 1, 999, 1000, 1001, 1999, 2000, 2001,
+    4294958999, 4294959000, 4294959001,
+    $7FFFFFFE, $7FFFFFFF, $80000000, $80000001, $FFFFFFFF);
+
+type
+  TUInt32Bit1000Test = class(TUInt32DivTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit1000ModTest }
+
+  TUInt32Bit1000ModTest = class(TUInt32ModTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit60000Test }
+const
+  FU32_60000Input: array[$0..$F] of Cardinal =
+    (0, 1, 59999, 60000, 60001, 119999, 120000, 120001,
+    4294919999, 4294920000, 4294920001,
+    $7FFFFFFE, $7FFFFFFF, $80000000, $80000001, $FFFFFFFF);
+
+type
+  TUInt32Bit60000Test = class(TUInt32DivTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit60000ModTest }
+
+  TUInt32Bit60000ModTest = class(TUInt32ModTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit146097Test }
+const
+  FU32_146097Input: array[$0..$F] of Cardinal =
+    (0, 1, 146096, 146097, 146098, 292193, 292194, 292195,
+    4294959605, 4294959606, 4294959607,    
+    $7FFFFFFE, $7FFFFFFF, $80000000, $80000001, $FFFFFFFF);
+
+type
+  TUInt32Bit146097Test = class(TUInt32DivTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit146097ModTest }
+
+  TUInt32Bit146097ModTest = class(TUInt32ModTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+  
+  { TUInt32Bit3600000Test }
+const
+  FU32_3600000Input: array[$0..$F] of Cardinal =
+    (0, 1, 3599999, 3600000, 3600001, 7199999, 7200000, 7200001,
+    3600000000, 4294799999, 4294800000, 4294800001,
+    $7FFFFFFF, $80000000, $80000001, $FFFFFFFF);
+
+type
+  TUInt32Bit3600000Test = class(TUInt32DivTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt32Bit3600000ModTest }
+
+  TUInt32Bit3600000ModTest = class(TUInt32ModTest)
+    protected
+      function GetDivisor: Cardinal; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+
+{ TUInt32Bit1Test }
+
+function TUInt32Bit1Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit division by 1';
+  end;
+
+function TUInt32Bit1Test.GetDivisor: Cardinal;
+  begin
+    Result := 1;
+  end;
+
+procedure TUInt32Bit1Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := 4294967293;
+      254:
+        Numerator := 4294967294;
+      255:
+        Numerator := 4294967295;
+      else
+        Numerator := Cardinal(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 1;
+
+    FResultArray[Index] := Answer;
+  end;
+  
+{ TUInt32Bit1Test }
+
+function TUInt32Bit1ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit modulus by 1';
+  end;
+
+function TUInt32Bit1ModTest.GetDivisor: Cardinal;
+  begin
+    Result := 1;
+  end;
+
+procedure TUInt32Bit1ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := 4294967293;
+      254:
+        Numerator := 4294967294;
+      255:
+        Numerator := 4294967295;
+      else
+        Numerator := Cardinal(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 1;
+
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit2Test }
+
+function TUInt32Bit2Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit division by 2';
+  end;
+
+function TUInt32Bit2Test.GetDivisor: Cardinal;
+  begin
+    Result := 2;
+  end;
+
+procedure TUInt32Bit2Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := 4294967293;
+      254:
+        Numerator := 4294967294;
+      255:
+        Numerator := 4294967295;
+      else
+        Numerator := Cardinal(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 2;
+
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit2ModTest }
+
+function TUInt32Bit2ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit modulus by 2';
+  end;
+
+function TUInt32Bit2ModTest.GetDivisor: Cardinal;
+  begin
+    Result := 2;
+  end;
+
+procedure TUInt32Bit2ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := 4294967293;
+      254:
+        Numerator := 4294967294;
+      255:
+        Numerator := 4294967295;
+      else
+        Numerator := Cardinal(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 2;
+
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit3Test }
+
+function TUInt32Bit3Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit division by 3';
+  end;
+
+function TUInt32Bit3Test.GetDivisor: Cardinal;
+  begin
+    Result := 3;
+  end;
+
+procedure TUInt32Bit3Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      254:
+        Numerator := 4294967294;
+      255:
+        Numerator := 4294967295;
+      else
+        Numerator := Cardinal(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 3;
+
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit3ModTest }
+
+function TUInt32Bit3ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit modulus by 3';
+  end;
+
+function TUInt32Bit3ModTest.GetDivisor: Cardinal;
+  begin
+    Result := 3;
+  end;
+
+procedure TUInt32Bit3ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      254:
+        Numerator := 4294967294;
+      255:
+        Numerator := 4294967295;
+      else
+        Numerator := Cardinal(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 3;
+
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit10Test }
+
+function TUInt32Bit10Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit division by 10';
+  end;
+
+function TUInt32Bit10Test.GetDivisor: Cardinal;
+  begin
+    Result := 10;
+  end;
+
+procedure TUInt32Bit10Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := 4294967289;
+      254:
+        Numerator := 4294967290;
+      255:
+        Numerator := 4294967295;
+      else
+        Numerator := Cardinal(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 10;
+
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit10ModTest }
+
+function TUInt32Bit10ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit modulus by 10';
+  end;
+
+function TUInt32Bit10ModTest.GetDivisor: Cardinal;
+  begin
+    Result := 10;
+  end;
+
+procedure TUInt32Bit10ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := 4294967289;
+      254:
+        Numerator := 4294967290;
+      255:
+        Numerator := 4294967295;
+      else
+        Numerator := Cardinal(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 10;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit100Test }
+
+function TUInt32Bit100Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit division by 100';
+  end;
+
+function TUInt32Bit100Test.GetDivisor: Cardinal;
+  begin
+    Result := 100;
+  end;
+
+procedure TUInt32Bit100Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := 4294967199;
+      254:
+        Numerator := 4294967200;
+      255:
+        Numerator := 4294967295;
+      else
+        Numerator := Cardinal(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 100;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit100ModTest }
+
+function TUInt32Bit100ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit modulus by 100';
+  end;
+
+function TUInt32Bit100ModTest.GetDivisor: Cardinal;
+  begin
+    Result := 100;
+  end;
+
+procedure TUInt32Bit100ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := 4294967199;
+      254:
+        Numerator := 4294967200;
+      255:
+        Numerator := 4294967295;
+      else
+        Numerator := Cardinal(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 100;
+
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit1000Test }
+
+function TUInt32Bit1000Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit division by 1,000';
+  end;
+
+function TUInt32Bit1000Test.GetDivisor: Cardinal;
+  begin
+    Result := 1000;
+  end;
+
+procedure TUInt32Bit1000Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FU32_1000Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 1000;
+
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit1000ModTest }
+
+function TUInt32Bit1000ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit modulus by 1,000';
+  end;
+
+function TUInt32Bit1000ModTest.GetDivisor: Cardinal;
+  begin
+    Result := 1000;
+  end;
+
+procedure TUInt32Bit1000ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FU32_1000Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 1000;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit60000Test }
+
+function TUInt32Bit60000Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit division by 60,000';
+  end;
+
+function TUInt32Bit60000Test.GetDivisor: Cardinal;
+  begin
+    Result := 60000;
+  end;
+
+procedure TUInt32Bit60000Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FU32_60000Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 60000;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit60000ModTest }
+
+function TUInt32Bit60000ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit modulus by 60,000';
+  end;
+
+function TUInt32Bit60000ModTest.GetDivisor: Cardinal;
+  begin
+    Result := 60000;
+  end;
+
+procedure TUInt32Bit60000ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FU32_60000Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 60000;
+
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit146097Test }
+
+function TUInt32Bit146097Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit division by 146,097';
+  end;
+
+function TUInt32Bit146097Test.GetDivisor: Cardinal;
+  begin
+    Result := 146097;
+  end;
+
+procedure TUInt32Bit146097Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FU32_146097Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 146097;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit146097ModTest }
+
+function TUInt32Bit146097ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit modulus by 146,097';
+  end;
+
+function TUInt32Bit146097ModTest.GetDivisor: Cardinal;
+  begin
+    Result := 146097;
+  end;
+
+procedure TUInt32Bit146097ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FU32_146097Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 146097;
+
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit3600000Test }
+
+function TUInt32Bit3600000Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit division by 3,600,000';
+  end;
+
+function TUInt32Bit3600000Test.GetDivisor: Cardinal;
+  begin
+    Result := 3600000;
+  end;
+
+procedure TUInt32Bit3600000Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FU32_3600000Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 3600000;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt32Bit3600000ModTest }
+
+function TUInt32Bit3600000ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 32-bit modulus by 3,600,000';
+  end;
+
+function TUInt32Bit3600000ModTest.GetDivisor: Cardinal;
+  begin
+    Result := 3600000;
+  end;
+
+procedure TUInt32Bit3600000ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: Cardinal;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FU32_3600000Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 3600000;
+
+    FResultArray[Index] := Answer;
+  end;

Index: tests/bench/bdiv_u64.inc
===================================================================
--- /dev/null
+++ tests/bench/bdiv_u64.inc
@@ -0,0 +1,621 @@
+type
+  { TUInt64Bit1Test }
+
+  TUInt64Bit1Test = class(TUInt64DivTest)
+    protected
+      function GetDivisor: QWord; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt64Bit1ModTest }
+
+  TUInt64Bit1ModTest = class(TUInt64ModTest)
+    protected
+      function GetDivisor: QWord; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt64Bit2Test }
+
+  TUInt64Bit2Test = class(TUInt64DivTest)
+    protected
+      function GetDivisor: QWord; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt64Bit2ModTest }
+
+  TUInt64Bit2ModTest = class(TUInt64ModTest)
+    protected
+      function GetDivisor: QWord; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt64Bit3Test }
+
+  TUInt64Bit3Test = class(TUInt64DivTest)
+    protected
+      function GetDivisor: QWord; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt64Bit3ModTest }
+
+  TUInt64Bit3ModTest = class(TUInt64ModTest)
+    protected
+      function GetDivisor: QWord; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt64Bit5Test }
+
+  TUInt64Bit5Test = class(TUInt64DivTest)
+    protected
+      function GetDivisor: QWord; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt64Bit5ModTest }
+
+  TUInt64Bit5ModTest = class(TUInt64ModTest)
+    protected
+      function GetDivisor: QWord; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt64Bit10Test }
+
+  TUInt64Bit10Test = class(TUInt64DivTest)
+    protected
+      function GetDivisor: QWord; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt64Bit10ModTest }
+
+  TUInt64Bit10ModTest = class(TUInt64ModTest)
+    protected
+      function GetDivisor: QWord; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt64Bit100Test }
+
+  TUInt64Bit100Test = class(TUInt64DivTest)
+    protected
+      function GetDivisor: QWord; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt64Bit100ModTest }
+
+  TUInt64Bit100ModTest = class(TUInt64ModTest)
+    protected
+      function GetDivisor: QWord; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  { TUInt64Bit1000000000Test }
+const
+  FU64_1000000000Input: array[$0..$F] of QWord =
+    (0, 1, 999999999, 1000000000, 1000000001, 5000000000,
+    7999999999999999999, 8000000000000000000, 8000000000000000001,
+    QWord(15999999999999999999), QWord(16000000000000000000), QWord(16000000000000000001),
+    $7FFFFFFFFFFFFFFF, QWord($8000000000000000), QWord($8000000000000001), QWord($FFFFFFFFFFFFFFFF));
+
+type
+  TUInt64Bit1000000000Test = class(TUInt64DivTest)
+    protected
+      function GetDivisor: QWord; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+  TUInt64Bit1000000000ModTest = class(TUInt64ModTest)
+    protected
+      function GetDivisor: QWord; override;
+      procedure DoTestIteration(Iteration: Integer); override;
+    public
+      function TestTitle: shortstring; override;
+  end;
+
+{ TUInt64Bit1Test }
+
+function TUInt64Bit1Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 64-bit division by 1';
+  end;
+
+function TUInt64Bit1Test.GetDivisor: QWord;
+  begin
+    Result := 1;
+  end;
+
+procedure TUInt64Bit1Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: QWord;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := QWord($FFFFFFFFFFFFFFFD);
+      254:
+        Numerator := QWord($FFFFFFFFFFFFFFFE);
+      255:
+        Numerator := QWord($FFFFFFFFFFFFFFFF);
+      else
+        Numerator := QWord(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 1;
+
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt64Bit1ModTest }
+
+function TUInt64Bit1ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 64-bit modulus by 1';
+  end;
+
+function TUInt64Bit1ModTest.GetDivisor: QWord;
+  begin
+    Result := 1;
+  end;
+
+procedure TUInt64Bit1ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: QWord;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := QWord($FFFFFFFFFFFFFFFD);
+      254:
+        Numerator := QWord($FFFFFFFFFFFFFFFE);
+      255:
+        Numerator := QWord($FFFFFFFFFFFFFFFF);
+      else
+        Numerator := QWord(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 1;
+
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt64Bit2Test }
+
+function TUInt64Bit2Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 64-bit division by 2';
+  end;
+
+function TUInt64Bit2Test.GetDivisor: QWord;
+  begin
+    Result := 2;
+  end;
+
+procedure TUInt64Bit2Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: QWord;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := QWord($FFFFFFFFFFFFFFFD);
+      254:
+        Numerator := QWord($FFFFFFFFFFFFFFFE);
+      255:
+        Numerator := QWord($FFFFFFFFFFFFFFFF);
+      else
+        Numerator := QWord(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 2;
+
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt64Bit2ModTest }
+
+function TUInt64Bit2ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 64-bit modulus by 2';
+  end;
+
+function TUInt64Bit2ModTest.GetDivisor: QWord;
+  begin
+    Result := 2;
+  end;
+
+procedure TUInt64Bit2ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: QWord;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := QWord($FFFFFFFFFFFFFFFD);
+      254:
+        Numerator := QWord($FFFFFFFFFFFFFFFE);
+      255:
+        Numerator := QWord($FFFFFFFFFFFFFFFF);
+      else
+        Numerator := QWord(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 2;
+
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt64Bit3Test }
+
+function TUInt64Bit3Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 64-bit division by 3';
+  end;
+
+function TUInt64Bit3Test.GetDivisor: QWord;
+  begin
+    Result := 3;
+  end;
+
+procedure TUInt64Bit3Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: QWord;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      254:
+        Numerator := QWord($FFFFFFFFFFFFFFFE);
+      255:
+        Numerator := QWord($FFFFFFFFFFFFFFFF);
+      else
+        Numerator := QWord(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 3;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt64Bit3ModTest }
+
+function TUInt64Bit3ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 64-bit modulus by 3';
+  end;
+
+function TUInt64Bit3ModTest.GetDivisor: QWord;
+  begin
+    Result := 3;
+  end;
+
+procedure TUInt64Bit3ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: QWord;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      254:
+        Numerator := QWord($FFFFFFFFFFFFFFFE);
+      255:
+        Numerator := QWord($FFFFFFFFFFFFFFFF);
+      else
+        Numerator := QWord(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 3;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt64Bit5Test }
+
+function TUInt64Bit5Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 64-bit division by 5';
+  end;
+
+function TUInt64Bit5Test.GetDivisor: QWord;
+  begin
+    Result := 5;
+  end;
+
+procedure TUInt64Bit5Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: QWord;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      254:
+        Numerator := QWord($FFFFFFFFFFFFFFFE);
+      255:
+        Numerator := QWord($FFFFFFFFFFFFFFFF);
+      else
+        Numerator := QWord(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 5;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt64Bit5ModTest }
+
+function TUInt64Bit5ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 64-bit modulus by 5';
+  end;
+
+function TUInt64Bit5ModTest.GetDivisor: QWord;
+  begin
+    Result := 5;
+  end;
+
+procedure TUInt64Bit5ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: QWord;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      254:
+        Numerator := QWord($FFFFFFFFFFFFFFFE);
+      255:
+        Numerator := QWord($FFFFFFFFFFFFFFFF);
+      else
+        Numerator := QWord(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 5;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt64Bit10Test }
+
+function TUInt64Bit10Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 64-bit division by 10';
+  end;
+
+function TUInt64Bit10Test.GetDivisor: QWord;
+  begin
+    Result := 10;
+  end;
+
+procedure TUInt64Bit10Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: QWord;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := QWord($FFFFFFFFFFFFFFF9);
+      254:
+        Numerator := QWord($FFFFFFFFFFFFFFFA);
+      255:
+        Numerator := QWord($FFFFFFFFFFFFFFFF);
+      else
+        Numerator := QWord(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 10;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt64Bit10ModTest }
+
+function TUInt64Bit10ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 64-bit modulus by 10';
+  end;
+
+function TUInt64Bit10ModTest.GetDivisor: QWord;
+  begin
+    Result := 10;
+  end;
+
+procedure TUInt64Bit10ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: QWord;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := QWord($FFFFFFFFFFFFFFF9);
+      254:
+        Numerator := QWord($FFFFFFFFFFFFFFFA);
+      255:
+        Numerator := QWord($FFFFFFFFFFFFFFFF);
+      else
+        Numerator := QWord(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 10;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt64Bit100Test }
+
+function TUInt64Bit100Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 64-bit division by 100';
+  end;
+
+function TUInt64Bit100Test.GetDivisor: QWord;
+  begin
+    Result := 100;
+  end;
+
+procedure TUInt64Bit100Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: QWord;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := QWord($FFFFFFFFFFFFFFEF);
+      254:
+        Numerator := QWord($FFFFFFFFFFFFFFF0);
+      255:
+        Numerator := QWord($FFFFFFFFFFFFFFFF);
+      else
+        Numerator := Cardinal(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 100;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt64Bit100ModTest }
+
+function TUInt64Bit100ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 64-bit modulus by 100';
+  end;
+
+function TUInt64Bit100ModTest.GetDivisor: QWord;
+  begin
+    Result := 100;
+  end;
+
+procedure TUInt64Bit100ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: QWord;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    case Index of
+      253:
+        Numerator := QWord($FFFFFFFFFFFFFFEF);
+      254:
+        Numerator := QWord($FFFFFFFFFFFFFFF0);
+      255:
+        Numerator := QWord($FFFFFFFFFFFFFFFF);
+      else
+        Numerator := Cardinal(Index);
+    end;
+
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 100;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt64Bit1000000000Test }
+
+function TUInt64Bit1000000000Test.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 64-bit division by 1,000,000,000';
+  end;
+
+function TUInt64Bit1000000000Test.GetDivisor: QWord;
+  begin
+    Result := 1000000000;
+  end;
+
+procedure TUInt64Bit1000000000Test.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: QWord;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FU64_1000000000Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator div 1000000000;
+      
+    FResultArray[Index] := Answer;
+  end;
+
+{ TUInt64Bit1000000000ModTest }
+
+function TUInt64Bit1000000000ModTest.TestTitle: shortstring;
+  begin
+    Result := 'Unsigned 64-bit modulus by 1,000,000,000';
+  end;
+
+function TUInt64Bit1000000000ModTest.GetDivisor: QWord;
+  begin
+    Result := 1000000000;
+  end;
+
+procedure TUInt64Bit1000000000ModTest.DoTestIteration(Iteration: Integer);
+  var
+    Numerator, Answer: QWord;
+    Index, X: Integer;
+  begin
+    Index := Iteration and $FF;
+    Numerator := FU64_1000000000Input[Index and $F];
+    FInputArray[Index] := Numerator;
+    for X := 0 to INTERNAL_LOOPS - 1 do
+      Answer := Numerator mod 1000000000;
+      
+    FResultArray[Index] := Answer;
+  end;

Index: tests/test/cg/tmoddiv6.pp
===================================================================
--- /dev/null
+++ tests/test/cg/tmoddiv6.pp
@@ -0,0 +1,3 @@
+{ %OPT=-O2 }
+{ this benchmark can be used also as a test case }
+{$I ../../bench/bdiv.pp}
div-bench-test.patch (71,660 bytes)   

J. Gareth Moreton

2021-04-27 00:16

developer   ~0130602

Sample log from new bench test (compiled under -O2):

------
Trunk:
------

Division compilation and timing test (using constants from System and Sysutils)
-------------------------------------------------------------------------------
              Unsigned 32-bit division by 1 - Pass - average iteration duration: 3.027 ns
               Unsigned 32-bit modulus by 1 - Pass - average iteration duration: 1.863 ns
              Unsigned 32-bit division by 2 - Pass - average iteration duration: 2.095 ns
               Unsigned 32-bit modulus by 2 - Pass - average iteration duration: 2.328 ns
              Unsigned 32-bit division by 3 - Pass - average iteration duration: 3.958 ns
               Unsigned 32-bit modulus by 3 - Pass - average iteration duration: 4.889 ns
             Unsigned 32-bit division by 10 - Pass - average iteration duration: 3.958 ns
              Unsigned 32-bit modulus by 10 - Pass - average iteration duration: 4.657 ns
            Unsigned 32-bit division by 100 - Pass - average iteration duration: 3.725 ns
             Unsigned 32-bit modulus by 100 - Pass - average iteration duration: 4.191 ns
          Unsigned 32-bit division by 1,000 - Pass - average iteration duration: 5.122 ns
           Unsigned 32-bit modulus by 1,000 - Pass - average iteration duration: 5.821 ns
         Unsigned 32-bit division by 60,000 - Pass - average iteration duration: 4.657 ns
          Unsigned 32-bit modulus by 60,000 - Pass - average iteration duration: 5.588 ns
        Unsigned 32-bit division by 146,097 - Pass - average iteration duration: 4.424 ns
         Unsigned 32-bit modulus by 146,097 - Pass - average iteration duration: 5.122 ns
      Unsigned 32-bit division by 3,600,000 - Pass - average iteration duration: 4.191 ns
       Unsigned 32-bit modulus by 3,600,000 - Pass - average iteration duration: 4.889 ns
              Unsigned 64-bit division by 1 - Pass - average iteration duration: 1.630 ns
               Unsigned 64-bit modulus by 1 - Pass - average iteration duration: 1.630 ns
              Unsigned 64-bit division by 2 - Pass - average iteration duration: 2.328 ns
               Unsigned 64-bit modulus by 2 - Pass - average iteration duration: 1.863 ns
              Unsigned 64-bit division by 3 - Pass - average iteration duration: 4.191 ns
               Unsigned 64-bit modulus by 3 - Pass - average iteration duration: 4.889 ns
              Unsigned 64-bit division by 5 - Pass - average iteration duration: 3.958 ns
               Unsigned 64-bit modulus by 5 - Pass - average iteration duration: 4.889 ns
             Unsigned 64-bit division by 10 - Pass - average iteration duration: 4.191 ns
              Unsigned 64-bit modulus by 10 - Pass - average iteration duration: 4.889 ns
            Unsigned 64-bit division by 100 - Pass - average iteration duration: 3.492 ns
             Unsigned 64-bit modulus by 100 - Pass - average iteration duration: 4.657 ns
  Unsigned 64-bit division by 1,000,000,000 - Pass - average iteration duration: 6.752 ns
   Unsigned 64-bit modulus by 1,000,000,000 - Pass - average iteration duration: 7.451 ns
                Signed 32-bit division by 1 - Pass - average iteration duration: 1.863 ns
                 Signed 32-bit modulus by 1 - Pass - average iteration duration: 1.863 ns
              Signed 32-bit division by 100 - Pass - average iteration duration: 3.260 ns
               Signed 32-bit modulus by 100 - Pass - average iteration duration: 4.424 ns
                Signed 64-bit division by 1 - Pass - average iteration duration: 1.863 ns
                 Signed 64-bit modulus by 1 - Pass - average iteration duration: 1.863 ns
               Signed 64-bit division by 10 - Pass - average iteration duration: 4.191 ns
                Signed 64-bit modulus by 10 - Pass - average iteration duration: 4.889 ns
               Signed 64-bit division by 18 - Pass - average iteration duration: 3.958 ns
                Signed 64-bit modulus by 18 - Pass - average iteration duration: 4.657 ns
               Signed 64-bit division by 24 - Pass - average iteration duration: 3.725 ns
                Signed 64-bit modulus by 24 - Pass - average iteration duration: 4.657 ns
              Signed 64-bit division by 100 - Pass - average iteration duration: 3.492 ns
               Signed 64-bit modulus by 100 - Pass - average iteration duration: 4.424 ns
              Signed 64-bit division by 153 - Pass - average iteration duration: 7.218 ns
               Signed 64-bit modulus by 153 - Pass - average iteration duration: 8.149 ns
            Signed 64-bit division by 1,461 - Pass - average iteration duration: 6.985 ns
             Signed 64-bit modulus by 1,461 - Pass - average iteration duration: 7.916 ns
Signed 64-bit division by 10,000 (Currency) - Pass - average iteration duration: 6.752 ns
 Signed 64-bit modulus by 10,000 (Currency) - Pass - average iteration duration: 7.916 ns
       Signed 64-bit division by 86,400,000 - Pass - average iteration duration: 5.821 ns
        Signed 64-bit modulus by 86,400,000 - Pass - average iteration duration: 6.985 ns

ok
- Sum of average durations: 238.186 ns
- Overall average duration: 4.411 ns

------
Patch:
------

Division compilation and timing test (using constants from System and Sysutils)
-------------------------------------------------------------------------------
              Unsigned 32-bit division by 1 - Pass - average iteration duration: 1.630 ns
               Unsigned 32-bit modulus by 1 - Pass - average iteration duration: 1.630 ns
              Unsigned 32-bit division by 2 - Pass - average iteration duration: 1.630 ns
               Unsigned 32-bit modulus by 2 - Pass - average iteration duration: 1.630 ns
              Unsigned 32-bit division by 3 - Pass - average iteration duration: 2.328 ns
               Unsigned 32-bit modulus by 3 - Pass - average iteration duration: 3.027 ns
             Unsigned 32-bit division by 10 - Pass - average iteration duration: 2.561 ns
              Unsigned 32-bit modulus by 10 - Pass - average iteration duration: 4.191 ns
            Unsigned 32-bit division by 100 - Pass - average iteration duration: 2.561 ns
             Unsigned 32-bit modulus by 100 - Pass - average iteration duration: 2.794 ns
          Unsigned 32-bit division by 1,000 - Pass - average iteration duration: 2.328 ns
           Unsigned 32-bit modulus by 1,000 - Pass - average iteration duration: 3.027 ns
         Unsigned 32-bit division by 60,000 - Pass - average iteration duration: 2.328 ns
          Unsigned 32-bit modulus by 60,000 - Pass - average iteration duration: 3.027 ns
        Unsigned 32-bit division by 146,097 - Pass - average iteration duration: 2.328 ns
         Unsigned 32-bit modulus by 146,097 - Pass - average iteration duration: 3.260 ns
      Unsigned 32-bit division by 3,600,000 - Pass - average iteration duration: 2.328 ns
       Unsigned 32-bit modulus by 3,600,000 - Pass - average iteration duration: 3.260 ns
              Unsigned 64-bit division by 1 - Pass - average iteration duration: 1.863 ns
               Unsigned 64-bit modulus by 1 - Pass - average iteration duration: 1.630 ns
              Unsigned 64-bit division by 2 - Pass - average iteration duration: 1.630 ns
               Unsigned 64-bit modulus by 2 - Pass - average iteration duration: 1.630 ns
              Unsigned 64-bit division by 3 - Pass - average iteration duration: 3.027 ns
               Unsigned 64-bit modulus by 3 - Pass - average iteration duration: 4.889 ns
              Unsigned 64-bit division by 5 - Pass - average iteration duration: 2.794 ns
               Unsigned 64-bit modulus by 5 - Pass - average iteration duration: 4.889 ns
             Unsigned 64-bit division by 10 - Pass - average iteration duration: 3.027 ns
              Unsigned 64-bit modulus by 10 - Pass - average iteration duration: 4.889 ns
            Unsigned 64-bit division by 100 - Pass - average iteration duration: 3.492 ns
             Unsigned 64-bit modulus by 100 - Pass - average iteration duration: 5.355 ns
  Unsigned 64-bit division by 1,000,000,000 - Pass - average iteration duration: 3.492 ns
   Unsigned 64-bit modulus by 1,000,000,000 - Pass - average iteration duration: 5.588 ns
                Signed 32-bit division by 1 - Pass - average iteration duration: 1.630 ns
                 Signed 32-bit modulus by 1 - Pass - average iteration duration: 1.630 ns
              Signed 32-bit division by 100 - Pass - average iteration duration: 4.657 ns
               Signed 32-bit modulus by 100 - Pass - average iteration duration: 4.424 ns
                Signed 64-bit division by 1 - Pass - average iteration duration: 1.863 ns
                 Signed 64-bit modulus by 1 - Pass - average iteration duration: 1.630 ns
               Signed 64-bit division by 10 - Pass - average iteration duration: 3.027 ns
                Signed 64-bit modulus by 10 - Pass - average iteration duration: 5.122 ns
               Signed 64-bit division by 18 - Pass - average iteration duration: 2.794 ns
                Signed 64-bit modulus by 18 - Pass - average iteration duration: 4.889 ns
               Signed 64-bit division by 24 - Pass - average iteration duration: 2.794 ns
                Signed 64-bit modulus by 24 - Pass - average iteration duration: 4.657 ns
              Signed 64-bit division by 100 - Pass - average iteration duration: 3.725 ns
               Signed 64-bit modulus by 100 - Pass - average iteration duration: 4.657 ns
              Signed 64-bit division by 153 - Pass - average iteration duration: 3.027 ns
               Signed 64-bit modulus by 153 - Pass - average iteration duration: 8.149 ns
            Signed 64-bit division by 1,461 - Pass - average iteration duration: 3.492 ns
             Signed 64-bit modulus by 1,461 - Pass - average iteration duration: 8.149 ns
Signed 64-bit division by 10,000 (Currency) - Pass - average iteration duration: 3.027 ns
 Signed 64-bit modulus by 10,000 (Currency) - Pass - average iteration duration: 7.683 ns
       Signed 64-bit division by 86,400,000 - Pass - average iteration duration: 3.027 ns
        Signed 64-bit modulus by 86,400,000 - Pass - average iteration duration: 6.985 ns

ok
- Sum of average durations: 185.100 ns
- Overall average duration: 3.428 ns

Florian

2021-04-29 22:01

administrator   ~0130663

Thanks, applied!

Issue History

Date Modified Username Field Change
2021-04-27 00:14 J. Gareth Moreton New Issue
2021-04-27 00:14 J. Gareth Moreton File Added: a64-magic-div.patch
2021-04-27 00:14 J. Gareth Moreton File Added: div-bench-test.patch
2021-04-27 00:16 J. Gareth Moreton Note Added: 0130602
2021-04-27 00:16 J. Gareth Moreton OS Debian GNU/Linux => Debian GNU/Linux (Raspberry Pi)
2021-04-27 00:16 J. Gareth Moreton Additional Information Updated View Revisions
2021-04-27 00:16 J. Gareth Moreton FPCTarget => -
2021-04-27 00:21 J. Gareth Moreton Tag Attached: patch
2021-04-27 00:21 J. Gareth Moreton Tag Attached: aarch64
2021-04-27 00:21 J. Gareth Moreton Tag Attached: optimization
2021-04-29 22:01 Florian Assigned To => Florian
2021-04-29 22:01 Florian Status new => resolved
2021-04-29 22:01 Florian Resolution open => fixed
2021-04-29 22:01 Florian Fixed in Version => 3.3.1
2021-04-29 22:01 Florian Fixed in Revision => 49290, 49291
2021-04-29 22:01 Florian Note Added: 0130663