Feature request: aligned setlength()
Original Reporter info from Mantis: crlab @neurolabusc1
-
Reporter name: Chris Rorden
Original Reporter info from Mantis: crlab @neurolabusc1
- Reporter name: Chris Rorden
Description:
The reference-counted dynamic arrays provided by FPC are terrifically useful, and dramatically simplify code for many situations. However, choosing setlength does not support custom pointer alignment. I think many users, and particularly those using SIMD instructions (SSE, OpenCL, OpenGL, Metal) would benefit if the setlength function was overloaded:
procedure SetLength(var A: DynArrayType; Len: SizeInt); overload;
procedure SetLength(var A: DynArrayType; Len, Alignment: SizeInt); overload;
All old software would use the first version, and would operate the same. However, the second variation would ensure the buffer pointer was memory aligned.
I realize that FPC dynamic arrays contain more information than just the pointer (that is what makes them so useful), so this is probably not a trivial change. However, I do think including this would allow users to take advantage of the elegant features of dynamic arrays and the power of many modern SIMD accelerations.
I am happy to provide a $200 USD bounty to anyone who can implement this or provide an elegant/efficient method to ensure byte alignment of FPC dynamic arrays.
Steps to reproduce:
program Hello;
uses sysutils;
procedure testAlign (ptr: pointer; lengthBytes: integer);
const
kAlign = 4096;
var
inAlign, newAlign: PtrUint;
uptr,aptr: pointer;
begin
inAlign := PtrUint(ptr) mod kAlign;
writeln(format('input alignment (0=desired) %d', [inAlign]));
if (inAlign = 0) then begin
//do something with ptr here, e.g. shift to GPU
exit; //already aligned offset and length
end;
getmem(uptr, lengthBytes+kAlign); //unaligned, larger
aptr := System.Align(uptr, kAlign); //aligned
newAlign := PtrUint(aptr) mod kAlign;
writeln(format('new alignment (0=desired) %d', [newAlign]));
system.move(ptr^,aptr^,lengthBytes);
//do something with aptr here, e.g. shift to GPU
freemem(uptr);
end;
procedure CreateDynArray (lengthBytes: integer);
var
dyn: array of byte;
begin
setlength(dyn, lengthBytes);
testAlign (@dyn[0], lengthBytes);
writeln('Created array');
end;
begin
CreateDynArray(4096);
end.
Additional information:
As a concrete example, consider Apple's Metal Framework, which allows both GPU based graphics and compute. Ryan Joseph has written a nice FPC wrapper
https://github.com/genericptr/Metal-Framework
However, the Metal Framework requires buffers to be aligned to 4096 byte boundaries:
If you want to use newBufferWithBytesNoCopy, your allocated buffer storage needs to be perfeclty page-aligned — both the start and the end. So specifying alignment of the start (with __attribute__((aligned(4096))) is not enough, you also need to allocate a multiply of 4K bytes. A quick and dirty code using statically allocated arrays:
I can handle this with the code in "steps to reproduce", but the code becomes much harder to follow, requires more memory and includes the penalty of a memory copy. Every developer needs to track the unaligned pointer (for freeing memory), the aligned pointer (for copying) and any use of Delphi dynamic arrays requires copying back and forth, which will have a big penalty for many compute uses.
Mantis conversion info:
- Mantis ID: 34031
- OS: Darwin
- OS Build: 10.11.6
- Platform: MacBook 2012 Retina 13"
- Version: 3.0.4
- Monitored by: » @neurolabusc1 (Chris Rorden), » @genericptr (Ryan Joseph), » @CuriousKit (J. Gareth Moreton)