View Issue Details

IDProjectCategoryView StatusLast Update
0037039FPCRTLpublic2020-05-24 10:33
ReporterNoName Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status newResolutionopen 
Product Version3.3.1 
Summary0037039: Add more 64bit optimized versions of often used string functions
DescriptionWhile the FPC 32bit string RTL includes many hand-crafted assembly versions the 64bit RTL only provides one implementation (StrComp).
Same issue applies for ARM which also gets more important (phones + server).
Steps To Reproduce32bit optimizations
https://github.com/graemeg/freepascal/blob/master/rtl/i386/strings.inc

64bit optimizations
https://github.com/graemeg/freepascal/blob/master/rtl/x86_64/strings.inc
TagsNo tags attached.
Fixed in Revision
FPCOldBugId
FPCTarget
Attached Files

Activities

NoName

2020-05-08 02:42

reporter   ~0122665

Is it possible to combine Intel asm syntax and AT&T in one file or is it only parsed once and then fixed to that syntax for the whole file?
What would be the requirement for new patches? Benchmarks? Providing tests?

Sven Barth

2020-05-08 09:59

manager   ~0122667

The assembler mode is a local directive which can be set per assembler block:

program tasmtest;

procedure Test;
begin
  {$asmmode intel}
  asm
    mov ebx, 2
  end;

  {$asmmode att}
  asm
    movl $2, %ebx
  end;
end;

begin

end.


Tests should in theory already be covered by the testsuite, so you mainly need to make sure that new routines don't break any existing tests (and considering that we're talking about rather important, low level routines, if they're broken you will know it :P ). For running the testsuite and a recommended workflow, please look here: https://wiki.freepascal.org/Testing_FPC#Running_the_testsuite

Benchmarks would be a good idea however, so that we can judge if it's paying off.

Florian

2020-05-08 21:41

administrator   ~0122674

I am not a fan in adding more assembler routines. There most be a really good reason not to improve the optimizer instead or at least use intrinsic basic versions.

Arnaud Bouchez

2020-05-10 15:07

reporter   ~0122697

I have included x86_64 asm patches for fpc_dynarray_* fpc_ansistr_* and fpc_unicodestr_* raw RTL functions in our mORMot Open Source frameworks.
It patches the RTL at runtime, replacing the default pascal version with optimized x86_64 asm.
Please check https://github.com/synopse/mORMot2/blob/master/src/core/mormot.core.rtti.fpc.inc#L547
This code may be used as reference for including in the RTL.

Performance gain is noticeable, but not huge, as @Florian estimated.
If we had some intrinsics for atomic integer operations instead of calling sub routines, we may indeed reach the very same level of performance.

Sven Barth

2020-05-10 22:42

manager   ~0122706

@Arnaud Bouchez:
- the functions you mention are not what this bug report is about, instead it's about functions like strlower, strupper, strcopy or strscan; these have define checks in place to replace the generic functions with optimized implementations, the functions for the managed types do not such, thus are not intended to be replaced
- your code's license is not compatible with FPC's anyway; FPC requires LGPL with static linking exception which is more strict than pure LGPL and we can't simply use the code as MPL either

J. Gareth Moreton

2020-05-11 02:00

developer   ~0122708

I will add that there is one use for pure assembler implementations, at least good ones... it gives us a target to aim for when improving the compiler and optimizer. Generally though, if you try hard enough, hand-crafted assembly language will beat a compiler in many cases, but of course is not cross-platform.

There are some instances where pure assembler would still win over though, such as using platform-specific operations (or operations that are more efficient on some architectures compared to others), some low-level trickery (e.g 0033693 reads the direct bits of a floating-point number), or taking advantage of some fast instructions that require a very specific set-up (e.g. MOVNTDQA, which uses a non-temporal hint but requires the memory to be on a 16-byte boundary).

NoName

2020-05-23 21:41

reporter   ~0123023

Last edited: 2020-05-23 21:41

View 2 revisions

@Sven Barth
Arnaud Bouchez offered to change the licence as needed, see https://synopse.info/forum/viewtopic.php?pid=31041#p31041
"Anyone is welcome using my code and trying to push it to FPC trunk!
I could of course change the licence of this code to FPC if it helps."

Sven Barth

2020-05-24 10:33

manager   ~0123027

Then this leaves the more problematic point that we're talking about two different set of functions here. The functions that Arnaud Bouchez presented are not supposed to be replaced by assembly functions. The string functions in strings.inc however are (even if Florian isn't a fan of that).

Issue History

Date Modified Username Field Change
2020-05-08 02:26 NoName New Issue
2020-05-08 02:42 NoName Note Added: 0122665
2020-05-08 09:59 Sven Barth Note Added: 0122667
2020-05-08 21:41 Florian Note Added: 0122674
2020-05-10 15:07 Arnaud Bouchez Note Added: 0122697
2020-05-10 22:42 Sven Barth Note Added: 0122706
2020-05-11 02:00 J. Gareth Moreton Note Added: 0122708
2020-05-13 10:35 Michael Van Canneyt Relationship added related to 0037060
2020-05-13 10:35 Michael Van Canneyt Relationship deleted related to 0037060
2020-05-23 21:41 NoName Note Added: 0123023
2020-05-23 21:41 NoName Note Edited: 0123023 View Revisions
2020-05-24 10:33 Sven Barth Note Added: 0123027