View Issue Details

IDProjectCategoryView StatusLast Update
0037786Lazarus-public2020-10-06 19:29
ReporterKlaus1 Assigned ToJuha Manninen  
PrioritynormalSeverityminorReproducibilityhave not tried
Status resolvedResolutionno change required 
Product Version2.0.10 
Summary0037786: proposal for next release ; I have develop assembler routines for SSE and VEX units for Intel or AMD processors
DescriptionThe use of SSE and VEX unit of the Intel or AMD processors is dificult, the FPC not support this.
(Something like intrinsics on compiler level, like in C there are it in FreePascal not).
To enable the developer, even without assembler knowledge, to take advantage of the new
processor hardware, the routines presented here have been developed to facilitate developed.
Regular is it the work for the compiler but....
The unit are in the zip files.

TagsNo tags attached.
Fixed in Revision
LazTarget-
Widgetset
Attached Files

Activities

Klaus1

2020-09-21 12:31

reporter  

lazsse.zip (103,627 bytes)
lazvex.zip (85,786 bytes)

Sven Barth

2020-09-21 13:26

manager   ~0125709

Support for SIMD intrinsics is work in progress in FPC trunk (see for example https://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/rtl/x86_64/cpummprocs.inc?revision=45778&view=markup ). It might be better to work on that implementation as that will provide greater flexibility for generating optimized code.

Juha Manninen

2020-09-21 13:53

developer   ~0125711

Ark (comes with KDE) is not able to unzip the ZIP packages. It shows the file list but extraction gives an error:
 "Compression method not supported"
How did you compress them?
This is the first time Ark fails to decompress a ZIP package.

Sven Barth

2020-09-22 09:31

manager   ~0125748

7-zip says that the files are compressed with “Deflate64”.

Juha Manninen

2020-09-22 10:55

developer   ~0125751

Ark also reports Deflate64 and even shows the uncompressed size (921,2 KiB).
I installed more optional backends (unrar, lzop, lrzip) but still the same error. This is the first time Ark fails me.

Anyway, I don't think Lazarus project should have assembly versions of functions provided by FPC project.
Functions provided by Lazarus project itself could have them if they are speed critical enough.

Klaus1

2020-09-22 18:00

reporter   ~0125758

Hello,
here the lpk files for ssse and vex. Sorry but I think with 7 zip have iI not problems.
For Sven:
Intrinsics is a helper for the developer when develop for difrent cpu's. Assembler is only for a cpu. Assembler is high optimized (normal) but cost many time for develop. When the routines have problems please contact here.
FPC is for many working systems, but Lazarus is for cpu's with Intel or AMD processor. Ok, when Mac change to ARM processors....
Regards Klaus
lazsse.lpk (3,012 bytes)   
<?xml version="1.0" encoding="UTF-8"?>
<CONFIG>
  <Package Version="4">
    <PathDelim Value="\"/>
    <Name Value="lazsse"/>
    <Type Value="RunTimeOnly"/>
    <Author Value="Klaus Stöhr"/>
    <CompilerOptions>
      <Version Value="11"/>
      <PathDelim Value="\"/>
      <SearchPaths>
        <UnitOutputDirectory Value="lib\$(TargetCPU)-$(TargetOS)\"/>
      </SearchPaths>
      <Parsing>
        <Style Value="1"/>
        <SyntaxOptions>
          <CStyleOperator Value="False"/>
          <CPPInline Value="False"/>
        </SyntaxOptions>
      </Parsing>
      <Linking>
        <Debugging>
          <GenerateDebugInfo Value="False"/>
        </Debugging>
      </Linking>
      <Other>
        <WriteFPCLogo Value="False"/>
        <CompilerMessages>
          <IgnoredMessages idx7122="True" idx7121="True"/>
        </CompilerMessages>
        <CustomOptions Value="-al"/>
      </Other>
    </CompilerOptions>
    <Description Value="Assembler procedures for use the SSE unit of the Intel or AMD processors. You MUST have SSE4.1 or higher
ONLY FOR 64 bit working systems You must have FPC >=3.2 for use
The Test_xmmint and Test_xmmfloat is for documentation. All assembler routines are documented with the correct input and otput parameters.      "/>
    <License Value="LGPL 3 for all assembler units
GPL 3 for the Test_xmmint and Test_xmmfloat units"/>
    <Version Major="1"/>
    <Files Count="11">
      <Item1>
        <Filename Value="Copying_GPLv3.txt"/>
        <Type Value="Text"/>
      </Item1>
      <Item2>
        <Filename Value="Copying_LGPLv3.txt"/>
        <Type Value="Text"/>
      </Item2>
      <Item3>
        <Filename Value="Liesmich.txt"/>
        <Type Value="Text"/>
      </Item3>
      <Item4>
        <Filename Value="Readme.txt"/>
        <Type Value="Text"/>
      </Item4>
      <Item5>
        <Filename Value="simd.inc"/>
        <Type Value="Include"/>
      </Item5>
      <Item6>
        <Filename Value="simd_fpu.inc"/>
        <Type Value="Include"/>
      </Item6>
      <Item7>
        <Filename Value="simdconv.pas"/>
        <UnitName Value="simdconv"/>
      </Item7>
      <Item8>
        <Filename Value="Test_xmmfloat.pas"/>
        <UnitName Value="Test_xmmfloat"/>
      </Item8>
      <Item9>
        <Filename Value="Test_xmmint.pas"/>
        <UnitName Value="Test_xmmint"/>
      </Item9>
      <Item10>
        <Filename Value="xmmfloat.pas"/>
        <UnitName Value="xmmfloat"/>
      </Item10>
      <Item11>
        <Filename Value="xmmint.pas"/>
        <UnitName Value="xmmint"/>
      </Item11>
    </Files>
    <RequiredPkgs Count="1">
      <Item1>
        <PackageName Value="FCL"/>
      </Item1>
    </RequiredPkgs>
    <UsageOptions>
      <UnitPath Value="$(PkgOutDir)"/>
    </UsageOptions>
    <PublishOptions>
      <Version Value="2"/>
      <UseFileFilters Value="True"/>
    </PublishOptions>
  </Package>
</CONFIG>
lazsse.lpk (3,012 bytes)   
lazvex.lpk (2,735 bytes)   
<?xml version="1.0" encoding="UTF-8"?>
<CONFIG>
  <Package Version="4">
    <PathDelim Value="\"/>
    <Name Value="lazvex"/>
    <Type Value="RunTimeOnly"/>
    <Author Value="Klaus Stöhr"/>
    <CompilerOptions>
      <Version Value="11"/>
      <PathDelim Value="\"/>
      <SearchPaths>
        <UnitOutputDirectory Value="lib\$(TargetCPU)-$(TargetOS)\"/>
      </SearchPaths>
      <Parsing>
        <Style Value="1"/>
        <SyntaxOptions>
          <CStyleOperator Value="False"/>
          <CPPInline Value="False"/>
        </SyntaxOptions>
      </Parsing>
      <Linking>
        <Debugging>
          <GenerateDebugInfo Value="False"/>
        </Debugging>
      </Linking>
      <Other>
        <WriteFPCLogo Value="False"/>
        <CompilerMessages>
          <IgnoredMessages idx7122="True" idx7121="True"/>
        </CompilerMessages>
        <CustomOptions Value="-al"/>
      </Other>
    </CompilerOptions>
    <Description Value="Assembler procedures for use the VEX unit of the Intel or AMD processors. ONLY for 64 bit working systems. You must have the FPC >= 3.2 for compile. The Test_ymm unit is for documentation and MUST delivered. In this unit is for all procedures the descriptiion for the input and output parameters."/>
    <License Value="LGPL 3 for all assembler routines
GPL 3 for the Test_ymm unit"/>
    <Version Major="1"/>
    <Files Count="9">
      <Item1>
        <Filename Value="simd.inc"/>
        <Type Value="Include"/>
      </Item1>
      <Item2>
        <Filename Value="simd_fpu.inc"/>
        <Type Value="Include"/>
      </Item2>
      <Item3>
        <Filename Value="simdconv.pas"/>
        <UnitName Value="simdconv"/>
      </Item3>
      <Item4>
        <Filename Value="Test_ymm.pas"/>
        <UnitName Value="Test_ymm"/>
      </Item4>
      <Item5>
        <Filename Value="ymmfloat.pas"/>
        <UnitName Value="ymmfloat"/>
      </Item5>
      <Item6>
        <Filename Value="Copying_GPLv3.txt"/>
        <Type Value="Text"/>
      </Item6>
      <Item7>
        <Filename Value="Copying_LGPLv3.txt"/>
        <Type Value="Text"/>
      </Item7>
      <Item8>
        <Filename Value="Liesmich.txt"/>
        <Type Value="Text"/>
      </Item8>
      <Item9>
        <Filename Value="Readme.txt"/>
        <Type Value="Text"/>
      </Item9>
    </Files>
    <RequiredPkgs Count="1">
      <Item1>
        <PackageName Value="FCL"/>
      </Item1>
    </RequiredPkgs>
    <UsageOptions>
      <UnitPath Value="$(PkgOutDir)"/>
    </UsageOptions>
    <PublishOptions>
      <Version Value="2"/>
      <UseFileFilters Value="True"/>
    </PublishOptions>
  </Package>
</CONFIG>
lazvex.lpk (2,735 bytes)   

Sven Barth

2020-09-23 10:17

manager   ~0125770

Last edited: 2020-09-23 10:18

View 2 revisions

No, intrinsics are not necessarily for different CPUs. E.g. the typical C compilers (GCC, Clang, MSVC) all have intrinsics for the Intel SSE, etc. and these are not available for non-x86 CPUs. FPC is the same, while it has many cross platform intrinsics there are also CPU specific intrinsics which in the future will also include the SSE intrinsics (they are in fact already available in principle in trunk). Also intrinsics allow much better optimizations of the compiler as no calls will be inserted and registers can be freely used by the register allocator.

Thus as long as you don't have a specific algorithm that you implement purely using assembler then intrinsics are superior.

Also Lazarus is not only for Intel CPUs, it also works on e.g. Raspberry Pi or PowerPC systems.

Chris Rorden

2020-09-23 23:33

reporter   ~0125800

Klaus
 - I also had problems uncompressing your archive on MacOS, in the end Keka worked. Maybe set this up as a Github archive.
 - I would suggest. plain Pascal units, instead of Lazarus package, so your work could be used by BOTH Lazarus and FPC.
 - The readme describes issues with dynamic arrays, and says "(see discussion in the Lazarus Bugtracker...)". It might be worth including a link to the discussion. I think this refers to https://bugs.freepascal.org/view.php?id=34031
 - In general, this looks terrific.
 - As Sven notes, most intrinsics are specific to specific architectures. I note that some have looked for intrinsics that replicate SSE functions on Neon, which improves portability. I doubt there is 100% coverage, but it seems intriguing. I am happy to provide you with access to ARM hardware if it can help you.
     https://github.com/DLTcollab/sse2neon
 - In general, this looks very nice. You have gone beyond just porting exponents, and provide a lot of nice functionality and test units. Thanks!

Klaus1

2020-09-28 16:14

reporter   ~0125927

Last edited: 2020-09-28 16:16

View 2 revisions

Here the programms in tar format. I think all unix, linux usw. can read this format. Without lpk file Think the procedures are in ASSEMBLER and working ONLY on a
64 bit systems with Intel or AMD processors. NOT for other working systems ARM usw.
Regards Klaus
lazsse.tar (949,248 bytes)
lazvex.tar (772,096 bytes)

Juha Manninen

2020-10-03 11:49

developer   ~0126056

Lazarus project is for cross-platform development. It is not a right place for libraries that support only one CPU architecture.
Such library should be hosted somewhere else. GitHub, Sourceforge maybe. It can be part of Lazarus CCR if you wish.
Some speed critical functions in LCL or in the IDE could be optimized with assembly for some architectures, but a pure Pascal version must be provided then.

Klaus1, this is not to belittle your effort in any way. I believe the code is good and useful. Others have clearly studied it more than I did.
Please decide where you want to host the code.

Chris Rorden, you suggest plain Pascal units instead of Lazarus package. Those options are not exclusive. You get plain Pascal units in any case. The Lazarus package is an extra bonus. It even allows this library to be distributed through the Lazarus Online Package Manager (maybe?).

Klaus1

2020-10-04 12:56

reporter   ~0126076

Helo Juha,
each Lazarus version is for divrent architectures compiled. (Windos, Linux . mac and others ) important ist the anderliing CPU
When the cpu is equal the plain assembly work in this architecture with equal cpu, and the INTEL or AMD CPU is in many working systems.
Lazarus use the FPC compiler, the package is optimized for this compiler. May suggest is for architectures with Intel or amd cpu input this as (optional) packages for use the xmm or ymm registers. You can see the download count for Lazarus and you see windows, linux and mac are the main
working systems for use lazarus. I not understand obstinate refusal. Crosscompiling is no argument, when this package is optional you can always
compiling. It is not obligation use this.
Regards Klaus

Juha Manninen

2020-10-04 15:39

developer   ~0126082

> I not understand obstinate refusal.

Why don't you understand it?
Lazarus project distribution only includes libraries/components which are Delphi compatible or are needed by the IDE itself.
The idea is to have the distributions as slim as possible. What more, in a cross-platform system all included code must be cross-platform, supporting platforms today and in the future.
To help the usage of external packages we now have an Online Package Manager (OPM). A user can install packages by a simple selection and button click. Super-easy!
Lazarus distribution used to include an "Industrial" package with various gauge controls. It is a cross-platform package but did not match the other criteria above. After OPM was implemented, it was moved to Lazarus CCR repo in SourceForge. Now it can be installed through OPM but does not bloat the Lazarus distribution. Cool!

You can choose your favorite source host (GitHub, SourceForge, whatever). If you choose Lazarus CCR, you get commit access there.
I don't understand why you see this as a negative thing.

Klaus1

2020-10-06 16:33

reporter   ~0126119

Helo Juha,
I have read yo article, I think I distribute the routines in (GitHub or SourceForge) and then test your OPM. When ok, I have not problems. I will only help
the developer to use the sse and vex units in the modern processor. The normal compiler has not the ability to see what coding is posibility working in parallel unit, but the developer. (Wen interes see the assembler listing how the compiler translated.) Assembler is dificil and cost many time for testing.
Here the developer is in pascal. The routines are how in the rtl... andsimple in use.
Regards Klaus

Juha Manninen

2020-10-06 19:29

developer   ~0126123

Ok, I resolve this one.
For more information about Online Package Manager and how to deliver your package there please read
 https://wiki.freepascal.org/Online_Package_Manager

If you want to host your code in Lazarus CCR (a project in SourceForge) please write about it in mailing list. Somebody must give write access there for you.
Otherwise just host the code anywhere you want.

Issue History

Date Modified Username Field Change
2020-09-21 12:31 Klaus1 New Issue
2020-09-21 12:31 Klaus1 File Added: lazsse.zip
2020-09-21 12:31 Klaus1 File Added: lazvex.zip
2020-09-21 13:26 Sven Barth Note Added: 0125709
2020-09-21 13:53 Juha Manninen Note Added: 0125711
2020-09-22 09:31 Sven Barth Note Added: 0125748
2020-09-22 10:55 Juha Manninen Note Added: 0125751
2020-09-22 18:00 Klaus1 Note Added: 0125758
2020-09-22 18:00 Klaus1 File Added: lazsse.lpk
2020-09-22 18:00 Klaus1 File Added: lazvex.lpk
2020-09-23 10:17 Sven Barth Note Added: 0125770
2020-09-23 10:18 Sven Barth Note Edited: 0125770 View Revisions
2020-09-23 23:33 Chris Rorden Note Added: 0125800
2020-09-28 16:14 Klaus1 Note Added: 0125927
2020-09-28 16:14 Klaus1 File Added: lazsse.tar
2020-09-28 16:14 Klaus1 File Added: lazvex.tar
2020-09-28 16:16 Klaus1 Note Edited: 0125927 View Revisions
2020-10-03 11:49 Juha Manninen Note Added: 0126056
2020-10-04 12:56 Klaus1 Note Added: 0126076
2020-10-04 15:39 Juha Manninen Note Added: 0126082
2020-10-06 16:33 Klaus1 Note Added: 0126119
2020-10-06 19:25 Juha Manninen Assigned To => Juha Manninen
2020-10-06 19:25 Juha Manninen Status new => assigned
2020-10-06 19:29 Juha Manninen Status assigned => resolved
2020-10-06 19:29 Juha Manninen Resolution open => no change required
2020-10-06 19:29 Juha Manninen LazTarget => -
2020-10-06 19:29 Juha Manninen Note Added: 0126123