View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0015582||FPC||Compiler||public||2010-01-25 11:10||2020-02-04 16:24|
|Reporter||Burkhard Carstens||Assigned To||Florian|
|Platform||i386||OS||linux||OS Version||suse 10.0|
|Product Version||2.4.0||Product Build|
|Target Version||Fixed in Version||3.3.1|
|Summary||0015582: "-OaLOCALMIN=16" / "codealign LOCALMIN=16" is not reliable|
|Description||running attached testprogram...|
1. ... compiled with -O1 shows wrong alignment of local variables of "procedure l_unit" (in ca_unit.pas) when called from ca_unit.initialization.
2. ... compiled with -O2 / -O3 shows allways wrong alignment of local variables in "procedure l" (codealign.pas) and "procedure l_unit" (in ca_unit.pas).
|Steps To Reproduce||compile and run attached test|
|Additional Information||Tested on:|
- 32bit suse-10.0, binutils 2.16.91
- 64bit suse 11.2 (using 32-bit compiler), binutils 2.19.51
Tested with options:
"-Aelf" and "-Aas" show identical behaviour.
"-Anasmelf" does not produce a runable binary (test dies on startup).
|Tags||No tags attached.|
|Fixed in Revision||43176|
codealign.zip (1,038 bytes)
||The current code for aligning local variables is indeed broken. It currently only works if the alignment of the stack pointer is greater or equal to the localalign setting. On Linux/i386, the stack pointer is only aligned to 4 bytes.|
Is there a chance to get this fixed (soon)? If not, is there a workaround (e.g. mess with the compiler sources to force 16-byte-alignment of the stack pointer)?
I'd need this for some SSE asm code which is much faster when using movapd instead of movupd ..
> Is there a chance to get this fixed (soon)?
I don't know. Not by me, at least.
> If not, is there a workaround (e.g. mess with the compiler sources to
> force 16-byte-alignment of the stack pointer)?
You can try to add system_i386_linux to the systems_need_16_byte_stack_alignment set in compiler/systems.pas (2.5.1-only). That will however only help if
a) the stack pointer is guaranteed to be 16-byte-aligned before PASCALMAIN is called from the assembler startup code, and I'm not sure that is the case.
b) your Pascal code is never called as a callback from external code (e.g. C code), as that code won't guarantee the alignment.
The general/proper fix requires adding explicit stack pointer alignment code in the entry code of every routine that has local variables whose alignment is > guaranteed stack pointer alignment for the given target.
Is this one good candidate to have something similar to gcc -mstackrealign ?
I'm asking since more and more 32bit apps (especially libraries) uses sse2/3 and they expect 16byte stack instead of 4 under linux.
Currently: Lazarus qtlcl uses libQt4Pas which is compiled with -mstackrealign and no problems, but libpixman used by gtk2 isn't compiled with -mstackrealign so lazarus apps crashes on 32bit recent distros (GtkFileChooser exactly).
The compiler is actually already able to generate Linux/i386 code that guarantees 16 byte alignment since r22279 (if you change the stackalign field of system_i386_linux_info in compiler/systems/i_linux.pas from 4 to 16).
The problem is that everyone and their mother insists on writing i386 assembler routines (including people working on the RTL), and many of those routines will break when we do that (often in non-obvious ways). I'm not interested in dealing with the fallout of that, nor with adding and supporting a separate Linux/i386 target that has stack alignment 16 (compiling only some code with stack alignment 16 is useless, because if a caller doesn't guarantee the stack alignment then it will be gone)
In r22280 I also added support for keeping track of the required stack alignment per routine, which could be used to implement something like -mstackrealign. But I'm still not interested in implementing/supporting/maintaining such functionality.
||hm..then I'm wondering how fpc will support C libs which uses sse2/sse3 and compiled with recent gcc in not so far future.|
> hm..then I'm wondering how fpc will support C libs which uses sse2/sse3
> and compiled with recent gcc in not so far future.
That is not the future, that's how it is today.
And I'm not saying that this support will not be added, just that I personally won't do it. Deciding whether to annoy all Linux/i386 assembler programmers or whether to add a separate platform to FPC is not something I want to deal with (and regardless of what we do, people will complain).
||I still think the callee has to align the stack to 16 bytes if it needs a 16 byte aligned stack on i386-linux, this is how it is defined in the SYSV-ABI. Everything else breaks a lot of code so libpixman must be recompiled with -mstackrealign.|
||So you expect distro packagers to pack libraries with -mstackrealign because of fpc ? ;)|
@Florian: the problem is that GCC developers decided to create/follow a new ABI for Linux/i386, and no longer consider the i386 SYSV ABI to be the Linux ABI. See e.g. the post starting the discussion at http://sourceforge.net/p/fbc/bugs/659/ and the linked discussions and GCC bug reports.
Since GCC is developed/maintained/used by the people who are involved most with Linux ABI issues, it's not even an illogical body of people to make such a decision. The main issue I have with it is that it was never made official or properly documented, and you have to figure it out from GCC changelogs and bug report discussions. As a result, we cannot just point to an official document explaining what was done, why it was done and what people should do with legacy code, and we will have to deal with the same fallout as GCC (but they asked for it, since they took the initiative to make the change in the first place).
@Zeljan: don't be facetious. This is not "because of fpc". There are several other compilers in the same boat (the linked bug report is for another compiler and is from December 2013). The reason for the issue is explained above: no one likes being forced by another independent project to change their compiler in a way that makes them having to deal with lots of angry users, especially if there is confusion about what the "right" way to do things is because there is still only one ABI document (official or not) that applies to Linux/i386.
The GCC people did not write a new ABI document, and the GCC source code or bug tracker does not count as an ABI document either; you can't expect all compiler developers to follow the GCC bug tracker to keep abreast of any possible changes to the Linux ABI, or to decide all for themselves when the discussions on the GCC bug tracker and mailing lists have died down enough to consider the issue settled. It's a whole can of worms you're opening by "giving in" to this kind of practice, and not having an official document to follow but just observed behaviour or discussions between angry users and annoyed developers are not the best technical specifications (who says that we'll get all the details right by just always aligning the stack to 16 bytes, and that there are no exceptions or special cases anywhere?)
It's one big mess.
@Zeljan: No, not because of fpc but for all reasons Jonas gave. Do not forget: this change means that potentially a lot of dyn. binaries created during the last twenty years might get broken. So I consider as a first line of defense against this mess that libraries must be compiled with -mstackrealign where needed.
IMO -mstackrealign is also from a technical point of view the right decision: only a few routines benefit from a 16 byte aligned stack, those can align it: basically number crunching code (but if I do number crunching, I use a 64 bit system which is probably faster anyways in this scenario, regardless of the needed memory). All others suffer from it because of L1 cache space being wasted for extra stack and extra code in case everything is 16 aligned. Not to mention the fact that AVX cares much less about memory alignment than SSE did, so when compiling AVX code realigning the stack is much less needed.
I agree that Jonas gave perfect arguments (especially about mess in their doumentation) from our (fpc users) point of view.I just want to say that fpc (actually lazarus gtk2) probably won't be useable with recent distros from now on.
Do not misunderstood me, I don't mind if fpc will have stack realign or not, since I'm using qtlcl for my job, and that one doesn't have any problem with it, just want to mention potential problem.
Would it help if we'd have a local $stackalign directive for Pascal functions that would additionally generate alignment stubs for external functions (if $stackalign of external function > native alignment defined in $fpc/compiler/systems/i_*)?
It's impossible to generate stubs in the general case, e.g. when you have a variadic function. Aligning the stack has to occur before calling the function. And there are multiple issues:
a) calling (only external?) cdecl functions
b) routines implemented in FPC that want to use e.g. SSE instructions that require 16 byte alignment to load local variables in inline assembler code
Since the much newer bug report one is closed as related:
Do we have any code? Can we massage it into working? (I can for my C++ 6 code, but I need an example that fails as this report suggests)
The compiler already has full support for it (you just have to change the stack alignment for Linux/i386 from 4 to 16 bytes). The problem is updating all pure assembler routines in the RTL that call other routines to ensure they guarantee a 16 byte alignment (or disabling them, as is done for Darwin/i386 which required 16 byte stack alignment from the start).
This change will also break many third party pure assembler routines, because the way 16 byte stack alignment is implemented in FPC, is that the caller removes all arguments from the stack (because the stack gets aligned to 16 bytes on entry, which includes reserving space for arguments that will be passed to subroutines), while with 4 byte alignment it's the callee that is responsible for this. That is the main hold-up for changing this, because especially for i386 there is a ton of legacy inline assembly code out there.
The safest would be to create a new compiler target altogether, probably. But creating a schism like that is not ideal either.
||How about realigning the stack right before calling a cdecl function?|
I like to say it's possible, but somewhat dangerous, and there may still be shortcomings where external code is concerned (e.g. an external routine calling a function in your library, and possibly not abiding by the same stack rules).
||Currently this is causing a segmentation fault in Hedgewars upon game exit on Fedora. What are my options to fix if nothing is going to change on the fpc side?|
||As mentioned in my comment in 0033311 (https://bugs.freepascal.org/view.php?id=33311#c106838), this is not related: this one is about i386, 0033311 is/was about x86-64. x86-64 requires always (and has required always, so for 15 years) a 16 byte aligned stack.|
|2010-01-25 11:10||Burkhard Carstens||New Issue|
|2010-01-25 11:10||Burkhard Carstens||File Added: codealign.zip|
|2010-01-25 12:14||Jonas Maebe||Note Added: 0033885|
|2010-01-25 15:13||Burkhard Carstens||Note Added: 0033890|
|2010-01-25 15:15||Burkhard Carstens||Note Edited: 0033890|
|2010-01-25 15:21||Jonas Maebe||Note Added: 0033891|
|2014-03-17 16:52||Zeljan Rikalo||Note Added: 0073789|
|2014-03-17 17:06||Jonas Maebe||Note Added: 0073790|
|2014-03-17 20:56||Zeljan Rikalo||Note Added: 0073794|
|2014-03-17 21:08||Jonas Maebe||Note Added: 0073796|
|2014-03-18 00:05||Florian||Note Added: 0073800|
|2014-03-18 08:13||Zeljan Rikalo||Note Added: 0073803|
|2014-03-18 08:56||Jonas Maebe||Note Added: 0073804|
|2014-03-18 09:50||Florian||Note Added: 0073808|
|2014-03-18 09:52||Florian||Note Edited: 0073808||View Revisions|
|2014-03-18 12:53||Zeljan Rikalo||Note Added: 0073815|
|2014-03-26 13:40||Sven Barth||Note Added: 0073987|
|2014-03-26 19:53||Jonas Maebe||Note Added: 0073993|
|2015-05-07 00:07||Jonas Maebe||Relationship added||has duplicate 0028037|
|2017-12-05 21:01||Jonas Maebe||Relationship added||has duplicate 0032710|
|2017-12-05 22:00||Thaddy de Koning||Note Added: 0104482|
|2017-12-05 22:12||Jonas Maebe||Note Added: 0104484|
|2017-12-17 11:44||Daniel Glöckner||Note Added: 0104773|
|2017-12-17 21:09||J. Gareth Moreton||Note Added: 0104786|
|2017-12-17 21:09||J. Gareth Moreton||Note Edited: 0104786||View Revisions|
|2017-12-17 21:11||J. Gareth Moreton||Note Edited: 0104786||View Revisions|
|2018-03-03 16:18||Jonas Maebe||Relationship added||has duplicate 0033311|
|2018-03-03 17:20||Florian||Relationship deleted||has duplicate 0033311|
|2018-03-03 17:23||Richard Shaw||Note Added: 0106840|
|2018-03-03 17:37||Florian||Note Added: 0106841|
|2018-10-29 09:36||Juha Manninen||Relationship added||related to 0033425|
|2019-10-21 21:45||Florian||Assigned To||=> Florian|
|2019-10-21 21:45||Florian||Status||new => resolved|
|2019-10-21 21:45||Florian||Resolution||open => fixed|
|2019-10-21 21:45||Florian||Fixed in Version||=> 3.3.1|
|2019-10-21 21:45||Florian||Fixed in Revision||=> 43176|
|2019-10-21 21:45||Florian||FPCTarget||=> -|