View Issue Details

IDProjectCategoryView StatusLast Update
0012942FPCCompilerpublic2009-03-21 13:00
ReporterUdo Giacomozzi Assigned ToJonas Maebe  
PrioritynormalSeveritycrashReproducibilityalways
Status closedResolutionfixed 
Product Version2.2.2 
Target Version2.4.0Fixed in Version2.4.0 
Summary0012942: Segmentation fault on stack overflow even with -Ct !
DescriptionWhen a program causes a stack overflow it raises an "Unknown Run-Time error : 202" exception which can be safely catched, fine.

When the same error happens a second time, the program crashes with Segmentation fault.

See attached test case.

Note stack checking (-Ct) is on!

I guess either the stack gets corrupt on stack overflow or exception handling is buggy.

I'd be glad if there is some workaround for this problem as it occasionally crashes (full program exit!) a multithreaded server application.
Additional InformationI compile the program with:
fpc-2.2.2/bin/ppc386 -Ct stacktest.pp

I get this output on various machines:

---------------------------------------------
Starting single-thread test in about 1 second.
Testing stack, please wait...
Exception after 241664 bytes: Unknown Run-Time error : 202
Testing stack, please wait...
Segmentation fault
---------------------------------------------

gdb is unable to produce a meaningful backtrace:

---------------------------------------------
(gdb) bt
#0 0x08048221 in ?? ()
Cannot access memory at address 0xbf7fffd0
---------------------------------------------

Tested with:
Debian 3.1, Kernel 2.6.8-3-686, AMD Sempron
Debian 4.0, Kernel 2.6.18-6-686-bigmem, AMD Athlon 64 X2
Red Hat Ent. 3 (Taroon), Kernel 2.4.21-4.ELsmp, Intel Xeon
TagsNo tags attached.
Fixed in Revision12528
FPCOldBugId
FPCTarget
Attached Files

Relationships

related to 0006068 closed Stack check error 
parent of 0013105 closedJonas Maebe Max concurrent thread count limited to 383 on Linux 
child of 0012948 closedMichael Van Canneyt Documentation on stack checking should mention that it is not usable in production environments 

Activities

2009-01-08 11:00

 

stacktest.pp (903 bytes)

Jonas Maebe

2009-01-08 11:30

manager   ~0024275

While the program shouldn't crash on the second exception, you should not rely on -Ct to keep your stack usage to a particular limit either. -Ct is only meant to be used to introduce an artificial stack limit in the program to help finding out where your program uses inordinate amounts of stack space. It's not meant (nor usable) as a way to impose a particular stack limit on the program and to ensure it never goes over that (if only because the stack checking code itself also uses stack space).

See the related bug report for more information.

It is also very little used in practice and not thoroughly tested on most platforms as far as I know, so I would recommend against using it in production environments.

Udo Giacomozzi

2009-01-08 11:47

reporter   ~0024277

Okay, but compiling without the -Ct switch aborts the program after consuming about slightly more than 8 MB of stack (according to the docs, the stack is only limited by system memory) with a horrible Segmentation fault.

The big problem is that these errors crash the whole process in a multithreading program (not just the thread that caused it). In my case I have a server application that handles external requests and should not crash completely when a handling thread causes any kind of error.

That application has no stack limit set via -Cs but occasionally causes Stack overflows inside FieldByName(xxx).AsString and my DB fields are never bigger than 32kb. The backtrace just shows 8 function calls (ie. no recursion problem).

What can I do to make such an application run safely?

Also, since the stack is said to be handled by the O/S: shouldn't Freepascal be able to handle such an error via normal exception handling?

Jonas Maebe

2009-01-08 14:06

manager   ~0024281

Last edited: 2009-01-08 14:06

> That application has no stack limit set via -Cs

If you don't set a stack limit using -Cs, the default for the OS is used (for Linux/i386, that default is currently 256kb, which is rather low). However, in case of Linux, this number is only used by the software stack checking inserted when you use -Ct. It is not used to tell the OS to limit the stack size of your program in any way, because it seems that there is simply no possibility to do so by the compiler under Linux (at least I can't find any information about it in the linker's man page).

It's possible that the default for your linux system is simply 8MB. You can check by executing "ulimit -a". If you need more stack space, change it using ulimit before starting your program.

> Also, since the stack is said to be handled by the O/S: shouldn't Freepascal
> be able to handle such an error via normal exception handling?

It is implemented that way for Windows, but not for any other platform. There's also the additional problems that
a) exception handling itself also needs stack space. While a signal handler can be programmed to execute on an alternate stack, many operations are unsafe to perform at signal handling time and therefore on many targets we return as soon as possible from the signal handler back to the regular program (and its stack) to handle everything
b) an out-of-stack exception usually happens somewhere during setting up a stack frame. This means that e.g. not all local reference counted variables may be allocated or initialised yet. As a result, when an exception handler subsequently tries to finalise such variables while unwinding the stack, it can cause memory corruption. There are probably more things that can go wrong due to partially set up stack stack frames.

In short: I think it is best to treat an "out of stack" error always as a fatal error at this time (because there is always a risk of memory corruption).

It probably is possible to correctly handle all cases of "out of stack" space errors in a clean way by, but nobody has ever spent time on implementing this properly in FPC (and definitely not for all platforms), and I don't see that happening any time soon either (because it would be a lot of work to rewrite all stack setup code for all targets and OSes to support this, which is one of the complexer bits of code in the compiler and often also OS-dependent).

Jonas Maebe

2009-01-08 14:58

manager   ~0024282

Also note that the ulimit value only applies to the main program. Every thread has its own stack, and the default pthread stack size under Linux is indeed 8MB, if I remember correctly.

While tthread.create allows you to specify a different stack size (http://www.freepascal.org/docs-html/rtl/classes/tthread.create.html), it seems that it is currently not passed on to pthread_create in any way :( That definitely *is* something that needs fixing in the short term.

That said, 8MB stack really is a huge amount. Independently of the above bugs in FPC, you may want to research why it is using so much stack space. If it's caused by unbounded recursion of functions with a huge amount of local variables, you may want to consider rewriting that code anyway (that's a very bad property for server-style programs to have, even if the error can be contained to single thread).

Udo Giacomozzi

2009-01-08 17:17

reporter   ~0024289

>It's possible that the default for your linux system is simply 8MB.
>You can check by executing "ulimit -a".

Indeed, it's 8 MB.

>It is implemented that way for Windows, but not for any other
>platform. There's also the additional problems

Yes, I searched the net on this topic and this seems to be a common problem under Linux and apparently only Windows is able to handle it nicely with some paging tricks.
Just for reference: Linux (Posix?) normally adds at least one additional page at the overflow side of the stack with "no access" rights. This way an segmentation fault is signaled as soon as that page is accessed (which happens on stack overflow).
The SIGSEGV handler can't use the normal stack in that case so it must get an extra stack, which is possible via sigaction(). However, LinuxThreads seems to use the stack pointer to find the thread that caused the signal, and so using an extra stack is also a problem. So, it's definitely hard to catch stack overflows.

>In short: I think it is best to treat an "out of stack" error always
>as a fatal error at this time

Agree (since there is no solution anyway and since FPC is unable to detect it internally).

>Every thread has its own stack, and the default pthread stack size
>under Linux is indeed 8MB, if I remember correctly.

They say 1MB on some sites/forums I found but it's probably system-dependent. There also seems to exist some sort of growing stack in LinuxThreads..

>it seems that it is currently not passed on to pthread_create in any way

Correct, the relevant BeginThread() function just calls another BeginThread() version discarding the StackSize argument. However, even 1 MB is really enough for me because I do not have any big local variables and all strings are AnsiStrings.

>you may want to research why it is using so much stack space

I get the feeling that the -Ct switch is causing false alarms. My endless recursion routine often (not always) causes a 202 runtime error after consuming just 14 kb or so. Without the -Ct switch I reliably get a SIGSEGV after consuming slightly less than 8 MB of stack (at the same position in program).

In my real world application I get 202 exceptions in TField.GetAsString which uses two local variables of 8kb each, so this matches with the virtual 0000017:0000016 kb limit caused by the -Ct switch.

Now I recompiled my server application without the -Ct switch and let's see how well it works. However, it's hard to debug a complex multithreaded application when it just says "Segmentation fault" at some point :(

Anyway, the documentation and ppc386 help page should make clear that the -Ct switch should be used with care and that it won't guarantee a safe stack...

Jonas Maebe

2009-01-08 18:02

manager   ~0024293

Last edited: 2009-01-08 18:24

> They say 1MB on some sites/forums I found but it's probably system-dependent.

Yes, probably..

EDIT: nptl uses the maximal stack size for the main program (as shown by ulimit -a) also for threads if no stack size is specified.

> There also seems to exist some sort of growing stack in LinuxThreads..

Note that LinuxThreads is no longer used on modern systems. LinuxThreads has been superseded by a proper pthreads implementation (nptl) since late kernel 2.4.x distro's, and on all kernel 2.6.x systems (technically it's an add-on to glibc and compiled as part of it, but I guess it needs some features from newer kernel versions).

> >it seems that it is currently not passed on to pthread_create in any way
>
> Correct, the relevant BeginThread() function just calls another BeginThread()
> version discarding the StackSize argument.

That's not true, as far as I can tell. The problem is however that CBeginThread in rtl/unix/cthreads.pp does not call pthread_attr_setstacksize() before calling pthread_create.

> I get the feeling that the -Ct switch is causing false alarms. My endless recursion
> routine often (not always) causes a 202 runtime error after consuming just 14 kb
> or so. Without the -Ct switch I reliably get a SIGSEGV after consuming slightly less
> than 8 MB of stack (at the same position in program).

Since -Ct is purely based on software checking on Linux, it does perform checks even though no stack size is passed on to pthread_create. If you don't explicitly specify a stack size to the tthread.create constructor, the default is used. This default is defined in rtl/inc/threadh.inc, and is 32kb (I have no idea where this value comes from, but it's obviously way too low). The accompanying comment says "{ including 16384 margin for stackchecking }", so I guess that's where your 14kb limit comes from (32kb-16kb - some initial overhead).

> Anyway, the documentation and ppc386 help page should make clear that
> the -Ct switch should be used with care and that it won't guarantee a safe stack...

It should indeed be documented in the manual, and the help page should refer there.

Jonas Maebe

2009-01-08 19:46

manager   ~0024300

Threads created on unix platforms now honour the specified stack size.

The default stack size for threads is now 4 MiB instead of 32 KiB.

Clarified in the help pages that -Cs sets the "stack *checking* size", and added "(only for testing, see manual)" to the -Ct explanation.

The default stack checking size for the main program under Linux/i386 (and some other Linux targets that had a very low stack limit) is now 8 MiB, instead of 256 KiB.

Fixed the generic stack checking code, which until now counted the size of the to-be-checked stack frame twice.

I added a child bug report for the change to the documentation.

Issue History

Date Modified Username Field Change
2009-01-08 11:00 Udo Giacomozzi New Issue
2009-01-08 11:00 Udo Giacomozzi File Added: stacktest.pp
2009-01-08 11:25 Jonas Maebe Relationship added related to 0006068
2009-01-08 11:30 Jonas Maebe Note Added: 0024275
2009-01-08 11:47 Udo Giacomozzi Note Added: 0024277
2009-01-08 14:06 Jonas Maebe Note Added: 0024281
2009-01-08 14:06 Jonas Maebe Note Edited: 0024281
2009-01-08 14:58 Jonas Maebe Note Added: 0024282
2009-01-08 17:17 Udo Giacomozzi Note Added: 0024289
2009-01-08 18:02 Jonas Maebe Note Added: 0024293
2009-01-08 18:06 Jonas Maebe Issue cloned: 0012948
2009-01-08 18:06 Jonas Maebe Relationship added parent of 0012948
2009-01-08 18:24 Jonas Maebe Note Edited: 0024293
2009-01-08 19:46 Jonas Maebe Fixed in Revision => 12528
2009-01-08 19:46 Jonas Maebe Status new => resolved
2009-01-08 19:46 Jonas Maebe Fixed in Version => 2.3.1
2009-01-08 19:46 Jonas Maebe Resolution open => fixed
2009-01-08 19:46 Jonas Maebe Assigned To => Jonas Maebe
2009-01-08 19:46 Jonas Maebe Note Added: 0024300
2009-01-08 19:46 Jonas Maebe Target Version => 2.4.0
2009-02-03 12:56 Jonas Maebe Relationship added parent of 0013105
2009-03-21 12:59 Jonas Maebe Relationship deleted parent of 0012948
2009-03-21 12:59 Jonas Maebe Relationship added child of 0012948
2009-03-21 13:00 Jonas Maebe Status resolved => closed