View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0017383||FPC||RTL||public||2010-09-10 22:17||2011-08-26 17:07|
|Reporter||Seth Grover||Assigned To||Jonas Maebe|
|Platform||i386 and x86_64||OS||Linux|
|Fixed in Version||2.6.0|
|Summary||0017383: unhandled exception caught by .so's handler causes infinite loop|
|Description||I'm logging this bug after mentioning it on the mailing list and receiving Jonas' response (see http://lists.freepascal.org/lists/fpc-pascal/2010-September/026405.html ).|
If a shared object library compiled in FPC under Linux installs its signal handler with HookSignal(RTL_SIGDEFAULT) and then an unhandled access violation occurs (either in the calling program or in the .so outside of a try/except block) then the .so will put the program into an infinite loop of raising runtime error 217 and access violations.
Jonas says (http://lists.freepascal.org/lists/fpc-pascal/2010-September/026408.html ) this is caused by revision 14184 (http://svn.freepascal.org/cgi-bin/viewvc.cgi?view=rev&revision=14184 ) which is a response to bug 0014958, quote, "The problem is probably that the lib's exit code is called both when the library is unloaded (in which case you don't want the process to terminate) and when the "library" terminates (either via an unhandled exception, or by calling halt)."
|Steps To Reproduce||1. Create a .so under Linux with FPC. |
2. In the initialization code for the .so, call HookSignal(RTL_SIGDEFAULT);
3. In some routine in the .so, cause a segfault outside of the context of
a try/except block.
4. In some calling program (C or FPC, doesn't matter) call that routine
you will see an infinite printout of runtime error 217 and access violations.
|Tags||No tags attached.|
|Fixed in Revision||16418|
So here's what I've learned:
When a library is "dlclosed" (in 32-bit Linux for this example):
#0 INTERNALEXIT () at /home/tlacuache/fpc/rtl/inc/system.inc:812
0000001 0xf7e082b5 in LIB_EXIT () at /home/tlacuache/fpc/rtl/inc/system.inc:881
0000002 0xf7e12535 in _FPC_SHARED_LIB_HALTPROC () at ./i386/si_dll.inc:78
0000003 0xf7ff336e in ?? () from /lib/ld-linux.so.2
0000004 0xf7ff3e07 in ?? () from /lib/ld-linux.so.2
0000005 0xf7fb7ca4 in dlclose_doit (handle=0x809c020) at dlclose.c:37
0000006 0xf7fee2f6 in ?? () from /lib/ld-linux.so.2
0000007 0xf7fb809c in _dlerror_run (operate=<value optimized out>, args=<value optimized out>) at dlerror.c:164
0000008 0xf7fb7cda in __dlclose (handle=0xf7e12530) at dlclose.c:48
0000009 0x08048353 in main () at tw14958b.pp:20
When an unhandled access violation occurs in the main program when the .so owns the signal handlers:
#0 INTERNALEXIT () at /home/tlacuache/fpc/rtl/inc/system.inc:812
0000001 0xf7de2b25 in LIB_EXIT () at /home/tlacuache/fpc/rtl/inc/system.inc:881
0000002 0xf7e081e5 in _FPC_SHARED_LIB_HALTPROC () at ./i386/si_dll.inc:78
0000003 0xf7dea354 in SYSTEM_EXIT () at system.pp:98
0000004 0xf7de2b1a in DO_EXIT () at /home/tlacuache/fpc/rtl/inc/system.inc:875
0000005 0xf7de2b3a in HALT (ERRNUM=0) at /home/tlacuache/fpc/rtl/inc/system.inc:888
0000006 0xf7ddfade in DOUNHANDLEDEXCEPTION () at /home/tlacuache/fpc/rtl/inc/except.inc:173
0000007 0xf7ddfb64 in fpc_raiseexception (OBJ=0xf7dfed50, ANADDR=0x0, AFRAME=0x40000) at /home/tlacuache/fpc/rtl/inc/except.inc:199
0000008 0xf7dff555 in RUNERRORTOEXCEPT (ERRNO=-136319664, ADDRESS=0x0, FRAME=0x0) at /home/tlacuache/fpc/rtl/objpas/sysutils/sysutils.inc:344
0000009 0xf7de2bae in HANDLEERRORADDRFRAME (ERRNO=-136319664, ADDR=0xf7dfed50, FRAME=0xf7d6e020) at /home/tlacuache/fpc/rtl/inc/system.inc:901
0000010 0xffffd2c8 in ?? ()
0000011 0x08080893 in _FPC_PROC_START () at ./i386/si_prc.inc:105
I don't see a way you can decouple it in _FPC_SHARED_LIB_HALTPROC or lower, because it seems _FPC_SHARED_LIB_HALTPROC is called directly by ld-linux.so. Would it be feasible to do something higher though? Ie., in the "Halt" procedure or in "DoUnhandledException" could we detect that "IsLibrary = true" and either set some global variable which could be checked in InternalExit and cause a true stop-the-process halt, or if we don't want to do it that deep just do it in the "Halt" procedure?
Maybe I should step back and ask: is my assumption about what should happen correct? If a FPC-compiled .so gets an unhandled exception (either generated inside the .so or caused by, for example, an access violation in the calling program when the .so has installed its signal handlers) what is the correct behavior? The application should halt, correct? The current behavior (to go into an infinite loop of access violations and runtime errors) is definitely broken.
If I can get some feedback about which direction would be acceptable to take here I'll do my best to provide a patch. I'm very anxious to get this fixed as I see this as a critical problem.
I've reverted the merge of that revision to 2.4.x, so 2.4.2 final will not contain it.
The difference between halt() and an unload of a library is that in the former case do_exit from system.inc is called (by halt()), and in the latter case lib_exit from system.inc is called (by the dynamic linker, as instructed by the compiler).
Both call internal_exit, but do_exit afterwards calls system_exit while lib_exit does not. System_exit is in linux/system.pp and calls haltproc.
So it's not really clear anymore to me why the program from 0014958 terminated inside the library's unload routine without the patch. I'll have to debug it again.
Thanks for reverting the regression in 2.4.2.
From what I can see in GDB:
Breakpoint 7, _FPC_SHARED_LIB_HALTPROC () at ./i386/si_dll.inc:78
#0 _FPC_SHARED_LIB_HALTPROC () at ./i386/si_dll.inc:78
0000001 0xf7ff336e in ?? () from /lib/ld-linux.so.2
0000002 0xf7ff3e07 in ?? () from /lib/ld-linux.so.2
0000003 0xf7fb7ca4 in dlclose_doit (handle=0x809c020) at dlclose.c:37
0000004 0xf7fee2f6 in ?? () from /lib/ld-linux.so.2
0000005 0xf7fb809c in _dlerror_run (operate=<value optimized out>, args=<value optimized out>) at dlerror.c:164
0000006 0xf7fb7cda in __dlclose (handle=0xf7e08120) at dlclose.c:48
0000007 0x08048375 in main () at tw14958b.pp:27
It looks like ld-linux.so also calls haltproc as part of dlclose.
||Perhaps we need two "haltprocs", one called by System_exit internally which does actually halt the process, and one to be called by ld-linux.so in dlclose which does not?|
> Thanks for reverting the regression in 2.4.2.
I didn't even realise it had been merged (the bug report still only lists 2.5.1 as the version it's fixed in, and that's also what I thought you were talking about on the mailing list).
> It looks like ld-linux.so also calls haltproc as part of dlclose.
It's possible that some functions in the backtrace are missing due to a stack frame optimisation in the ld-linux.so.2 code (that often affects the next stack frame too).
> Perhaps we need two "haltprocs", one called by System_exit
> internally which does actually halt the process, and one to be
> called by ld-linux.so in dlclose which does not?
As mentioned in my previous comment, normally that is more or less the situation that we /should/ have right now: halt calls system_exit which calls haltproc, while ld-linux.so should call lib_exit which does not call system_exit and hence not haltproc. Somewhere something's going wrong however.
A clarification, please:
> I've reverted the merge of that revision to 2.4.x, so 2.4.2 final will
> not contain it.
I can see that you reverted it in fixes_2_4, but hasn't the fixes branch already diverged from the 2.4.2 release? 2.4.2 has already been tagged with release candidates...
> 2.4.2 has already been tagged with release candidates...
A tag is not a branch. There is no "branch from which 2.4.2 will be tagged" to merge the change to. And 2.4.2rc1 has been tagged and some binaries have already been built from it, so it cannot be merged there either since that would result in some platforms including that change and some not. It can only be included in 2.4.2rc2 (if there is such a release) or 2.4.2 final.
||Since you reverted the fix for issue 14184 should it be reopened? I'd do it but I don't think I've got the authority. :)|
||Sorry, I mean issue 14958.|
||I think the fact that this issue is still open is fine. I could resolve this one and open the other one, but that wouldn't change much.|
||I just verified this in the fixes_2_6 branch, verifying that my example from this issue and the example from issue 14958 both behave correctly. Thanks.|
|2010-09-10 22:17||Seth Grover||New Issue|
|2010-09-22 18:18||Seth Grover||Note Added: 0041277|
|2010-09-23 13:14||Jonas Maebe||FPCOldBugId||=> 0|
|2010-09-23 13:14||Jonas Maebe||Description Updated|
|2010-09-23 13:58||Jonas Maebe||Note Added: 0041283|
|2010-09-23 16:32||Seth Grover||Note Added: 0041289|
|2010-09-23 16:36||Seth Grover||Note Added: 0041290|
|2010-09-23 16:51||Jonas Maebe||Note Added: 0041291|
|2010-09-24 21:49||Seth Grover||Note Added: 0041303|
|2010-09-24 21:57||Jonas Maebe||Note Added: 0041304|
|2010-11-19 23:16||Seth Grover||Note Added: 0043282|
|2010-11-19 23:16||Seth Grover||Note Added: 0043283|
|2010-11-20 12:24||Jonas Maebe||Note Added: 0043297|
|2010-11-20 12:24||Jonas Maebe||Relationship added||related to 0014958|
|2010-11-24 16:33||Jonas Maebe||Fixed in Revision||=> 16418|
|2010-11-24 16:33||Jonas Maebe||Status||new => resolved|
|2010-11-24 16:33||Jonas Maebe||Fixed in Version||=> 2.5.1|
|2010-11-24 16:33||Jonas Maebe||Resolution||open => fixed|
|2010-11-24 16:33||Jonas Maebe||Assigned To||=> Jonas Maebe|
|2011-03-04 10:43||Jonas Maebe||Relationship added||related to 0018831|
|2011-08-26 17:07||Seth Grover||Status||resolved => closed|
|2011-08-26 17:07||Seth Grover||Note Added: 0051160|