View Issue Details

IDProjectCategoryView StatusLast Update
0017383FPCRTLpublic2011-08-26 17:07
ReporterSeth Grover Assigned ToJonas Maebe  
PrioritynormalSeveritymajorReproducibilityalways
Status closedResolutionfixed 
Platformi386 and x86_64OSLinux 
Product Version2.4.3 
Fixed in Version2.6.0 
Summary0017383: unhandled exception caught by .so's handler causes infinite loop
DescriptionI'm logging this bug after mentioning it on the mailing list and receiving Jonas' response (see http://lists.freepascal.org/lists/fpc-pascal/2010-September/026405.html ).

If a shared object library compiled in FPC under Linux installs its signal handler with HookSignal(RTL_SIGDEFAULT) and then an unhandled access violation occurs (either in the calling program or in the .so outside of a try/except block) then the .so will put the program into an infinite loop of raising runtime error 217 and access violations.

Jonas says (http://lists.freepascal.org/lists/fpc-pascal/2010-September/026408.html ) this is caused by revision 14184 (http://svn.freepascal.org/cgi-bin/viewvc.cgi?view=rev&revision=14184 ) which is a response to bug 0014958, quote, "The problem is probably that the lib's exit code is called both when the library is unloaded (in which case you don't want the process to terminate) and when the "library" terminates (either via an unhandled exception, or by calling halt)."
Steps To Reproduce1. Create a .so under Linux with FPC.
2. In the initialization code for the .so, call HookSignal(RTL_SIGDEFAULT);
3. In some routine in the .so, cause a segfault outside of the context of
   a try/except block.
4. In some calling program (C or FPC, doesn't matter) call that routine

results:

you will see an infinite printout of runtime error 217 and access violations.
TagsNo tags attached.
Fixed in Revision16418
FPCOldBugId0
FPCTarget
Attached Files

Relationships

related to 0014958 closedJonas Maebe Cannot unload shared library 
related to 0018831 resolvedJonas Maebe unloadlibrary() terminates program 

Activities

Seth Grover

2010-09-22 18:18

reporter   ~0041277

So here's what I've learned:

When a library is "dlclosed" (in 32-bit Linux for this example):

(gdb) bt
#0 INTERNALEXIT () at /home/tlacuache/fpc/rtl/inc/system.inc:812
0000001 0xf7e082b5 in LIB_EXIT () at /home/tlacuache/fpc/rtl/inc/system.inc:881
0000002 0xf7e12535 in _FPC_SHARED_LIB_HALTPROC () at ./i386/si_dll.inc:78
0000003 0xf7ff336e in ?? () from /lib/ld-linux.so.2
0000004 0xf7ff3e07 in ?? () from /lib/ld-linux.so.2
0000005 0xf7fb7ca4 in dlclose_doit (handle=0x809c020) at dlclose.c:37
0000006 0xf7fee2f6 in ?? () from /lib/ld-linux.so.2
0000007 0xf7fb809c in _dlerror_run (operate=<value optimized out>, args=<value optimized out>) at dlerror.c:164
0000008 0xf7fb7cda in __dlclose (handle=0xf7e12530) at dlclose.c:48
0000009 0x08048353 in main () at tw14958b.pp:20

When an unhandled access violation occurs in the main program when the .so owns the signal handlers:

(gdb) bt
#0 INTERNALEXIT () at /home/tlacuache/fpc/rtl/inc/system.inc:812
0000001 0xf7de2b25 in LIB_EXIT () at /home/tlacuache/fpc/rtl/inc/system.inc:881
0000002 0xf7e081e5 in _FPC_SHARED_LIB_HALTPROC () at ./i386/si_dll.inc:78
0000003 0xf7dea354 in SYSTEM_EXIT () at system.pp:98
0000004 0xf7de2b1a in DO_EXIT () at /home/tlacuache/fpc/rtl/inc/system.inc:875
0000005 0xf7de2b3a in HALT (ERRNUM=0) at /home/tlacuache/fpc/rtl/inc/system.inc:888
0000006 0xf7ddfade in DOUNHANDLEDEXCEPTION () at /home/tlacuache/fpc/rtl/inc/except.inc:173
0000007 0xf7ddfb64 in fpc_raiseexception (OBJ=0xf7dfed50, ANADDR=0x0, AFRAME=0x40000) at /home/tlacuache/fpc/rtl/inc/except.inc:199
0000008 0xf7dff555 in RUNERRORTOEXCEPT (ERRNO=-136319664, ADDRESS=0x0, FRAME=0x0) at /home/tlacuache/fpc/rtl/objpas/sysutils/sysutils.inc:344
0000009 0xf7de2bae in HANDLEERRORADDRFRAME (ERRNO=-136319664, ADDR=0xf7dfed50, FRAME=0xf7d6e020) at /home/tlacuache/fpc/rtl/inc/system.inc:901
0000010 0xffffd2c8 in ?? ()
0000011 0x08080893 in _FPC_PROC_START () at ./i386/si_prc.inc:105


I don't see a way you can decouple it in _FPC_SHARED_LIB_HALTPROC or lower, because it seems _FPC_SHARED_LIB_HALTPROC is called directly by ld-linux.so. Would it be feasible to do something higher though? Ie., in the "Halt" procedure or in "DoUnhandledException" could we detect that "IsLibrary = true" and either set some global variable which could be checked in InternalExit and cause a true stop-the-process halt, or if we don't want to do it that deep just do it in the "Halt" procedure?

Maybe I should step back and ask: is my assumption about what should happen correct? If a FPC-compiled .so gets an unhandled exception (either generated inside the .so or caused by, for example, an access violation in the calling program when the .so has installed its signal handlers) what is the correct behavior? The application should halt, correct? The current behavior (to go into an infinite loop of access violations and runtime errors) is definitely broken.

If I can get some feedback about which direction would be acceptable to take here I'll do my best to provide a patch. I'm very anxious to get this fixed as I see this as a critical problem.

Jonas Maebe

2010-09-23 13:58

manager   ~0041283

I've reverted the merge of that revision to 2.4.x, so 2.4.2 final will not contain it.

The difference between halt() and an unload of a library is that in the former case do_exit from system.inc is called (by halt()), and in the latter case lib_exit from system.inc is called (by the dynamic linker, as instructed by the compiler).

Both call internal_exit, but do_exit afterwards calls system_exit while lib_exit does not. System_exit is in linux/system.pp and calls haltproc.

So it's not really clear anymore to me why the program from 0014958 terminated inside the library's unload routine without the patch. I'll have to debug it again.

Seth Grover

2010-09-23 16:32

reporter   ~0041289

Thanks for reverting the regression in 2.4.2.

From what I can see in GDB:

Breakpoint 7, _FPC_SHARED_LIB_HALTPROC () at ./i386/si_dll.inc:78
(gdb) bt
#0 _FPC_SHARED_LIB_HALTPROC () at ./i386/si_dll.inc:78
0000001 0xf7ff336e in ?? () from /lib/ld-linux.so.2
0000002 0xf7ff3e07 in ?? () from /lib/ld-linux.so.2
0000003 0xf7fb7ca4 in dlclose_doit (handle=0x809c020) at dlclose.c:37
0000004 0xf7fee2f6 in ?? () from /lib/ld-linux.so.2
0000005 0xf7fb809c in _dlerror_run (operate=<value optimized out>, args=<value optimized out>) at dlerror.c:164
0000006 0xf7fb7cda in __dlclose (handle=0xf7e08120) at dlclose.c:48
0000007 0x08048375 in main () at tw14958b.pp:27

It looks like ld-linux.so also calls haltproc as part of dlclose.

Seth Grover

2010-09-23 16:36

reporter   ~0041290

Perhaps we need two "haltprocs", one called by System_exit internally which does actually halt the process, and one to be called by ld-linux.so in dlclose which does not?

Jonas Maebe

2010-09-23 16:51

manager   ~0041291

> Thanks for reverting the regression in 2.4.2.

I didn't even realise it had been merged (the bug report still only lists 2.5.1 as the version it's fixed in, and that's also what I thought you were talking about on the mailing list).

> It looks like ld-linux.so also calls haltproc as part of dlclose.

It's possible that some functions in the backtrace are missing due to a stack frame optimisation in the ld-linux.so.2 code (that often affects the next stack frame too).

> Perhaps we need two "haltprocs", one called by System_exit
> internally which does actually halt the process, and one to be
> called by ld-linux.so in dlclose which does not?

As mentioned in my previous comment, normally that is more or less the situation that we /should/ have right now: halt calls system_exit which calls haltproc, while ld-linux.so should call lib_exit which does not call system_exit and hence not haltproc. Somewhere something's going wrong however.

Seth Grover

2010-09-24 21:49

reporter   ~0041303

A clarification, please:

> I've reverted the merge of that revision to 2.4.x, so 2.4.2 final will
> not contain it.

I can see that you reverted it in fixes_2_4, but hasn't the fixes branch already diverged from the 2.4.2 release? 2.4.2 has already been tagged with release candidates...

Jonas Maebe

2010-09-24 21:57

manager   ~0041304

> 2.4.2 has already been tagged with release candidates...

A tag is not a branch. There is no "branch from which 2.4.2 will be tagged" to merge the change to. And 2.4.2rc1 has been tagged and some binaries have already been built from it, so it cannot be merged there either since that would result in some platforms including that change and some not. It can only be included in 2.4.2rc2 (if there is such a release) or 2.4.2 final.

Seth Grover

2010-11-19 23:16

reporter   ~0043282

Since you reverted the fix for issue 14184 should it be reopened? I'd do it but I don't think I've got the authority. :)

Seth Grover

2010-11-19 23:16

reporter   ~0043283

Sorry, I mean issue 14958.

Jonas Maebe

2010-11-20 12:24

manager   ~0043297

I think the fact that this issue is still open is fine. I could resolve this one and open the other one, but that wouldn't change much.

Seth Grover

2011-08-26 17:07

reporter   ~0051160

I just verified this in the fixes_2_6 branch, verifying that my example from this issue and the example from issue 14958 both behave correctly. Thanks.

Issue History

Date Modified Username Field Change
2010-09-10 22:17 Seth Grover New Issue
2010-09-22 18:18 Seth Grover Note Added: 0041277
2010-09-23 13:14 Jonas Maebe FPCOldBugId => 0
2010-09-23 13:14 Jonas Maebe Description Updated
2010-09-23 13:58 Jonas Maebe Note Added: 0041283
2010-09-23 16:32 Seth Grover Note Added: 0041289
2010-09-23 16:36 Seth Grover Note Added: 0041290
2010-09-23 16:51 Jonas Maebe Note Added: 0041291
2010-09-24 21:49 Seth Grover Note Added: 0041303
2010-09-24 21:57 Jonas Maebe Note Added: 0041304
2010-11-19 23:16 Seth Grover Note Added: 0043282
2010-11-19 23:16 Seth Grover Note Added: 0043283
2010-11-20 12:24 Jonas Maebe Note Added: 0043297
2010-11-20 12:24 Jonas Maebe Relationship added related to 0014958
2010-11-24 16:33 Jonas Maebe Fixed in Revision => 16418
2010-11-24 16:33 Jonas Maebe Status new => resolved
2010-11-24 16:33 Jonas Maebe Fixed in Version => 2.5.1
2010-11-24 16:33 Jonas Maebe Resolution open => fixed
2010-11-24 16:33 Jonas Maebe Assigned To => Jonas Maebe
2011-03-04 10:43 Jonas Maebe Relationship added related to 0018831
2011-08-26 17:07 Seth Grover Status resolved => closed
2011-08-26 17:07 Seth Grover Note Added: 0051160