Hello All -
We have a legacy C program running non stop on one of our servers, with several instances often running at once. Fairly regularly, one of the instances while stop outputting to the log file and will just deadlock/hang. They must be then 'kill'ed by myself.
When I gdb into one of the hung running processes, and enter the 'where' command I invariably get something like the following:
#0 0x009f0402 in ?? ()
#1 0x00bdf1ce in __lll_mutex_lock_wait () from /lib/libc.so.6
#2 0x00b86abf in _L_mutex_lock_1965 () from /lib/libc.so.6
#3 0x00000000 in ?? ()
Does anyone recognise this? I'm sure it's indicative of a bug in the app but amn't sure how to track it down. Any suggestions would be very welcome.
- a process exited without releasing the lock
- a process has the lock, and is stuck waiting for something else.
You could create a debug wrapper of the mutex to see who currently has the lock when everything else is stuck waiting for it.
Thanks for the reply Salem.
The app doesn't use mutex's deliberately. By that I mean there is no code specificly written to use mutex's, but I suppose they could be implemented by some external function.
Could you suggest how to create a debug wrap as you suggested without knowing where exactly the problem is coming from?
If I could debug one of the frozen apps with gdb, it could help trace back where the problem occurs at least. Can anyone advise on how best to do that?