PDA

View Full Version : Linux Putting my children to sleep with futex!?



Abs
02-11-2009, 10:25 AM
Hey all, I'm not the most seasoned person writing in C however I'd like to think I'm coming along :-) I'm just about finished with a program I've been writing for a little while now that is a basic little daemon. Which basically listens for a connection, forks, and sends the request to a child process then goes back to listening.

In order to ensure that I don't get too many children processes I check a shared piece of memory (which gets incremented by the deamon when it forks, and decremented by the child when it finishes) to make sure I don't go over a certain number of children. After I detect that I already have, for example 5 children running the daemon sends a signal to all of the children to stop what they're doing, decrement the shared memory counter and exit. All the while the daemon is waiting for the children to stop.

Everything works great until I run my little load testing script that basically hammers my daemon with several thousands of requests as fast as it possibly can.

The problem is that after my main big load test there is sometimes a child process that is in a "sleeping" state(and never got to decrement the counter so the daemon waits forever). Doing an strace (following forks) on the daemon during the load test when the child goes to sleep and even after the child is sleeping I find the last run call was a futex FUTEX_WAIT. I'm not using futexes anywhere in my code and am protecting my shared memory with very very basic spinlocks. So it appears my OS is arbitrarily picking one of my children and putting it to sleep by making it call futex.

Has anyone ever heard of this, or know if there is a way to keep it from happening? I've even looked at different gcc options but am not seeing anything that appears to relate to this.

I spent most of yesterday googling and all I found was information about what a futex is and how they work, but nothing about Linux putting certain procs to sleep... something somewhere is trying to be way too helpful. Any ideas?

Codeplug
02-11-2009, 11:10 AM
>> and am protecting my shared memory with very very basic spinlocks.
Sounds like you're not using proper synchronization. A Posix or SVR4 semaphore would be a natural choice for limiting the number of children.

If you would like to post a minimal, but complete, example of your forking and shm access, we can point out anything that's "not right".

gg

Abs
02-11-2009, 11:25 AM
Sure thing.

Here is my code snippet that does the checking and and forking etc... My scontext is a struct with two integers.. children and updating.


if (strcasecmp(context.throttle_procs,"on") == 0 && scontext->children+1 > context.int_throttle_procs_max) {
sprintf(buf,"%s Max processes(%d) reached, sending SIGUSR1 signal to children",context.ident,context.int_throttle_procs_max);
log_message(context.log_file,buf,'e');
kill(0,SIGUSR1);
while (scontext->children !=0) {
usleep(250000);
//Wait for procs to exit before continueing
}
}
if ((pid=fork()) >= 0) {
//**--This is the Child Process--**//
if (pid == 0) {
cpid=getpid();
sprintf(context.ident,"child[%d]:",cpid);
context.child=scontext->children+1;
dispatcher(client_sockfd,hostname);
sprintf(buf,"%s process #%d exiting",context.ident,context.child);
log_message(context.log_file,buf,'d');
//Protext critical section with simple spinlock//
while (scontext->updating == TRUE) {
//spin waiting for lock
sprintf(buf,"%s process #%d waiting for lock.",context.ident,context.child);
log_message(context.log_file,buf,'d');
usleep(200000+context.child);
}
scontext->updating = TRUE;
scontext->children--;
//unlock critical section//
scontext->updating = FALSE;
close(client_sockfd);
exit(0);
}
//**--This is the parent Process--**//
else {
//Protext critical section with simple spinlock//
while (scontext->updating == TRUE) {
//spin waiting for lock
sprintf(buf,"%s process #%d waiting for lock.",context.ident,context.child);
log_message(context.log_file,buf,'d');
usleep(199000);
}
scontext->updating = TRUE;
scontext->children++;
//unlock critical section//
scontext->updating = FALSE;
sprintf(buf,"%s Sent to child(%d) process #%d",context.ident,pid,scontext->children);
log_message(context.log_file,buf,'d');
close(client_sockfd);
}
}


when a SIGUSR1 is recieved my signal handler checks to see if it is the daemone process or not... if if it's not (child) then it run this function:


void sig_handle_force_exit(void) {
char buf[KB];
struct shrd_context *scontext;

context.running = FALSE;
close_all_fd();
scontext = (struct shrd_context *)get_shrd_context(context.shm_id);
while (scontext->updating == TRUE) {
//spin waiting for lock
sprintf(buf,"%s process #%d waiting for lock.",context.ident,context.child);
log_message(context.log_file,buf,'d');
usleep(250000+context.child);
}
scontext->updating = TRUE;
scontext->children--;
//unlock critical section//
scontext->updating = FALSE;
det_shrd_context(scontext);
sprintf(buf,"%s process #%d forced exit.",context.ident,context.child);
log_message(context.log_file,buf,'w');
exit(-1);
}


Thank you so much for the eyes. I know it's probably not the cleanest code to look at, and it will probably make you cringe, but like I said... I'm still pretty green :-) Thanks

brewbuck
02-11-2009, 11:28 AM
>> and am protecting my shared memory with very very basic spinlocks.
Sounds like you're not using proper synchronization. A Posix or SVR4 semaphore would be a natural choice for limiting the number of children.

Well, strictly a spinlock is perfectly fine to protect a shared resource, it just chows down on CPU while it does so.

Abs
02-11-2009, 11:31 AM
right, from what I've read since getting in and only incrementing/decrementing a counter is such a small and fast operation a spinlock is ideal as it doesn't require any context switches etc... and can actually be faster than having to put a proc to sleep, wake up, and resume. But that assumes something else doesn't come along and do it for you :-/

brewbuck
02-11-2009, 11:35 AM
Futexes are used internally by the C library to implement certain kinds of wait states. They are a Linux-specific feature which are meant to be used as a building block for more friendly synchronization objects.

I can't see what in your code might be using a futex. But your "spinlocks" are not actual spinlocks, since they are not atomic test-and-set. I imagine you have a simple deadlock caused by not using proper locks.

A proper spinlock is fairly simple, but I won't show one because it's not what you should be using here. Better options would be pthread mutexes or UNIX semaphores.

Abs
02-11-2009, 11:42 AM
brewbuck, thank you for your comments... Since I'm doing everything with fork, I would need to do a considerable rewrite to get this working using threads instead, so I'd like to avoid pthreads. Would you recommend a Unix semaphore then? And can you tell me why you recommend again using a spinlock in this situation? The criteria you're basing this off of would help me to make a better decision in the future.

Codeplug
02-11-2009, 01:09 PM
What you have is mult-threaded. Each thread just happens to be in a separate process.

>> //unlock critical section//
>> scontext->updating = FALSE;
This does not provide any kind of meaningful (working) synchronization - even on a single core processor. Under Posix, there are strict rules to how multiple threads can access the same memory location:
http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap04.html#tag_04_10

You also need to make sure you're calling safe functions within your signal handler:
http://www.opengroup.org/onlinepubs/000095399/functions/xsh_chap02_04.html#tag_02_04_03 (towards the bottom of 2.4.3)

As for a solution - the easiest thing to do would be to use a single Posix semaphore with a count of 1 as your "critical section" provider - instead of the unsafe 'updating' variable.
http://www.opengroup.org/onlinepubs/009695399/functions/sem_init.html
So to "enter the critical section" you would call sem_wait(). To "leave the critical section" you would call sem_post().

Now that you have a proper synchronization object, you can use it to protect all "access" to shared memory. If you need to read the value of 'children', you:
[enter CS]
[read a copy]
[exit CS]
And likewise for increment and decrement.

Now, if there is shared memory which is read-only (never modified after threads are created), then synchronization isn't necessary - you can read away. All other accesses must use proper synchronization.

>> context.child=scontext->children+1;
Keep in mind, that even with proper synchronization there is no guarantee that 'context.child' will be unique per child.

gg

Abs
02-11-2009, 01:54 PM
Thank you so much for those links, I'm still going through them. And am going to take a stab at implementing it with a semaphore. I'll post back with my results. Another question though... With my loop that waits for the counter to come back down to 0, because it's not doing any modification of the value does it need to also be protected?

And yes I know that setting context.child=scotext->children+1 doesn't guarantee it to be unique per child, I had put that in place for more debugging so I could see which pid was running as which child number in my log. That does bring up another question for me though... when I make that assignment, that would also need to be within the CS, to guarantee I get the proper value right?

Codeplug
02-11-2009, 02:18 PM
All reads and writes ("accesses") should be protected.

Instead of polling for 0 in the parent, you could create an additional semaphore, let's call "lastchild", with an initial value of 0. The parent would then sem_wait(lastchild) to wait for all the children to exit. The last child to exit (scontext->children == 1) would then sem_post(lastchild), unblocking the parent, letting the parent know that the last child is going down.

>> when I make that assignment, that would also need to be within the CS
Yeah, since that constitutes an access to a shared memory location. You could just use a local variable in the parent to communicate a "child number" to the child.

gg

Abs
02-11-2009, 06:27 PM
Well, it's interesting... I'm now using semaphores to protect the shared memory and I'm still getting the same behavior as before where for somereason one of the child processes goes to sleep... this time however using strace, it appears that the last called sys call is


futex(0x7f21e73005d4, FUTEX_WAIT_PRIVATE, 2, NULL

Instead of the regular FUTEX_WAIT.

When I attach to the child with gdb and do a backtrace I it looks its running a function that it can't find (which makes sense as it looks like the process is being put to sleep by some other process(maybe OS?)). Here's the output from the gdb backtrace:


GNU gdb (GDB; openSUSE 11.1) 6.8.50.20081120-cvs
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
For bug reporting instructions, please see:
<http://bugs.opensuse.org/>.
Attaching to process 27973
Reading symbols from /home/miscem/dev/app_status/app_status...done.
Reading symbols from /lib64/librt.so.1...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00007f21e7089e6e in ?? () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f21e7089e6e in ?? () from /lib64/libc.so.6
#1 0x00007f21e703e9ed in ?? () from /lib64/libc.so.6
#2 0x00007f21e703e7a6 in ?? () from /lib64/libc.so.6
#3 0x00007f21e703cd00 in ctime_r () from /lib64/libc.so.6
#4 0x00000000004020be in log_message (filename=0x6092cc "/var/log/app_status", message=0x7fffef7282c0 "child[27973]: 10.41.1.187: Recieved SIGUSR1, Max procs reached!?", t=101 'e')
at includes/cust_utils.h:223
#5 0x0000000000404c35 in signal_handler (sig=10) at includes/init_utils.h:561
#6 <signal handler called>
#7 0x00007f21e703c7b0 in ?? () from /lib64/libc.so.6
#8 0x00007f21e703e832 in ?? () from /lib64/libc.so.6
#9 0x00007f21e703cd00 in ctime_r () from /lib64/libc.so.6
#10 0x00000000004020be in log_message (filename=0x6092cc "/var/log/app_status", message=0x7fffef728c50 "child[27973]: 10.41.1.187: dispatcher: request: \"GET /app HTTP/1.0\"", t=100 'd')
at includes/cust_utils.h:223
#11 0x0000000000405f20 in dispatcher (clientfd=6, hostname=0x7fffef72a4c0 "10.41.1.187") at includes/http_utils.h:319
#12 0x0000000000406574 in main (argc=1, argv=0x7fffef72aab8) at app_status.c:151
(gdb)


One interesting thing I noticed, is that when I compiled it as a 32bit executable, it didn't have the problem nearly as often. But once it did the gdb backtrace showed this:


GNU gdb (GDB; openSUSE 11.1) 6.8.50.20081120-cvs
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
For bug reporting instructions, please see:
<http://bugs.opensuse.org/>.
Attaching to process 10695
Reading symbols from /home/miscem/dev/app_status/app_status...done.
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
0xffffe430 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe430 in __kernel_vsyscall ()
#1 0xf7e90e93 in ?? () from /lib/libc.so.6
#2 0xf7e3e44b in ?? () from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)


Which looks dubious. Also here is my modified code snippets...maybe I missed something or I'm not implementing the semaphores correctly??

Added this for the creation of the semaphores... I removed the int "updating" from the shared context struct I was using before and added two sem_t types to it called mod_sem, and lc_sem.


//--Create Semaphore(s)--//
if (sem_init(&scontext->mod_sem,1,1) == 0) {
if (sem_init(&scontext->lc_sem,1,0) != 0) {
sprintf(buf,"%s sem_init(lc_sem): %s",context.ident,strerror(errno));
log_message(context.log_file,buf,'e');
return 1;
}
}
else {
sprintf(buf,"%s sem_init(mod_sem): %s",context.ident,strerror(errno));
log_message(context.log_file,buf,'e');
return 1;
}


Here is the updated snippet showing the checking for the number of child processes, the fork, increment and decrement.


sem_wait(&scontext->mod_sem); //Enter CS
cur_children = scontext->children; //Get Copy of val
sem_post(&scontext->mod_sem); //Leave CS
if (strcasecmp(context.throttle_procs,"on") == 0 && cur_children+1 > context.int_throttle_procs_max) {
sprintf(buf,"%s Max processes(%d) reached, sending SIGUSR1 signal to children",context.ident,context.int_throttle_procs_max);
log_message(context.log_file,buf,'e');
kill(0,SIGUSR1);
sem_wait(&scontext->lc_sem); //wait for last child to signal no more children
}
if ((pid=fork()) >= 0) {
//**--This is the Child Process--**//
if (pid == 0) {
cpid=getpid();
sprintf(context.ident,"child[%d]:",cpid);
sem_wait(&scontext->mod_sem); //Enter CS
context.child=scontext->children+1;
sem_post(&scontext->mod_sem); //Leave CS
dispatcher(client_sockfd,hostname);
sprintf(buf,"%s process #%d exiting",context.ident,context.child);
log_message(context.log_file,buf,'d');
sem_wait(&scontext->mod_sem); //Enter CS
scontext->children--;
sem_post(&scontext->mod_sem); //Leave CS
det_shrd_context(scontext);
close(client_sockfd);
exit(0);
}
//**--This is the parent Process--**//
else {
sem_wait(&scontext->mod_sem); //Enter CS
scontext->children++;
sprintf(buf,"%s Sent to child(%d) process #%d",context.ident,pid,scontext->children);
sem_post(&scontext->mod_sem); //Leave CS
log_message(context.log_file,buf,'d');
close(client_sockfd);
}
}


And here is the function that gets called by the children on a SIGUSR1


void sig_handle_force_exit(void) {
char buf[KB];
struct shrd_context *scontext;

context.running = FALSE;
close_all_fd();
scontext = (struct shrd_context *)get_shrd_context(context.shm_id);
sem_wait(&scontext->mod_sem); //Enter CS
scontext->children--;
if (scontext->children == 0 ) {
sprintf(buf,"%s I'm the last child, signaling....",context.ident);
log_message(context.log_file,buf,'d');
sem_post(&scontext->lc_sem); //Signal daemon waiting for last child
}
//unlock critical section//
sem_post(&scontext->mod_sem); //Leave CS
det_shrd_context(scontext);
sprintf(buf,"%s process #%d forced exit.",context.ident,context.child);
log_message(context.log_file,buf,'w');
exit(-1);
}


Thanks again for taking time to help me as I learn these things. I've already learned a ton from whats been posted already.

brewbuck
02-11-2009, 06:27 PM
brewbuck, thank you for your comments... Since I'm doing everything with fork, I would need to do a considerable rewrite to get this working using threads instead, so I'd like to avoid pthreads. Would you recommend a Unix semaphore then? And can you tell me why you recommend again using a spinlock in this situation? The criteria you're basing this off of would help me to make a better decision in the future.

On Linux at least, pthread mutexes can be shared between processes. You have to initialize the mutex within a shared memory segment (something you already have), and set the attributes with:



pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);


As far as I know, you can't currently do this with condition variables, only mutexes.

Codeplug
02-11-2009, 08:18 PM
You're still trying to do way too much in your signal handler. This easiest (and safest) thing to do is to boil down your signal handler to one line:
> g_time_to_die = 1;
Where 'g_time_to_die' is of type "volatile sig_atomic_t". Then dispatcher() just needs to poll it ever so often.

Another option is to use a different IPC mechanism for communicating "time to die" to your children. If dispatcher() spends most of its time in a poll() or select(), then an additional socket or a pipe could be added to the fd_set/pollfd so that when the parent writes to it, the children know to die. This approach could be expanded to send other "commands" to all or individual children.

gg

Abs
02-12-2009, 12:31 PM
Is there an easy way to share a volatile sig_atomic_t between processes? If I have to add it to my shared memory segment then I would need to still attach to the shared memory and update the value. And those functions aren't listed as async safe functions to use in a signal handler... which may be part of my problem in the first place... I think you're right on, that the problem lies in my signal handler.

For anyone else reading I also ran across these two links that were very informative for me, complete with examples:

https://www.securecoding.cert.org/confluence/display/seccode/SIG30-C.+Call+only+asynchronous-safe+functions+within+signal+handlers

https://www.securecoding.cert.org/confluence/display/seccode/SIG31-C.+Do+not+access+or+modify+shared+objects+in+signa l+handlers

Abs
02-12-2009, 12:40 PM
Nevermind, I just realized I don't need to share that between processes as the signal handler only cares about the current running process. Maybe I'll forget the whole decrement forced exit...so that once the SIGUSR1 is sent to the children, then just call _exit or maybe abort (both of which are async safe. Then set the counter back to 0 and assume that the children exited.

I don't know if I really want to use poll or select because I want them to actually get interrupted in whatever they're doing to just go away.

Does anyone forsee any problem with this or am I doing yet another dumb thing?

Abs
02-12-2009, 12:45 PM
On Linux at least, pthread mutexes can be shared between processes. You have to initialize the mutex within a shared memory segment (something you already have), and set the attributes with:



pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);


As far as I know, you can't currently do this with condition variables, only mutexes.

Interesting, I didn't know you could set a mutex to span multiple processes... I always thought the pthread library was more specific to the threads located in a single process. But I guess now that I think about the use of sem_init etc... and that it can span processes while being included in the pthread library it makes sense. Thanks

brewbuck
02-12-2009, 01:12 PM
Interesting, I didn't know you could set a mutex to span multiple processes... I always thought the pthread library was more specific to the threads located in a single process. But I guess now that I think about the use of sem_init etc... and that it can span processes while being included in the pthread library it makes sense. Thanks

Well, consider what a pthread_mutex actually is. It looks kind of like this:



struct pthread_mutex_t
{
volatile sig_atomic_t spinlock;
int lock_count;
int wait_count;
wait_queue_head *waiting_threads;
};


When a thread or process attempts to lock the mutex, it first gains the spinlock. Then it checks the lock count -- if the lock is unlocked, it increments the count, then unlocks the spinlock. On the other hand, if the lock was locked, the process puts itself on the wait queue, increments the wait count, then unlocks the spinlock while simultaneously sleeping. On Linux, that last step is achieved with a futex.

When the process holding the mutex unlocks it, it first locks the spin lock, then decrements the lock_count. If the lock_count becomes zero, it checks the wait count (while still holding the spinlock). If it's greater than zero, it dequeues all the processes on the wait queue and wakes them up, sets the wait_count to zero, decrements the lock_count, then unlocks the spinlock.

Being able to do this across processes depends on having a method of placing other processes on the wait queue, and telling them to wake up. On Linux, this is done with futexes. Not all implementations of pthreads support inter-process mutexes, but Linux does.

Codeplug
02-12-2009, 01:50 PM
>> Does anyone forsee any problem with this...
Not sure what "this" is at this point :)

You can continue to use signals, just replace the signal handler with "g_time_to_die = 1". Then the controlling loop in dispatcher() would be "while (!g_time_to_die)", or something like that.

If you just call _exit() in your signal handler, that's not a "clean" way to die (no atexit handlers are called, resources aren't manually released, etc...). A clean termination would be ideal.

gg

Abs
02-12-2009, 06:43 PM
yeah...sorry about the vagueness of "this". I was referring to just having the signal handler make the child call _exit when it recieves the signal and then resetting the shared counter back to 0.

I know it would be better if they exited normally, but my dispatcher isn't in a loop. Each request that comes in is it's own process and then exits... So the dispatcher right now just takes the request (as the child) determines what kind of request it is, calls the correct function to handle the request and thats about it. So short of checking g_time_to_die between various points of all the functions after my dispatcher is called I'm not seeing another way to truly interrupt and cleanly have the childred cleanly exit.

I was also unaware of the atexit function (thanks, that will come in useful later on), and so am not calling any other functions on normal exit. I did notice that _exit closes open file descriptors etc.. which should be enough for what the children are doing.

As of now I've been able to successfully load the crap out my hobbled together code and no crashes/problems at all (knock on wood). I may still use a semaphore to wait for the max processes to back down so there is an open slot, and if it waits too long then send the kill with the assumption that all of the currently running children are hosed. But thats for later.

Thank you both for all of the great information and insight into what was happening. I don't think I ever would have thought it was problem with my signal handler. Thanks again.