Thread: Processes in uninterruptible sleep

  1. #1
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445

    Processes in uninterruptible sleep

    I am having a periodic issue with a server program, where it will go into uninterruptible sleep (D state) for no reason that I can discern. I am running the release version of OpenSuse 10.3, with kernel version 2.6.22.5-31-default x86-64. My server does basic file, socket, and mysql I/O and the mysql server does not go to D state when my program does. my program is a typical multi-process unix-type server that forks to handle client connections. the interesting thing is that the server runs on 3 ports to handle connections from various locations with differing network and firewall configurations, and only one port (port 80 but not HTTP) goes into D state. the other two continue to function normally. even stranger yet, the parent and ALL of its children go into D state at the same time. I've been googling for over an hour looking for known bugs in the opensuse 10.3 default kernel, and haven't found anything useful. do any of you know of anything that might cause a parent process and all of its children to go into D state simultaneously on this system? just looking for more things to rule out before I start looking at hardware.

    I know it's customary to show source code that exhibits the problem, but since I can't reproduce the problem on command, I don't know if source code would really be useful in this case.

  2. #2

  3. #3
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    I think this is usually considered to be a hardware I/O issue. It will happen when the kernel is reading or writing to some hardware (eg, disk or the network card) and gets no reply.

    I guess that implies something could be broken -- you need to resolve what kind of hardware access is causing that to happen.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  4. #4
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Codeplug View Post
    Anything interesting in "dmesg"?

    gg
    Yeah, for sure check the logs, like /var/messages and /var/kern.log or whatever they are called. You may (probably) want to set the kernel and system logging level with klogd()/syslogd() if you are not getting any message about it.
    Last edited by MK27; 02-08-2010 at 12:04 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  5. #5
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by Codeplug View Post
    Anything interesting in "dmesg"?
    lots of firewall messages but nothing else. is there a way to turn off firewall logging in the dmesg output, at least temporarily?

  6. #6
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by MK27 View Post
    Yeah, for sure check the logs, like /var/messages and /var/kern.log or whatever they are called.
    the /var/log/messages file that included the last day that it happened contained a lot of logged events from the firewall about SYN flooding. I'm pretty sure these are false positives, but I have no way to know for sure, since my server operates on port 80 but does not actually use http, as the firewall may be expecting.

    so perhaps the firewall cuts off the connection but doesn't notify the kernel that the socket is being closed. is this even possible? is it possible to configure the firewall to handle port 80 as a raw port instead of expecting http (if it even enforces protocols at all)?

  7. #7
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Are you using semaphores?
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  8. #8
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by brewbuck View Post
    Are you using semaphores?
    no. I am not using any form of IPC, except for the TCP sockets and whatever libmysqlclient uses to talk to the server.

  9. #9
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    "Alt+SysRQ+t" may give a clue by dumping a stack trace of the D-state processes.

    If you only care about a solution (instead of "why"), then your time may be better spent trying to reproduce the issue in the latest stable kernel (2.6.32.7).

    gg

  10. #10
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by Codeplug View Post
    "Alt+SysRQ+t" may give a clue by dumping a stack trace of the D-state processes.

    If you only care about a solution (instead of "why"), then your time may be better spent trying to reproduce the issue in the latest stable kernel (2.6.32.7).

    gg
    if I build a custom kernel, will I need to rebuild my "world" as well? I remember having to do this when I have updated kernels on machines before.

    also, opensuse 10.3 is not supported anymore, and I'd like to try using the kernel package from the 11.2 distribution (linux 2.6.31.5). can you see any problems I might face while doing this? is it even something I should consider doing?

  11. #11
    {Jaxom,Imriel,Liam}'s Dad Kennedy's Avatar
    Join Date
    Aug 2006
    Location
    Alabama
    Posts
    1,065
    Quote Originally Posted by Elkvis View Post
    I am having a periodic issue with a server program, where it will go into uninterruptible sleep (D state) for no reason that I can discern. I am running the release version of OpenSuse 10.3, with kernel version 2.6.22.5-31-default x86-64. My server does basic file, socket, and mysql I/O and the mysql server does not go to D state when my program does. my program is a typical multi-process unix-type server that forks to handle client connections. the interesting thing is that the server runs on 3 ports to handle connections from various locations with differing network and firewall configurations, and only one port (port 80 but not HTTP) goes into D state. the other two continue to function normally. even stranger yet, the parent and ALL of its children go into D state at the same time. I've been googling for over an hour looking for known bugs in the opensuse 10.3 default kernel, and haven't found anything useful. do any of you know of anything that might cause a parent process and all of its children to go into D state simultaneously on this system? just looking for more things to rule out before I start looking at hardware.

    I know it's customary to show source code that exhibits the problem, but since I can't reproduce the problem on command, I don't know if source code would really be useful in this case.
    I have seen this before in a system that didn't have enough memory (64 MB) and NO SWAP. The issue was that the kernel thread sd was attempting to allocate memory for a file I/O -- there was no memory left, however, and the driver handled it poorly. The only thing that ever worked for me was to kill that process and start it over again. The real fix came later when I upgraded to much more memory (this era 2GB). I wonder if you are having similar issues?

  12. #12
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by Kennedy View Post
    I have seen this before in a system that didn't have enough memory (64 MB) and NO SWAP.
    I have 16GB of memory and 32GB of swap, so this probably doesn't apply.

    The issue was that the kernel thread sd was attempting to allocate memory for a file I/O -- there was no memory left, however, and the driver handled it poorly. The only thing that ever worked for me was to kill that process and start it over again..
    which I can't do because it's in the uninterruptible sleep state.

    The real fix came later when I upgraded to much more memory (this era 2GB). I wonder if you are having similar issues?
    thanks for trying, but I don't think our situations are similar enough to consider this as a possibility.

  13. #13
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Elkvis View Post
    the /var/log/messages file that included the last day that it happened contained a lot of logged events from the firewall about SYN flooding. I'm pretty sure these are false positives, but I have no way to know for sure, since my server operates on port 80 but does not actually use http, as the firewall may be expecting.
    If you are getting a bunch of SYN requests on port 80 and your server doesn't deal with http, don't you think this could be a problem? It seems to me this is a very bad port to pick -- it's already created these complications for you, real or imagined -- and if you have any choice at all, choose another one.

    Quote Originally Posted by Elkvis View Post
    if I build a custom kernel, will I need to rebuild my "world" as well? I remember having to do this when I have updated kernels on machines before.

    also, opensuse 10.3 is not supported anymore, and I'd like to try using the kernel package from the 11.2 distribution (linux 2.6.31.5). can you see any problems I might face while doing this? is it even something I should consider doing?
    Generally, building a new kernel does not mean having to change anything else, presuming you know what you are doing. It's safe to try anyway, since if it doesn't work out you can just go back to using your old one (this is determined by the bootloader). It can be a very tedious and boring procedure though, the config changes slightly with each version, meaning you may not be able to just swap in your old .config -- last time I downloaded a new one it took me at least two hours just to go through xconfig making sure everything was set appropriately. Unlike the distro kernels, I don't believe any effort is made to provide a useful "default" configuration with the source, and xconfig et. al. do not do anything automatically. Configuring the kernel is a sure cure for insomnia.

    If you think a new kernel will help maybe first try the newer distro package. If you already have a 2.6 kernel, though, you may as well stick with what you've got.

    One thing I would definitely do is get on the user mailing list for the firewall and present your case there, hopefully someone will have some more pertinent advice.
    Last edited by MK27; 02-08-2010 at 03:09 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  14. #14
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by MK27 View Post
    It can be a very tedious and boring procedure though, the config changes slightly with each version, meaning you may not be able to just swap in your old .config -- last time I downloaded a new one it took me at least two hours just to go through xconfig making sure everything was set appropriately. Unlike the distro kernels, I don't believe any effort is made to provide a useful "default" co
    That's what "make oldconfig" is for. It re-parses your older .config file and (hopefully) sanitizes it enough so that make xconfig can deal with it.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  15. #15
    Registered User jeffcobb's Avatar
    Join Date
    Dec 2009
    Location
    Henderson, NV
    Posts
    875
    To turn it another way, if you did not have to recompile driver/app X to install it, you probably won't have to after another kernel upgrade. For example back in the day you had to rebuild nVidia drivers or Alsa drivers after a kernel update. Now when you have a repository as large as Debian/Ubuntu it is no longer necessary. This is not a guarantee or an absolute but it has been a LOOOONG time since I have had to rebuild a driver after a kernel update.

    Now before anyone jumps on my case I am not trying to start a distro war or anything but I do want to say this about kernel rebuilding: The Debian way is IMHO the safest and easiest for those who are new to it, unsure about kernel configuration etc. Many moons ago I submitted a Debian How-To to Linux Laptops that used kernel rebuilding to support some of the hardware on a Vaio. You can look here to see how easy it is...JBCobb.net » Post Topic » The difference between “Don’t wanna” and “Can’t”. Yeah I know its dated now but the process of building the kernel and the help that Debian gives you here is what I am referring to.

    Oh and I agree 1000% with MK: you really ought to rethink the wisdom of sticking non-HTTP services on a port universally recognized as HTTP. It's about the only thing Windows and UNIX agree on...
    C/C++ Environment: GNU CC/Emacs
    Make system: CMake
    Debuggers: Valgrind/GDB

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Zombie and Uninterruptable Sleep Processes
    By pomprocker in forum Linux Programming
    Replies: 7
    Last Post: 01-27-2010, 10:53 AM
  2. Replies: 34
    Last Post: 05-27-2009, 12:26 PM
  3. Putting other processes to sleep
    By brett in forum C Programming
    Replies: 12
    Last Post: 12-12-2007, 01:24 AM
  4. binary tree of processes
    By gregulator in forum C Programming
    Replies: 1
    Last Post: 02-28-2005, 12:59 AM
  5. Sleep is overrated...
    By Polymorphic OOP in forum A Brief History of Cprogramming.com
    Replies: 24
    Last Post: 01-24-2003, 12:40 PM