PDA

View Full Version : Processes in uninterruptible sleep



Elkvis
02-08-2010, 10:48 AM
I am having a periodic issue with a server program, where it will go into uninterruptible sleep (D state) for no reason that I can discern. I am running the release version of OpenSuse 10.3, with kernel version 2.6.22.5-31-default x86-64. My server does basic file, socket, and mysql I/O and the mysql server does not go to D state when my program does. my program is a typical multi-process unix-type server that forks to handle client connections. the interesting thing is that the server runs on 3 ports to handle connections from various locations with differing network and firewall configurations, and only one port (port 80 but not HTTP) goes into D state. the other two continue to function normally. even stranger yet, the parent and ALL of its children go into D state at the same time. I've been googling for over an hour looking for known bugs in the opensuse 10.3 default kernel, and haven't found anything useful. do any of you know of anything that might cause a parent process and all of its children to go into D state simultaneously on this system? just looking for more things to rule out before I start looking at hardware.

I know it's customary to show source code that exhibits the problem, but since I can't reproduce the problem on command, I don't know if source code would really be useful in this case.

Codeplug
02-08-2010, 11:36 AM
Anything interesting in "dmesg"?

gg

MK27
02-08-2010, 11:59 AM
I think this is usually considered to be a hardware I/O issue. It will happen when the kernel is reading or writing to some hardware (eg, disk or the network card) and gets no reply.

I guess that implies something could be broken -- you need to resolve what kind of hardware access is causing that to happen.

MK27
02-08-2010, 12:01 PM
Anything interesting in "dmesg"?

gg

Yeah, for sure check the logs, like /var/messages and /var/kern.log or whatever they are called. You may (probably) want to set the kernel and system logging level with klogd()/syslogd() if you are not getting any message about it.

Elkvis
02-08-2010, 12:15 PM
Anything interesting in "dmesg"?

lots of firewall messages but nothing else. is there a way to turn off firewall logging in the dmesg output, at least temporarily?

Elkvis
02-08-2010, 01:03 PM
Yeah, for sure check the logs, like /var/messages and /var/kern.log or whatever they are called.

the /var/log/messages file that included the last day that it happened contained a lot of logged events from the firewall about SYN flooding. I'm pretty sure these are false positives, but I have no way to know for sure, since my server operates on port 80 but does not actually use http, as the firewall may be expecting.

so perhaps the firewall cuts off the connection but doesn't notify the kernel that the socket is being closed. is this even possible? is it possible to configure the firewall to handle port 80 as a raw port instead of expecting http (if it even enforces protocols at all)?

brewbuck
02-08-2010, 01:16 PM
Are you using semaphores?

Elkvis
02-08-2010, 01:23 PM
Are you using semaphores?

no. I am not using any form of IPC, except for the TCP sockets and whatever libmysqlclient uses to talk to the server.

Codeplug
02-08-2010, 01:28 PM
"Alt+SysRQ+t" may give a clue by dumping a stack trace of the D-state processes.

If you only care about a solution (instead of "why"), then your time may be better spent trying to reproduce the issue in the latest stable kernel (2.6.32.7).

gg

Elkvis
02-08-2010, 01:57 PM
"Alt+SysRQ+t" may give a clue by dumping a stack trace of the D-state processes.

If you only care about a solution (instead of "why"), then your time may be better spent trying to reproduce the issue in the latest stable kernel (2.6.32.7).

gg

if I build a custom kernel, will I need to rebuild my "world" as well? I remember having to do this when I have updated kernels on machines before.

also, opensuse 10.3 is not supported anymore, and I'd like to try using the kernel package from the 11.2 distribution (linux 2.6.31.5). can you see any problems I might face while doing this? is it even something I should consider doing?

Kennedy
02-08-2010, 02:10 PM
I am having a periodic issue with a server program, where it will go into uninterruptible sleep (D state) for no reason that I can discern. I am running the release version of OpenSuse 10.3, with kernel version 2.6.22.5-31-default x86-64. My server does basic file, socket, and mysql I/O and the mysql server does not go to D state when my program does. my program is a typical multi-process unix-type server that forks to handle client connections. the interesting thing is that the server runs on 3 ports to handle connections from various locations with differing network and firewall configurations, and only one port (port 80 but not HTTP) goes into D state. the other two continue to function normally. even stranger yet, the parent and ALL of its children go into D state at the same time. I've been googling for over an hour looking for known bugs in the opensuse 10.3 default kernel, and haven't found anything useful. do any of you know of anything that might cause a parent process and all of its children to go into D state simultaneously on this system? just looking for more things to rule out before I start looking at hardware.

I know it's customary to show source code that exhibits the problem, but since I can't reproduce the problem on command, I don't know if source code would really be useful in this case.

I have seen this before in a system that didn't have enough memory (64 MB) and NO SWAP. The issue was that the kernel thread sd was attempting to allocate memory for a file I/O -- there was no memory left, however, and the driver handled it poorly. The only thing that ever worked for me was to kill that process and start it over again. The real fix came later when I upgraded to much more memory (this era 2GB). I wonder if you are having similar issues?

Elkvis
02-08-2010, 02:21 PM
I have seen this before in a system that didn't have enough memory (64 MB) and NO SWAP.

I have 16GB of memory and 32GB of swap, so this probably doesn't apply.


The issue was that the kernel thread sd was attempting to allocate memory for a file I/O -- there was no memory left, however, and the driver handled it poorly. The only thing that ever worked for me was to kill that process and start it over again..

which I can't do because it's in the uninterruptible sleep state.


The real fix came later when I upgraded to much more memory (this era 2GB). I wonder if you are having similar issues?

thanks for trying, but I don't think our situations are similar enough to consider this as a possibility.

MK27
02-08-2010, 03:03 PM
the /var/log/messages file that included the last day that it happened contained a lot of logged events from the firewall about SYN flooding. I'm pretty sure these are false positives, but I have no way to know for sure, since my server operates on port 80 but does not actually use http, as the firewall may be expecting.

If you are getting a bunch of SYN requests on port 80 and your server doesn't deal with http, don't you think this could be a problem? It seems to me this is a very bad port to pick -- it's already created these complications for you, real or imagined -- and if you have any choice at all, choose another one.


if I build a custom kernel, will I need to rebuild my "world" as well? I remember having to do this when I have updated kernels on machines before.

also, opensuse 10.3 is not supported anymore, and I'd like to try using the kernel package from the 11.2 distribution (linux 2.6.31.5). can you see any problems I might face while doing this? is it even something I should consider doing?

Generally, building a new kernel does not mean having to change anything else, presuming you know what you are doing. It's safe to try anyway, since if it doesn't work out you can just go back to using your old one (this is determined by the bootloader). It can be a very tedious and boring procedure though, the config changes slightly with each version, meaning you may not be able to just swap in your old .config -- last time I downloaded a new one it took me at least two hours just to go through xconfig making sure everything was set appropriately. Unlike the distro kernels, I don't believe any effort is made to provide a useful "default" configuration with the source, and xconfig et. al. do not do anything automatically. Configuring the kernel is a sure cure for insomnia.

If you think a new kernel will help maybe first try the newer distro package. If you already have a 2.6 kernel, though, you may as well stick with what you've got.

One thing I would definitely do is get on the user mailing list for the firewall and present your case there, hopefully someone will have some more pertinent advice.

brewbuck
02-08-2010, 03:25 PM
It can be a very tedious and boring procedure though, the config changes slightly with each version, meaning you may not be able to just swap in your old .config -- last time I downloaded a new one it took me at least two hours just to go through xconfig making sure everything was set appropriately. Unlike the distro kernels, I don't believe any effort is made to provide a useful "default" co

That's what "make oldconfig" is for. It re-parses your older .config file and (hopefully) sanitizes it enough so that make xconfig can deal with it.

jeffcobb
02-08-2010, 03:30 PM
To turn it another way, if you did not have to recompile driver/app X to install it, you probably won't have to after another kernel upgrade. For example back in the day you had to rebuild nVidia drivers or Alsa drivers after a kernel update. Now when you have a repository as large as Debian/Ubuntu it is no longer necessary. This is not a guarantee or an absolute but it has been a LOOOONG time since I have had to rebuild a driver after a kernel update.

Now before anyone jumps on my case I am not trying to start a distro war or anything but I do want to say this about kernel rebuilding: The Debian way is IMHO the safest and easiest for those who are new to it, unsure about kernel configuration etc. Many moons ago I submitted a Debian How-To to Linux Laptops that used kernel rebuilding to support some of the hardware on a Vaio. You can look here to see how easy it is...JBCobb.net Post Topic The difference between “Don’t wanna” and “Can’t” (http://jbcobb.net/?p=132). Yeah I know its dated now but the process of building the kernel and the help that Debian gives you here is what I am referring to.

Oh and I agree 1000% with MK: you really ought to rethink the wisdom of sticking non-HTTP services on a port universally recognized as HTTP. It's about the only thing Windows and UNIX agree on...

MK27
02-08-2010, 04:15 PM
That's what "make oldconfig" is for. It re-parses your older .config file and (hopefully) sanitizes it enough so that make xconfig can deal with it.

Well I learned something useful today. :)

Elkvis
02-08-2010, 04:41 PM
Oh and I agree 1000% with MK: you really ought to rethink the wisdom of sticking non-HTTP services on a port universally recognized as HTTP. It's about the only thing Windows and UNIX agree on

I am definitely moving in that direction, but the system has been in place for 2 years in this configuration, and this only started happening within the last few months.

MK27
02-08-2010, 05:32 PM
I am definitely moving in that direction, but the system has been in place for 2 years in this configuration, and this only started happening within the last few months.

Well, take this with a lot of salt: unfortunately the entire thread got lost in the last cboard DB crash, but recently Yarin posted a bunch of IP's that he'd collected using "denyhosts" in only a few weeks. These were IP's that more than 100 times tried and failed to log into the ssh port on his new VPS (which is to say, somewhere that is presumably of no interest to anyone).

It was a pretty good list and when I traced some of the IP's, they'd all been acquired thru one of two services (one in Australia and one in Holland). I don't actually think they are malevolent, but let's entertain the idea that they might be and that these people are just port scanning at random as much as possible looking for different opportunities (such as a lucky ssh hit).

Now, one of the big goals of such a person would be a successful DoS attack, which can be done with SYN flooding. Another big goal would be to not get caught. So (here's where you can reach for the salt, because it's all just conjecture) if I had a small zombie net or whatever and was trying to test it out, I might just go around somewhat randomly looking to see if my SYN requests on port 80 weren't receiving ACKs or any kind of response at all -- this might indicate the server has successfully been overwhelmed.

And if you set up a non-http server on port 80, that's what it might appear to be from that perspective (a stressed out server that DoS's easily). So if this just started happening, maybe your IP is now on someone's "experiment here" list.

Kind of far fetched. Some missing links in it. Anyway, get on the mailing list for the firewall and ask about your problem. You should NOT be just ignoring that.

jeffcobb
02-08-2010, 05:45 PM
Pretty much what MK said; but if you are providing a non-HTTP service on a standard HTTP port you are *asking* for problems that you could easily side-step by moving it to another, non-standard port. I am not saying this is the cause of your current problems, just that it will likely to cause you problems down the road. Standards are standards for a reason. It would be like providing a non-email service on your SMTP port; you can do it but you will get Sam and Joe Spammer slamming your server trying to find a way to make it a relay (spoofing)...

brewbuck
02-08-2010, 06:04 PM
Making the symptom go away is not the same thing as diagnosing the problem. You could move to a different port, but how do you know some fault is not still lurking? I think you should try to figure out what's happening -- using the sysrq control key combo to figure out where in the kernel the thing is hanging seems like an obvious first step.