Thread: program segfaults without dumping a core

  1. #1
    Registered User
    Join Date
    Mar 2005
    Posts
    3

    program segfaults without dumping a core

    Howdy --

    Looking for tips on how I can get my program to dump a core file when it crashes. The program itself is an SMTP server, comprising about 20k lines of code. It presently is written to fork itself 12 times, and then spawn 64 threads for each process. Each thread then handles a seperate connection. The system itself is a Dell Poweredge 1650, running Red Hat Enterprise Linux 3, with kernel 2..4.21-20.ELsmp.

    Before anyone states the obvious, let me run down what I've tried already. I've tried enabling core dumps in the bash shell using the command 'ulimit -c unlimited', after which my settings look like so :

    [root@jean root]# ulimit -a
    core file size (blocks, -c) unlimited
    data seg size (kbytes, -d) unlimited
    file size (blocks, -f) unlimited
    max locked memory (kbytes, -l) 4
    max memory size (kbytes, -m) unlimited
    open files (-n) 1024
    pipe size (512 bytes, -p) 8
    stack size (kbytes, -s) 10240
    cpu time (seconds, -t) unlimited
    max user processes (-u) 7168
    virtual memory (kbytes, -v) unlimited

    I have also tried enabling core dumps in code using the 'setrlimit' function call :

    Code:
    int enable_coredumps(void) {
    	
    	int state;
    	struct rlimit rlim;
    	
    	memset(&rlim, 0, sizeof(rlim));
    
    	state = getrlimit(RLIMIT_CORE, &rlim);
    
    	if (state) { 
    		slog("(main:enable_coredumps) Could not get the kernel options. getrlimit = %i", state);
    		return 0;
    	}
    	
    	rlim.rlim_cur = RLIM_INFINITY;
    
    	state = setrlimit(RLIMIT_CORE, &rlim);
    	if (state) { 
    		slog("(main:enable_coredumps) Could not set kernel options for core dumping. setrlimit = %i", state);
    		return 0;
    	}
    
    	return 1;
    }
    So far nothing has worked.

    Does anyone have any ideas? My present theories are that something about how big my stack is prevents the kernel from being able to dump the core. My second theory is that the problem is a result of the fact that I start the application as root, bind to port 25, and then setsuid to another user. I then chdir to the setsuid's homedir, and in theory it can no longer access the directory where the app was started. In order to try and rule this theory out, I've tried starting the app from the setsuid's home directory, to no avail.

    Alternatively, has anyone written a SIGSEGV handler that uses the ptrace function to record where the fault occurred? (Remember, this is a multi-threaded app, so in theory the parent thread could catch the SIGSEGV, and then ptrace all of the child threads before exiting, though I haven't actually tried to implement this.)

    I have run this program extensively in valgrind, and can find no trace of a bug. However, I cannot run the production server inside of valgrind because of the number of simultaneous connections. This application tends to have between 100 and 500 simultaneous connections, and when running under valgrind, it doesn't appear to accept the connections fast fast enough.

    Any help would be greatly appreciated.

    Thanks in advance,

    Ladar

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    So do any normal programs you write dump core?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Mar 2005
    Posts
    3

    Core dumps

    Yes. I've written normal programs that dump cores. I've also gotten this program to dump its core on my test box with only 8 threads. I just can't get it to dump core when I have 32-64 threads per process.

  4. #4
    Registered User
    Join Date
    Mar 2005
    Posts
    3
    As a quick follow up. I cut the number of threads to 16, and raised the number of processes to 32. I also turned off the chuid code. When I did this, I got core dumps, but without full stack traces.

    Even without the trace, it appears the code was dumping deep inside the SSL library. I've since upgraded my SSL library from 0.9.7e to 0.9.7f.

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    I think this is a question for a more hard-core linux developers forum.

    This is really a beginner C and C++ board, with lots of newbies, and not that many wizards who know linux inside out.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 4
    Last Post: 02-21-2008, 10:39 AM
  2. Replies: 2
    Last Post: 01-28-2008, 03:07 AM
  3. Using variables in system()
    By Afro in forum C Programming
    Replies: 8
    Last Post: 07-03-2007, 12:27 PM
  4. My program, anyhelp
    By @licomb in forum C Programming
    Replies: 14
    Last Post: 08-14-2001, 10:04 PM