Thread: Debugging problem

  1. #1
    ... kermit's Avatar
    Join Date
    Jan 2003
    Posts
    1,534

    Debugging problem

    I have been fiddling with an open source project (The Dillo browser) and have run into a bit of a brick wall. The browser segfaults on start up. I could bug the devs and figure this out, but I am hesitant to do so for at least a couple of reasons:

    1. They are a small group, and are busy enough porting the browser to a different version of FLTK.
    2. I suspect this may well be an environment issue (more later), but I see this as a challenge to learn something new and interesting.


    So let's see.. The software builds fine, but segfaults every time. Following is a debug session, with a core file loaded, a program run, and a backtrace. I am hoping someone will be able to see something here that I can't, and offer some advice. As an aside, all of my gdb experience has been with simpler issues up to this point.

    Code:
    drs@mintyd ~/tmp/work/dillo-1.3 $ gdb src/dillo /tmp/core 
    Reading symbols from /home/drs/tmp/work/dillo-1.3/src/dillo...done.
    [New Thread 13327]
    
    warning: Can't read pathname for load map: Input/output error.
    Reading symbols from /usr/lib/libjpeg.so.62...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libjpeg.so.62
    Reading symbols from /lib/libpng12.so.0...(no debugging symbols found)...done.
    Loaded symbols for /lib/libpng12.so.0
    Reading symbols from /usr/lib/libXext.so.6...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libXext.so.6
    Reading symbols from /usr/lib/libXft.so.2...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libXft.so.2
    Reading symbols from /usr/lib/libfontconfig.so.1...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libfontconfig.so.1
    Reading symbols from /usr/lib/libXinerama.so.1...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libXinerama.so.1
    Reading symbols from /lib/libpthread.so.0...Reading symbols from /usr/lib/debug/lib/libpthread-2.11.2.so...done.
    done.
    Loaded symbols for /lib/libpthread.so.0
    Reading symbols from /lib/libdl.so.2...Reading symbols from /usr/lib/debug/lib/libdl-2.11.2.so...done.
    done.
    Loaded symbols for /lib/libdl.so.2
    Reading symbols from /usr/lib/libX11.so.6...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libX11.so.6
    Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libz.so.1
    Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libstdc++.so.6
    Reading symbols from /lib/libm.so.6...Reading symbols from /usr/lib/debug/lib/libm-2.11.2.so...done.
    done.
    Loaded symbols for /lib/libm.so.6
    Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
    Loaded symbols for /lib/libgcc_s.so.1
    Reading symbols from /lib/libc.so.6...Reading symbols from /usr/lib/debug/lib/libc-2.11.2.so...done.
    done.
    Loaded symbols for /lib/libc.so.6
    Reading symbols from /usr/lib/libfreetype.so.6...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libfreetype.so.6
    Reading symbols from /usr/lib/libXrender.so.1...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libXrender.so.1
    Reading symbols from /usr/lib/libexpat.so.1...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libexpat.so.1
    Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/lib/ld-2.11.2.so...done.
    done.
    Loaded symbols for /lib64/ld-linux-x86-64.so.2
    Reading symbols from /usr/lib/libxcb.so.1...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libxcb.so.1
    Reading symbols from /usr/lib/libXau.so.6...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libXau.so.6
    Reading symbols from /usr/lib/libXdmcp.so.6...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libXdmcp.so.6
    Reading symbols from /lib/libnss_files.so.2...Reading symbols from /usr/lib/debug/lib/libnss_files-2.11.2.so...done.
    done.
    Loaded symbols for /lib/libnss_files.so.2
    Reading symbols from /lib/libnss_dns.so.2...Reading symbols from /usr/lib/debug/lib/libnss_dns-2.11.2.so...done.
    done.
    Loaded symbols for /lib/libnss_dns.so.2
    Reading symbols from /lib/libresolv.so.2...Reading symbols from /usr/lib/debug/lib/libresolv-2.11.2.so...done.
    done.
    Loaded symbols for /lib/libresolv.so.2
    Reading symbols from /usr/lib/libXcursor.so.1...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libXcursor.so.1
    Reading symbols from /usr/lib/libXfixes.so.3...(no debugging symbols found)...done.
    Loaded symbols for /usr/lib/libXfixes.so.3
    Reading symbols from /usr/lib/gconv/ISO8859-1.so...Reading symbols from /usr/lib/debug/usr/lib/gconv/ISO8859-1.so...done.
    done.
    Loaded symbols for /usr/lib/gconv/ISO8859-1.so
    Core was generated by `src/dillo'.
    Program terminated with signal 11, Segmentation fault.
    #0  0x00007f819376390a in __snprintf (s=<value optimized out>, maxlen=<value optimized out>, format=<value optimized out>) at snprintf.c:39
    39      snprintf.c: No such file or directory.
            in snprintf.c
    (gdb) run
    Starting program: /home/drs/tmp/work/dillo-1.3/src/dillo 
    [Thread debugging using libthread_db enabled]
    paths: Cannot open file '/home/drs/.dillo/keysrc'
    paths: Using /usr/local/etc/dillo/keysrc
    dillo_dns_init: Here we go! (threaded)
    Enabling cookies as from cookiesrc...
    Nav_open_url: new url='http://www.dragcave.net/'
    [New Thread 0x7ffff4edb700 (LWP 13335)]
    Dns_server [0]: www.dragcave.net is 38.101.121.19
    [Thread 0x7ffff4edb700 (LWP 13335) exited]
    Connecting to 38.101.121.19
    Nav_open_url: new url='http://dragcave.net/'
    Capi_filters_test: ALLOW from 'www.dragcave.net' to 'dragcave.net'
    [New Thread 0x7ffff4edb700 (LWP 13336)]
    Dns_server [0]: dragcave.net is 38.101.121.19
    [Thread 0x7ffff4edb700 (LWP 13336) exited]
    Connecting to 38.101.121.19
    [cookies dpi]: dragcave.net GETTING: 
    
    Program received signal SIGSEGV, Segmentation fault.
    0x00007ffff5d0990a in __snprintf (s=<value optimized out>, maxlen=<value optimized out>, format=<value optimized out>) at snprintf.c:39
    39      snprintf.c: No such file or directory.
            in snprintf.c
    (gdb) bt
    #0  0x00007ffff5d0990a in __snprintf (s=<value optimized out>, maxlen=<value optimized out>, format=<value optimized out>) at snprintf.c:39
    #1  0x2e0000000040c2f8 in ?? ()
    #2  0x00206e6f67002e2e in ?? ()
    #3  0x0000000000716b70 in ?? ()
    #4  0x00000000007f5210 in ?? ()
    #5  0x00000000007f5210 in ?? ()
    #6  0x00000000007b0420 in ?? ()
    #7  0x0000000000000023 in ?? ()
    #8  0x0000000000000023 in ?? ()
    #9  0x0000000000475ba8 in Fl_Window::label(char const*, char const*) ()
    #10 0x00000000007f5cc0 in ?? ()
    #11 0x0000000000000004 in ?? ()
    #12 0x0000000000000043 in ?? ()
    #13 0x0000000000420acf in DilloHtml::bugMessage (this=<value optimized out>, format=<value optimized out>) at html.cc:146
    #14 0x000000000040d005 in a_UIcmd_set_page_title (bw=0x43, label=0x420acf "SH\211\373", <incomplete sequence \366\207\230>) at uicmd.cc:1156
    #15 0x0000000000420af0 in Html_tag_close_title (html=0x7dc190, TagIdx=<value optimized out>) at html.cc:1650
    #16 0x0000000000421bf4 in Html_tag_cleanup_to_idx (html=0x7dc190, new_idx=<value optimized out>) at html.cc:1315
    #17 Html_tag_cleanup_at_close (html=0x7dc190, new_idx=<value optimized out>) at html.cc:1360
    #18 0x0000000000423f10 in Html_process_tag (html=0x7dc190, buf=<value optimized out>, bufsize=<value optimized out>, Eof=0) at html.cc:3553
    #19 Html_write_raw (html=0x7dc190, buf=<value optimized out>, bufsize=<value optimized out>, Eof=0) at html.cc:3818
    #20 0x00000000004246c4 in DilloHtml::write (this=<value optimized out>, Buf=<value optimized out>, BufSize=<value optimized out>, Eof=<value optimized out>) at html.cc:552
    #21 0x0000000000424716 in Html_callback (Op=<value optimized out>, Client=0x7b7a50) at html.cc:3712
    #22 0x000000000041512a in Cache_process_queue (entry=0x7b7840) at cache.c:1181
    #23 0x000000000041593e in a_Cache_process_dbuf (Op=<value optimized out>, 
        buf=0x7be7b0 "HTTP/1.1 200 OK\r\nContent-Type: text/html; charset=iso-8859-1\r\nCache-Control: private, max-age=2, no-cache=Set-Cookie, must-revalidate, proxy-revalidate\r\nContent-Encoding: gzip\r\nVary: Accept-Encoding\r\n"..., buf_size=2044, Url=<value optimized out>) at cache.c:878
    #24 0x0000000000417436 in a_Capi_ccc (Op=2, Branch=2, Dir=1, Info=0x7b75e0, Data1=0x7b0880, Data2=0x4a725d) at capi.c:776
    #25 0x000000000041287c in a_Chain_fcb (Op=<value optimized out>, Info=<value optimized out>, Data1=<value optimized out>, Data2=<value optimized out>) at chain.c:113
    #26 0x0000000000437762 in Dpi_parse_token (Op=<value optimized out>, Branch=<value optimized out>, Dir=<value optimized out>, Info=<value optimized out>, Data1=<value optimized out>, 
        Data2=<value optimized out>) at dpi.c:220
    #27 Dpi_process_dbuf (Op=<value optimized out>, Branch=<value optimized out>, Dir=<value optimized out>, Info=<value optimized out>, Data1=<value optimized out>, Data2=<value optimized out>)
        at dpi.c:339
    #28 a_Dpi_ccc (Op=<value optimized out>, Branch=<value optimized out>, Dir=<value optimized out>, Info=<value optimized out>, Data1=<value optimized out>, Data2=<value optimized out>)
        at dpi.c:735
    #29 0x000000000041287c in a_Chain_fcb (Op=<value optimized out>, Info=<value optimized out>, Data1=<value optimized out>, Data2=<value optimized out>) at chain.c:113
    #30 0x00000000004381b6 in a_IO_ccc (Op=2, Branch=<value optimized out>, Dir=1, Info=0x7b76a0, Data1=0x7b06c0, Data2=0x0) at IO.c:425
    #31 0x00000000004382e3 in IO_read (io=0x7b06c0) at IO.c:197
    #32 0x0000000000438337 in IO_callback (io=0x7b06c0) at IO.c:262
    #33 0x0000000000438454 in IO_fd_read_cb (fd=6, data=<value optimized out>) at IO.c:283
    #34 0x0000000000475470 in fl_wait(double) ()
    #35 0x0000000000457f91 in Fl::wait(double) ()
    ---Type <return> to continue, or q <return> to quit---
    #36 0x0000000000458005 in Fl::run() ()
    #37 0x0000000000407ec6 in main (argc=1, argv=0x7fffffffe888) at dillo.cc:431
    (gdb)
    That was compiled with -ggdb -O1. No difference if I do -ggdb -O2, but if I do -O0, the segfault goes away. I mentioned I thought this might be an environment thing: I have sucessfully built this on an x86-64 OpenSuse machine (11.3, desktop), but when I tried with an x86-64 laptop (OpenSuse 11.4), I got this problem. I tried Fedora 14 (same laptop) and it built fine with the default settings. But with Linux Mint (Debian Edition), on the same laptop, I am back to the segfault. Not sure of what to do, but I am hoping some of you might be able to give me an idea of what to be looking at here.

  2. #2
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Well it's crashing in snprintf, so perhaps it's a buffer overrun. I did some searching through the dillo source, trying to find bugMessage() (the first function we can see besides snprintf). But it doesn't seem to be doing anything suspicious. dillo:src/html.cc

    So up one level in the stack trace, to a_UIcmd_set_page_title(). Unfortunately there is no line 1156 in the code I'm looking at in the mercurial repository, probably they've changed it around a bit. (Maybe fixed your bug . . . .) But here's the closest I could find: dillo:src/uicmd.cc Not very enlightening. But look at the parameters being passed to it:
    Code:
    a_UIcmd_set_page_title (bw=0x43, label=0x420acf "SH\211\373", <incomplete sequence \366\207\230>)
    That looks like a really weird title. Might not even be NULL-terminated. Maybe uninitialized?

    And one higher up the stack frame: dillo:src/html.cc
    Maybe html->InFlags is uninitialized? Or maybe the HTML document you're trying to look at doesn't have a <head> or a <title> element?

    I'd try telling the browser to open a specific HTML document, in case the default screen is broken. Maybe an extremely simple one you write yourself, with e.g.
    Code:
    <html>
    <head><title>Simple Title</title></head>
    <body>
    <p>The body.</p>
    </body>
    </html>
    If that doesn't work, I'd set breakpoints in a_UIcmd_set_page_title() or similar functions, try to see what's going on (if you're familiar with GDB). Or just hack the function so that it never sets the title at all!

    BTW: good GDB commands to know:
    • up: go up the stack frame
    • down: go down the stack frame
    • [break whereever.c:line]: set a breakpoint somewhere
    • p var: print a variable


    My boredom has been alleviated. Thanks for the interesting problem to think about, and good luck debugging . . . .
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  3. #3
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Looks like uninitialized memory (more specifically, a string hasn't been null-terminated).

    Evidence:

    1. It goes away when you build for debug.
    2. It changes on different platforms.
    3. See the "label" parameter on the a_UIcmd_set_page_title() call. It looks like junk.
    4. It crashes inside __snprintf(), which suggests it flew off the end of a buffer.

    My wild-ass guess is that a string is not getting null-terminated somewhere. It works in debug because memory tends to be 0 in debug, and you get the null terminator by accident. Another possibility is there's a fixed-length buffer somewhere, and some code copies a string which is too long to fit in the buffer, so it stops copying when the buffer is full -- the resulting buffer has no null terminator (but in debug, there happesn to be a zero at the next byte so everything looks fine).

    I'd run it through Valgrind and see what it says.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  4. #4
    ... kermit's Avatar
    Join Date
    Jan 2003
    Posts
    1,534
    Hey guys,

    Thanks for having a look at this. If you want to see the code I am working with, have a look here (1.3 port).

    dwks: I was pretty sure this was not an issue with <head>/<title> tags missing (I checked on a page that validated on W3), but I tried your simple html page as well. No luck, but it did give me something else to look at. I am gonna go fiddle with some breakpoints, and see what I can see.

    Brewbuck: I'll go and try Valgrind.

    Edit: I think I am on to something:

    Code:
    void a_UIcmd_set_page_title(BrowserWindow *bw, const char *label)
      1143{
      1144   const int size = 128;
      1145   char title[size];
      1146
      1147   if (a_UIcmd_get_bw_by_widget(BW2UI(bw)->tabs()->wizard()->value()) == bw) {
      1148      // This is the focused bw, set window title
      1149      if (snprintf(title, size, "Dillo: %s", label) >= size) {
      1150         uint_t i = MIN(size - 4, 1 + a_Utf8_end_of_char(title, size - 8));
      1151         snprintf(title + i, 4, "...");
      1152      }
      1153      BW2UI(bw)->window()->copy_label(title);
      1154      BW2UI(bw)->window()->redraw_label();
      1155   }
      1156   BW2UI(bw)->tabs()->set_tab_label(BW2UI(bw), label);
      1157}
    When I check the value of size, just before the call to snprintf, it is -128.
    Last edited by kermit; 04-25-2011 at 06:08 PM.

  5. #5
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    You're definitely seeing stack corruption, hence all the strangeness in the back trace (??, bizarre addresses). Overflowing a local buffer/array is by far the most likely, but one caveat: just because the last functions we recognize are DilloHtml::bugMessage and a_UIcmd_set_page_title doesn't mean that's where the overflow is. It could be the overflow started in stack frame 3, but overflowed all the way up to frame 14. In situations like this, I prefer to debug a live process. You can put a sleep statement at the beginning of main, and do something like gdb program_name process_id. Let it run with no breakpoints and do whatever you do to make it crash. Usually, GDB catches the seg fault before it actually blows out your call stack, so you can switch back to your GDB window and see exactly where you were right before the seg fault. That backtrace may be far more useful.

    One of the more obscure tricks in GDB that I find useful is for printing arrays that have degraded to pointers due to function calls. You can do print *array@10 to print the first 10 elements of array. You may also be interested in the following for making printing of structures and arrays a little easier on the eyes:
    set print array
    set print pretty
    set print elements <n>

  6. #6
    Registered User
    Join Date
    Dec 2007
    Posts
    2,675
    If it's consistently trashing the size variable you could put a write watch on that variable too, I think.

  7. #7
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by kermit View Post
    When I check the value of size, just before the call to snprintf, it is -128.
    -128 is the value you would see if you tried to store 128 into a signed char. Of course, size is an int, not a signed char. But still, very weird. Also, it's const, so it might not actually exist as a real variable. You MIGHT just be seeing debugger weirdness (not printing the value of the constant properly).

    You say this happens BEFORE the call to snprintf(). What value is stored there BEFORE the call to a_UIcmd_get_bw_by_widget()? I don't know what a_UIcmd_get_bw_by_widget() does, but it doesn't sound like a function that would corrupt the stack.

    If the call to snprintf() is overflowing, it should drop into that if() block. Does it get there?

    I'd also try stepping into window()->copy_label(title) and see what's going on in there.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  8. #8
    ... kermit's Avatar
    Join Date
    Jan 2003
    Posts
    1,534
    Code:
    void a_UIcmd_set_page_title(BrowserWindow *bw, const char *label)
      1143{
      1144   const int size = 128;
      1145   char title[size];
      1146   /* At this point size==-128 (though I have seen it as just junk numbers
                 * as well). Could be anduril462 was on to something perhaps? 
                 */
      1147   if (a_UIcmd_get_bw_by_widget(BW2UI(bw)->tabs()->wizard()->value()) == bw) {
      1148      // This is the focused bw, set window title
      1149      if (snprintf(title, size, "Dillo: %s", label) >= size) {<-- It dies here
      1150         uint_t i = MIN(size - 4, 1 + a_Utf8_end_of_char(title, size - 8));
      1151         snprintf(title + i, 4, "...");
      1152      }
      1153      BW2UI(bw)->window()->copy_label(title);
      1154      BW2UI(bw)->window()->redraw_label();
      1155   }
      1156   BW2UI(bw)->tabs()->set_tab_label(BW2UI(bw), label);
      1157}
    Anyway, it's late, and I need sleep. I will have another look tomorrow. Valgrind showed a few mismatched free() / delete / delete [] and some "Conditional jump or move depends on uninitialised value(s)" I will have a look at those later as well.
    Last edited by kermit; 04-25-2011 at 08:32 PM.

  9. #9
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by kermit View Post
    Anyway, it's late, and I need sleep. I will have another look tomorrow. Valgrind showed a few mismatched Mismatched free() / delete / delete [] and some "Conditional jump or move depends on uninitialised value(s)" I will have a look at those later as well.
    If you're like me, you might have the answer in your head when you wake up.

    That "Conditional jump or move depends on uninitialised value(s)" may be your problem.

    Stack overflows are not detectable by Valgrind memcheck mode, but ptrcheck can sometimes find them. If you suspect it's stack-related, try running ptrcheck instead:

    http://valgrind.org/docs/manual/pc-manual.html
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Direct3D Debugging problem...
    By yaya in forum Game Programming
    Replies: 1
    Last Post: 03-10-2009, 12:34 AM
  2. Problem in debugging in Eclipse
    By Bargi in forum Linux Programming
    Replies: 1
    Last Post: 08-21-2007, 09:53 AM
  3. debugging problem
    By XX@nnX in forum Windows Programming
    Replies: 0
    Last Post: 09-27-2006, 10:20 AM
  4. debugging problem
    By _ag0nizer in forum C++ Programming
    Replies: 3
    Last Post: 06-10-2002, 05:26 PM
  5. debugging problem
    By _ag0nizer in forum C++ Programming
    Replies: 3
    Last Post: 06-03-2002, 09:47 AM