Thread: Having probs with segmentation faults on fclose()? Possible work-around

  1. #1
    Registered User
    Join Date
    Jul 2012
    Location
    Michigan U.P.
    Posts
    20

    Having probs with segmentation faults on fclose()? Possible work-around

    Hello: The 'Net has helped me so much, time to try and return the favor.

    I had spent many hours debugging a file reading program that was generating segmentation fault aborts. With debugging tools I narrowed it down to an apparent conflict between malloc'ed memory and fclose(). ('Net Research on this set of circumstances even produced reports of the problem being absent on *NIX systems, but present on Linux).

    There comes a time when you cut your losses and try a different approach. While perhaps not the optimum, the example below shows what changes I made to the existing code (now commented out).

    The basis of the change is to replace fopen, fread/getc, fclose with open, read, close (lower-level calls).

    If the reader needs more back-story on this program - including more info on variables not seen defined in this snippet - feel free to contact.

    FWIW, the program being created is intended to take .eml, .msg and .txt (which contain the .eml format) files, extract certain header lines ("To:, From:, Subject:, Date:, Cc, for importing into a MySQL database. The end result hopes to be an archive of email files sortable on the headers. The entirety of the email file is also loaded into the DB. This way, many thousands of emails can be hard-archived to .7z or .rar or .zip, the the emails in the DB can be searched if needed.

    Best, GnS.

    Code:
    int blockread()
    {
    int      fhandle;
    /* int      inint; */
    int      close_ret;
    int      retval;
    int      getmem_flag;
    long     n;
    long     *bufptr;
    /* FILE     *lfp; */
    
        getmem_flag = OK;
        errno       = 0;
        retval      = OK;
    
    /*     if ((lfp = fopen(pd.tmp, "rb")) != (FILE *) NULL) */
        fhandle = open(pd.tmp, O_RDONLY);
    
         if (fhandle != -1)   /* error condition for open() */
            {
            pd.wholefile = (long *) xmalloc(&getmem_flag);
            bufptr = pd.wholefile;
    
            if (getmem_flag != OK)
                {
    /*             fclose(lfp); */
                close_ret = close(fhandle);
                if (close_ret != 0)
                    fprintf(pd.log_fp, "\nclose() input file returned %d, %s.\n", close_ret,  strerror(errno));
    
                fprintf(pd.log_fp, "\nAborting read file %s.\n", pd.tmp);
                return(BAD);
                }
    
    /*         for (n = 0l; (inint = getc(lfp)) != EOF && n < pd.cur_fsize; n++)
                *bufptr++ = inint;  */
            n = read(fhandle, bufptr, pd.cur_fsize);
    
            /* fclose(lfp); */
            close_ret = close(fhandle);
            if (close_ret != 0)
                fprintf(pd.log_fp, "\nclose() input file returned %d, %s.\n", close_ret,  strerror(errno));
    
            /* if (n != pd.cur_fsize) */
            if ((n != pd.cur_fsize) || (errno != 0))
                {
                fprintf(pd.log_fp, "\nRead %s problem, expected %ld, got %ld bytes\n", pd.tmp, pd.cur_fsize, n);
                return(BAD);
                }
            }
        else
            fprintf(pd.log_fp, "\nFail open %s, %s\n", pd.tmp, strerror(errno));
    
        return(retval);
    }

  2. #2
    Registered User
    Join Date
    May 2010
    Posts
    4,608
    It would probably be best if you posted the smallest complete program that illustrates your problem. Trying to find segmentation faults using only snippets is very difficult.

    Edit:
    After looking closer at your code I recommend that you stick with the standard C functions, fopen(), fclose(), fread() etc.. Also use malloc()/free() instead of xmalloc/free. Never cast NULL, if you get an error message because of your use of NULL then you should be using something other than NULL. Don't cast the return value from malloc() in a C program.

    Also have you run this program through your debugger? Your debugger should be able to tell you exactly where it detected the error and allow you to view the variables at the time of the crash to help determine the cause of the crash.

    Jim
    Last edited by jimblumberg; 07-07-2012 at 09:15 AM.

  3. #3
    Registered User
    Join Date
    Jul 2012
    Location
    Michigan U.P.
    Posts
    20
    JimBlumBerg: As it happens, "xmalloc()" is simply a suggested wrapper for "malloc()". Sticking with the "fxxxx()" file I/O calls was attempted, resulting in many hours of troubleshooting to no avail. Not that I'm a 'pro' C programmer, but I did use the debugging features of Code::Blocks plus "valgrind", and to my eyes never saw the exact location of the error.

    This project entails - as of this writing - main.c/fio.c/other.c/defines.h files. I guess I could have posted them all here, but as I mention in the first post, this is for brief example only. I'm glad to forward the entire source to anybody who's really fighting their project. If you are in that class, let me know and I'll send you all the code I have. Also, it's not finished, so the code may change.

  4. #4
    Registered User
    Join Date
    May 2010
    Posts
    4,608
    As it happens, "xmalloc()" is simply a suggested wrapper for "malloc()"
    By whom? Stick with the standard C features.
    I'm glad to forward the entire source to anybody who's really fighting their project.
    Why would anyone who is fighting their own project want to bother with code you are saying is broken?
    Sticking with the "fxxxx()" file I/O calls was attempted, resulting in many hours of troubleshooting to no avail. Not that I'm a 'pro' C programmer, but I did use the debugging features of Code::Blocks plus "valgrind", and to my eyes never saw the exact location of the error.
    As I said stick to the standard C features, and post the smallest complete program that illustrates your problem. It doesn't seem like switching to the non-standard functions has changed much, unless you are now saying the program is working correctly.

    Jim

  5. #5
    Registered User
    Join Date
    Jul 2012
    Location
    Michigan U.P.
    Posts
    20
    Quote Originally Posted by jimblumberg View Post
    Why would anyone who is fighting their own project want to bother with code you are saying is broken?
    The coding created so far works fine. I'll test more extensively when the project nears completion.

    Quote Originally Posted by jimblumberg View Post
    As I said stick to the standard C features, and post the smallest complete program that illustrates your problem.
    Understood about your "stick to standard C features" opinion. This was the smallest segment of code I felt would illustrate the thread's focus.

    Quote Originally Posted by jimblumberg View Post
    It doesn't seem like switching to the non-standard functions has changed much, unless you are now saying the program is working correctly.
    It works correctly, but needs more code to accomplish the project goal.

    Perhaps - as a new poster to this Board - I don't know the finer points of posting etiquette here. But if this is the typical reception for people sharing a problem and THEIR solution, I won't be posting anything else, any time soon.

    All I've done is volunteered my info. Anybody can critique or recommend better. However, I don't appreciate my chops being busted. Anybody who doesn't like this info or finds it too problematic for them may want to just move along.

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,347
    It seems to me that the gist of this thread is the suggestion that if you are programming for Linux specifically and seem unable to fix a segmentation fault that you discovered is somehow due to "an apparent conflict between malloc'ed memory and fclose()", then consider if you should "replace fopen, fread/getc, fclose with open, read, close (lower-level calls)" as a last resort.

    Beyond that the example itself does not appear interesting. What would be more interesting is the code that resulted in the segfault, e.g., maybe it turns out that keeping the standard C functions would work if some other mistake was fixed. However, since it is apparently infeasible to post that, there really is not much else to talk about besides pointing out that perhaps the problem lies elsewhere, hence switching from standard C to a lower level API merely suppresses the symptoms of the problem, for now.
    Last edited by laserlight; 07-07-2012 at 10:49 AM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    Registered User
    Join Date
    Jun 2011
    Posts
    4,508
    Quote Originally Posted by gandsnut View Post
    Perhaps - as a new poster to this Board - I don't know the finer points of posting etiquette here. But if this is the typical reception for people sharing a problem and THEIR solution, I won't be posting anything else, any time soon.

    All I've done is volunteered my info. Anybody can critique or recommend better. However, I don't appreciate my chops being busted. Anybody who doesn't like this info or finds it too problematic for them may want to just move along.
    Don't mistake blunt directness for rudeness. There really wasn't any "chop busting" here - trust me, you'll know it when you see it.

    There is a degree of attitude from time to time, but that's the nature of the beast. Don't let it get to you, and don't let it detract from the value of these forums. If you're serious about programming, then you will learn a lot here. But you would have to understand and accept the culture here, first.

  8. #8
    Registered User
    Join Date
    Jul 2012
    Location
    Michigan U.P.
    Posts
    20
    Quote Originally Posted by laserlight View Post
    ... are programming for Linux specifically and seem unable to fix a segmentation fault that you discovered is somehow due to "an apparent conflict between malloc'ed memory and fclose()", then consider if you should "replace fopen, fread/getc, fclose with open, read, close (lower-level calls)" as a last resort.
    You got it.

    Quote Originally Posted by laserlight View Post
    Beyond that the example itself does not appear interesting. What would be more interesting is the code that resulted in the segfault, e.g., maybe it turns out that keeping the standard C functions would work if some other mistake was fixed. However, since it is apparently infeasible to post that, there really is not much else to talk about besides pointing out that perhaps the problem lies elsewhere, hence switching from standard C to a lower level API merely suppresses the symptoms of the problem, for now.
    I couldn't agree more.

    When I racked up 7-8 hours debugging and researching, I went looking for alternatives. This is a one-off program for personal use. If Joe Blow finds it helpful and it saves his day, then I'm gratified. If Sam Gee has a better solution, to him I suggest he post his code.

    I titled this thread "possible work-around", not "the best solution using perfect coding practice". As I've mentioned, perhaps I'm not yet "with the program" on this message Board.

  9. #9
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    38,714
    > n = read(fhandle, bufptr, pd.cur_fsize);
    So I'm guessing from the text that you're going to treat this large block of memory as a string, and try and use various str... functions on it.

    In which case, you really ought to put a \0 on the end of this array (and allow space for it as well).

    Allocated memory is often \0 filled (the first time it is allocated by the OS), which can lead to things like this superficially working.
    But after a period of allocating and freeing, the local memory pool is filled with old junk, and your luck runs out when there is no longer any accidental \0's in the right place.

    Another useful debug tool would be https://en.wikipedia.org/wiki/Electric_Fence
    This causes traps as soon as you run off the end of your allocation, or when you try to access something you freed earlier on.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  10. #10
    Registered User
    Join Date
    Jul 2012
    Location
    Michigan U.P.
    Posts
    20
    Quote Originally Posted by Salem View Post
    Another useful debug tool would be https://en.wikipedia.org/wiki/Electric_Fence This causes traps as soon as you run off the end of your allocation, or when you try to access something you freed earlier on.
    I saw this but since it was quick to add a 'valgrind' plug-in to Code::blocks, I didn't research EF further...

    Quote Originally Posted by Salem View Post
    > n = read(fhandle, bufptr, pd.cur_fsize);
    So I'm guessing from the text that you're going to treat this large block of memory as a string, and try and use various str... functions on it.
    Sort of. Your Q: strikes at the heart of the project. Believe me, I'm open for constructive suggestions.

    Problem: MANY thousands of emails in the form of .eml, .msg and .txt files. I want them migrated into a database so I can delete/.ZIP the raw files and recover disk space. In the bargain, the database will provide search access to the emails content.

    Observation: .eml / .msg / .txt (containing an email message) all have To:, From:, Subject:, Date:, and (frequently) Cc: "fieldnames" with a data string following:

    To: Joe Blow <jb@aol.com>
    From: Sally Stuff <sstuff@yahoo.com>
    Subject: Barbeque next Saturday @ 2pm, Frank's barn
    Date: June 30, 2012
    Cc: Wilma Blue <wilmablue@msn.com>

    Typically, in an .eml file, there's all sorts of additional info that sometimes is very important to the message, other times is incidental. Then there's the body of the email, which could be text alone, embedded documents or graphics, or something else.

    There's little standardization for line order in an .eml file. The 5 headers above rarely appear next to each other.

    My Approach (for good or bad): Write a 'C' program that does the following:

    1) Opens/scans a directory that has .eml, .msg or .txt files,
    2) Opens each file and seeks out the 5 headers, and parses out the related data,
    3) Closes the fopen() on the file just scanned, and opens a new file handle using "open()" with intent to "read()" the entire file into a block of memory.
    4) At this stage, all the content of the .eml AND the 5 headers are in hand, and are inserted into a row in a MySQL database,
    5) Repeat 2) - 4) until the directory is all scanned.

    For my needs, I don't need to parse out the attachements/message body/email header garbage. Once in the MySQL database, the emails are archived and can be searched based on the 5 headers.

    Say I want to find emails from Sally Stuff before a certain email send date. Those elements are fields in a database, and can be queried specifically. Then, if I need to see the content of the email, the MySQL DB holds the entire email in a LONGBLOB field.

    The biggest challenge here is vacuuming a whole .eml file into one C program memory assignment. If the .eml file has an embedded graphic, it could have all variety of byte values, like CR/LF/EOF etc.

    As such, delimited file formats like CSV won't work, because the delimiters can also appear in the data. How does one import data to the database without delimiters?

    I felt a string (of char) was not going to work based on the above. Therefore, I've taken to malloc() a memory block equal to the size of the .eml file, of type "int".

    Presumably, an .eml file of 126,029 bytes (each stored in an 'int') will import to the database's LONGBLOB field without any problem.

    ========

    I'm nearing the final step of adding the embedded SQL code to insert the row data. In a different C program for testing, I can open, read and write simple data to/from MySQL. Next to do is to merge the .eml file-reading/parsing, and the database access coding.

    Is this the right or wrong way to accomplish the goal? Well, it's the way I've decided to go. Respectfully, if somebody comes back with a "dude, you're doing it all wrong" response, it does me no good. If somebody offers "hey, I know code that will accomplish this", I'll look at the suggestion right away.

    When the program works to my satisfaction, I'll release it publicly for anybody, because I have searched for weeks for this kind of solution.

    Best, GnS.

  11. #11
    Registered User
    Join Date
    Dec 2007
    Posts
    2,676
    It sounds like you're working with mbox format, which is well-defined. In addition, the contents of these files should be textual; all images, documents, etc. should be encoded within the MIME parts of the message. I'm not sure I would read an entire file into memory at once unless time was of the essence and memory were freely-available; better to grab one message at a time, process, and continue to the next.

    Speaking as one who has done this a few times, in different programming languages.

  12. #12
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    I'm very curious: I'm a hardline Linux programmer, and I've never heard about an issue with dynamic memory allocation and fclose(). It sounds something like a Microsoft employee might claim.. FUD or just noise. Care to provide any details? Or did you do your 'net search using Bing only?

    I have only seen fclose() segfault because the application garbled the internal state of the FILE descriptor. That has nothing to do with fclose(); it is just an innocent victim. The real problem is usually a buffer overrun much earlier in the code.

    Other than that, I'm on the same lines as rags_to_riches, except that I suggest reading each header and body separately.

    Consider something like the following signature. I'm using weird indentation to hopefully make it easier to read:
    Code:
    #ifndef   MAILBOX_H
    #define   MAILBOX_H
    
    typedef  struct mailbox   mailbox;
    
    mailbox *mailbox_open    (const char *filename);
    
    int      mailbox_close   (mailbox *mailbox);
    
    int      mailbox_headers (mailbox *mailbox,
                              char   **data,
                              size_t  *size);
    
    int      mailbox_body    (mailbox *mailbox,
                              char   **data,
                              size_t  *size);
    
    #endif /* MAILBOX_H */
    The mailbox_headers() function reads the headers for the next message in the mailbox stream. In some cases -- mainly damaged archives and filesystems -- the files may contain embedded NUL bytes, so I recommend using a char pointer and length as above.

    If the message is not interesting, a followup mailbox_headers() call will skip the message body, and get the headers for the next message.

    To get the message body, one would just need to call mailbox_body() after calling mailbox_headers(). (Due to the format and likely implementation, calling mailbox_body() repeatedly would most likely just return an empty message body.)

    There are a number of ways one might implement the above interface in Linux.

    The simplest choice is POSIX.1-2008 getline(), which reads arbitrary-length lines, allocating/reallocating the buffer as needed. It takes about 400 to 500 lines to fully implement the above. It is, relatively speaking, a bit slow however.

    I would personally use unistd.h low-level I/O. It is fast and works for all situations. Memory mapping is slightly faster and easier to implement, but only works for normal files; on 32-bit architectures the map size is limited, and working around that by windowing the data produces quite complicated code. I frequently use memory mapping, but on 64-bit architectures only; windowing is just not worth the effort.

    With a bit of extra effort, you could add xz decoding support directly into the implementation (see here for reference). That would allow you keep the original mailboxes archived using xz -9 mailbox, and the library would internally decompress the data when read. (For tar and zip archives, I recommend using a shell script frontend instead, which decompresses the archives to standard output, fed to the actual application. Much easier to maintain in the long run.)

    If you find the low-level details a bit too arcane, I might be able to whip up a Linux implementation if you're seriously interested.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Segmentation faults
    By C_Enthuaist in forum C Programming
    Replies: 4
    Last Post: 06-17-2010, 06:47 AM
  2. segmentation faults
    By movie55 in forum C Programming
    Replies: 8
    Last Post: 09-22-2009, 05:31 AM
  3. Segmentation faults out the ****
    By Ubber_C_Noob in forum C Programming
    Replies: 26
    Last Post: 09-11-2005, 10:25 AM
  4. segmentation faults?
    By salsa in forum C Programming
    Replies: 4
    Last Post: 10-08-2004, 09:11 AM
  5. Segmentation faults!
    By Chris Gat in forum C Programming
    Replies: 5
    Last Post: 07-15-2002, 04:54 PM