Reading a Linux mailbox in C

This is a discussion on Reading a Linux mailbox in C within the C Programming forums, part of the General Programming Boards category; This has kind of frustrated me for a few hours of searching. We currently have a php script that checks ...

  1. #1
    Registered User
    Join Date
    Sep 2009
    Posts
    5

    Reading a Linux mailbox in C

    This has kind of frustrated me for a few hours of searching. We currently have a php script that checks a mailbox for new messages and parses those out, after which we do some processing with the data. In php its super simple to use the mail function to do this. However for scalability in the next stage of this project I am going to convert the email reader to a C daemon that just reads the mail and launched a child php process to handle the message data.

    I am having a hard time finding how to emulate the functionity of the php mail function. It seems that nothing in C exists that is simple and easy to use. I have found the UW IMAP c client UW IMAP software--IMAP Information Center. Not sure if this is what I am looking for.

    Any insight or help into the matter is greatly appreciated!

  2. #2
    Registered User
    Join Date
    Sep 2009
    Posts
    5
    Anyone parse emails in C before?

    This question is stumping me, and seems to be stumping the members here as well.

    Thanks!

  3. #3
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by P3R3 View Post
    This has kind of frustrated me for a few hours of searching. We currently have a php script that checks a mailbox for new messages and parses those out, after which we do some processing with the data. In php its super simple to use the mail function to do this. However for scalability in the next stage of this project I am going to convert the email reader to a C daemon that just reads the mail and launched a child php process to handle the message data.

    I am having a hard time finding how to emulate the functionity of the php mail function. It seems that nothing in C exists that is simple and easy to use. I have found the UW IMAP c client UW IMAP software--IMAP Information Center. Not sure if this is what I am looking for.

    Any insight or help into the matter is greatly appreciated!
    Yes, there are a slew of perl modules for this too. AFAIK, there is no such thing as a "linux mailbox", I know I use the mbox format, which may be the most common, but it is not unique to linux (ie, don't bark up the wrong tree).

    I do not think that link you posted will be of much help to you.

    However, if you look at a mailbox it is actually one big text file, so this should not be hard to do just by scrutinizing "it's" structure, which is mostly defined by the SMTP headers present. After all, that is all php & perl have to work with -- they are just parsing a text file.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  4. #4
    a_capitalist_story
    Join Date
    Dec 2007
    Posts
    2,641
    You mention IMAP. Is it an IMAP server to which you're connecting? You don't seem clear on that. What is the PHP functionality you're using to do this now?

    The PHP mail function *sends* mail, by the way; it's not a function that reads mail, so I'm a little confused about what you're doing.

  5. #5
    Registered User
    Join Date
    Sep 2009
    Posts
    5
    Quote Originally Posted by rags_to_riches View Post
    You mention IMAP. Is it an IMAP server to which you're connecting? You don't seem clear on that. What is the PHP functionality you're using to do this now?

    The PHP mail function *sends* mail, by the way; it's not a function that reads mail, so I'm a little confused about what you're doing.
    imap_open is the function I use to connect to the mailbox, imap_check to get infor about the mailbox, and imap_fetch_overview to read the mail, imap_header_info to get all the information, and imap_check_structure to see if an image is in the message. Sorry I misspoke on the original post.


    I just need to write a service that loops through incoming mail on a certain mailbox and find emails with valid images in the message I then will spawn a php script process each image.

  6. #6
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by P3R3 View Post
    I just need to write a service that loops through incoming mail on a certain mailbox and find emails with valid images in the message I then will spawn a php script process each image.
    Okay, so you are examining the box, and not connecting to the mail server, right? As I say, look at the box -- it is just a plain text file you can parse yourself.

    Here's a clue: messages that contain images will have a line like this:
    Code:
    Content-Type: multipart/mixed;
    And the image part of the message will be introduced like this:

    Code:
    Content-Type: image/jpeg;
    There are some complex considerations you will probably have to learn as you go along -- this is NOT a totally trivial task. BTW, if you see stuff like this:

    Code:
    /9j/4SIERXhpZgAATU0AKgAAAAgADAEPAAIAAAAGAAAAngEQAAIAAAAVAAAApAESAAMAAAABAAEA
    AAESAAMAAAABAAEAAAEaAAUAAAABAAAAugEbAAUAAAABAAAAwgEoAAMAAAABAAIAAAExAAIAAAAO
    AAAAygEyAAIAAAAUAAAA2AE8AAIAAAAQAAAA7AITAAMAAAABAAEAAIdpAAQAAAABAAAA/AAAB3BD
    YW5vbgBDYW5vbiBQb3dlclNob3QgQTUxMAAAALQAAAABAAAAtAAAAAEAAFF1aWNrVGltZSA3LjQA
    MjAwODowMjowNyAxMDoyODoxNQBNYWMgT1MgWCAxMC40LjkAAB6CmgAFAAAAAQAAAmqCnQAFAAAA
    AQAAAnKQAAAHAAAABDAyMjCQAwACAAAAFAAAAnqQBAACAAAAFAAAAo6RAQAHAAAABAECAwCRAgAF
    That is image data.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  7. #7
    Registered User
    Join Date
    Sep 2009
    Posts
    5
    Quote Originally Posted by MK27 View Post
    Okay, so you are examining the box, and not connecting to the mail server, right? As I say, look at the box -- it is just a plain text file you can parse yourself.

    Here's a clue: messages that contain images will have a line like this:
    Code:
    Content-Type: multipart/mixed;
    And the image part of the message will be introduced like this:

    Code:
    Content-Type: image/jpeg;
    There are some complex considerations you will probably have to learn as you go along -- this is NOT a totally trivial task. BTW, if you see stuff like this:

    Code:
    /9j/4SIERXhpZgAATU0AKgAAAAgADAEPAAIAAAAGAAAAngEQAAIAAAAVAAAApAESAAMAAAABAAEA
    AAESAAMAAAABAAEAAAEaAAUAAAABAAAAugEbAAUAAAABAAAAwgEoAAMAAAABAAIAAAExAAIAAAAO
    AAAAygEyAAIAAAAUAAAA2AE8AAIAAAAQAAAA7AITAAMAAAABAAEAAIdpAAQAAAABAAAA/AAAB3BD
    YW5vbgBDYW5vbiBQb3dlclNob3QgQTUxMAAAALQAAAABAAAAtAAAAAEAAFF1aWNrVGltZSA3LjQA
    MjAwODowMjowNyAxMDoyODoxNQBNYWMgT1MgWCAxMC40LjkAAB6CmgAFAAAAAQAAAmqCnQAFAAAA
    AQAAAnKQAAAHAAAABDAyMjCQAwACAAAAFAAAAnqQBAACAAAAFAAAAo6RAQAHAAAABAECAwCRAgAF
    That is image data.
    I was hoping there would be some class available to do this, looks like I am out of luck?

  8. #8
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by P3R3 View Post
    I was hoping there would be some class available to do this, looks like I am out of luck?
    If you haven't found one. Looks like I actually started a thread of this sort a while ago:

    jpeg data in mbox files

    You should read that if you intend to do this, there is an important point about how image data in email is encoded in base64, which is not the way it is in a normal image file.

    But I would imagine most scripting languages (eg, perl, python, VB) have a quick and simple way to do this (I ended up writing a script in perl that rips all the jpeg images out of an mbox and saves them to files in a seperate directory; it is about 50 lines).

    Of course, php and perl are both written in C, they are just saving you from a lot of labour intensive activity. You may want to reconsider whether you have to use C here -- altho php may not be feasible, I am pretty sure that in the end, the C version would not have much advantage over the perl version. If you are working on windows, probably VB script could do the job.

    On the other hand, if your C skills are up to it, it is still just a matter of parsing the box properly. Slightly tedious, but you will probably learn a bunch of stuff about SMTP protocols in the process, and how to work with image data.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  9. #9
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,161
    Long ago when I wrote an AI system for stopping spam email, I used the UW C client and found it to be perfectly sufficient for my task. I'd stick with that unless hit some limitation.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  10. #10
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by brewbuck View Post
    Long ago when I wrote an AI system for stopping spam email, I used the UW C client and found it to be perfectly sufficient for my task. I'd stick with that unless hit some limitation.
    Okay, I looked at that and it seems to me you are talking about a filter that connects to the mail server, and not a program that goes thru mail that has already been delivered and placed in a box.

    [later] alright, it does refer to "Mailbox Access Functions" in http://www.washington.edu/imap/docum...ernal.txt.html. But IMO those functions are few and limited. I'd observe that the documentation

    REVISED: 19 August 1996
    may pre-date multi-part mail. Certainly, I don't see any indication that this is a consideration in the API, so I doubt it will make this task much easier (but it at least does provide functions to access individual messages in the box).
    Last edited by MK27; 09-25-2009 at 01:55 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  11. #11
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,161
    Quote Originally Posted by MK27 View Post
    multi-part mail. Certainly, I don't see any indication that this is a consideration in the API, so I doubt it will make this task much easier (but it at least does provide functions to access individual messages in the box).
    After some good old recollection, I remember that the reason I used C client was because the mbox format, although it looks pretty simple, actually has some really strange cases that do occur in practice. I spent several days trying to implement a reliable mbox parser, but whenever I threw a real-world mbox file at it there was always something goofy. After that I decided to just use a tried-and-true implementation, even if it was rather limited.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  12. #12
    a_capitalist_story
    Join Date
    Dec 2007
    Posts
    2,641
    I suppose it really comes down to this: the PHP IMAP implementation is most likely just a wrapper around the UW IMAP C API. What are you buying by rewriting this in C? How is the scalability improved by using a C daemon, versus a cron job that uses the PHP functions?

    If you can truly justify the work for the scalability issue, then fine, but I would bet the time spent would not justify the meager gains. If this is just about "purity," or not-invented-here syndrome, do yourself a favor: don't be a language snob. If you've got something that works, just stick with it!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. WIN32 API and Linux... *shudder
    By parad0x13 in forum C++ Programming
    Replies: 4
    Last Post: 07-24-2008, 07:27 PM
  2. Replies: 8
    Last Post: 03-10-2008, 12:08 PM
  3. Linux for Windows!
    By Strut in forum Linux Programming
    Replies: 2
    Last Post: 12-25-2002, 10:36 AM
  4. Linux? Windows Xp?
    By VooDoo in forum Linux Programming
    Replies: 15
    Last Post: 07-31-2002, 08:18 AM
  5. linux vs linux?
    By Dreamerv3 in forum Linux Programming
    Replies: 5
    Last Post: 01-22-2002, 08:39 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21