Parsing using strtok() and sscanf()

This is a discussion on Parsing using strtok() and sscanf() within the C Programming forums, part of the General Programming Boards category; I am trying to parse a very large string one line at a time and then parse each individual line. ...

  1. #1
    Registered User
    Join Date
    Nov 2007
    Posts
    96

    Parsing using strtok() and sscanf()

    I am trying to parse a very large string one line at a time and then parse each individual line. I know the set up of the first line which is "HTTP/1.1 200 OK" and I am successfully able to use strtok() and read in the line. However, when I try to use sscanf to further parse this line I receive a seg. fault. Here is the code which I have.

    Code:
          dummy = strtok(message, "\n");
          fprintf(stdout, "%s\n", dummy);
          sscanf(dummy, "%*s %s %*s", code);
          fprintf(stdout, "%s\n", code);
    or is there an easier way to do this?

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    What are you trying to accomplish here? What is message, are you really only using strtok to remove the \n, and what do you expect to happen?

  3. #3
    Registered User
    Join Date
    Nov 2007
    Posts
    96
    I was just trying to read in a line at a time. and then parse the line. Is strtok() removing the '\n' and causing the sscanf() to not be able to parse? I just would like to be able to parse each line of a HTTP response and assign the code value, server information, content-length, and such.

  4. #4
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    If you don't know why you're using strtok, why should we?

    What strtok does, is look for any of the characters given in the last argument, and replaces them with string-terminating \0 characters. So the first \n in the string gets replaced with \0 and a pointer to the front of the string is returned.

    That won't cause a segfault that I can see, unless message is a string literal (possible based on what you're saying). However, it does not go get any new data from anywhere. And sscanf won't ever segfault, I'm pretty sure, it will just return a matching error if there aren't three strings in the dummy string to match on.

  5. #5
    Registered User
    Join Date
    Nov 2007
    Posts
    96
    What function would you recommend that I use to parse this in such a way?

  6. #6
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    Given that you haven't yet stated what you want to do in any way at all, I have no idea what I would use to parse your input.

    If all you want is the "middle" of the line, as somewhat indicated by your sscanf statement, then I would use fgets to read the line, sscanf to get one of the strings out of the middle, lather, rinse, repeat.

  7. #7
    Registered User
    Join Date
    Nov 2007
    Posts
    96
    I already have the input read into a string of characters so I don't think fgets would work here as that reads in from a stream. Is there another function that I could use in a similar manner as fgets but would read from a char*

    What I would like to do is take the input of:

    HTTP/1.0 200 OK
    Date: Fri, 31 Dec 1999 23:59:59 GMT
    Content-Type: text/html
    Content-Length: 1354

    and parse that so:

    code = 200
    date = Fri, 31 Dec 1999 23:59:59 GMT
    Content-Type = text/html
    Content-Length = 1354

  8. #8
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    Can we maybe get some context here? What does message contain when you print it out, and what do you want to get out of the message itself?

  9. #9
    Registered User
    Join Date
    Nov 2007
    Posts
    96
    Here is the message which I store to my message variable when I receive it from the server:

    Code:
    HTTP/1.0 200 OK
    Cache-Control: private, max-age=0
    Date: Thu, 12 Feb 2009 19:58:51 GMT
    Expires: -1
    Content-Type: text/html; charset=ISO-8859-1
    Set-Cookie: PREF=ID=6269c8db0cc2c042:TM=1234468731:LM=1234468731:S=pkuulnOa71dLbRRN; expires=Sat, 12-Feb-2011 19:58:51 GMT; path=/; domain=.google.com
    Server: gws
    
    <html><head><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"><title>Google</title><script>var _gjwl=location;function _gjuc(){var a=_gjwl.hash;if(a.indexOf("&q=")>0||a.indexOf("#q=")>=0){a=a.substring(1);if(a.indexOf("#")==-1){for(var c=0;c<a.length;){var d=c;if(a.charAt(d)=="&")++d;var b=a.indexOf("&",d);if(b==-1)b=a.length;var e=a.substring(d,b);if(e.indexOf("fp=")==0){a=a.substring(0,c)+a.substring(b,a.length);b=c}else if(e=="cad=h")return 0;c=b}_gjwl.href="search?"+a+"&cad=h";return 1}}return 0};
    window._gjuc && location.hash && _gjuc();</script><style>body,td,a,p,.h{font-family:arial,sans-serif}.h{color:#36c;font-size:20px}.q{color:#00c}.ts td{padding:0}.ts{border-collapse:collapse}#gbar{height:22px;padding-left:2px}.gbh,.gbd{border-top:1px solid #c9d7f1;font-size:1px}.gbh{height:0;position:absolute;top:24px;width:100%}#gbi,#gbs{background:#fff;left:0;position:absolute;top:24px;visibility:hidden;z-index:1000}#gbi{border:1px solid;border-color:#c9d7f1 #36c #36c #a2bae7;z-index:1001}#guser{padding-bottom:7px !important}#gbar,#guser{font-size:13px;padding-top:1px !important}@media all{.gb1,.gb3{height:22px;margin-right:.73em;vertical-align:top}#gbar{float:left}}.gb2{display:block;padding:.2em .5em}a.gb1,a.gb2,a.gb3{color:#00c !important}.gb2,.gb3{text-decoration:none}a.gb2:hover{background:#36c;color:#fff !important}</style><script>window.google={kEI:"e3-USffgKpa4sgPfxPisBw",kEXPI:"17259,19634",kHL:"en"};
    google.y={};google.x=function(e,g){google.y[e.id]=[e,g];return false};window.gbar={};(function(){var b=window.gbar,f,h;b.qs=function(a){var c=window.encodeURIComponent&&(document.forms[0].q||"").value;if(c)a.href=a.href.replace(/([?&])q=[^&]*|$/,function(i,g){return(g||"&")+"q="+encodeURIComponent(c)})};function j(a,c){a.visibility=h?"hidden":"visible";a.left=c+"px"}b.tg=function(a){a=a||window.event;var c=0,i,g=window.navExtra,d=document.getElementById("gbi"),e=a.target||a.srcElement;a.cancelBubble=true;if(!f){f=document.createElement(Array.every||window.createPopup?"iframe":"div");f.frameBorder="0";f.src="#";d.parentNode.appendChild(f).id="gbs";if(g)for(i in g)d.insertBefore(g[i],d.firstChild).className="gb2";document.onclick=b.close}if(e.className!="gb3")e=e.parentNode;do c+=e.offsetLeft;while(e=e.offsetParent);j(d.style,c);f.style.width=d.offsetWidth+"px";f.style.height=d.offsetHeight+"px";j(f.style,c);h=!h};b.close=function(a){h&&b.tg(a)}})();</script></head><body bgcolor=#ffffff text=#000000 link=#0000cc vlink=#551a8b alink=#ff0000 onload="document.f.q.focus();if(document.images)new Image().src='/images/nav_logo3.png'" topmargin=3 marginheight=3><div id=gbar><nobr><b class=gb1>Web</b> <a href="http://images.google.com/imghp?hl=en&tab=wi" onclick=gbar.qs(this) class=gb1>Images</a> <a href="http://maps.google.com/maps?hl=en&tab=wl" onclick=gbar.qs(this) class=gb1>Maps</a> <a href="http://news.google.com/nwshp?hl=en&tab=wn" onclick=gbar.qs(this) class=gb1>News</a> <a href="http://www.google.com/prdhp?hl=en&tab=wf" onclick=gbar.qs(this) class=gb1>Shopping</a> <a href="http://mail.google.com/mail/?hl=en&tab=wm" class=gb1>Gmail</a> <a href="http://www.google.com/intl/en/options/" onclick="this.blur();gbar.tg(event);return !1" class=gb3><u>more</u> <small>▼</small></a><div id=gbi> <a href="http://video.google.com/?hl=en&tab=wv" onclick=gbar.qs(this) class=gb2>Video</a> <a href="http://groups.google.com/grphp?hl=en&tab=wg" onclick=gbar.qs(this) class=gb2>Groups</a> <a href="http://books.google.com/bkshp?hl=en&tab=wp" onclick=gbar.qs(this) class=gb2>Books</a> <a href="http://scholar.google.com/schhp?hl=en&tab=ws" onclick=gbar.qs(this) class=gb2>Scholar</a> <a href="http://finance.google.com/finance?hl=en&tab=we" onclick=gbar.qs(this) class=gb2>Finance</a> <a href="http://blogsearch.google.com/?hl=en&tab=wb" onclick=gbar.qs(this) class=gb2>Blogs</a> <div class=gb2><div class=gbd></div></div> <a href="http://www.youtube.com/?hl=en&tab=w1" onclick=gbar.qs(this) class=gb2>YouTube</a> <a href="http://www.google.com/calendar/render?hl=en&tab=wc" class=gb2>Calendar</a> <a href="http://picasaweb.google.com/home?hl=en&tab=wq" onclick=gbar.qs(this) class=gb2>Photos</a> <a href="http://docs.google.com/?hl=en&tab=wo" class=gb2>Documents</a> <a href="http://www.google.com/reader/view/?hl=en&tab=wy" class=gb2>Reader</a> <a href="http://sites.google.com/?hl=en&tab=w3" class=gb2>Sites</a> <div class=gb2><div class=gbd></div></div> <a href="http://www.google.com/intl/en/options/" class=gb2>even more &raquo;</a></div> </nobr></div><div class=gbh style=left:0></div><div class=gbh style=right:0></div><div align=right id=guser style="font-size:84%;padding:0 0 4px" width=100%><nobr><a href="/url?sa=p&pref=ig&pval=3&q=http://www.google.com/ig%3Fhl%3Den%26source%3Diglk&usg=AFQjCNFA18XPfgb7dKnXfKz7x7g1GDH1tg">iGoogle</a> | <a href="https://www.google.com/accounts/Login?continue=http://www.google.com/index.html&hl=en">Sign in</a></nobr></div><center><br clear=all id=lgpd><a href="/search?q=charles+darwin&hl=en&ct=charlesdarwin_09&oi=ddle"><img src=/logos/charlesdarwin_09.gif width=320 height=130 border=0 alt="Charles Darwin's 200th Birthday" title="Charles Darwin's 200th Birthday"></a><br><br><form action="/search" name=f><table cellpadding=0 cellspacing=0><tr valign=top><td width=25%>&nbsp;</td><td align=center nowrap><input name=hl type=hidden value=en><input type=hidden name=ie value="ISO-8859-1"><input autocomplete="off" maxlength=2048 name=q size=55 title="Google Search" value=""><br><input name=btnG type=submit value="Google Search"><input name=btnI type=submit value="I'm Feeling Lucky"></td><td nowrap width=25%><font size=-2>&nbsp;&nbsp;<a href=/advanced_search?hl=en>Advanced Search</a><br>&nbsp;&nbsp;<a href=/preferences?hl=en>Preferences</a><br>&nbsp;&nbsp;<a href=/language_tools?hl=en>Language Tools</a></font></td></tr></table></form><br><font size=-1><font color=red>New!</font> Explore the ocean in <a href="/aclk?sa=L&ai=CPYLtYn-USYXSAaKaNK_yiawGo6vgfIXRkr8Kwdmc2RMQASDBVFDLnZi__v____8BYMkGqgQJT9CV1bFTnwho&num=1&sig=AGiWqtwbSmBjb8sxEvdOcoQ2DdyPAYLIIA&q=http://earth.google.com/ocean/">Google Earth 5.0</a></font><br><br><br><font size=-1><a href="/intl/en/ads/">Advertising&nbsp;Programs</a> - <a href="/services/">Business Solutions</a> - <a href="/intl/en/about.html">About Google</a></font><p><font size=-2>&copy;2009 - <a href="/intl/en/privacy.html">Privacy</a></font></p></center></body><script>if(google.y)google.y.first=[];window.setTimeout(function(){var xjs=document.createElement('script');xjs.src='/extern_js/f/CgJlbhICdXMgACswCjgNLCswDjgELCswGDgDLA/oTKXc0xdkmY.js';document.getElementsByTagName('head')[0].appendChild(xjs)},0);google.y.first.push(function(){google.ac.i(document.f,document.f.q,'','')})</script><script>function _gjp() {!(location.hash && _gjuc()) && setTimeout(_gjp, 500);}window._gjuc && _gjp();</script></html>
    All I care about from this is using the variables I have named:

    Code:
    char *code
    char *server
    char *lastModified
    char *date
    char *contentType
    int contentlength
    char *body
    And assigning these values if they exist in the Header Lines of the HTTP response if they exist, otherwise I set them to unknown

    Code:
    HTTP/1.0 200 OK
    Cache-Control: private, max-age=0
    Date: Thu, 12 Feb 2009 19:58:51 GMT
    Expires: -1
    Content-Type: text/html; charset=ISO-8859-1
    Set-Cookie: PREF=ID=6269c8db0cc2c042:TM=1234468731:LM=1234468731:S=pkuulnOa71dLbRRN; expires=Sat, 12-Feb-2011 19:58:51 GMT; path=/; domain=.google.com
    Server: gws
    So from this I would want to assign:

    code to 200
    server to gws
    lastModified to unknown
    contentLength to unkown
    ..etc

    my body variable would contain the rest of the <html> and my message variable is holding the entire response currently

  10. #10
    a_capitalist_story
    Join Date
    Dec 2007
    Posts
    2,652
    Here is an example program:
    Code:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    const char *HEADERS = "HTTP/1.0 200 OK\r\n"
    "Cache-Control: private, max-age=0\r\n"
    "Date: Thu, 12 Feb 2009 19:50:54 GMT\r\n"
    "Expires: -1\r\n"
    "Content-Type: text/html; charset=ISO-8859-1\r\n"
    "Set-Cookie: cookie_data; expires=Sat, 12-Feb-2011 19:50:54 GMT; "
    "path=/; domain=.google.com\r\n"
    "Server: gws\r\n"
    "Connection: Close\r\n\r\n";
    
    int main()
    {
       char *hdrs[8] = { 0 };
       const char *start = HEADERS, *end = NULL, *eoh = NULL;
       unsigned int i = 0, j = 0;
       eoh = strstr(HEADERS, "\r\n\r\n");
       while ((end = strstr(start, "\r\n")) != NULL && end > start)
       {
          hdrs[i] = malloc(end - start + 1);
          strncpy(hdrs[i], start, end - start);
          hdrs[i][end - start] = 0;
          ++i;
          if (end == eoh)
             break;
          start = end + strlen("\r\n");
       }
    
       for (; j < i; ++j)
       {
          printf("%d - %s\n", j, hdrs[j]);
          free(hdrs[j]);
       }
       return 0;
    }

  11. #11
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    Alright. So, I would guess the idea would be to read a word at a time; if the word matches some magic key, then the next word/the rest of the line gets copied to your variable. Are you planning to have the char *s point into the message body itself, or allocate new memory for them?

  12. #12
    Registered User
    Join Date
    Nov 2007
    Posts
    96
    If you mean by the message body the actual html code, I am just having a char * that points to it and gets assigned the entire html and thats it as for that. The only other items I care about parsing are the header lines from the HTTP request

  13. #13
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,185
    Quote Originally Posted by NuNn View Post
    If you mean by the message body the actual html code, I am just having a char * that points to it and gets assigned the entire html and thats it as for that. The only other items I care about parsing are the header lines from the HTTP request
    That's not really the question, no. You have char* variables code, server, lastModified, etc. Are you planning to just make those point to where the data lives already, or get new memory and make a copy? If the former you can strtok (on spaces too, not just on new-line) your way through the thing, watching for your keywords (and remember that after the first call, you should use NULL for the first parameter of strtok), assuming the data is not const. If the second, then you can use strstr to find your keywords and go from there.

  14. #14
    Registered User
    Join Date
    Nov 2007
    Posts
    96
    I figured out a way I believe that works pretty effectively, thank you for your time and help though tabstop and rags_to_riches. Have a great one!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Strange problem - strtok
    By AngKar in forum C Programming
    Replies: 7
    Last Post: 04-23-2006, 08:36 PM
  2. sscanf() examples?
    By Axel in forum C Programming
    Replies: 6
    Last Post: 10-18-2005, 11:00 AM
  3. sscanf
    By paperbox005 in forum C Programming
    Replies: 2
    Last Post: 08-18-2004, 07:46 AM
  4. How can I free what strtok returns?
    By registering in forum C Programming
    Replies: 3
    Last Post: 06-24-2003, 05:56 PM
  5. Using strtok and sscanf
    By scaven in forum C Programming
    Replies: 5
    Last Post: 04-15-2003, 12:45 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21