Thread: String Search

  1. #1
    Registered User
    Join Date
    Aug 2005
    Posts
    56

    String Search

    Hello, I am paticularly writing code that will find a certain string but stop, when it needs to. For example, if I am searching for a URL within an html source code file. I want to find "http://" and then end the string at the last quotation mark.

    Say I want to find all the URLS in the html source code, but there are so many different ones, I can't just search for a certain URL, or it will only get that URL, and I want all. So I would search for "http://" and then have it stop at the last quote in the URL; then it would keep going. All I can get is the "http://" then it keeps going. I don't understand how I can accomplish what I am trying to do.


    I have...

    Code:
    while(fgets(buffer,MAXCHAR,fp1))
      {
         if( (a = strstr(buffer,text)) != NULL)
           {
           match++;      //inc. match count
           printf("The URL is %s\n", buffer);
           }
       }
    "text" is the the string I'm looking for. However, I am having trouble figuring out how to make buffer read in the characters after the text it finds, and then stop after it hits the last quotation mark of the URL.

    (sorry I'm saying the same thing over and over again.)

    I appreciate any help, it would be really great.

  2. #2
    Registered User Jaqui's Avatar
    Join Date
    Feb 2005
    Posts
    416
    what you are actually searching for is ' " " '

    the " not the http://www.somedomain.whatever

    search for the opening quote, when it finds it, extract all text until it finds the closing quote.
    then, from that point in the file, search until next opening quote.

    but exclude img src= unless you also want the url of the images linked.
    Quote Originally Posted by Jeff Henager
    If the average user can put a CD in and boot the system and follow the prompts, he can install and use Linux. If he can't do that simple task, he doesn't need to be around technology.

  3. #3
    Registered User
    Join Date
    Aug 2005
    Posts
    56
    Actually, that was also one of my ideas to search for the quotes, but my problem is, I don't know how to code it. I don't even know how to represent a quote because surely " " " is not valid. I'd have to use ascii character number 34, but I don't know how to do this. Any help would be great.

  4. #4
    Registered User Tonto's Avatar
    Join Date
    Jun 2005
    Location
    New York
    Posts
    1,465
    "\""

  5. #5
    Registered User
    Join Date
    Aug 2005
    Posts
    56
    Oh, I see. Thanks. So I would seek for \". But, now I don't understand the code for extracting the string between the quotes. What I want to do is, harvest this URL, and stuff it into an array of strings. I can't figure out a way to pull this off in program code.

  6. #6
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    You don't need to use strstr, since you're searching for a string that's one char long. Try strchr.
    Code:
    char *p, *end, *s = "<img src=\"hippo.png\" />";
    
    p = strchr(s, '"');
    end = strchr(++p, '"');
    
    while(p != end) putchar(*p++);
    The output for that should be
    Code:
    hippo.png
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  7. #7
    Registered User
    Join Date
    Aug 2005
    Posts
    56
    Okay, thank you. Yeah this method works. However, now what I want is that instead of the characters being printed out with putchar(), how can I assign them into a string?

    So say we use the last example of "hippo.png."

    Would I have to load the characters into an array to make it a string, or is can this be done with pointers? What I want is that the URL gets put into a string, then printed out from there. Because what I want is an array of strings that are all assigned a number. Also, will this method work if I want more than one string between the quotes? Say, that in the file there are many URL's within quotes. I want to take ALL of these strings and stuff them into an array.

    I have this all planned out in my head, but I need some help with actaully coding it. I appreciate any help.

  8. #8
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    You do realise that you said "I want" 5 times in that last post? It makes it hard for the reader to understand exactly what you're asking for! You need to think about your requirements first, then work out how to achieve them. Clarity is key!
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

  9. #9
    Registered User
    Join Date
    Aug 2005
    Posts
    56
    Allright, this is what I want. I am creating a random link hitter... It downloads the current html file for the web page it starts with. Then, it searches through the html file for URL strings (which are in quotes) and shoves it into an array of strings. Then, because the strings are easily and numericly ordered, I can use the Rand() function to randomly select an index in the array. Then, the URL selected within the array will be opened in the internet and the process will be repeated. Now, I have the downloading and the file opening down. The parts that confuse me, especially in code, are how to put the characters between the quotes into a variable (I guess because it is a string, it will be put into a character array one by one, so the array of URL strings will have to be multidimensional I think). Once I get that part down, the rest is easy. I think it is a lot simpler than what I am thinking, but I just can't seem to get it right in the code. Help would be great!!!

  10. #10
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    >>it searches through the html file for URL strings (which are in quotes)
    If you're talking about any old web page, then no, they're not guaranteed to be in quotes.
    A link could be coded like <a href=link.htm>
    Also, quotes aren't the only criteria you need to think about. The URL starts after the href element of an anchor tag. This means your code needs to be smarter than simply looking for the next quote in the text.

    My suggestion, create a variable that represents the current "state" or what you're looking for. It's hard to explain (especially late at night!), so here's an example of what I mean. It doesn't work to the level of detail that is required to properly parse HTML, but might give you a bit of an idea.

    Code:
    #include <stdio.h>
    
    #define INURL 1
    #define NOT_INURL !INURL
    
    int main(void)
    {
      char *html = "this a <a href=\"link.htm\">link</a>.";
      char url[BUFSIZ];
      char *url_p;
      char *p;
      int  State;
      
      puts (html);
      
      for (State = NOT_INURL, p = html, url_p = url;
           *p; 
           p++)
      {
        if (*p == '\"')
        {
          State = !State;
          printf ("Now %s in url\n", State == INURL ? "" : "not");
          continue;
        }
        if (State == INURL)
        {
          *url_p = *p;
          url_p++;
        }
      }
      
      *url_p = '\0';
      
      puts (url);
      
      return(0);
    }
    /*
     * buffer overflow checking omitted!
     */
    I'd also suggest you research how to properly use multi-dimensional arrays before trying to intergrate them into your program. I don't know how much you understand, so forgive this simple example:
    Code:
    #include <stdio.h>
    #define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
    
    int main(void)
    {
      char list[2][3] = { "abc", "efg"};
      int i, j;
      
      for (i = 0; i < ARRAY_SIZE(list); i++)
    	for (j = 0; j < ARRAY_SIZE(list[i]); j++)
    	  putchar (list[i][j]);
    	  
      return(0);
    }
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 8
    Last Post: 04-25-2008, 02:45 PM
  2. Custom String class gives problem with another prog.
    By I BLcK I in forum C++ Programming
    Replies: 1
    Last Post: 12-18-2006, 03:40 AM
  3. Linked List Help
    By CJ7Mudrover in forum C Programming
    Replies: 9
    Last Post: 03-10-2004, 10:33 PM
  4. Something is wrong with this menu...
    By DarkViper in forum Windows Programming
    Replies: 2
    Last Post: 12-14-2002, 11:06 PM