-
String Search
Hello, I am paticularly writing code that will find a certain string but stop, when it needs to. For example, if I am searching for a URL within an html source code file. I want to find "http://" and then end the string at the last quotation mark.
Say I want to find all the URLS in the html source code, but there are so many different ones, I can't just search for a certain URL, or it will only get that URL, and I want all. So I would search for "http://" and then have it stop at the last quote in the URL; then it would keep going. All I can get is the "http://" then it keeps going. I don't understand how I can accomplish what I am trying to do.
I have...
Code:
while(fgets(buffer,MAXCHAR,fp1))
{
if( (a = strstr(buffer,text)) != NULL)
{
match++; //inc. match count
printf("The URL is %s\n", buffer);
}
}
"text" is the the string I'm looking for. However, I am having trouble figuring out how to make buffer read in the characters after the text it finds, and then stop after it hits the last quotation mark of the URL.
(sorry I'm saying the same thing over and over again.)
I appreciate any help, it would be really great.
-
what you are actually searching for is ' " " '
the " not the http://www.somedomain.whatever
search for the opening quote, when it finds it, extract all text until it finds the closing quote.
then, from that point in the file, search until next opening quote.
but exclude img src= unless you also want the url of the images linked.
-
Actually, that was also one of my ideas to search for the quotes, but my problem is, I don't know how to code it. I don't even know how to represent a quote because surely " " " is not valid. I'd have to use ascii character number 34, but I don't know how to do this. Any help would be great.
-
-
Oh, I see. Thanks. So I would seek for \". But, now I don't understand the code for extracting the string between the quotes. What I want to do is, harvest this URL, and stuff it into an array of strings. I can't figure out a way to pull this off in program code.
-
You don't need to use strstr, since you're searching for a string that's one char long. Try strchr.
Code:
char *p, *end, *s = "<img src=\"hippo.png\" />";
p = strchr(s, '"');
end = strchr(++p, '"');
while(p != end) putchar(*p++);
The output for that should be
-
Okay, thank you. Yeah this method works. However, now what I want is that instead of the characters being printed out with putchar(), how can I assign them into a string?
So say we use the last example of "hippo.png."
Would I have to load the characters into an array to make it a string, or is can this be done with pointers? What I want is that the URL gets put into a string, then printed out from there. Because what I want is an array of strings that are all assigned a number. Also, will this method work if I want more than one string between the quotes? Say, that in the file there are many URL's within quotes. I want to take ALL of these strings and stuff them into an array.
I have this all planned out in my head, but I need some help with actaully coding it. I appreciate any help.
-
You do realise that you said "I want" 5 times in that last post? It makes it hard for the reader to understand exactly what you're asking for! ;) You need to think about your requirements first, then work out how to achieve them. Clarity is key!
-
Allright, this is what I want. I am creating a random link hitter... It downloads the current html file for the web page it starts with. Then, it searches through the html file for URL strings (which are in quotes) and shoves it into an array of strings. Then, because the strings are easily and numericly ordered, I can use the Rand() function to randomly select an index in the array. Then, the URL selected within the array will be opened in the internet and the process will be repeated. Now, I have the downloading and the file opening down. The parts that confuse me, especially in code, are how to put the characters between the quotes into a variable (I guess because it is a string, it will be put into a character array one by one, so the array of URL strings will have to be multidimensional I think). Once I get that part down, the rest is easy. I think it is a lot simpler than what I am thinking, but I just can't seem to get it right in the code. Help would be great!!!
-
>>it searches through the html file for URL strings (which are in quotes)
If you're talking about any old web page, then no, they're not guaranteed to be in quotes.
A link could be coded like <a href=link.htm>
Also, quotes aren't the only criteria you need to think about. The URL starts after the href element of an anchor tag. This means your code needs to be smarter than simply looking for the next quote in the text.
My suggestion, create a variable that represents the current "state" or what you're looking for. It's hard to explain (especially late at night!), so here's an example of what I mean. It doesn't work to the level of detail that is required to properly parse HTML, but might give you a bit of an idea.
Code:
#include <stdio.h>
#define INURL 1
#define NOT_INURL !INURL
int main(void)
{
char *html = "this a <a href=\"link.htm\">link</a>.";
char url[BUFSIZ];
char *url_p;
char *p;
int State;
puts (html);
for (State = NOT_INURL, p = html, url_p = url;
*p;
p++)
{
if (*p == '\"')
{
State = !State;
printf ("Now %s in url\n", State == INURL ? "" : "not");
continue;
}
if (State == INURL)
{
*url_p = *p;
url_p++;
}
}
*url_p = '\0';
puts (url);
return(0);
}
/*
* buffer overflow checking omitted!
*/
I'd also suggest you research how to properly use multi-dimensional arrays before trying to intergrate them into your program. I don't know how much you understand, so forgive this simple example:
Code:
#include <stdio.h>
#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
int main(void)
{
char list[2][3] = { "abc", "efg"};
int i, j;
for (i = 0; i < ARRAY_SIZE(list); i++)
for (j = 0; j < ARRAY_SIZE(list[i]); j++)
putchar (list[i][j]);
return(0);
}