I have a program that now is able to find text within quotes, in an html file (special thanks to Hammer). Here is what I ended up with...
Code:
#include <stdio.h>
#include <stdlib.h>
#define SIZE 500000
#define INURL 1
#define NOT_INURL !INURL
int main(void)
{
int i = 0, b = 0, d = 0;
FILE *fp;
int matches = 0;
fp = fopen("c:\\blah.html", "r");
char html[SIZE];
char url[150][512];
//char *url_p;
char c;
int State, count = 0;
fread(html, sizeof(char), SIZE, fp);
//puts (html);
for (State = NOT_INURL; c = html[i]; i++)
{
if (c == '\"')
{
count++;
matches++;
State = !State;
continue;
}
if (State == INURL)
{
url[b][d] = c;
printf("%c", url[b][d]);
d++;
}
if (count == 2)
{
url[b][d] = '\0';
printf("\n");
b++;
count = 0;
}
}
printf("There are %d matches\n\n\n\n\n", matches / 2);
fclose(fp);
return(0);
}
However, this method doesn't work for URLS, it only works for strings within the quotes. Do any of you guys have some ideas on how I can single out the quotes? I was thinking of first searching for a quote, then, if the next four characters were "http", it would continue to stuff charcters into the array. Or, I could search through the array after it finishes finding all quoted text, and then search for "http," but I don't know the code for throwing out charcters of an array and having everything reordered, unless I rewrote the array and deleted the old one, which is inefficient. I was wondering if anybody has any ideas toward this dilemna, and I would really appreciate it if someone could throw some code ideas at me.