I want to make a list of all links from an index.html page. All I want is the URL and I'll live with the assumption that there's double quotes around it - for now. Here's what I have so far:
My output shows me the URL, but it also shows me the rest of the line of text as well. I keep trying to tell it (I think) to stop after the first double quote after the URL, but no matter what I keep getting the rest of the line.Code:#include <stdlib.h> #include <string.h> #include <regex.h> #define MAX_STRING_SIZE 1024 int main() { int rc; regex_t * myregex = calloc(1, sizeof(regex_t)); regmatch_t matches[3]; FILE *fp; char line[MAX_STRING_SIZE]; if(myregex == NULL) return 1; fp = fopen("/var/tmp/index.html", "r"); rc = regcomp(myregex, "href\\s*=\\s*(\")*(.*?\")([^\"]+)", REG_EXTENDED); while(fgets(line, MAX_STRING_SIZE, fp) != NULL) { if(regexec(myregex, line, 3, matches, 0) == 0) { printf("String: %s\n", line + matches[2].rm_so); } } free(myregex); return 0; }
Thank you in advance for helping out.



LinkBack URL
About LinkBacks



CornedBee