Originally Posted by
anduril462
Now, pick a HTTP library. libcurl comes to mind, it's very efficient, stable, widely used and well documented (and free).
Only a few minutes ago I started writing a web spider in C to download entire forums, using libcurl. Still need to parse and search for links. I think there are C libraries(written years ago) for parsing HTML, but webpages nowadays are made up of HTML, CSS, javascript and god knows what else. Can anyone recommend a good C library? I am probably just gonna write code to do the parsing, probably way quicker than learning some new library.
Code:
#include <curl/curl.h>
#include <stdio.h>
#include <stdlib.h>
#define FILE_SIZE 10000
int main(void)
{
curl_global_init(CURL_GLOBAL_ALL);
CURL * myHandle;
CURLcode setop_result;
FILE *file;
if((file = fopen("webpage.html", "wb")) == NULL)
{
perror("Error");
exit(EXIT_FAILURE);
}
if((myHandle = curl_easy_init()) == NULL)
{
perror("Error");
exit(EXIT_FAILURE);
}
if((setop_result = curl_easy_setopt(myHandle, CURLOPT_URL, "http://cboard.cprogramming.com/")) != CURLE_OK)
{
perror("Error");
exit(EXIT_FAILURE);
}
if((setop_result = curl_easy_setopt(myHandle, CURLOPT_WRITEDATA, file)) != CURLE_OK)
{
perror("Error");
exit(EXIT_FAILURE);
}
if((setop_result = curl_easy_perform(myHandle)) != 0)
{
perror("Error");
exit(EXIT_FAILURE);
}
curl_easy_cleanup(myHandle);
fclose(file);
puts("Webpage downloaded successfully to webpage.html");
return 0;
}