I'm trying to read text from a webpage but i don't know how...
something like
i'm trying to read it from http://www.something.com, so it schould output "Something".Code:void main(){ printf("%s", TextFromWebpage); }
Any help?
This is a discussion on Reading text from webpage within the C++ Programming forums, part of the General Programming Boards category; I'm trying to read text from a webpage but i don't know how... something like Code: void main(){ printf("%s", TextFromWebpage); ...
I'm trying to read text from a webpage but i don't know how...
something like
i'm trying to read it from http://www.something.com, so it schould output "Something".Code:void main(){ printf("%s", TextFromWebpage); }
Any help?
Ok, so what do you ACTUALLY want to do?
Say we have a HTML document on the file http://www.something.com/index.html that contains, you want it to sayCode:<h1>Sometext</h1>\n<a href='http://www.something.com/blah.html'>Somelink</a>?Code:Sometext Somelink
Or something else?
--
Mats
Last edited by matsp; 10-09-2007 at 06:08 AM. Reason: Make the HTML stand out using code tags.
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
Check out www.something.com...
i want it to exactly show "Something.".
And what it comes from is thisTo get "Something" extracted out of that, you will need to remove all the tagss in the angle brackets, and you should be left with the text "Something".Code:<html><head><title>Something.</title></head> <body>Something. </body> </html>
Of course, that is only half the problem, the other problem is to get the text from www.something.com to your application. You will need to either use an external program (such as wget) or write your own "download from the web" functions. The latter is not HARD to do, but not immediately trivial either.
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
Use DispHelper ( a library). Download it, and include the DispHelper.h file and the DispHelper.c file to your project. Then use the following code (this is straight from the examples section of the library):Code:int main(void) { CDhInitialize init; dhToggleExceptions(TRUE); cout << "Running DownloadWebPage sample..." << endl; DownloadWebPage(TEXT("http://www.something.com")); cin.get(); return 0; } void DownloadWebPage(LPCTSTR szURL) { CDispPtr objHTTP; CDhStringA szResponse, szStatus; try { dhCheck( dhCreateObject(L"MSXML2.XMLHTTP", NULL, &objHTTP) ); dhCheck( dhCallMethod(objHTTP, L".Open(%S, %T, %b)", L"GET", szURL, FALSE) ); dhCheck( dhCallMethod(objHTTP, L".Send") ); dhCheck( dhGetValue(L"%s", &szStatus, objHTTP, L".StatusText") ); cout << "Status: " << szStatus << endl; dhCheck( dhGetValue(L"%s", &szResponse, objHTTP, L".ResponseText") ); cout << szResponse << endl; } catch (string errstr) { cerr << "Fatal error details:" << endl << errstr << endl; } }
And what error is that?
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
I'd use libcurl and implement some sort of state-machine (to parse the HTML).
That would be one of the two lines above. How about a printout to figure out which of the two it is?Code:dhCheck( dhCallMethod(objHTTP, L".Open(%S, %T, %b)", L"GET", szURL, FALSE) ); dhCheck( dhCallMethod(objHTTP, L".Send") );
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
I can't help (I didn't write it, just used the given code). It worked for me, however. Remember to include DispHelper.c to your project.
This mightn't work (depending on what IDE you're using), but here's something I posted a while ago - using ifsteam from a webpage??
It's not a great solution really. As zac said - use libcurl - http://curl.haxx.se It's not hard to use.
My output:Code:Enter the URL: http://www.google.com <html><head><meta http-equiv="content-type" content="text/html; charset=UTF-8"><title>Google</title><style>body,td,a,p,. h{font-family:arial,sans-serif}.h{font-size:20px}.h{color:#3366cc}.q{color:#00c}.ts td{padding:0}.ts{border-collapse:col lapse}</style><script> <!-- window.google={kEI:"F5f-RqS6E464xAGH4aAc",kEXPI:"17259,17735",kHL:"nn"};function sf(){document.f.q.focus();} window.clk=function(b,c,d,e,f,g){if(document.images){var a=encodeURIComponent||escape;(new Image).src="/url?sa=T"+(c?"&o i="+a(c):"")+(d?"&cad="+a(d):"")+"&ct="+a(e)+"&cd="+a(f)+(b?"&url="+a(b.replace(/#.*/,"")).replace(/\+/g,"%2B"):"")+"&ei =F5f-RqS6E464xAGH4aAc"+g}return true};// --> </script> </head><body bgcolor=#ffffff text=#000000 link=#0000cc vlink=#551a8b alink=#ff0000 onload="sf();if(document.images){new Image().src='/images/nav_logo3.png'}" topmargin=3 marginheight=3><div align=right id=guser style="font-size:84%;padding: 0 0 4px" width=100%><nobr><a href="https://www.google.com/accounts/Login?continue=http://www.google.com/&hl=nn">Logg p├Ñ </a></nobr></div><center><br clear=all id=lgpd><img alt="Google" height=110 src="/intl/nn_ALL/images/logo.gif" width=334 ><br><br><form action="/search" name=f><style>#lgpd{display:none}</style><script defer><!-- function qs(el){if(window.RegExp&&window.encodeURIComponent){var ue=el.href,qe=encodeURIComponent(document.f.q.value);if (ue.indexOf("q=")!=-1){el.href=ue.replace(new RegExp("q=[^&$]*"),"q="+qe);}else{el.href=ue+"&q="+qe;}}return 1;} //--> </script><table border=0 cellspacing=0 cellpadding=4><tr><td nowrap><font size=-1><b>Veven</b> <a class=q href="http://images.google.com/imghp?oe=UTF-8&hl=nn&tab=wi" onclick="return qs(this)">Bilete</a> &nb sp; <a class=q href="http://groups.google.com/grphp?oe=UTF-8&hl=nn&tab=wg" onclick="return qs(this)">Grupper</a>&nb sp; <a class=q href="/dirhp?oe=UTF-8&hl=nn&tab=wd" onclick="return qs(this)">Katalog</a> &nb sp; <!--"/*"/*--><font size=-1><a class=q onClick='return window.qs?qs(this):1' href='http://127.0.0.1:4664/&s=OyHX 5sj3H8T6QlFLBCw2ZKLLBP0'>Desktop</a></font> </font></td></tr></table><table cellpadding=0 cellspa cing=0><tr valign=top><td width=25%> </td><td align=center nowrap><input name=hl type=hidden value=nn><input maxlen gth=2048 name=q size=55 title="Google-s├╕k" value=""><br><input name=btnG type=submit value="Google-s├╕k"><input name=bt nI type=submit value="Beint fram!"></td><td nowrap width=25%><font size=-2> <a href=/advanced_search?hl=nn>Ut vida s├╕k</a><br> <a href=/preferences?hl=nn>Innstillingar</a><br> <a href=/language_tools?hl=nn>S pr├Ñkverkty</a></font></td></tr></table></form><br><br><font size=-1><a href="/intl/no/about.html">Alt om Google</a> - <b><a href=http://www.google.ie/>G├Ñ til Google Ireland</a></b><span id=hp style="behavior:url(#default#homepage)"></spa n><script><!-- (function() {var a="http://www.google.com/",b=document.getElementById("hp"),c=b.isHomePage(a);_rptHp=function(){(new Ima ge).src="/gen_204?sa=X&ct=mgyhp&cd="+(b.isHomepage(a)?1:0)};if(!c){document.write('<p><a href=/mgyhp.html onClick=docume nt.getElementById("hp").setHomepage("'+a+'");_rptHp();>Gjer Google til startsida di!</a>')};})();//--> </script></font><p><font size=-2>©2007 Google</font></p></center></body></html>