Thread: Reading text from webpage

  1. #1
    Registered User
    Join Date
    Jul 2007
    Posts
    61

    Reading text from webpage

    I'm trying to read text from a webpage but i don't know how...

    something like
    Code:
    void main(){
             printf("%s", TextFromWebpage);
    }
    i'm trying to read it from http://www.something.com, so it schould output "Something".
    Any help?

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Ok, so what do you ACTUALLY want to do?

    Say we have a HTML document on the file http://www.something.com/index.html that contains
    Code:
    <h1>Sometext</h1>\n<a href='http://www.something.com/blah.html'>Somelink</a>
    , you want it to say
    Code:
    Sometext
    Somelink
    ?

    Or something else?

    --
    Mats
    Last edited by matsp; 10-09-2007 at 06:08 AM. Reason: Make the HTML stand out using code tags.
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Registered User
    Join Date
    Jul 2007
    Posts
    61
    Check out www.something.com...
    i want it to exactly show "Something.".

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    And what it comes from is this
    Code:
    <html><head><title>Something.</title></head>
    <body>Something.
    </body>
    </html>
    To get "Something" extracted out of that, you will need to remove all the tagss in the angle brackets, and you should be left with the text "Something".

    Of course, that is only half the problem, the other problem is to get the text from www.something.com to your application. You will need to either use an external program (such as wget) or write your own "download from the web" functions. The latter is not HARD to do, but not immediately trivial either.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Registered User mikeman118's Avatar
    Join Date
    Aug 2007
    Posts
    183
    Use DispHelper ( a library). Download it, and include the DispHelper.h file and the DispHelper.c file to your project. Then use the following code (this is straight from the examples section of the library):
    Code:
    int main(void)
    {
    	CDhInitialize init;
    	dhToggleExceptions(TRUE);
    
    	cout << "Running DownloadWebPage sample..." << endl;
    	DownloadWebPage(TEXT("http://www.something.com"));
            cin.get();
            return 0;
    }
    void DownloadWebPage(LPCTSTR szURL)
    {
    	CDispPtr objHTTP;
    	CDhStringA szResponse, szStatus;
    
    	try
    	{
    		dhCheck( dhCreateObject(L"MSXML2.XMLHTTP", NULL, &objHTTP) );
    		dhCheck( dhCallMethod(objHTTP, L".Open(%S, %T, %b)", L"GET", szURL, FALSE) );
    		dhCheck( dhCallMethod(objHTTP, L".Send") );
    
    		dhCheck( dhGetValue(L"%s", &szStatus, objHTTP, L".StatusText") );
    		cout << "Status: " << szStatus << endl;
    
    		dhCheck( dhGetValue(L"%s", &szResponse, objHTTP, L".ResponseText") );
    		cout << szResponse << endl;
    	}
    	catch (string errstr)
    	{
    		cerr << "Fatal error details:" << endl << errstr << endl;
    	}
    }

  6. #6
    Registered User
    Join Date
    Jul 2007
    Posts
    61
    Quote Originally Posted by mikeman118 View Post
    Use DispHelper ( a library). Download it, and include the DispHelper.h file and the DispHelper.c file to your project. Then use the following code (this is straight from the examples section of the library):
    Code:
    int main(void)
    {
    	CDhInitialize init;
    	dhToggleExceptions(TRUE);
    
    	cout << "Running DownloadWebPage sample..." << endl;
    	DownloadWebPage(TEXT("http://www.something.com"));
            cin.get();
            return 0;
    }
    void DownloadWebPage(LPCTSTR szURL)
    {
    	CDispPtr objHTTP;
    	CDhStringA szResponse, szStatus;
    
    	try
    	{
    		dhCheck( dhCreateObject(L"MSXML2.XMLHTTP", NULL, &objHTTP) );
    		dhCheck( dhCallMethod(objHTTP, L".Open(%S, %T, %b)", L"GET", szURL, FALSE) );
    		dhCheck( dhCallMethod(objHTTP, L".Send") );
    
    		dhCheck( dhGetValue(L"%s", &szStatus, objHTTP, L".StatusText") );
    		cout << "Status: " << szStatus << endl;
    
    		dhCheck( dhGetValue(L"%s", &szResponse, objHTTP, L".ResponseText") );
    		cout << szResponse << endl;
    	}
    	catch (string errstr)
    	{
    		cerr << "Fatal error details:" << endl << errstr << endl;
    	}
    }
    It's giving me an error

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    And what error is that?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #8
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    I'd use libcurl and implement some sort of state-machine (to parse the HTML).

  9. #9
    Registered User
    Join Date
    Jul 2007
    Posts
    61
    Quote Originally Posted by matsp View Post
    And what error is that?

    --
    Mats
    Code:
    Fatal error details:
    Member: .Send
    Function: CallMethod
    Error In: InvokeArray
    Error: Kan de opegeven bron niet vinden.
    Code: 800c0005
    Source: msxml3.dll

  10. #10
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Code:
    		dhCheck( dhCallMethod(objHTTP, L".Open(%S, %T, %b)", L"GET", szURL, FALSE) );
    		dhCheck( dhCallMethod(objHTTP, L".Send") );
    That would be one of the two lines above. How about a printout to figure out which of the two it is?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  11. #11
    Registered User mikeman118's Avatar
    Join Date
    Aug 2007
    Posts
    183
    I can't help (I didn't write it, just used the given code). It worked for me, however. Remember to include DispHelper.c to your project.

  12. #12
    The superhaterodyne twomers's Avatar
    Join Date
    Dec 2005
    Location
    Ireland
    Posts
    2,273
    This mightn't work (depending on what IDE you're using), but here's something I posted a while ago - http://cboard.cprogramming.com/showp...57&postcount=9
    It's not a great solution really. As zac said - use libcurl - http://curl.haxx.se It's not hard to use.

    My output:
    Code:
    Enter the URL: http://www.google.com
    <html><head><meta http-equiv="content-type" content="text/html; charset=UTF-8"><title>Google</title><style>body,td,a,p,.
    h{font-family:arial,sans-serif}.h{font-size:20px}.h{color:#3366cc}.q{color:#00c}.ts td{padding:0}.ts{border-collapse:col
    lapse}</style><script>
    <!--
    window.google={kEI:"F5f-RqS6E464xAGH4aAc",kEXPI:"17259,17735",kHL:"nn"};function sf(){document.f.q.focus();}
    window.clk=function(b,c,d,e,f,g){if(document.images){var a=encodeURIComponent||escape;(new Image).src="/url?sa=T"+(c?"&o
    i="+a(c):"")+(d?"&cad="+a(d):"")+"&ct="+a(e)+"&cd="+a(f)+(b?"&url="+a(b.replace(/#.*/,"")).replace(/\+/g,"&#37;2B"):"")+"&ei
    =F5f-RqS6E464xAGH4aAc"+g}return true};// -->
    </script>
    </head><body bgcolor=#ffffff text=#000000 link=#0000cc vlink=#551a8b alink=#ff0000 onload="sf();if(document.images){new
    Image().src='/images/nav_logo3.png'}" topmargin=3 marginheight=3><div align=right id=guser style="font-size:84%;padding:
    0 0 4px" width=100%><nobr><a href="https://www.google.com/accounts/Login?continue=http://www.google.com/&hl=nn">Logg p├&#209;
    </a></nobr></div><center><br clear=all id=lgpd><img alt="Google" height=110 src="/intl/nn_ALL/images/logo.gif" width=334
    ><br><br><form action="/search" name=f><style>#lgpd{display:none}</style><script defer><!--
    function qs(el){if(window.RegExp&&window.encodeURIComponent){var ue=el.href,qe=encodeURIComponent(document.f.q.value);if
    (ue.indexOf("q=")!=-1){el.href=ue.replace(new RegExp("q=[^&$]*"),"q="+qe);}else{el.href=ue+"&q="+qe;}}return 1;}
    //-->
    </script><table border=0 cellspacing=0 cellpadding=4><tr><td nowrap><font size=-1><b>Veven</b>&nbsp;&nbsp;&nbsp;&nbsp;<a
     class=q href="http://images.google.com/imghp?oe=UTF-8&hl=nn&tab=wi" onclick="return qs(this)">Bilete</a>&nbsp;&nbsp;&nb
    sp;&nbsp;<a class=q href="http://groups.google.com/grphp?oe=UTF-8&hl=nn&tab=wg" onclick="return qs(this)">Grupper</a>&nb
    sp;&nbsp;&nbsp;&nbsp;<a class=q href="/dirhp?oe=UTF-8&hl=nn&tab=wd" onclick="return qs(this)">Katalog</a>&nbsp;&nbsp;&nb
    sp;&nbsp;<!--"/*"/*--><font size=-1><a class=q onClick='return window.qs?qs(this):1' href='http://127.0.0.1:4664/&s=OyHX
    5sj3H8T6QlFLBCw2ZKLLBP0'>Desktop</a></font>&nbsp;&nbsp;&nbsp;&nbsp;</font></td></tr></table><table cellpadding=0 cellspa
    cing=0><tr valign=top><td width=25%>&nbsp;</td><td align=center nowrap><input name=hl type=hidden value=nn><input maxlen
    gth=2048 name=q size=55 title="Google-s├╕k" value=""><br><input name=btnG type=submit value="Google-s├╕k"><input name=bt
    nI type=submit value="Beint fram!"></td><td nowrap width=25%><font size=-2>&nbsp;&nbsp;<a href=/advanced_search?hl=nn>Ut
    vida s├╕k</a><br>&nbsp;&nbsp;<a href=/preferences?hl=nn>Innstillingar</a><br>&nbsp;&nbsp;<a href=/language_tools?hl=nn>S
    pr├&#209;kverkty</a></font></td></tr></table></form><br><br><font size=-1><a href="/intl/no/about.html">Alt om  Google</a> -
    <b><a href=http://www.google.ie/>G├&#209; til Google Ireland</a></b><span id=hp style="behavior:url(#default#homepage)"></spa
    n><script><!--
    (function() {var a="http://www.google.com/",b=document.getElementById("hp"),c=b.isHomePage(a);_rptHp=function(){(new Ima
    ge).src="/gen_204?sa=X&ct=mgyhp&cd="+(b.isHomepage(a)?1:0)};if(!c){document.write('<p><a href=/mgyhp.html onClick=docume
    nt.getElementById("hp").setHomepage("'+a+'");_rptHp();>Gjer Google til startsida di!</a>')};})();//-->
    </script></font><p><font size=-2>&copy;2007 Google</font></p></center></body></html>

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. reading text
    By Anator in forum C++ Programming
    Replies: 33
    Last Post: 01-30-2008, 12:13 PM
  2. reading text from a file
    By gustavson in forum C Programming
    Replies: 1
    Last Post: 10-26-2007, 01:18 AM
  3. reading from a text file help......
    By jodders in forum C++ Programming
    Replies: 2
    Last Post: 01-25-2005, 12:51 PM
  4. Reading text file and structuring it..
    By Killroy in forum C Programming
    Replies: 20
    Last Post: 11-19-2004, 08:36 AM
  5. Reading Tab Separted Text files
    By Cathy in forum C Programming
    Replies: 1
    Last Post: 02-15-2002, 10:28 AM