Thread: HTML Parsing in C to get body content

  1. #1
    Registered User
    Join Date
    Nov 2014
    Posts
    5

    Question HTML Parsing in C to get body content

    Ok, So I have used libcurl to get a webpage source and saved it in the the memory in C.I want to get the body content's name.For example if in the web page source code contained,

    Code:
    <body content = "cboarding-itskhaledmd user-1">

    I want my program to return,

    Code:
    cboarding-itskhaledmd user-1


    By going through the whole web source and return only body content.And if content doesn't exist then report.Please somebody help me.

    How to achieve this in C? I have searched for it.Please I know of XML parsers(libxml).But please some body give me a code with an example of exactly what I want to do please.I have been looking for it everywhere.Please in C. I really need it.
    I hope I have made my question clear.

  2. #2
    Registered User
    Join Date
    Nov 2012
    Posts
    1,393
    Quote Originally Posted by Khaled Mohammad View Post
    I know of XML parsers(libxml).But please some body give me a code with an example of exactly what I want to do please.
    If you want a general solution then I would first choose a parsing library. For example libcurl is quite well suited for the purpose you are describing. The distribution of curl includes an example of how to download a page and then parse it using libcurl and tidy.

    By the way your example is a bit confusing. You said you want the body content but then in your example you show an attribute of the body tag named content. Normally body content is something like this

    Code:
    <body>
       <p>Hi this is a paragraph.
        <p>&amp; this is another paragraph&#33;&#33;
    </body>
    Given that body content, what do you want your program to return?

  3. #3
    Registered User
    Join Date
    Nov 2014
    Posts
    5
    Quote Originally Posted by c99tutorial View Post
    If you want a general solution then I would first choose a parsing library. For example libcurl is quite well suited for the purpose you are describing. The distribution of curl includes an example of how to download a page and then parse it using libcurl and tidy.

    By the way your example is a bit confusing. You said you want the body content but then in your example you show an attribute of the body tag named content. Normally body content is something like this

    Code:
    <body>
       <p>Hi this is a paragraph.
        <p>&amp; this is another paragraph!!
    </body>
    Given that body content, what do you want your program to return?

    I actually want the attribute of body taged named content.Sorry If I confused you.But I want the attribute of body tag named content.Please help.Thanks.

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Khaled Mohammad
    I actually want the attribute of body taged named content.Sorry If I confused you.But I want the attribute of body tag named content.Please help.Thanks.
    c99tutorial's advice still applies:
    Quote Originally Posted by c99tutorial
    If you want a general solution then I would first choose a parsing library. For example libcurl is quite well suited for the purpose you are describing. The distribution of curl includes an example of how to download a page and then parse it using libcurl and tidy.
    Quote Originally Posted by Khaled Mohammad
    But please some body give me a code with an example of exactly what I want to do please.
    You should make a best effort and demonstrate that you did so, e.g., by posting the code that you tried and telling us how does it not work.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. parsing html in winsock.....
    By Anddos in forum C++ Programming
    Replies: 1
    Last Post: 10-22-2009, 06:28 AM
  2. html header and html body
    By Checker1977 in forum Tech Board
    Replies: 18
    Last Post: 11-23-2008, 05:52 AM
  3. Library which extract html tags content
    By Bargi in forum C++ Programming
    Replies: 0
    Last Post: 05-10-2007, 10:17 PM
  4. String parsing(parsing comments out of HTML file)
    By slcjoey in forum C# Programming
    Replies: 0
    Last Post: 07-29-2006, 08:28 PM
  5. Parsing HTML...or UPnP?
    By crummy in forum C Programming
    Replies: 7
    Last Post: 02-21-2005, 12:32 PM

Tags for this Thread