Thread: Reading PDF and DOC files

  1. #1
    Registered User
    Join Date
    May 2010
    Posts
    1

    Reading PDF and DOC files

    I should have posted this question to this forum first. Anyway, I am working on a Visual C++ program which needs to read the contents of a PDF or DOC file. What libraries do I need to use for this task?

    Any pointers or info appreciated.

    Thank you in advance.

    K

  2. #2
    Registered User
    Join Date
    Jan 2009
    Location
    Australia
    Posts
    375
    If you can't find any libraries simply by searching, your best bet is to look at the source of Free or Open Source projects that already do these things.

    AbiWord is a Word Processor (pretty sure it supports doc) with readily available source on the website.

    Not sure of any open source PDF readers (I think Foxit Reader is just freeware/shareware?), but Google seems to return some results, so you could take a look there.

  3. #3
    Registered User
    Join Date
    Dec 2007
    Posts
    214
    On PDF:

    I doubt you will find anything free. You can buy an SDK from adobe : Adobe - PDF Library SDK

    You can get a free PDF reference http://www.adobe.com/devnet/acrobat/...erence_1-7.pdf , which is now superceded by ISO 32000-1, or you can buy the ISO 32000-1 http://www.iso.org/iso/iso_catalogue...csnumber=51502

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    It depends which version of Visual studio you have (and how much money you paid for it).
    Something old (or free) might not have a bunch of class libraries that make it easier.

    If you want to start at the beginning with bashing your own code, try
    Microsoft Office Binary (doc, xls, ppt) File Formats
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Registered User
    Join Date
    Jun 2008
    Posts
    62
    Quote Originally Posted by DaveH View Post
    On PDF:

    I doubt you will find anything free. You can buy an SDK from adobe : Adobe - PDF Library SDK

    You can get a free PDF reference http://www.adobe.com/devnet/acrobat/...erence_1-7.pdf , which is now superceded by ISO 32000-1, or you can buy the ISO 32000-1 ISO 32000-1:2008 - Document management -- Portable document format -- Part 1: PDF 1.7
    Actually there are a couple of free PDF libraries. However, I don't know about doc.

    PoDoFo is the one I've liked the best, though GNUPdf looks like it might show some promise.

  6. #6
    Registered User
    Join Date
    Jun 2010
    Location
    beijing
    Posts
    23
    OpenOffice can open MS word,so you can obtain openoffice's code to study it

  7. #7
    Registered User valaris's Avatar
    Join Date
    Jun 2008
    Location
    RING 0
    Posts
    507
    Depending on what you want to do with these files you may just be able to get away with using the word automation classes (I'm sure adobe exports some classes for pdf's too). This way you can just let the actual applications parse the file and also use their services to do whatever you need (spellcheck, display the document, modify it etc.).
    Last edited by valaris; 06-25-2010 at 12:08 PM. Reason: Removed C# example. Totally didn't realize I navigated out of the C# Forum to Windows Forum.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. How can detect filetype for files without extension?
    By rp_exploit in forum C Programming
    Replies: 2
    Last Post: 04-15-2010, 06:25 AM
  2. How can i store a doc or pdf file in SQL Server
    By Aga^^ in forum C# Programming
    Replies: 7
    Last Post: 12-19-2009, 01:15 PM
  3. Is anyone else encountering the same problem?
    By BEN10 in forum General Discussions
    Replies: 93
    Last Post: 12-18-2009, 11:02 PM
  4. how can i convert DOC to PDF?
    By PedroTuga in forum Tech Board
    Replies: 6
    Last Post: 05-17-2004, 01:10 PM

Tags for this Thread