Thread: What types of files can C++ file processing handle?

  1. #1
    Registered User
    Join Date
    May 2006
    Posts
    57

    What types of files can C++ file processing handle?

    I'm just starting with C++ file processing, and I know that it can modify and create .txt files.

    Can C++ file processing handle word documents (.doc)? Open Office documents? What other types of files can it handle? I know, for example, there are incompatibilities between Word and Open Office. So I wonder how many different types of file formats C++ file processing can handle, and if it can handle the differences between files types.
    Last edited by darsunt; 10-27-2008 at 04:33 PM. Reason: add info

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    C++ can read in any file you care to tell it to read in. It doesn't understand any file -- that's your job, as programmer.

  3. #3
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    C++ reads files on the byte level, or (under some circumstances) on the text level. With most formats, only the byte level is actually useful.
    Interpretation beyond that is up to you.

    For example, let's look at Word vs OpenOffice.org Writer.

    Word started out using a proprietary binary format. This format changed slightly with more or less every version, with some bigger changes, most significantly when the OLE Storage format was introduced. To find out how to parse it, you have to either find some documentation on it, read code that already parses it (such as the import filters of various open-source projects), or reverse-engineer it yourself.
    I think Office XP introduced the Word XML format. It was an XML-based format that somehow reflected the same information. The format was short-lived - by the next version already, they had changed it yet again.
    The current state of the art is Office OpenXML (OOXML) Text. OOXML is a travesty of an ISO standard - basically, it's a ZIP-format file containing a number of XML files and possibly other files, such as embedded images. The newest Office uses a pre-standardization version of this format, which is not compatible with the standard.

    OpenOffice.org started out as StarOffice. There, it used the StarOffice formats. I don't know what their early formats were, but by the time of OOo 1.0, they used XML files in a ZIP container (where do you think MS stole that idea ...). They then took that idea and generalized it, working together with other companies and organizations (IBM, KDE, ...), resulting in the OpenDocument Format (ODF) standard. OpenDocument is also a ZIP container of XML and related files. However, the internal XML structure is completely incompatible to OOXML.

    By the way, both MS Office and OpenOffice.org are mostly written in C++. (As far as I know. Of course, you can't be entirely sure what MS Office is written in.)
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Sidetracking a bit, but I thought a large portion of OpenOffice was actually Java-code.

    I think most of Office is written in C or C++, but most of it is compiled to intermediate form P-code rather than proper machine code.

    And yes, C++ can read ANY type of file. However, C++ as a language knows nothing about what the MEANING of the data in the file is. That is your task as a programmer. For some file-formats, you may be able to get libraries to interpret the format. For example, zip-files can be read with a zip library, jpeg-files with a jpeg library. Unfortunately, I'm not aware of any library that understands any of the proprietary forms of MS Office applications (.doc, .xls, etc).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Sidetracking a bit, but I thought a large portion of OpenOffice was actually Java-code.
    Yes, but all the important functionality is in C++. You can run it without a JRE.
    The exception is the database component. The underlying DB is written completely in Java.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  6. #6
    Registered User
    Join Date
    May 2006
    Posts
    57
    So basically anything is possible with C++ file processing, one just has to do the homework? Good news and bad news. Good it can be done. Bad, it sounds like a lot of work

    Better roll up the sleeves. I wonder if .NET has some shortcuts?

  7. #7
    Hardware Engineer
    Join Date
    Sep 2001
    Posts
    1,398
    I'm just starting with C++ file processing, and I know that it can modify and create .txt files.
    In C++, there are only two ways to read/write files: Text and binary. Almost everything is done in binary mode. (In this context, binary doesn't mean "base-2"... It means "raw data".)


    And, there are two ways to handle formatted files:

    1. The "easy way" is to find a library (a library of non-standard C++ functions) that handles the particular format. For example, the library function should be able read the file header and data, and write that information into a structure, object, or array. And, it should be able to create a complete file (header & data) from data provided by your program.

    Usually, this is the best way to go, especially if you are dealing with a compressed format. In that case, the library function(s) should open the file and decompress it (or compress and save it).

    Usually, it's not really easy to use these libraries, because there may be lots of functions to sort-through and learn, and the functions themselves can be complicated, requiring you to pass-in pointers-to-structures, etc.

    I don't know anything about .DOC files... I would assume Microsoft has a library for Visual C++.


    2. Study the format specifications, and do it yourself. You can find format specs (and/or links) at wotsit.org. But, most of these formats are also complicated, with lots of variations & options.

  8. #8
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    Could you not have tried what ever you are wanting to do without asking?

    Example:
    Code:
    #include "my_class.txt"
    #include "my_derived_class.py"
    Those work...

    Code:
    gcc -c my_class.py -o my_class.o
    Works too. Though since gcc and g++ do some leg work based on extention (i.e. determining if a file is C++ when GCC is called, will cause an invokation of G++)

  9. #9
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    What the hell, master5001? What's this got to do with anything?
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  10. #10
    3735928559
    Join Date
    Mar 2008
    Location
    RTP
    Posts
    838
    some IDEs have objects for dealing with MS Office documents. I've never used them, but they are there.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Formatting a text file...
    By dagorsul in forum C Programming
    Replies: 12
    Last Post: 05-02-2008, 03:53 AM
  2. Formatting the contents of a text file
    By dagorsul in forum C++ Programming
    Replies: 2
    Last Post: 04-29-2008, 12:36 PM
  3. Getting other processes class names
    By Hawkin in forum Windows Programming
    Replies: 3
    Last Post: 03-20-2008, 04:02 PM
  4. Inventory records
    By jsbeckton in forum C Programming
    Replies: 23
    Last Post: 06-28-2007, 04:14 AM
  5. Possible circular definition with singleton objects
    By techrolla in forum C++ Programming
    Replies: 3
    Last Post: 12-26-2004, 10:46 AM