C Board  

Go Back   C Board > General Programming Boards > C# Programming

Reply
 
LinkBack Thread Tools Display Modes
Old 08-03-2007, 03:43 AM   #1
Registered User
 
Join Date: Sep 2006
Posts: 98
Reading Microsoft Word documents

How can I read Word documents using C#? I use streamReader for plain text files, but I don't know how to read Word documents. Does anyone know how to do this?
Mavix is offline   Reply With Quote
Old 08-03-2007, 03:54 AM   #2
Kernel hacker
 
Join Date: Jul 2007
Location: Farncombe, Surrey, England
Posts: 15,686
The file-format produced by Word (in a .doc file) is a binary format that contains all sorts of "extra data" beyond the basic text that is "the real documet".

I would guess there are libraries available to read it, but not sure really.

The easiest solution is perhaps to save the document as text or "mostly text" document (rtf for example).

--
Mats
matsp is offline   Reply With Quote
Old 08-03-2007, 07:11 AM   #3
Anti-Poster
 
Join Date: Feb 2002
Posts: 1,241
There's a COM Interop object available if you have Word installed. Take a look through Google for "Word interop" or something similar.
__________________
Rule #1: Every rule has exceptions
pianorain is offline   Reply With Quote
Old 08-03-2007, 11:24 AM   #4
Registered User
 
Join Date: Sep 2006
Posts: 98
I have Word installed, but it's quite old (Office 2000). Isn't there some other way to read Word documents?
Mavix is offline   Reply With Quote
Old 08-03-2007, 12:00 PM   #5
Kernel hacker
 
Join Date: Jul 2007
Location: Farncombe, Surrey, England
Posts: 15,686
Depends on what you want to do - what are you trying to achieve? If you can describe what your end goal is, then we can almost certainly describe some way of getting there (or towards that goal) - but "just reading a word document" isn't trivial, because the information in the file is stored in quite a complex manner (for example, if you enable "visible changes" both the previous text and the new text for multiple generations of the document may be kept in the document).

--
Mats
matsp is offline   Reply With Quote
Old 08-03-2007, 01:57 PM   #6
Anti-Poster
 
Join Date: Feb 2002
Posts: 1,241
Another alternative is to use the COM IFilter interface provided by Microsoft Desktop Search. You'll lose all the font information, but you will be able to at least get the words out of the document. It may be possible to use OpenOffice to programatically open Word docs.

In any case, it's not a simple endeavor.
__________________
Rule #1: Every rule has exceptions
pianorain is offline   Reply With Quote
Old 08-04-2007, 07:22 AM   #7
Registered User
 
Join Date: Sep 2006
Posts: 98
Quote:
Originally Posted by matsp View Post
Depends on what you want to do - what are you trying to achieve? If you can describe what your end goal is, then we can almost certainly describe some way of getting there (or towards that goal) - but "just reading a word document" isn't trivial, because the information in the file is stored in quite a complex manner (for example, if you enable "visible changes" both the previous text and the new text for multiple generations of the document may be kept in the document).

--
Mats
All I want to do is extract the text from the Word document. Just the text, no formatting.
Mavix is offline   Reply With Quote
Old 08-04-2007, 07:28 AM   #8
Kernel hacker
 
Join Date: Jul 2007
Location: Farncombe, Surrey, England
Posts: 15,686
Well, if you want to do that, you could try an approach of this:

Code:
fin = fopen("something.doc", "rb");
fout = fopen("someelse.txt", "w"); 
while ((c = fgetc(fin)) != EOF) {
    if (isascii(c)) fputc(c, fout);
}
This isn't meant to be complete, and I'm fairly sure you'll get some "garbage", but you probably can get the actual text out of it.

Alternatively, try the "Word Viewer":
Microsoft Word Viewer download site

It allows you to copy text out of a word document, it's free and you don't have to write a single line of code (and it's probably going to do a better job of sorting out what's what in your document too).

--
Mats
matsp is offline   Reply With Quote
Old 08-04-2007, 07:39 AM   #9
Registered User
 
Join Date: Sep 2006
Posts: 98
Thanks for the help!
Mavix is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Hangman game and strcmp crazygopedder C Programming 12 11-23-2008 06:13 PM
Reading a Whole Word in C Chinfrim C Programming 2 10-19-2008 12:54 PM
Microsoft Word Automation BobS0327 Windows Programming 12 11-22-2007 05:53 PM
Apps that act "differently" in XP SP2 Stan100 Tech Board 6 08-16-2004 10:38 PM
im so stuck. how can i write a program to forward word documents to email addresses Britney C++ Programming 1 04-01-2003 06:02 AM


All times are GMT -6. The time now is 12:04 AM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22