How can I read Word documents using C#? I use streamReader for plain text files, but I don't know how to read Word documents. Does anyone know how to do this?
Printable View
How can I read Word documents using C#? I use streamReader for plain text files, but I don't know how to read Word documents. Does anyone know how to do this?
The file-format produced by Word (in a .doc file) is a binary format that contains all sorts of "extra data" beyond the basic text that is "the real documet".
I would guess there are libraries available to read it, but not sure really.
The easiest solution is perhaps to save the document as text or "mostly text" document (rtf for example).
--
Mats
There's a COM Interop object available if you have Word installed. Take a look through Google for "Word interop" or something similar.
I have Word installed, but it's quite old (Office 2000). Isn't there some other way to read Word documents?
Depends on what you want to do - what are you trying to achieve? If you can describe what your end goal is, then we can almost certainly describe some way of getting there (or towards that goal) - but "just reading a word document" isn't trivial, because the information in the file is stored in quite a complex manner (for example, if you enable "visible changes" both the previous text and the new text for multiple generations of the document may be kept in the document).
--
Mats
Another alternative is to use the COM IFilter interface provided by Microsoft Desktop Search. You'll lose all the font information, but you will be able to at least get the words out of the document. It may be possible to use OpenOffice to programatically open Word docs.
In any case, it's not a simple endeavor.
Well, if you want to do that, you could try an approach of this:
This isn't meant to be complete, and I'm fairly sure you'll get some "garbage", but you probably can get the actual text out of it.Code:fin = fopen("something.doc", "rb");
fout = fopen("someelse.txt", "w");
while ((c = fgetc(fin)) != EOF) {
if (isascii(c)) fputc(c, fout);
}
Alternatively, try the "Word Viewer":
Microsoft Word Viewer download site
It allows you to copy text out of a word document, it's free and you don't have to write a single line of code (and it's probably going to do a better job of sorting out what's what in your document too).
--
Mats
Thanks for the help!