How can I read Word documents using C#? I use streamReader for plain text files, but I don't know how to read Word documents. Does anyone know how to do this?
How can I read Word documents using C#? I use streamReader for plain text files, but I don't know how to read Word documents. Does anyone know how to do this?
The file-format produced by Word (in a .doc file) is a binary format that contains all sorts of "extra data" beyond the basic text that is "the real documet".
I would guess there are libraries available to read it, but not sure really.
The easiest solution is perhaps to save the document as text or "mostly text" document (rtf for example).
--
Mats
There's a COM Interop object available if you have Word installed. Take a look through Google for "Word interop" or something similar.
If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein
I have Word installed, but it's quite old (Office 2000). Isn't there some other way to read Word documents?
Depends on what you want to do - what are you trying to achieve? If you can describe what your end goal is, then we can almost certainly describe some way of getting there (or towards that goal) - but "just reading a word document" isn't trivial, because the information in the file is stored in quite a complex manner (for example, if you enable "visible changes" both the previous text and the new text for multiple generations of the document may be kept in the document).
--
Mats
Another alternative is to use the COM IFilter interface provided by Microsoft Desktop Search. You'll lose all the font information, but you will be able to at least get the words out of the document. It may be possible to use OpenOffice to programatically open Word docs.
In any case, it's not a simple endeavor.
If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein
Well, if you want to do that, you could try an approach of this:
This isn't meant to be complete, and I'm fairly sure you'll get some "garbage", but you probably can get the actual text out of it.Code:fin = fopen("something.doc", "rb"); fout = fopen("someelse.txt", "w"); while ((c = fgetc(fin)) != EOF) { if (isascii(c)) fputc(c, fout); }
Alternatively, try the "Word Viewer":
Microsoft Word Viewer download site
It allows you to copy text out of a word document, it's free and you don't have to write a single line of code (and it's probably going to do a better job of sorting out what's what in your document too).
--
Mats
Thanks for the help!