stripping text from a word doc

This is a discussion on stripping text from a word doc within the C Programming forums, part of the General Programming Boards category; I have to pull the text out of a word doc. I know a small bit of C, but I'm ...

  1. #1
    Registered User
    Join Date
    Apr 2002
    Posts
    7

    stripping text from a word doc

    I have to pull the text out of a word doc. I know a small bit of C, but I'm not sure I know enough to do this. Regardless, I have a C dll that another guy here at work wrote from my VB version and it does strip out the text. However, it also strips out the formatting characters that are alphanumeric. This will not work for the situation I need it for. Does anyone know how to find the formatting characters in a word doc when stripping the text out so that it will ignore the formatting characters? Here's the code to the dll that I'm using right now. Any ideas or suggestions would be greatly appreciated. Thanks.

    Code:
    #include <windows.h>
    #include <stdio.h>
    #include <iostream.h>
    #include <string.h>
    #include <fstream.h>
    
    extern "C" LPWSTR __stdcall ConvertDocument(const char* pPath)
    {
    	long i;
    	char ch;
    	//char oStr[100000];
    	LPWSTR bsText;
    	//WCHAR wszText[200000];
    	CHAR oStr[200000];
    
    
       ifstream tfile(pPath, ios::binary | ios::nocreate );
       if( tfile ) {
    
    	   i = 0;
    	   while ( (tfile.good()) && (i <= 199999) ) { // EOF or failure stops the reading
    			tfile.get( ch );
     			if((ch >= 'A')&&(ch <= 'Z')){
    				oStr[i] = ch;
    				i++;
    			}
    			if((ch >= 'a')&&(ch <= 'z')){
    				oStr[i] = ch;
    				i++;
    			}
    			if((ch >= '0')&&(ch <= '9')){
    				oStr[i] = ch;
    				i++;
    			}
    			if(ch == 13){
    				oStr[i] = 13;
    				i++;
    				oStr[i] = 10;
    				i++;
    			}
    			if((ch == ' ')||(ch == '\t')){
    				oStr[i] = ' ';
    				i++;
    			}
    			if((ch == '.')||(ch == '?')||(ch == '!')||(ch == ';')||(ch == '(')||(ch == ')')||(ch == '{')||(ch == '}')||(ch == '[')||(ch == ']')||(ch == '`')||(ch == ':')||(ch == 39)){
    				oStr[i] = ' ';
    				i++;
    			}  
    	   }
    	   tfile.close();	// No need for this really, ~ofstream kills the file
    	   oStr[i] = '\0';
       }
       else {
          cout << "ERROR: Cannot open file." << endl;
    	  oStr[0] = '\0';
       }	
    	bsText = SysAllocString((LPWSTR)&oStr);
    
    	return bsText;
    
    }

  2. #2
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    Your code is c++ (or is it vc++) ...... this is the C board.

    Try the Windows or the C++ board for better luck.
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. please help with binary tree, urgent.
    By slickestting in forum C Programming
    Replies: 2
    Last Post: 07-22-2007, 08:55 PM
  2. How to use FTP?
    By maxorator in forum C++ Programming
    Replies: 8
    Last Post: 11-04-2005, 03:17 PM
  3. Replies: 3
    Last Post: 05-25-2005, 02:50 PM
  4. Read word from text file (It is an essay)
    By forfor in forum C Programming
    Replies: 7
    Last Post: 05-08-2003, 12:45 PM
  5. Outputting String arrays in windows
    By Xterria in forum Game Programming
    Replies: 11
    Last Post: 11-13-2001, 07:35 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21