Office access in C/C++ NOT VC++!! :)

This is a discussion on Office access in C/C++ NOT VC++!! :) within the C++ Programming forums, part of the General Programming Boards category; I am developing a simple Win32 application using Borland's bcc32 v5.5 free compiler. I need to be able to access ...

  1. #1
    Registered User
    Join Date
    May 2005
    Location
    Toronto, ON
    Posts
    1

    Question Office access in C/C++ NOT VC++!! :)

    I am developing a simple Win32 application using Borland's bcc32 v5.5 free compiler.

    I need to be able to access a number of word documents created using a VBA macro. I am really using Win32 for this just for fun and to learn.

    However, I need to be able to access the word document extract the text (without formatting etc) and process the text using regular expressions.

    Can someone please point me to a resource/technology that will address the word document access problem using c/c++ and not VC++.

    thank you,
    Paul Schofield

  2. #2
    Hardware Engineer
    Join Date
    Sep 2001
    Posts
    1,398
    You can get the .doc file specs at whatsit.org. If you study the spec... for a couple of days ...you can probably figure-out how to strip the formatting shtuff out of the file, leaving only the ASCII. Here's a short excerpt from the spec.

    Text
    The text of the file starts at fib.fcMin. fib.fcMin is usually set to the next 128 byte boundary after the end of the FIB. The text in a Word document is ASCII text with the following restrictions (ASCII codes given in decimal):\

    Paragraph ends are stored as a single Carriage Return character (ASCII 13). No other occurrences of this character sequence are allowed.
    Hard line breaks which are not paragraph ends are stored as ASCII 11. Other line break or word wrap information is not stored.
    Breaking hyphens are stored as ASCII 45 (normal hyphen code); Non-required hyphens are ASCII 31. Non-breaking hyphens are stored as ASCII 30.
    Non-breaking spaces are stored as 160. Normal spaces are ASCII 32.
    Page breaks and Section marks are ASCII 12 (normal form feed); if there's an entry in the section table, it's a section mark, otherwise it's a page break.
    Column breaks are stored as ASCII 14.
    Tab characters are ASCII 9 (normal).
    The field begin mark which delimits the beginning of a field is ASCII 19. The field end mark which delimits the end of a field is ASCII 21. The field separator ,which marks the boundary between the preceding field code text and following field expansion text within a field, is ASCII 20. The field escape character is the '\' character which also serves as the formula mark.
    The cell mark which delimits the end of a cell in a table row is stored as ASCII 7 and has the fInTable paragraph property set to fTrue (pap.fInTable == 1).
    The row mark which delimits the end of a table row is stored as ASCII 7 and has the fInTable paragraph property and fTtp paragraph property set to fTrue (pap.fInTable == 1 && pap.fTtp == 1).
    The following ASCII codes are treated as "special" characters when they have the character property special on (chp.fSpec == 1):
    [EDIT] -
    Do you control the VBA script? Can you save it as plain ASCII text?
    Last edited by DougDbug; 05-26-2005 at 01:52 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. access to devices
    By pastitprogram in forum C++ Programming
    Replies: 1
    Last Post: 06-06-2008, 11:16 PM
  2. Microsoft Access problem
    By Micko in forum Tech Board
    Replies: 7
    Last Post: 03-04-2005, 12:55 AM
  3. Replies: 2
    Last Post: 09-08-2003, 12:47 PM
  4. What is your favourite office suite?
    By Nutshell in forum A Brief History of Cprogramming.com
    Replies: 9
    Last Post: 06-06-2002, 02:44 AM
  5. Direct disk access in DOS
    By VirtualAce in forum A Brief History of Cprogramming.com
    Replies: 3
    Last Post: 02-26-2002, 01:52 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21