Thread: Help is needed to read text from a file

  1. #1
    Registered User
    Join Date
    Sep 2008
    Posts
    26

    Question Help is needed to read text from a file

    Hi all:

    I have been given a text file full of text pattern as shown below:


    # VIEW OBJFILE = iiomsview16.V

    VIEW II_STD_HEADER_T

    # TYPE CNAME FBNAME COUNT FLAG SIZE NULL VALUE
    # ----- -------- --------- ----- ---- ---- ----------

    string hd_cd1 HEADER_CD 1 - 13 " "
    string hd_text HEADER_TEXT 1 - 41 " "



    I need to read this text file content and put into a customized ADT. One of the rules while reading from this file is that when a line started with a '#' character, the line is treated as comment and ignored.

    My plan is:
    Using fgetc() function to read in the characters from the file one by one;

    My question is:
    How do I detect end of a line or a "tab" character or the end of the file? They are NOT written in the ASNII chart, I think.

    Can anyone help me please?

    Thank you

  2. #2
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    Quote Originally Posted by yuzhangoscar View Post
    Hi all:

    I have been given a text file full of text pattern as shown below:


    # VIEW OBJFILE = iiomsview16.V

    VIEW II_STD_HEADER_T

    # TYPE CNAME FBNAME COUNT FLAG SIZE NULL VALUE
    # ----- -------- --------- ----- ---- ---- ----------

    string hd_cd1 HEADER_CD 1 - 13 " "
    string hd_text HEADER_TEXT 1 - 41 " "



    I need to read this text file content and put into a customized ADT. One of the rules while reading from this file is that when a line started with a '#' character, the line is treated as comment and ignored.

    My plan is:
    Using fgetc() function to read in the characters from the file one by one;

    My question is:
    How do I detect end of a line or a "tab" character or the end of the file? They are NOT written in the ASNII chart, I think.

    Can anyone help me please?

    Thank you
    The end of line is the '\n' character, the tab is the '\t' character and the end of file the EOF character. Test for those characters.
    You can also look up fgets() that reads a whole line. Or other functions.
    So if you want end of file you just do for example buffer[i] == EOF, newline buffer[i] == '\n'

  3. #3
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    end of file the EOF character
    actually - EOF is return code of fgetc that indicates read error, this is not a character and this is the reason fgetc returns int so the return values are any character from 0 to 255 AND EOF
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  4. #4
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Is this illustrative?

    Code:
    #include <stdio.h>
    
    int main() {
            int chr;
            FILE *fstRO = fopen("/tmp/thefile.txt", "r");
            while ((chr=fgetc(fstRO)) != -1) {
                    if (chr == '#') while (1) if (chr=fgetc(fstRO) == '\n') {
                                    chr=fgetc(fstRO);
                                    if (chr == '#') continue;
                                    else break;}
                    printf("&#37;c", chr);
            }   
           fclose(fstRO);
    }
    someone else might help you with the "Abstract Data Type" stuff

    ps. in this "EOF" is represented by the "-1" from fgetc
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Code:
                    if (chr == '#') while (1) if (chr=fgetc(fstRO) == '\n') {
                                    chr=fgetc(fstRO);
                                    if (chr == '#') continue;
                                    else break;}
    Huh?
    Why not just do:
    Code:
    if (chr == '#') 
        while(fgect() != '\n')   ; 
    else
       putchar(chr);
    There is certainly no need to use continue the way you do - just use
    Code:
    if (chr != '#' ) break;
    The loop will continue anyways.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by matsp View Post
    Huh?
    Why not just do:
    Code:
    if (chr == '#') 
        while(fgect() != '\n')   ; 
    else
       putchar(chr);
    Aha -- I tested this BEFORE I posted it. The "if" trap needs to be before the putchar or printf to catch '#', AND it needs to catch the \n from the end of a comment. Your snippet above will 1) print this \n 2) also print a "#" if it starts on a line after a line starting with "#", then (having missed the trap) continue printing the entire comment.

    #2 (no pun!) is the reason for the continue, otherwise we should break.

    See what I mean?
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  7. #7
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    And what will happen if the last line of the file begins with the # and does not have \n at the end?
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  8. #8
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by vart View Post
    And what will happen if the last line of the file begins with the # and does not have \n at the end?
    yep. there should be a line like this after else break;

    Code:
    else if (chr == 0) break;
    ASCII decimal 0 = '\0' (the null terminator, the end of the line)
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  9. #9
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    the null terminator, the end of the line
    2 different things
    '\0' - C-string terminator
    '\n' - line terminator

    But what is missing - check for EOF condition
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  10. #10
    Registered User
    Join Date
    Sep 2006
    Posts
    230
    Quote Originally Posted by MK27 View Post
    yep. there should be a line like this after else break;

    Code:
    else if (chr == 0) break;
    ASCII decimal 0 = '\0' (the null terminator, the end of the line)
    Better to just compare chr with '\0'. Makes it clearer, and more portable.
    I might not be a pro, but I'm usually right

  11. #11
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by vart View Post
    2 different things
    '\0' - C-string terminator
    '\n' - line terminator
    '\n' is certainly NOT a "line terminator", but nomenclature aside:

    Quote Originally Posted by vart View Post
    check for EOF condition
    from my post #4:
    in this "EOF" is represented by the "-1" from fgetc
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  12. #12
    Registered User
    Join Date
    Sep 2008
    Posts
    26
    Thank you all. I have solved my problem, cheers everyone.

  13. #13
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    from my post #4:
    you have more when 1 fgetc in the code - only one is checked

    EOF could be any other negative value, standard does not require it to be -1
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Inventory records
    By jsbeckton in forum C Programming
    Replies: 23
    Last Post: 06-28-2007, 04:14 AM
  2. How to read chars from a text file and use them in TChart
    By Bachatero in forum C++ Programming
    Replies: 1
    Last Post: 08-29-2006, 04:03 PM
  3. getline function to read 1 line from a text file
    By kes103 in forum C++ Programming
    Replies: 3
    Last Post: 10-21-2004, 06:21 PM
  4. Read word from text file (It is an essay)
    By forfor in forum C Programming
    Replies: 7
    Last Post: 05-08-2003, 11:45 AM
  5. simulate Grep command in Unix using C
    By laxmi in forum C Programming
    Replies: 6
    Last Post: 05-10-2002, 04:10 PM