Thread: Detecting end of LINE in text files

  1. #1
    Registered User
    Join Date
    Dec 2008
    Posts
    15

    Detecting end of LINE in text files

    Hello i'm a new programmer in C (my first language). I've got a problem while reading a file. I'd like to know the lenght of a file line in order to use fseek later, but i can't find a way to do this. Using fscanf(f, %c, &p) printf("%d", p) I get the ascii code of the character read, but it's a crazy number, sometimes prints 0, sometimes 104, usually the code of the last "good" character read... I know that '\n'=10, and blank space=32, so i've tried also something like if(p='\n') system("pause")... but notthing happens so p don't have to be '\n'. Well, i'd also like to know, is possible, the ascii code for EOF. I know it's -1, but when we ask while(fscanf(.....)!=EOF)), how does fscanf know that has reached the end of file and return -1?

    Thanks in advance.

  2. #2
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    Code:
    count = 0;
    while ((c = getc(stdin)) != '\n')
       count++;
    printf("%d", d);
    stdin is the standard input, thus the console. count will be augmented until c == '\n', thus until the newline is reached.

    When you print a character you do this:
    Code:
    printf(%c", p);
    %d prints an integer

    As for fscanf, it returns the number of characters read or something else on an error. If it reaches the end of file it returns EOF.
    How does it knows? Well, all functions that read use a file pointer. The file pointer points at a character in the file. Everytime a character is read the file pointer points to the next character. That's how the functions know when they reached the end of file.

    There is no sense using fscanf and read only one character. As shown above getc() does this more simple.

    edit: stdin is the console. You can think the console as a "file". A file is just a sequential stream of characters

  3. #3
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Can you use stat in non *nix environments? I'm not sure. Anyway, stat can be used to get information from the filesystem about (!) files. It puts the information in a "struct stat" datatype. This includes more than just the size, you'll have to look at the stat documentation:
    Code:
    #include <stdio.h>
    #include <sys/stat.h>
    
    
    int main(int argc, char *argv[]) {
    	struct stat info;
    	if (argc<2) {puts("Filename required");return -1;}
    	if (stat(argv[1],&info)<0) {perror("stat() error");return -2;}
    	printf("Size of %s: %d bytes\n",argv[1],info.st_size);
    	return 0;
    }
    "info" is my name for the struct stat. "info.st_size" is the member of struct stat info that contains the file size in bytes. "perror" is for interpreting error code; try using an invalid filename and you'll see how it works.

    This method might be faster than the one suggested by C_ntua if the file is very big.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    stat() is not a standard library function, so it's not portable - it may exist on SOME other OS's than *nix, but by far not guaranteed.

    You can do he same thing by:
    Code:
    offset_t filelen(FILE *f)
    {
         offset_t curpos = ftell(f);
         offset_t len;
         fseek(f, SEEK_END, 0);
         len = ftell(f);
         fseek(f, SEEK_BEGIN, curpos);
         return len;
    }
    Note however that on some systems, the length in bytes, and the total number of bytes that you'd read with getc() or similar isn't the same thing - newline stored in a file, in some systems, is two characters combined, whilst in other systems it's only one character. In getc() and friends, it is ALWAYS one character - so 20 lines worth of file may be 20 bytes shorter than the actual size of the file - never more than the actual size tho'.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Registered User
    Join Date
    Aug 2008
    Location
    Belgrade, Serbia
    Posts
    163
    So, what getc() does, returning '\n' for CR+LF, is a bad thing because those two characters are counted separately in the actual size of the file.
    Vanity of vanities, saith the Preacher, vanity of vanities; all is vanity.
    What profit hath a man of all his labour which he taketh under the sun?
    All the rivers run into the sea; yet the sea is not full; unto the place from whence the rivers come, thither they return again.
    For in much wisdom is much grief: and he that increaseth knowledge increaseth sorrow.

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by hauzer View Post
    So, what getc() does, returning '\n' for CR+LF, is a bad thing because those two characters are counted separately in the actual size of the file.
    Yes, it's bad if you intend on counting characters and jump around in the file using seek - but only then - and that's not a regular activity.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Registered User
    Join Date
    Dec 2008
    Posts
    15
    Hi and thanks for all your answers. I have tried many other ways as getc but get the same problem. Here is a link to a pic with my code explained and the console response. The file "data.txt" is new and void (i haven't opened it), so it should contain just the end of line char. Here is the link : http://img177.imageshack.us/my.php?image=datagr9.jpg

    I've tried with other files for example creating it, pressing enter (new line) and closing, and i succesfully get the code read (10 with is '\n' in ascii), and also done with a blank space (in that case i get 32 wich is the space char) but i have never got the end of line char. I have also opened a file, put some words, and written fseek(f, the pos of the final of the last letter in the (only) line and then scanned, but as happens in the picture, it shows the last "well read" char of the last word).

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. help again with scrolling without wrapping
    By Dukefrukem in forum C Programming
    Replies: 8
    Last Post: 09-21-2007, 12:48 PM
  2. Text adventure engine idea. thoughts?
    By suzakugaiden in forum Game Programming
    Replies: 16
    Last Post: 01-15-2006, 05:13 AM
  3. Question About Blank Lines in Text Files
    By Zildjian in forum C++ Programming
    Replies: 11
    Last Post: 10-16-2004, 04:31 PM
  4. text files & notepad problem
    By bigtamscot in forum C Programming
    Replies: 2
    Last Post: 05-01-2003, 04:41 PM
  5. text line termination
    By Unregistered in forum C Programming
    Replies: 3
    Last Post: 09-09-2001, 04:39 AM