Thread: Finding the file size.

  1. #1
    Registered User
    Join Date
    Sep 2007
    Posts
    69

    Finding the file size.

    Hello,

    I would like to read a file and store it in an array. But I want to read the whole file, so when I specify the number of bytes to read, I want it to read the whole file. How could I go about reading the whole file instead of just a number of bytes? Thank you!

    (Im trying to write something close to a "cat" command in Unix)

  2. #2
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Get the size of the file in bytes, with fseek() and ftell() then allocate enough memory to hold the file then read it in with fread().

    Or something like http://msdn2.microsoft.com/en-us/library/ms810613.aspx

  3. #3
    Cogito Ergo Sum
    Join Date
    Mar 2007
    Location
    Sydney, Australia
    Posts
    463
    != Eof ?

  4. #4
    Registered User
    Join Date
    Sep 2007
    Posts
    69
    I wrote this but it refuses to work and gives me a segfault all the time.


    Code:
     int   fd;
      int   buffer_size;
      char  *buffer;
      int   i;
    
      i = 0;
      fd = open("main.c", O_RDONLY);
      if (fd == -1)
        {
          return (-1);
        }
      buffer_size = lseek(fd, 0, SEEK_END);
      buffer = malloc((buffer_size + 1) * sizeof(*buffer));
      read(fd, buffer, buffer_size);
    }

  5. #5
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Check the return value of malloc(), it could be failing. Why not get the source of 'cat' and take a look?

    I'd suggest using mmap'd files for this, it seems like a big waste of memory

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by +Azazel+ View Post
    Hello,

    I would like to read a file and store it in an array. But I want to read the whole file, so when I specify the number of bytes to read, I want it to read the whole file. How could I go about reading the whole file instead of just a number of bytes? Thank you!

    (Im trying to write something close to a "cat" command in Unix)
    If you are actually trying to replicate cat, then I would think the following would do:
    Code:
    int main(int argc, char **argv)
    {
       int i;
       char buffer[1000];
       for(i = 1; i < argc; i++) {
          FILE *f = fopen(argv[i], "r");
          if (!f) {
             perror(argv[i]);
             return 1;
          } else {
             while(fgets(buffer, sizeof(buffer), f)) {
                fputs(buffer, f);
             }
          }
       }
       return 0;
    }
    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > (Im trying to write something close to a "cat" command in Unix)
    Except cat doesn't care about the size of the file (it has no need to). Indeed, cat works with devices and pipes where there is no concept of a file size.

    Allocating a massive block of memory just to store the file for a few seconds is just wrong.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by Salem View Post
    > (Im trying to write something close to a "cat" command in Unix)
    Except cat doesn't care about the size of the file (it has no need to). Indeed, cat works with devices and pipes where there is no concept of a file size.

    Allocating a massive block of memory just to store the file for a few seconds is just wrong.
    Not to mention that I have quite often done something like:
    Code:
    $> cat somefile|grep sometihing|grep -v soemthingelse
    where somefile may well be MUCH larger than the amount of RAM + Swap that my system has [because somefile is some type of logfile that has been running for ages, and built up to many gigabytes]. cat should cope with "cat /dev/hda1", where hda1 is a 250GB hard-disk [obviously it shouldn't be mounted, otherwise you may find it problematic]

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    Registered User
    Join Date
    Sep 2007
    Posts
    69
    Big thanks.

    I modified the code abit to get this:

    Code:
    #include <unistd.h>
    #include <fcntl.h>
    
    int     main(int argc, char **argv)
    {
      int   i;
      int   fd;
      char  buffer[1000];
    
      i = 1;
      if (argc <= 1)
        {
          my_putstr("Usage: Please enter 1 or more parameters");
          my_putstr("\n");
        }
      while (i < argc)
        {
          fd = open(argv[i], O_RDONLY);
          if (!fd)
            {
              my_putstr("Error reading file");
              my_putstr("\n");
            }
          else
            {
              while (read(fd, buffer, sizeof(buffer)))
                my_putstr(buffer);
            }
          i++;
        }
      return (0);
    }
    But in the end of each file I cat, apart from showing the whole file, it gives me these symbols in the end:

    &#207;(&#235;&#191;&#191;(

    What could be causing this?

    Also if the second file is smaller than the first, it not only shows the second but parts of the first mixed in. I think I need to clear the buffer each time? or?
    Last edited by +Azazel+; 10-16-2007 at 06:56 AM.

  10. #10
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    read() doesn't append a \0.
    What you should be using is the return result of read() to tell you how many bytes to write.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  11. #11
    Registered User
    Join Date
    Sep 2007
    Posts
    69
    I dont really understand. Read returns the number of bytes read. But I need to specify first how many bytes I want to read (which seems a problem to me). Would I choose randomly? Read the first 50 bytes and write those bytes, but then like you said read doesnt append a '\0' so how do I know when to stop?

  12. #12
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    There are two problems here:
    1. You read 1000 bytes at a time, and then print it with "myputstr". Presumably, myputstr prints a string, meaning that the data sent to myputstr would have to be zero-terminated to indicate the size of the string. Currently, there is no code to zero terminate the string, so it continues printing beyond the end of the string until it finds some other zero in the memory.

    2. If there is less than 1000 bytes [left] in the file, then you need to NOT print beyond what has been read - that will just be whatever happens to be in your buffer, which is most likely some previously read stuff. Since read gives you the number of bytes ACTUALLY read, it is pretty easy to just stick the zero at the end of the string.

    One should of course note that the original cat is not dealing with the content as strings - it is just one long binary stream. It is perfectly valid to do
    "cat /boot/vmlinuz|gzip -c - |od -x|grep -n 0x1F".
    If you read a binary file, [I presume] your myputstr will stop printing the content of the file when it reaches a zero in the input string - which is not how the original cat works.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  13. #13
    Registered User
    Join Date
    Sep 2007
    Posts
    69
    I understand, so is there any way to go around this?

  14. #14
    Registered User
    Join Date
    Sep 2007
    Posts
    69
    Ohhh, nevermind. It works right now

    Heres the code:

    Code:
    #include <unistd.h>
    #include <fcntl.h>
    
    int     main(int argc, char **argv)
    {
      int   i;
      int   x;
      int   b;
      int   fd;
      char  buffer[1];
    
      i = 1;
      b = 1;
      x = 1;
      if (argc <= 1)
        {
          my_putstr("Usage: Please enter 1 or more parameters");
          my_putstr("\n");
        }
      while (i < argc)
        {
          fd = open(argv[i], O_RDONLY);
          if (!fd)
            {
              my_putstr("Error reading file");
              my_putstr("\n");
            }
          else
            {
              while (x)
                {
                  x = read(fd, buffer, sizeof(buffer));
                  lseek(fd, b, SEEK_SET);
                  my_putstr(buffer);
                  b++;
                }
            }
          i++;
        }
      return (0);
    }
    But it refuses to work for more than one parameter.
    Last edited by +Azazel+; 10-16-2007 at 08:20 AM.

  15. #15
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    That is TERRIBLY inefficient [and you shouldn't have a lseek, as that will not work on devices that don't have a concept of filesize (or otherwise can't be "seeked")].

    Your original approach is much better - just check how many bytes came back from read(), and set a zero in your string at that point [make your buffer 1001 bytes].

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. To find the memory leaks without using any tools
    By asadullah in forum C Programming
    Replies: 2
    Last Post: 05-12-2008, 07:54 AM
  2. Can we have vector of vector?
    By ketu1 in forum C++ Programming
    Replies: 24
    Last Post: 01-03-2008, 05:02 AM
  3. Post...
    By maxorator in forum C++ Programming
    Replies: 12
    Last Post: 10-11-2005, 08:39 AM
  4. Dikumud
    By maxorator in forum C++ Programming
    Replies: 1
    Last Post: 10-01-2005, 06:39 AM
  5. Replies: 11
    Last Post: 03-25-2003, 05:13 PM