Thread: What mean filedescriptor error-codes?

  1. #16
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,674
    Mmm, so it is.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  2. #17
    Registered User
    Join Date
    Sep 2007
    Posts
    26
    Hi,
    Quote Originally Posted by matsp View Post
    And you should DEFINITELY close the file after it's been opened.
    So do I: syscall(__NR_close,fd);

    Quote Originally Posted by Salem View Post
    > for ( buffer_len=1; buffer[buffer_len] != '\0'; buffer_len++);
    As said this is strlen without including string.h .

    Quote Originally Posted by matsp View Post
    read doesn't append a \0, so it's wrong to use strlen() to go looking for it.
    Since read() returns a result (which you're ignoring), perhaps you could pick that up and use that to determine how much to write.
    [..]
    If it's a text file and there's more than 100 bytes in the file, it will fail to have a zero at the end. [And of course, the string could be EMPTY, in whcih case starting at 1 could give the wrong result.
    I can't follow that? I created a 182kb textfile and have resized my buffers and everything works fine. What do you mean with there's not always \0 representing the eof? What else? Or how could I achivie this by using syscalls? You say read is giving something back? I mean how works strlen:
    Code:
    static inline size_t strlen(const char * s)
    {
    int d0;
    register int __res;
    __asm__ __volatile__(
            "repne\n\t"
            "scasb\n\t"
            "notl %0\n\t"
            "decl %0"
            :"=c" (__res), "=&D" (d0)
            :"1" (s),"a" (0), "0" (0xffffffffu)
            :"memory");
    return __res;
    }
    I don't understand this asm+C mixture totally but the line starting with :"1" looks for me like checking for wheter \a or \0 depending on if it's a string or a file?


    @ the why-question for this all: I'm just playing around I found a paper explaining how to use syscalls in kernelmodules, so I wanted to learn how to invoke them from C and just wanted to play around a bit with them. Next to the fact that calling the syscall directly should be faster or?

    greets

  3. #18
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,674
    If you have a 10 byte file, then 10 bytes is all that will get copied to your buffer.

    The 11th character will NOT be a \0 or some representation of end of file or anything.

    The only way to determine how much data was written to the buffer is to look at the return result of the read syscall.

    > but the line starting with :"1" looks for me like checking for wheter \a
    It's nothing of the sort, but I'm not going to explain the intricacies of the GNU assembler syntax. You'll need to google something for yourself.
    Suffice to say, what it does do is provide a kind of bridge between the C part of the code and the asm part of the code (where to read values from, and where to store them).
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #19
    Registered User
    Join Date
    Sep 2007
    Posts
    26
    Okay I did now like suggested and a few other things, so I think now there shouldn't be any errors:
    Code:
    #include </usr/include/linux/unistd.h>
    #include </usr/include/asm/stat.h>
    
    int main(int argc, char *argv[1]) {
    
            int fd,len;
            struct stat mystat ;
            
            fd = syscall(__NR_open,argv[1],00,04000) ;
            if (fd==-1) {
                    syscall(__NR_write,1,"No such file\n",13);
                    return 1;
                             }
    
            syscall(__NR_fstat,fd,&mystat);
            char buffer[mystat.st_size];
    
            len = syscall(__NR_read,fd,buffer,mystat.st_size);
            syscall(__NR_close,fd);
            syscall(__NR_write,1,buffer,len);
    
            return 0;
    }
    thanks for all youre help!

  5. #20
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,674
    > char buffer[mystat.st_size];
    Variable length arrays, and declarations in the middle of statements are both outside the scope of C89.

    Also, if you have a file of many MB in size, you've just blown away the stack.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #21
    Registered User
    Join Date
    Sep 2007
    Posts
    26
    Quote Originally Posted by Salem View Post
    > char buffer[mystat.st_size];
    Variable length arrays, and declarations in the middle of statements are both outside the scope of C89.
    Really? I thought this would be a good way of reading files with unknown length.

    Okay I got now behind, that I have to use gcc -pedantic to get these kind of warnings you are telling about.

    Quote Originally Posted by Salem View Post
    Also, if you have a file of many MB in size, you've just blown away the stack.
    I didn't realize that because I thought there's a max size arrays could be and after that it would simply stop allocating memmory.

    But what is then the best or better correct way of reading a file with unknown size? Getting its size and reading then block for block?

    greets

  7. #22
    Technical Lead QuantumPete's Avatar
    Join Date
    Aug 2007
    Location
    London, UK
    Posts
    894
    Quote Originally Posted by lilcoder View Post
    But what is then the best or better correct way of reading a file with unknown size? Getting its size and reading then block for block?
    Yep, pretty much. The standard way would be to read the file one line at a time, do your processing and then read the next line in.

    QuantumPete
    "No-one else has reported this problem, you're either crazy or a liar" - Dogbert Technical Support
    "Have you tried turning it off and on again?" - The IT Crowd

  8. #23
    Registered User
    Join Date
    Sep 2007
    Posts
    26
    Quote Originally Posted by QuantumPete View Post
    Yep, pretty much. The standard way would be to read the file one line at a time, do your processing and then read the next line in.
    But a line has not definite size or? So a 10 MB file could have just one line and never contain \n ? I think I'm gonna try to accomplish this now with sys_readahead:
    Populates the page cache with data from a file so that subsequent reads from that file will not block on disk I/O.

    Arguments
    eax 225
    ebx File descriptor identifying the file which is to be read.
    ecx Starting offset from which data is to be read. This value is effectively rounded down to a page boundary and bytes are read up to the next page boundary greater than or equal to (ecx+edx)
    edx Number of bytes to be read.
    But it seems I can't access this page cache then, there no variable I can give it and readv also expects a certain number of buffers, so also this seems useless to read a file of unkown size..
    Last edited by lilcoder; 09-24-2007 at 05:31 AM.

  9. #24
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,674
    Use a fixed sized buffer, and read that amount.
    The constant BUFSIZ in stdio.h is a good example size.

    Say for argument that the OS restricts the max working set of your process to 50MB and you try to read a 100MB file in one go. What you end up with is the first half of the file written back out to the swap file and the second half in memory.

    Then when you come to write it out again, the first half is read from swap and the 2nd half is written to swap. And so on.

    > But what is then the best or better correct way of reading a file with unknown size?
    Reading fixed sized buffers in a while loop, and detecting when end of file is reached.
    Code:
    char buff[BUFSIZ];
    int n;
    while ( (n=read(fd,buff,BUFSIZ)) > 0 ) write(1,buff,n);
    > So a 10 MB file could have just one line and never contain \n ?
    I've seen XML files like this before now.
    Also binary files have no concept of 'line', but have some other structure instead.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  10. #25
    Registered User
    Join Date
    Sep 2007
    Posts
    26
    Yes the while loop is a good solution and works really well:
    Code:
    #include </usr/include/linux/unistd.h>
    
    #define BUFSIZE 4092
    
    int main(int argc, char *argv[1]) {
             
       
            char buffer[BUFSIZE];
            int fd,len;
    
            if (argc!=2) {
                    syscall(__NR_write,1,"Usage: macro [filename]\n",24);
                    return 0;
                                 } 
            
            fd = syscall(__NR_open,argv[1],00,04000) ;
    
            if (fd==-1) {
                    syscall(__NR_write,1,"No such file\n",13);
                    return 0;
                               }
    
            while ((len=syscall(__NR_read,fd,buffer,BUFSIZE))>0) { syscall(__NR_write,1,buffer,len);}
     
            syscall(__NR_close,fd);
          
    
            return 0;
    }

  11. #26
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    And if you take your code and read a file of (say) 100MB, how long does it tak, and how long does "cat inputfile" take [inputfile obviously should be replaced by the name of your 100MB file.]

    Use "time", such as "time cat inputfile" to show you how fast it was.

    Just to see - I haven't tried it myself, I'm just curious.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  12. #27
    Registered User
    Join Date
    Sep 2007
    Posts
    26
    Hi okay I did a ~60MB Zip-file for this test now.

    Code:
    time cat ind.zip
    [..]
    real   2m31.340s
    user   0m00.000s
    sys    2m30.089s
    
    time ./macro ind.zip
    [..]
    real   2m34.212s
    user   0m00.004s
    sys    2m33.358s
    But I dunno how relyable this test is. Not pretty much or what do you think?

    I took here the best clocktick for 1000 tries, one time for sys_write and the other time for printf. I think this is a better method than the above:
    Code:
    #include </usr/include/linux/unistd.h>
    #include <stdio.h>
    #include </usr/src/linux-headers-2.6.20-16/include/asm-i386/msr.h>
    
    int main(void) {
    
       unsigned long ini, end, now, best, tsc;
       int i;
       char buffer[4];
    
    #define measure_time(code) \
       for (i = 0; i < 10000; i++) { \
          rdtscl(ini); \
             code; \
          rdtscl(end); \
          now = end - ini; \
          if (now < best) best = now; \
       }
    
       /* time rdtsc (i.e. no code) */
       best = ~0;
       measure_time( 0 );
       tsc = best;
    
       /* time an empty read() */
       best = ~0;
      // measure_time(  syscall(4,1,"test1\n",6) );
       measure_time(  printf("test2\n") );
    
       /* report data */
       printf("rdtsc: %li ticks\nsyscall():%liticks\n",
       tsc, best-tsc);
          
       return 0;
    }
    Now I get for the syscall_write:
    rdtsc: 192 ticks
    syscall():3396ticks
    and for printf:
    rdtsc: 192 ticks
    syscall():4092ticks
    Which is an expected result because calling syscalls should of course be faster than calling the libC first which then calls the syscalls.

  13. #28
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Well, my point was rather that cat is faster than your code (admittedly by only a fraction of the total time) - despite cat being portable and using C runtime library calls.

    I think it would be fairer to use fputs() or fwrite() rather than printf() to compare the time for your application. The printf call is MUCH more complex than a simple write to a file - even if at the end it is what it does - for example, printf does essentially this:

    Code:
    int printf(const char *fmt, ...) {
    .... 
       while(*fmt) {
           if (*fmt == '%') {
               *fmt++;
               switch(*fmt) {
                   case 's':
                       // a bunch of code. 
                       ...
                       break;
                   case 'd': 
                       // a bunch of code. 
                       ...
                       break;
    ... // Lots more case labels. 
                    default:
                }
             }
             else 
             {
                   // output char from *fmt
             }
             fmt++;
          }
    }
    Also, if you use /dev/null as your output device, e.g. "time cat blah.zip > /dev/null", you don't have the scrolling of the screen to take into account. The same applies to printf & write of course.

    As you've just proven, a big van with load is slower than an ordinary car - but there's really should be no one that will be surprised by this.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #29
    Registered User
    Join Date
    Sep 2007
    Posts
    26
    fputs doesn't seem much faster than printf. It's best result was 4032ticks over 1000 tries..

    I'll redo the time-test this night with closed X-server and no network-connection to bypass any irregularities.

  15. #30
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,674
    With processor clock times now measuring <1nS and disk seek times still well above 1mS, there is somewhere between 6 and 7 orders of magnitude of difference between them.

    Tinkering with the code when you're dealing with physical devices just isn't going to work. You'll never get your 2 minutes down to say 5 seconds without changing your storage technology.

    For sure, you may see some improvement if you manage to get lucky and your small file is resident in some cache (or contiguous within the same cylinder on disk). But any sufficiently large file will always be playing catchup with the processor. As soon as you end up with a disk seek, that's it.

    Those awesome disk transfer times the ads throw at you (hundreds of MB/Sec) are only really sustainable when the heads don't move. It's the seek time which kills you.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed