Mmm, so it is.
Mmm, so it is.
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
If at first you don't succeed, try writing your phone number on the exam paper.
Hi,
So do I: syscall(__NR_close,fd);
As said this is strlen without including string.h .
I can't follow that? I created a 182kb textfile and have resized my buffers and everything works fine. What do you mean with there's not always \0 representing the eof? What else? Or how could I achivie this by using syscalls? You say read is giving something back? I mean how works strlen:
I don't understand this asm+C mixture totally but the line starting with :"1" looks for me like checking for wheter \a or \0 depending on if it's a string or a file?Code:static inline size_t strlen(const char * s) { int d0; register int __res; __asm__ __volatile__( "repne\n\t" "scasb\n\t" "notl %0\n\t" "decl %0" :"=c" (__res), "=&D" (d0) :"1" (s),"a" (0), "0" (0xffffffffu) :"memory"); return __res; }
@ the why-question for this all: I'm just playing aroundI found a paper explaining how to use syscalls in kernelmodules, so I wanted to learn how to invoke them from C and just wanted to play around a bit with them. Next to the fact that calling the syscall directly should be faster or?
greets
If you have a 10 byte file, then 10 bytes is all that will get copied to your buffer.
The 11th character will NOT be a \0 or some representation of end of file or anything.
The only way to determine how much data was written to the buffer is to look at the return result of the read syscall.
> but the line starting with :"1" looks for me like checking for wheter \a
It's nothing of the sort, but I'm not going to explain the intricacies of the GNU assembler syntax. You'll need to google something for yourself.
Suffice to say, what it does do is provide a kind of bridge between the C part of the code and the asm part of the code (where to read values from, and where to store them).
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
If at first you don't succeed, try writing your phone number on the exam paper.
Okay I did now like suggested and a few other things, so I think now there shouldn't be any errors:
thanks for all youre help!Code:#include </usr/include/linux/unistd.h> #include </usr/include/asm/stat.h> int main(int argc, char *argv[1]) { int fd,len; struct stat mystat ; fd = syscall(__NR_open,argv[1],00,04000) ; if (fd==-1) { syscall(__NR_write,1,"No such file\n",13); return 1; } syscall(__NR_fstat,fd,&mystat); char buffer[mystat.st_size]; len = syscall(__NR_read,fd,buffer,mystat.st_size); syscall(__NR_close,fd); syscall(__NR_write,1,buffer,len); return 0; }
> char buffer[mystat.st_size];
Variable length arrays, and declarations in the middle of statements are both outside the scope of C89.
Also, if you have a file of many MB in size, you've just blown away the stack.
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
If at first you don't succeed, try writing your phone number on the exam paper.
Really? I thought this would be a good way of reading files with unknown length.
Okay I got now behind, that I have to use gcc -pedantic to get these kind of warnings you are telling about.
I didn't realize that because I thought there's a max size arrays could be and after that it would simply stop allocating memmory.
But what is then the best or better correct way of reading a file with unknown size? Getting its size and reading then block for block?
greets
"No-one else has reported this problem, you're either crazy or a liar" - Dogbert Technical Support
"Have you tried turning it off and on again?" - The IT Crowd
But a line has not definite size or? So a 10 MB file could have just one line and never contain \n ? I think I'm gonna try to accomplish this now with sys_readahead:
But it seems I can't access this page cache then, there no variable I can give it and readv also expects a certain number of buffers, so also this seems useless to read a file of unkown size..Populates the page cache with data from a file so that subsequent reads from that file will not block on disk I/O.
Arguments
eax 225
ebx File descriptor identifying the file which is to be read.
ecx Starting offset from which data is to be read. This value is effectively rounded down to a page boundary and bytes are read up to the next page boundary greater than or equal to (ecx+edx)
edx Number of bytes to be read.
Last edited by lilcoder; 09-24-2007 at 05:31 AM.
Use a fixed sized buffer, and read that amount.
The constant BUFSIZ in stdio.h is a good example size.
Say for argument that the OS restricts the max working set of your process to 50MB and you try to read a 100MB file in one go. What you end up with is the first half of the file written back out to the swap file and the second half in memory.
Then when you come to write it out again, the first half is read from swap and the 2nd half is written to swap. And so on.
> But what is then the best or better correct way of reading a file with unknown size?
Reading fixed sized buffers in a while loop, and detecting when end of file is reached.
> So a 10 MB file could have just one line and never contain \n ?Code:char buff[BUFSIZ]; int n; while ( (n=read(fd,buff,BUFSIZ)) > 0 ) write(1,buff,n);
I've seen XML files like this before now.
Also binary files have no concept of 'line', but have some other structure instead.
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
If at first you don't succeed, try writing your phone number on the exam paper.
Yes the while loop is a good solution and works really well:
Code:#include </usr/include/linux/unistd.h> #define BUFSIZE 4092 int main(int argc, char *argv[1]) { char buffer[BUFSIZE]; int fd,len; if (argc!=2) { syscall(__NR_write,1,"Usage: macro [filename]\n",24); return 0; } fd = syscall(__NR_open,argv[1],00,04000) ; if (fd==-1) { syscall(__NR_write,1,"No such file\n",13); return 0; } while ((len=syscall(__NR_read,fd,buffer,BUFSIZE))>0) { syscall(__NR_write,1,buffer,len);} syscall(__NR_close,fd); return 0; }
And if you take your code and read a file of (say) 100MB, how long does it tak, and how long does "cat inputfile" take [inputfile obviously should be replaced by the name of your 100MB file.]
Use "time", such as "time cat inputfile" to show you how fast it was.
Just to see - I haven't tried it myself, I'm just curious.
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
Hi okay I did a ~60MB Zip-file for this test now.
But I dunno how relyable this test is. Not pretty much or what do you think?Code:time cat ind.zip [..] real 2m31.340s user 0m00.000s sys 2m30.089s time ./macro ind.zip [..] real 2m34.212s user 0m00.004s sys 2m33.358s
I took here the best clocktick for 1000 tries, one time for sys_write and the other time for printf. I think this is a better method than the above:
Now I get for the syscall_write:Code:#include </usr/include/linux/unistd.h> #include <stdio.h> #include </usr/src/linux-headers-2.6.20-16/include/asm-i386/msr.h> int main(void) { unsigned long ini, end, now, best, tsc; int i; char buffer[4]; #define measure_time(code) \ for (i = 0; i < 10000; i++) { \ rdtscl(ini); \ code; \ rdtscl(end); \ now = end - ini; \ if (now < best) best = now; \ } /* time rdtsc (i.e. no code) */ best = ~0; measure_time( 0 ); tsc = best; /* time an empty read() */ best = ~0; // measure_time( syscall(4,1,"test1\n",6) ); measure_time( printf("test2\n") ); /* report data */ printf("rdtsc: %li ticks\nsyscall():%liticks\n", tsc, best-tsc); return 0; }
and for printf:rdtsc: 192 ticks
syscall():3396ticks
Which is an expected result because calling syscalls should of course be faster than calling the libC first which then calls the syscalls.rdtsc: 192 ticks
syscall():4092ticks
Well, my point was rather that cat is faster than your code (admittedly by only a fraction of the total time) - despite cat being portable and using C runtime library calls.
I think it would be fairer to use fputs() or fwrite() rather than printf() to compare the time for your application. The printf call is MUCH more complex than a simple write to a file - even if at the end it is what it does - for example, printf does essentially this:
Also, if you use /dev/null as your output device, e.g. "time cat blah.zip > /dev/null", you don't have the scrolling of the screen to take into account. The same applies to printf & write of course.Code:int printf(const char *fmt, ...) { .... while(*fmt) { if (*fmt == '%') { *fmt++; switch(*fmt) { case 's': // a bunch of code. ... break; case 'd': // a bunch of code. ... break; ... // Lots more case labels. default: } } else { // output char from *fmt } fmt++; } }
As you've just proven, a big van with load is slower than an ordinary car - but there's really should be no one that will be surprised by this.
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
fputs doesn't seem much faster than printf. It's best result was 4032ticks over 1000 tries..
I'll redo the time-test this night with closed X-server and no network-connection to bypass any irregularities.
With processor clock times now measuring <1nS and disk seek times still well above 1mS, there is somewhere between 6 and 7 orders of magnitude of difference between them.
Tinkering with the code when you're dealing with physical devices just isn't going to work. You'll never get your 2 minutes down to say 5 seconds without changing your storage technology.
For sure, you may see some improvement if you manage to get lucky and your small file is resident in some cache (or contiguous within the same cylinder on disk). But any sufficiently large file will always be playing catchup with the processor. As soon as you end up with a disk seek, that's it.
Those awesome disk transfer times the ads throw at you (hundreds of MB/Sec) are only really sustainable when the heads don't move. It's the seek time which kills you.
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
If at first you don't succeed, try writing your phone number on the exam paper.