Thread: Multiple Pages in a Txt File

  1. #16
    Deathray Engineer MacGyver's Avatar
    Join Date
    Mar 2007
    Posts
    3,210
    Quote Originally Posted by maxorator View Post
    If you replicate the efforts of C-library then you will get faster results.
    Only if you really write it more efficiently than the C library implementation you are currently using. Given your skill, I wouldn't bank on you being better than those that develop compilers, but you know what they say about blind squirrels.

    Code:
    char* lpData=new char[4096];
    DWORD dwCount=0,dwWritten;
    HANDLE hFile=CreateFileA("a.txt", GENERIC_WRITE, 0, NULL, CREATE_ALWAYS,0, 0);
    if(hFile!=INVALID_HANDLE_VALUE){
        for(int i=0;i<10000;i++){
            if(dwCount>=4096){
                WriteFile(hFile,lpData,dwCount,&dwWritten,0);
                dwCount=0;
            }
            lpData[dwCount]='*';
            dwCount++;
        }
        WriteFile(hFile,lpData,dwCount,&dwWritten,0);
        CloseHandle(hFile);
    }
    delete[] lpData;
    I think this proves my point.

  2. #17
    Reverse Engineer maxorator's Avatar
    Join Date
    Aug 2005
    Location
    Estonia
    Posts
    2,318
    Quote Originally Posted by MacGyver View Post
    Only if you really write it more efficiently than the C library implementation you are currently using. Given your skill, I wouldn't bank on you being better than those that develop compilers, but you know what they say about blind squirrels.
    I know what I'm talking about. Having viewed the massive bunch of disassembled CRT code (msvcrt.dll) I can guarantee you that doing it myself IS faster.

    Does anyone know a way to count CPU cycles?
    Last edited by maxorator; 08-25-2007 at 02:48 AM.
    "The Internet treats censorship as damage and routes around it." - John Gilmore

  3. #18
    Reverse Engineer maxorator's Avatar
    Join Date
    Aug 2005
    Location
    Estonia
    Posts
    2,318
    Code:
    void TestMyCode(){
        char* lpData=new char[4096];
        DWORD dwCount,dwWritten,dwTime;
        HANDLE hFile;
        dwTime=GetTickCount();
        for(int a=0;a<1000;a++){
            dwCount=0;
            hFile=CreateFileA("a.txt", GENERIC_WRITE, 0, NULL, CREATE_ALWAYS,0, 0);
            if(hFile!=INVALID_HANDLE_VALUE){
                for(int i=0;i<10000;i++){
                    if(dwCount>=4096){
                        WriteFile(hFile,lpData,dwCount,&dwWritten,0);
                        dwCount=0;
                    }
                    lpData[dwCount]='*';
                    dwCount++;
                }
                WriteFile(hFile,lpData,dwCount,&dwWritten,0);
                CloseHandle(hFile);
            }    
        }
        DisplayNumber(GetTickCount()-dwTime);
        delete[] lpData;
    }
    
    void TestCRT(){
        FILE *f;
        DWORD dwTime;
        dwTime=GetTickCount();
        for(int a=0;a<1000;a++){
            f=fopen("a.txt", "w");
            for(int i=0;i<10000;i++){
                fputc('*',f);
            }
            fclose(f);
        }
        DisplayNumber(GetTickCount()-dwTime);
    }
    CRT spent 1250/1265/1266/1375 ticks and my code spent 453/656/438/453. But actually I cheated a little bit. You see - I allocated the buffer once, CRT code for every time I opened a file and started writing to it. Let's make things more fair:
    Code:
    void TestMyCode(){
        char* lpData;
        DWORD dwCount,dwWritten,dwTime;
        HANDLE hFile;
        dwTime=GetTickCount();
        for(int a=0;a<1000;a++){
            lpData=(char*)HeapAlloc(GetProcessHeap(),HEAP_NO_SERIALIZE,4096);
            dwCount=0;
            hFile=CreateFileA("a.txt", GENERIC_WRITE, 0, NULL, CREATE_ALWAYS,0, 0);
            if(hFile!=INVALID_HANDLE_VALUE){
                for(int i=0;i<10000;i++){
                    if(dwCount>=4096){
                        WriteFile(hFile,lpData,dwCount,&dwWritten,0);
                        dwCount=0;
                    }
                    lpData[dwCount]='*';
                    dwCount++;
                }
                WriteFile(hFile,lpData,dwCount,&dwWritten,0);
                CloseHandle(hFile);
            }
            HeapFree(GetProcessHeap(),HEAP_NO_SERIALIZE,lpData);
        }
        Di(GetTickCount()-dwTime);
    }
    
    void TestCRT(){
        FILE *f;
        DWORD dwTime;
        dwTime=GetTickCount();
        for(int a=0;a<1000;a++){
            f=fopen("a.txt", "w");
            for(int i=0;i<10000;i++){
                fputc('*',f);
            }
            fclose(f);
        }
        Di(GetTickCount()-dwTime);
    }
    Results: CRT - 1250/1234/1375/1344 and direct WinAPI - 437/531/453/437. Seems like allocating memory takes absolutely no time at all compared to writing to hard drive.

    But it really does matter if CRT code manages to take 2/3 of the execution time even if I have SO slow hard drive.

    My suggestion: If you're bored, do fast code.
    Last edited by maxorator; 08-25-2007 at 05:10 AM.
    "The Internet treats censorship as damage and routes around it." - John Gilmore

  4. #19
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    maxorator: computers only have to be fast enough. I doubt the CRT was designed with optimizing so much minutae in mind. And I don't think outputting a few thousand asterisks is a fair benchmark of anyone's code, especially when you consider that I/O is always a bottleneck. The do-nothing program you wrote will always be as slow as I/O and frankly humans can afford to spend their time in that particular area.

  5. #20
    Deathray Engineer MacGyver's Avatar
    Join Date
    Mar 2007
    Posts
    3,210
    I think my favorite part of his example is picking an arbitrary amount of data to allocate for his Win32 function while BUFSIZ is being used in the CRT. Yes, let's benchmark two pieces of code with different buffer sizes.

    But still, regardless, I would rather trust the reliability of the C or C++ libraries than O/S specific code that has not been tested. It takes a few lines to write in C or C++ something that may take a lot longer to write in the Windows API, and it's not portable as the standard libraries are. You should pick your choice of platform with lots of thought behind it.

    If you just want to write a Windows-only program, and you're obsessed with the Windows API, then whatever. Go for it. If you're seeking portability, don't even bother with O/S specific functions.

  6. #21
    Reverse Engineer maxorator's Avatar
    Join Date
    Aug 2005
    Location
    Estonia
    Posts
    2,318
    Quote Originally Posted by MacGyver View Post
    I think my favorite part of his example is picking an arbitrary amount of data to allocate for his Win32 function while BUFSIZ is being used in the CRT. Yes, let's benchmark two pieces of code with different buffer sizes.
    Nope. CRT uses 4096 as its buffer size.

    CRT initializes the buffer count to 4096 whenever the first write operation is done and then decrements it when appending to the buffer. Consider this:
    Code:
    FILE *ff=fopen("test.txt","w");
    fputc('c',ff);
    //the second member of FILE structure is the counter, this outputs 4095
    printf("&#37;d",*((int*)ff+1));
    fclose(ff);
    Quote Originally Posted by MacGyver View Post
    But still, regardless, I would rather trust the reliability of the C or C++ libraries than O/S specific code that has not been tested. It takes a few lines to write in C or C++ something that may take a lot longer to write in the Windows API, and it's not portable as the standard libraries are. You should pick your choice of platform with lots of thought behind it.
    I deal with GUI mostly, so that isn't portable anyway - unless I use something like QT which I don't. If I write something that I want to be portable, then I would write special code for Windows and then a CRT version. I really care about the performance. I wouldn't like to make programs like VCExpress which sometimes takes 20 seconds to open the help (I accidentally press F1 sometimes and then the program hangs for about 20 seconds and then manages to display help). In VCExpress there's always some delay when I open whatever dialog. Lol, I just pressed "Community->Ask a question" just for testing and it took about 10 seconds for it to open that MSDN site in Document Explorer.
    Last edited by maxorator; 08-25-2007 at 04:49 AM.
    "The Internet treats censorship as damage and routes around it." - John Gilmore

  7. #22
    Deathray Engineer MacGyver's Avatar
    Join Date
    Mar 2007
    Posts
    3,210
    Ah, correct. Another reason to hate the Microsoft's implementation.

    For example, LCC (as well as every other implementation I've taken the time to see) properly matches BUFSIZ with the file buffer size (as I think it should match).
    Last edited by MacGyver; 08-25-2007 at 04:49 AM.

  8. #23
    Reverse Engineer maxorator's Avatar
    Join Date
    Aug 2005
    Location
    Estonia
    Posts
    2,318
    Well, I compile with MinGW and it imports from msvcrt.dll as you probably noticed. So there is no GCC implementation? I don't think there's much difference between the buffer being 512 bytes or 4096. Actually 512 would be slower...
    Last edited by maxorator; 08-25-2007 at 05:12 AM.
    "The Internet treats censorship as damage and routes around it." - John Gilmore

  9. #24
    Deathray Engineer MacGyver's Avatar
    Join Date
    Mar 2007
    Posts
    3,210
    Checking it out on a Solaris machine now, I see BUFSIZ at 1024, but 8192 is what is actually allocated.

    BUFSIZ for MinGW and LCC is both 512, however, MinGW (since it links with MS' implementation) actually uses 4096. LCC uses 512, and apparently has its own implementation.

    I remember trying to check this out about two years ago on a Solaris machine.... I don't remember many details about it, but I thought at the time that BUFSIZ matched the buffer size at 8192. Perhaps someone else had told me that at the time as I was working on it.

    This is why it's not worth learning, memorizing, and trusting as fact, implementation details. This can't be guarenteed, once you move from the compiler and system you're using. If you had tested your code on a compiler that didn't use Microsoft's implementation, then the buffer sizes would have been different. Also, I don't believe there is any reason why MS won't change their buffer sizes in the future, thereby rending your code incorrect at a later date. Also, assuming which member of the FILE struct is in which order, is rather silly. That can change with any implementation, especially since the solaris implementation I have shows that they may be changed depending on how it is compiled.
    Last edited by MacGyver; 08-25-2007 at 05:18 AM.

  10. #25
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Code:
    TestCRT             : Ticks=516
    TestCRT             : Ticks=593
    TestCRT             : Ticks=594
    TestCRT             : Ticks=688
    TestCRT2            : Ticks=547
    TestCRT2            : Ticks=562
    TestCRT2            : Ticks=1500
    TestCRT2            : Ticks=594
    TestMyCode          : Ticks=3562
    TestMyCode          : Ticks=3360
    TestMyCode          : Ticks=3797
    TestMyCode          : Ticks=3687
    TestMyCode2         : Ticks=438
    TestMyCode2         : Ticks=468
    TestMyCode2         : Ticks=563
    TestMyCode2         : Ticks=453

    Running on my machine with counts of 10000 characters to a file, 1000 times, except for TestMyCode - which is the code I originally wrote, writing one char at a time with WriteFile. This function only does 100 iterations of creating the file, so the ACTUAL comparable time is about 33-38000 ticks.

    CRT2 is the same functionality (using a buffer and fwrite) as TestMyCode2 - the latter being Maxorator's original code. As you can see, the difference isn't very large.

    With a smaller outer loop (100, 10 for TestMyCode) and a bigger output of 100000 stars, the results are:
    Code:
    TestCRT             : Ticks=172
    TestCRT             : Ticks=218
    TestCRT             : Ticks=204
    TestCRT             : Ticks=171
    TestCRT2            : Ticks=141
    TestCRT2            : Ticks=125
    TestCRT2            : Ticks=172
    TestCRT2            : Ticks=187
    TestMyCode          : Ticks=3094
    TestMyCode          : Ticks=3281
    TestMyCode          : Ticks=3782
    TestMyCode          : Ticks=3250
    TestMyCode2         : Ticks=125
    TestMyCode2         : Ticks=93
    TestMyCode2         : Ticks=110
    TestMyCode2         : Ticks=125
    Not much of a difference, except in the first case I suggested - which is why I suggested that case as a pessimal one.

    --
    Mats
    Last edited by matsp; 08-25-2007 at 05:54 AM.

  11. #26
    Reverse Engineer maxorator's Avatar
    Join Date
    Aug 2005
    Location
    Estonia
    Posts
    2,318
    Code:
    //Outer 10000, inner 10000
    TestCRT        : 12797
    TestCRT        : 12859
    TestCRT        : 13094
    TestMyCode     : 4344
    TestMyCode     : 4032
    TestMyCode     : 4141
    //Outer 10000, inner 1000
    TestCRT        : 3532
    TestCRT        : 3265
    TestCRT        : 3484
    TestMyCode     : 2765
    TestMyCode     : 2610
    TestMyCode     : 2719
    //Outer 1000, inner 100000
    TestCRT        : 10578
    TestCRT        : 10625
    TestCRT        : 10625
    TestMyCode     : 1532
    TestMyCode     : 1329
    TestMyCode     : 1687
    //Outer 1, inner 100.000.000 (almost 100MB)
    TestCRT        : 12984
    TestCRT        : 10063
    TestCRT        : 10781
    TestMyCode     : 6641
    TestMyCode     : 3984
    TestMyCode     : 7547
    //Outer 10000, inner 1
    TestCRT        : 3141
    TestCRT        : 2813
    TestCRT        : 3000
    TestMyCode     : 2892
    TestMyCode     : 2617
    TestMyCode     : 2812
    File opening speed is quite the same. But file writing speed is something different.

    How often do you need to create thousands of basically empty files? Not too often. But how often do you need to create big files? Happens quite often.
    Last edited by maxorator; 08-25-2007 at 06:07 AM.
    "The Internet treats censorship as damage and routes around it." - John Gilmore

  12. #27
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by maxorator View Post
    I got 12797 vs 4344 when loops were 10000/10000. But when the inner loop is 1000 (then 10000/1000), then 3468 vs 2407.
    Obviously something in your system is less optimal than in my system - perhaps your machine has less cache (perhaps it's a Pentium 4 with a puny little 12K trace-cache?), and the extra code executed makes the big difference. In my tests, the direct system calls and CRT is pretty much equal until we start doing stupid amounts of system calls.

    --
    Mats

  13. #28
    Reverse Engineer maxorator's Avatar
    Join Date
    Aug 2005
    Location
    Estonia
    Posts
    2,318
    Watch the benchmarking I made. I have AMD Athlon 64 X2 Dual-Core 3800+.
    Last edited by maxorator; 08-25-2007 at 06:11 AM.
    "The Internet treats censorship as damage and routes around it." - John Gilmore

  14. #29
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by maxorator View Post
    Watch the benchmarking I made. I have AMD Athlon 64 X2 Dual-Core 3800+.
    Hmm. That's strange, because my machine is a single core Athlon64 3200+. And my disk isn't exactly the latest SATA drive or something like that, it's a 3-4 year old PATA 40GB Maxtor.


    --
    Mats

  15. #30
    Reverse Engineer maxorator's Avatar
    Join Date
    Aug 2005
    Location
    Estonia
    Posts
    2,318
    Quote Originally Posted by matsp View Post
    Hmm. That's strange, because my machine is a single core Athlon64 3200+. And my disk isn't exactly the latest SATA drive or something like that, it's a 3-4 year old PATA 40GB Maxtor.
    Mine is some old Samsung 40gb drive... a new 320gb fast one arrives the day after tomorrow.

    Well, we can agree that in normal situations it is recommended to use CRT and that direct Win32 API is good if you know exactly what you're doing, have a lot of time and want high performance. But when dealing with GUI the portability is already lost and then it only matters how much time you have available to spend on the project...
    Last edited by maxorator; 08-25-2007 at 06:41 AM.
    "The Internet treats censorship as damage and routes around it." - John Gilmore

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Data Structure Eror
    By prominababy in forum C Programming
    Replies: 3
    Last Post: 01-06-2009, 09:35 AM
  2. Can we have vector of vector?
    By ketu1 in forum C++ Programming
    Replies: 24
    Last Post: 01-03-2008, 05:02 AM
  3. Game Pointer Trouble?
    By Drahcir in forum C Programming
    Replies: 8
    Last Post: 02-04-2006, 02:53 AM
  4. Unknown Memory Leak in Init() Function
    By CodeHacker in forum Windows Programming
    Replies: 3
    Last Post: 07-09-2004, 09:54 AM