Platform SDK ioCompletionPort examples

This is a discussion on Platform SDK ioCompletionPort examples within the Windows Programming forums, part of the Platform Specific Boards category; There are two examples that use ioCompletionPorts in the platform SDK. One uses a single thread and the second, two ...

  1. #1
    Registered User
    Join Date
    Dec 2004
    Posts
    8

    Platform SDK ioCompletionPort examples

    There are two examples that use ioCompletionPorts in the platform SDK. One uses a single thread and the second, two threads. They both simply copy a file.

    After compiling and trying them both, the version with two threads works fine but the single thread version seems not to. It creates a copied file the same size but without all the contents. On closer investigation it would seem that the Key "returned" from

    Code:
        while (PendingIO) {
            Success = GetQueuedCompletionStatus(IoPort,
                                                &NumberBytes,
                                                &Key,
                                                &CompletedOverlapped,
                                                INFINITE);
    is never set to WriteKey and therefore the program just completes 20 reads and 20 writes. It then extends the output file the the same length as the input file, effectively filling the file with hex 00.

    (You'll only see this if the input file is bigger than 64KB*20).

    Does this example work for anyone else or is it just wrong? If so, can anyone correct it?

    ......\Microsoft SDK\Samples\winbase\IO\UnBufCpy\UnBufCp1.c


    Tim

  2. #2
    Registered User
    Join Date
    Dec 2004
    Posts
    95
    Strange. Can you post that C file so I can test it?

  3. #3
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Congratulations, it seems that you've found a bug. Not a bad first post! I haven't tried it, but it looks like the problem is the misuse of the PendingIO count. We should increment the IO count every time we issue a read and decrement it every time a write completes. However, in the sample PendingIO is also being decremented when a read completes. It may not take a file of (64KB*20) to cause a problem. In theory, if all the reads completed before any writes completed, the loop and the program would exit with all the writes pending. This may be what is happening if you are not seeing any WriteKeys.

    The fix should be simple, just comment out the --PendingIO; on line 477.
    Last edited by anonytmouse; 12-24-2004 at 07:05 AM.

  4. #4
    Registered User
    Join Date
    Dec 2004
    Posts
    8
    Beginner's luck I guess!

    Can I further pick your brains?

    I have a C program that matches 2 text files together and applies changes in one to the other.

    The "master" file has 130m records and is 70GB big. The "updates" file is typically 50m records and 35GB big.Originally the program was written to be portable and the IO is carried out using the standard C (fgets etc) functions.

    As the decision has been made that we'll be on Microsoft for the forseeable future, I'm keen to explore how this program can be changed to make the most of IO. The program has a huge amount of work to do so I know it'll never be fast but I want it to be as fast as possible.

    The program simply reads a single fixed length record from the "master" file and reads the "updates" file until a match or matches are found, then performs some processing to apply the updates before writing out a new "master" file. The two files are sorted on a key.

    The bottleneck is most definately all that IO, given I'm running on Windows, how could I speed that up?

  5. #5
    Registered User
    Join Date
    Dec 2004
    Posts
    95
    Perhaps a thread pool servicing an IOCP, maintaining a steady number of pending overlapped ReadFile calls, i.e. basically keep the disk(s) working all the time.

    The size of the thread pool depends on your hardware. I assume you're running 2k+?

  6. #6
    Registered User
    Join Date
    Dec 2004
    Posts
    8
    Windows 2000 Advanced Server on a Quad Xeon HP ProLiant.

  7. #7
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Sorry, lots of questions and not many answers.

    Whatever method is used, disk I/O is slow. Is it possible that you could reduce the amount of I/O required? Any reduction in I/O would give you a much larger speed improvement and would possibly retain portability.

    What is your profiling showing? For example, if your application is using 100% CPU, then improving I/O may give only minor improvements.

    >> The program simply reads a single fixed length record from the "master" file and reads the "updates" file until a match or matches are found, then performs some processing to apply the updates before writing out a new "master" file. The two files are sorted on a key. <<

    Can you elaborate on this process? What happens when you reach the next record in the master file? Do you have to read through the updates file again? Are you doing more than 105GB of total reads? Can you use an in-memory index?

    How long is the operation taking currently? What sort of improvement are you hoping for?
    Last edited by anonytmouse; 01-04-2005 at 02:28 PM.

  8. #8
    Registered User
    Join Date
    Dec 2004
    Posts
    8
    Given two input files each with a key and three other "fields":

    Code:
    Master:                  Updates:
    
    Key  Fld1 Fld2 Fld3      Key Fld1 Fld2 Fld3
    111     A    B    C      333    A    A    A
    222     A    B    C      444    Z    Z    Z
    444     A    B    C      777    A    B    C
    888     A    B    C
    The new master file output will be:
    Code:
    Key  Fld1 Fld2 Fld3
    111     A    B    C
    222     A    B    C
    333     A    A    A
    444     Z    Z    Z
    777     A    B    C
    888     A    B    C
    As both files are sorted, the program will just trot through both files once - purely sequentially. If there are no updates for a master (111,222,888) then the master will just be written out unchanged. If there are matching updates (444), then the fields will be applied one-by-one to according to some logic. Records which only appear on the updates file (333,777) are written out as new masters.

    So, total reads: 105GB, writes: c.75GB

    Getting consistent profiling is proving tricky. On some runs, the function that performs the input file reads accounts for 75% of "func time". Sometimes it gets down to 50%. I suspect that this is due to the Windows file cache. Is there anyway to disable this when using standard functions like fgets?

  9. #9
    Registered User
    Join Date
    Dec 2004
    Posts
    95
    You can disable buffering on a stream with setbuf() - pass a null pointer as the second argument. It probably won't give you much more information though - you know the disk is the bottleneck already. Hardware changes aside, basically I'd try to make sure you're getting the disk working all the time - i.e. while you're doing the processing, have pending reads (this is where the Win32 specific stuff comes in) waiting, so the disk is still reading away.

    You could have a single thread sitting on an IOCP, which would push the chunk of data read onto a list or stack (one for each file) for your processing thread(s) to process - typical threaded design. It doesn't do any calculations itself, so it can almost always be waiting on the IOCP. It'd check when the number of pending reads reached some "low watermark", and post more when it did - keeping the disk working.

    You might not want or need > 1 processing thread, because from your description it doesn't look like the calculations are going to take up much time, and the extra complexity+sync required to have multiple processing threads would render them a disadvantage.

    Anyway, I dunno if this would actually produce an improvement or not - would need more thought and some testing. It'd be a bit more complicated than the current version too, though not hugely.

  10. #10
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,646
    >> Is there anyway to disable this when using standard functions like fgets?
    >> You can disable buffering on a stream with setbuf()
    You definitely don't want to use standard C/C++ I/O for this type of application.

    >> I'd try to make sure you're getting the disk working all the time.
    Me too. You want the disk hardware to stay busy with reads/writes in the "background" while the processor(s) crunch data in the "foreground".

    In Windows, having the disk hardware perform an operation while the CPU does something else is accomplished by using overlapped I/O. IOCP is a special form of overlapped I/O that I think over complicates things for what you need to accomplish. Using normal overlapped I/O should be sufficient.

    In order to perform the most efficient reads and writes possible, you'll want to take a look at the FILE_FLAG_NO_BUFFERING flag in the CreateFile() api.

    Here's what I would do as a first attempt:
    Code:
        For each record (R) in Updated 
            If Master is EOF 
                Copy remainder of Updated 
                Exit For 
            End If 
            Copy Master data until you get to where R would/should be 
            Copy in R from Updated 
        End For
    If you're records are fixed in size, then you can really speed things up by using a binary search to find "where R would/should be" within the current set of Master records in memory.

    The fun part is deciding who does the reading, writing, and processing and how the threads communicate with one another. You'll also have to consider memory constraints when determining how much memory to commit towards read buffers.

    >> You might not want or need > 1 processing thread...
    I agree. The writing of the resulting file has to be done serially and there isn't much opertunity for speedup by performing the actual work in parallel. To keep things simple, I would start off with 4 threads: Master reader, Updated reader, Result writter, and the processor thread.

    gg
    Last edited by Codeplug; 01-05-2005 at 08:48 PM.

  11. #11
    Registered User
    Join Date
    Dec 2004
    Posts
    8
    Quote Originally Posted by azteched
    You can disable buffering on a stream with setbuf()
    Just tried that and the profiler now reports that the file access (consistently) accounts for 91% of the time!!

  12. #12
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Do you have to rewrite the entire master file? It seems that if you just updated the existing master file rather than copying it all, you would save 35GB of reads and 35GB of writes by skipping the records that are unchanged, which is a 40% reduction in total I/O. You would add new records to the end of the file and use an index file to keep an ordered index of the records. Database software uses a similar system. Speaking of which, have you considered using a database system?

    The other thing to think about is reducing the intermixing or reads and writes to different files by buffering operations in memory. Reading 50MB of one file and then writing 50MB to another file should be faster than intermixing the two operations, unless they're on different disks.

    As azteched and Codeplug have mentioned, this is going to be a rather complicated task. Possibly, you should look for a circular buffer solution, rather than implementing the advanced I/O from scratch.

    EDIT: For more detailed profiling information goto Control Panel->Administrative Tools->Performance. Click the Add button on the toolbar to add different counters. You will be interested in counters from the PhysicalDisk, Process (click your process in the instance column) and Processor objects, especially Avg. Disk Queue Length and % Disk Time from PhysicalDisk.
    Last edited by anonytmouse; 01-06-2005 at 11:49 AM.

  13. #13
    Registered User
    Join Date
    Dec 2004
    Posts
    8
    Quote Originally Posted by anonytmouse
    Speaking of which, have you considered using a database system?
    The file that gets output from this program is actually then loaded into an Oracle table in a data warehouse. The "some logic" that I referred to earlier is actually very complicated and beyond single transaction SQL and PL/SQL (Oracle's procedural SQL) is far too slow -- at least for this.


    EDIT: I read somewhere that read requests smaller than 64KB are coalesced into 64KB requests by the OS. Filemon.exe from sysinternals.com backs this up. Looking at the stdio.h file it seems that the standard BUFSIZ is 512 bytes. I added a setvbuf statement to ensure that all files are buffered to 64KB and the profiler now puts reading and writing at 20% each. This improvement isn't displayed in release version code (at least with a small data sample) but it's interesting.
    Last edited by slimtim; 01-06-2005 at 02:21 PM.

  14. #14
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Try adding 'S' to the fopen mode string:
    Code:
    fopen(filename, "rbS");
    This enables FILE_FLAG_SEQUENTIAL_SCAN:
    Quote Originally Posted by MSDN CreateFile
    Indicates that the file is to be accessed sequentially from beginning to end. The system can use this as a hint to optimize file caching. If an application moves the file pointer for random access, optimum caching may not occur; however, correct operation is still guaranteed.
    Specifying this flag can increase performance for applications that read large files using sequential access. Performance gains can be even more noticeable for applications that read large files mostly sequentially, but occasionally skip over small ranges of bytes.
    http://support.microsoft.com/kb/98756/EN-US/
    http://msdn.microsoft.com/library/de...c_._wfopen.asp

  15. #15
    Registered User
    Join Date
    Dec 2004
    Posts
    8
    Just an update:

    The latest code uses the fopen switch for sequential scanning, uses setbuf to set the buffers to 64KB and uses fread rather than fgets. A baseline run of the full process took 4hr58m and the new code took a mere 1hr41m. I'm currently testing the changes with lots of other programs.

    Thanks all for your help.

Page 1 of 2 12 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Did I Install the Platform SDK Correctly?
    By bengreenwood in forum C++ Programming
    Replies: 7
    Last Post: 07-14-2008, 09:33 AM
  2. load gif into program
    By willc0de4food in forum Windows Programming
    Replies: 14
    Last Post: 01-11-2006, 09:43 AM
  3. Visual C++ 2005 linking and file sizes
    By Rune Hunter in forum C++ Programming
    Replies: 2
    Last Post: 11-12-2005, 09:41 PM
  4. Question..
    By pode in forum Windows Programming
    Replies: 12
    Last Post: 12-19-2004, 06:05 PM
  5. File IO with .Net SDK and platform SDK
    By AtomRiot in forum Windows Programming
    Replies: 5
    Last Post: 12-14-2004, 09:18 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21