Going from single-threaded to multi-threaded

This is a discussion on Going from single-threaded to multi-threaded within the C Programming forums, part of the General Programming Boards category; Coming from an assembler background, where I have to manage *everything* in regards to thread dispatching, memory isolation, synchronization, etc., ...

  1. #1
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Katy, Texas
    Posts
    2,309

    Going from single-threaded to multi-threaded

    Coming from an assembler background, where I have to manage *everything* in regards to thread dispatching, memory isolation, synchronization, etc., to a higher level compiled language like C or C++, where the compiler takes care of several aspects of storage (the stack in C or otherwise in C++), I'm trying to get my head around what I need to worry about in regards to thread-safety, multi-threading I/O to the same file, and what I can take for granted in cases where the compiler will protect me.

    For example, in my single threaded app, I do this today:
    Code:
    open input & output files
    get an input buffer 
    get an output buffer 
    repeat:
    	read 10,000 (or last remaning portion thereof) records into input buffer
    	convert input records into output buffer
    	write the output buffer
    	loop to repeat until EOF 
    close both files
    My multi-threaded design, I'm thinking, will be like this:
    Code:
    main thread { 
    	open input & output files
    	create 4 converter threads via "init"
    	get 4 input buffers 
    	repeat { 
    		do until no input buffers available { 
    			read 10K records into buffer
    			pass buffer to a waiting converter task via "work"
    		}
    		wait for input buffer to be available 
    		loop to repeat until EOF 
    	}
    	wait for all converter tasks to finish 
    	tell converter tasks to shutdown via "term"
    	close both files
    }
    
    converter thread { 
    init: 
    	get an output buffer 
    	indicate ready for work 
    work:
    	convert the passed buffer data
    	write the data
    	mark input buffer free 
    term: 
    	free output buffer 
    	thread exit 
    }
    So my questions are, initially, in regards to C, on Unix or Windows, will this design fly? Can I overlap writing to the same file by 4 threads at once? Or, do I need a "writer thread" that is serialized which the converter threads will post when an output buffer is ready to be written?

    Thanks. More questions to come, I'm sure.

    Todd
    Mac and Windows cross platform programmer. Ruby lover.

    Quote of the Day
    12/20: Mario F.:I never was, am not, and never will be, one to shut up in the face of something I think is fundamentally wrong.

    Amen brother!

  2. #2
    CSharpener vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,484
    because io has less impact on the CPU and sets the thread into waiting state

    I would make 1 thread that writes the result data to disk
    and 4 threads that make a calcualtions filling some queue that will be flush to disk by the IO thread
    The first 90% of a project takes 90% of the time,
    the last 10% takes the other 90% of the time.

  3. #3
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    A system write (such as write() in Linux/Unix or WriteFile() in Windows) should be thread-safe. However, if you use fwrite() or some such, it may not be - or it may be. But in essence, the write operation will serialize your output either way (as in the call to your write function will wait for the write to finish, unless we also start using asynchronous write functionality - which both linux [and other Unix variants, most of them at least] and Windows supports - in which case it becomes a bit harder to understand when and what to wait for to ensure the writes come out in the correct order.

    Having a separate writer thread that just accepts a packet to be written will be a "safer" method, and if you don't wait for the data to be finished writing until AFTER you have done the next packet for that particular thread [which requires 2 * numthreads number of buffers], you should be able to get pretty decent parallelism. (A huge portion of a disk-read or write is just waiting for the disk to move the head and spin round to the right place on the disk).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  4. #4
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Katy, Texas
    Posts
    2,309
    The way my data is set up, it is OK to be out of order in the output file. If I read 20 blocks of 10,000 records each, and the blocks could be labeled A, B, C and so on, it's OK if they are written B, C, A, etc...

    Right now, I'm using (in C++) ifstream.read() and ofsteam.write() for all I/O. In a completely stripped down version of my program, if I process 1 million or so records, with no data conversion, in the single threaded design, it takes about 2 seconds to run. With data conversion, it takes about 10 seconds. I'd like to improve upon this by parallelizing the I/O. (Yes, there are probably other things to look at and tune, but I want to tackle this parallelism. )

    I think my first task will be to set up some timers to establish a base line for what the program is doing today. I'm keeping track of elapsed time (but only to the 1 second granularity), any not even tracking CPU time.

    I also feel the single writer thread would be a safer design.

    I guess too, that since write() is thread safe, I could actually even open the file, determine it's size, divide that by 4, and spawn 4 threads that basically operate like my single threaded design today, but use file positioning and a predetermined number of records to process. (I can leverage this because the records are fixed length).
    Mac and Windows cross platform programmer. Ruby lover.

    Quote of the Day
    12/20: Mario F.:I never was, am not, and never will be, one to shut up in the face of something I think is fundamentally wrong.

    Amen brother!

  5. #5
    Ex scientia vera
    Join Date
    Sep 2007
    Posts
    478
    Quote Originally Posted by matsp View Post
    A system write (such as write() in Linux/Unix or WriteFile() in Windows) should be thread-safe. However, if you use fwrite() or some such, it may not be - or it may be. But in essence, the write operation will serialize your output either way (as in the call to your write function will wait for the write to finish, unless we also start using asynchronous write functionality - which both linux [and other Unix variants, most of them at least] and Windows supports - in which case it becomes a bit harder to understand when and what to wait for to ensure the writes come out in the correct order.

    Having a separate writer thread that just accepts a packet to be written will be a "safer" method, and if you don't wait for the data to be finished writing until AFTER you have done the next packet for that particular thread [which requires 2 * numthreads number of buffers], you should be able to get pretty decent parallelism. (A huge portion of a disk-read or write is just waiting for the disk to move the head and spin round to the right place on the disk).

    --
    Mats
    IIRC, I have read that using CRT functions such as fwrite() are thread-safe if you use _beginthreadex() (or non-extended) instead of the winapi equivalents of it.

  6. #6
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Posts
    22,915
    Spawning 4 threads to read or write on the same harddisk will most likely give you poorer performance than 1 read & 1 write or such, or perhaps even single threaded because the harddisk needs to seek forth and back all the time which introduces latency.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  7. #7
    CSharpener vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,484
    Quote Originally Posted by IceDane View Post
    IIRC, I have read that using CRT functions such as fwrite() are thread-safe if you use _beginthreadex() (or non-extended) instead of the winapi equivalents of it.
    It depens on C-runtime library you are using. Because C-standard says nothing about threads - it is upto implementation to decide.

    microsoft multi-threaded runtime libraries do have a locking mechanism. Other implementations could have skip this step - leaving it upto application to make locking as needed
    The first 90% of a project takes 90% of the time,
    the last 10% takes the other 90% of the time.

  8. #8
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,308
    Quote Originally Posted by Todd Burch View Post
    In a completely stripped down version of my program, if I process 1 million or so records, with no data conversion, in the single threaded design, it takes about 2 seconds to run. With data conversion, it takes about 10 seconds. I'd like to improve upon this by parallelizing the I/O. (Yes, there are probably other things to look at and tune, but I want to tackle this parallelism. )
    It sounds like the conversion part of it takes about 8 seconds, but you'd rather optimise the part that takes 2 seconds.
    I would never consider using multiple threads to speed up disk IO. Writing to disk doesn't take any significant CPU time, since it's dwarfed by the IO time, so you aren't exactly going to gain um, anything at all really. You'd more than likely just introduce other problems. So maybe I'm misunderstanding.
    Also, before going the way of threading, it might be worth spending just a little more time looking into optimising the data conversion. Afterall, there are those of us out there with single core machines still, that wont get any speed bennefit from a multi-threaded version.
    Would you like to post any of the code for others to try and optimise?
    Last edited by iMalc; 03-22-2008 at 02:32 PM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  9. #9
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    If you have a quad-core system, I guess you could (theoretically) reduce your conversion time to about 2 seconds too, but I doubt that the I/O time will be noticably changed, as iMalc says.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  10. #10
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Katy, Texas
    Posts
    2,309
    I do have a quad core system. I'm not looking to speed up the I/O as a goal for multi-threading. I'm looking to run the conversions in parallel, and as matsp says, take my 8 second conversion time and get that down to a theoretical 2 seconds by having 4 threads working on the problem. I should not have said "parallelizing the I/O", I should have said "parallelizing the conversion".

    I won't be posting any of my code that I have today, but while I implement this, I will happily make and provide real test cases with associated data files to illustrate process concepts and to run test cases. (The data I have for my real test cases, I had to sign a HIPPA form to get access to, and the code is a (will be a) commercial venture for me.)

    Thanks everyone. It will be fun to get this running in parallel.

    Todd
    Mac and Windows cross platform programmer. Ruby lover.

    Quote of the Day
    12/20: Mario F.:I never was, am not, and never will be, one to shut up in the face of something I think is fundamentally wrong.

    Amen brother!

  11. #11
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,691
    >> single core machines ... that wont get any speed bennefit from a multi-threaded version
    Since there is both CPU and disk bound work that's needed, speedup may be possible even on a single core machine - since work can be done while waiting on the disk.

    I would seperate the threading based on disk-bound work vs. cpu-bound work - as others have mentioned.

    gg

  12. #12
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,308
    Quote Originally Posted by Codeplug View Post
    >> single core machines ... that wont get any speed bennefit from a multi-threaded version
    Since there is both CPU and disk bound work that's needed, speedup may be possible even on a single core machine - since work can be done while waiting on the disk.

    I would seperate the threading based on disk-bound work vs. cpu-bound work - as others have mentioned.
    Yeah you're right, it may still speed it up a little due to waiting on the disk IO. You still have 2 seconds of IO and 8 seconds of CPU work though, and so wont get it under 8 seconds total on a single core machine.
    Doing asynchronous IO (which I haven't done on Windows before) could probably achieve the same effect.
    However you'll get more of a speedup by halving the CPU time it takes to do the conversions, with some optimisation effort.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 1
    Last Post: 06-05-2009, 01:39 PM
  2. single function multi sturct
    By bandal27 in forum C Programming
    Replies: 6
    Last Post: 01-07-2009, 02:25 PM
  3. Multi file source to single file source
    By anonytmouse in forum Tech Board
    Replies: 4
    Last Post: 12-07-2003, 08:47 AM
  4. concatenating single chars to multi char arrays
    By TJJ in forum C Programming
    Replies: 7
    Last Post: 11-20-2003, 04:09 AM
  5. Useing a single dimension array as a multi simension one
    By Eber Kain in forum C++ Programming
    Replies: 1
    Last Post: 02-16-2002, 06:25 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21