Thread: First venture into multithreading

  1. #1
    Registered User
    Join Date
    Jun 2009
    Posts
    2

    First venture into multithreading

    Hey everyone. I'm writing a word frequency analysis just to see how fast I can get it running. I've done just about everything I can with a sequential implementation, so I thought I might try putting the input on one thread (it reads the text to analyze from a file) and the frequency analysis on another, but I'm failing miserably. I'm using the boost thread library, and I'm quite positive it has something to do with user error.

    On the input thread, I'm reading words in, parsing them, and placing them into a queue shared between the two threads. The calculation thread is *supposed* to be pulling words from the queue and adding them to a search tree as the input thread adds them, but apparently that isn't happening - after adding threading, the program takes about 50x longer to run.

    Here are the input/output functions each respective thread uses (the bb global variable is the shared queue and words is the search tree):
    Code:
    void writer()
    {
      string s = bb.receive();
      while (s.length() > 0)
      {
        words.add(s);
        s = bb.receive();
      }
    }
    
    
    void reader()
    {
      string reader;
      string temp;
    
      while (file >> reader)
      {
        for (unsigned int i = 0; i < reader.length(); i++)
        {
          if (isalpha(reader[i]) || reader[i] == '\'' || reader[i] == '-')
            temp += tolower(reader[i]);
        }
    
        if (temp.length())
        {
          bb.send(temp);
        }
    
        temp.clear();
      }
    
      bb.send(temp);
    }
    Here's the declaration of the two threads:
    Code:
      boost::thread input(&reader);
      boost::thread output(&writer);
      input.join();
      output.join();
    ... and here's the code for the bounded buffer class that serves as the internal queue:
    Code:
    const int BUFFER_SIZE = 1000;
    
    class bounded_buffer : private boost::noncopyable
    {
     public:
      typedef boost::mutex::scoped_lock lock;
    
      bounded_buffer() { begin = end = buffered = 0; }
    
      void send(string s)
      {
        lock l(monitor);
        while (buffered == BUFFER_SIZE)
          buffer_not_full.wait(l);
    
        buffer[end] = s;
        end = (end + 1) % BUFFER_SIZE;
        ++buffered;
        buffer_not_empty.notify_one();
      }
    
      string receive()
      {
        lock l(monitor);
        while (buffered == 0)
          buffer_not_empty.wait(l);
    
        string s = buffer[begin];
        begin = (begin + 1) % BUFFER_SIZE;
        --buffered;
        buffer_not_full.notify_one();
        return s;
      }
    
     private:
      int begin, end, buffered;
      string buffer[BUFFER_SIZE];
      boost::condition buffer_not_full, buffer_not_empty;
      boost::mutex monitor;
    };
    Thanks!

  2. #2
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Your tasks are too simple. The overhead of the queue eats everything you could possibly hope to gain. In other words, your program spends most of its waiting.
    This is only an informed guess, of course. I didn't actually profile your program. However, it's a very good bet.

    Get a good lock-free queue implementation (Intel's TBB library has one) and see if it helps. I still don't think you can beat the single-threaded program, but at least it shouldn't be that much slower.

    If you want to optimize the single-threaded version further, asynchronous I/O could gain you something. But only for files that are larger than the block size, and even then only maybe. And of course, you lose portability. In fact, I have no idea how to do proper async I/O under Linux. (The reports on availability of POSIX AIO in Linux are spotty.)
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #3
    Registered User
    Join Date
    Jun 2009
    Posts
    2
    I was hoping that wouldn't be why. Thanks for the quick answer.

    One more question: this attempt at multithreading was essentially a stab in the dark - I really had no idea what I was doing. Do you know of any good reading, tutorials, practice problems, etc. that I could look into?

  4. #4
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    No, sorry. I have no idea where I learned multi-threading from.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  5. #5
    Registered User
    Join Date
    Apr 2008
    Posts
    890
    Quote Originally Posted by thevolumebutton View Post
    One more question: this attempt at multithreading was essentially a stab in the dark - I really had no idea what I was doing. Do you know of any good reading, tutorials, practice problems, etc. that I could look into?
    Programming with POSIX Threads

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Threads Considered Harmful
    Debugging a normal sequential program is on a scale of 1 to 10.
    Debugging some thread problems places you well into the hundreds at a stroke.

    Any decently sized threaded program NEEDS a damn good design to go along with it. You can't just hack away at some code with threads in it, and hope it'll all work at the end.

    A lot of thread bugs are also Heisenbug's. That is, they usually show up under some extreme condition, and "disappear" as soon as you attempt to study them.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    Apr 2008
    Posts
    890
    Quote Originally Posted by Salem View Post
    "Poorly designed threaded programs, written by novices, are considered harmful" might have been a better title to that rant.

  8. #8
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Quote Originally Posted by medievalelks View Post
    "Poorly designed threaded programs, written by novices, are considered harmful" might have been a better title to that rant.
    No, I believe the rant was aptly named. Even well designed threaded programs that are written by experts are a pain to debug.

  9. #9
    Registered User
    Join Date
    Apr 2008
    Posts
    890
    Quote Originally Posted by bithub View Post
    No, I believe the rant was aptly named. Even well designed threaded programs that are written by experts are a pain to debug.
    That article was written with a strong bias and little to back it up, other than contrived nightmare scenarios. You can find equally alarming FUD written about C++, OO, etc.

    Programming is hard, and in the wrong hands, any tool can seem harmful.

  10. #10
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    That article was written with a strong bias and little to back it up, other than contrived nightmare scenarios. You can find equally alarming FUD written about C++, OO, etc.
    If you've come up with a bulletproof way to avoid deadlocks and race conditions or you've come with a clever way of debugging them, then please share. I'm sure there are many programmers here who would appreciate your insight (myself included).

  11. #11
    Registered User
    Join Date
    Apr 2008
    Posts
    890
    There's no bulletproof way to provent bugs in general. Is Programming Considered Harmful?

  12. #12
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    There's no bulletproof way to provent bugs in general. Is Programming Considered Harmful?
    True, but you conveniently ignored the other side of the or in my sentence (notice that I even bolded it so that you wouldn't miss it). Debugging non multi-threaded bugs is fairly easy. This is not the case in threaded programs.

  13. #13
    Registered User
    Join Date
    May 2009
    Posts
    37
    Quote Originally Posted by bithub View Post
    If you've come up with a bulletproof way to avoid deadlocks and race conditions or you've come with a clever way of debugging them, then please share. I'm sure there are many programmers here who would appreciate your insight (myself included).
    I think you are overstating the problem a bit. If you think things through and try to keep communication between threads narrow and well defined, I don't see thread programming as such a hard task. I've been doing it for years on both Unix and Windows systems. Just use the KISS rule (Keep It Simple Stupid).

  14. #14
    Registered User
    Join Date
    Apr 2008
    Posts
    890
    Quote Originally Posted by SyntaxError View Post
    I think you are overstating the problem a bit. If you think things through and try to keep communication between threads narrow and well defined, I don't see thread programming as such a hard task. I've been doing it for years on both Unix and Windows systems. Just use the KISS rule (Keep It Simple Stupid).
    You mean you don't just randomly throw locks on everything in sight? ;-)

  15. #15
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    In simple projects I would agree with you 100%. Try working on an application that has close to 1 million lines of code, and see how often you run into problems that require you do things like:
    - share locks between objects
    - Have large individual lock coverages
    - Generic callbacks while obtaining a lock which can cause deadlocks across modules.

    A current line count on my project is 912015 lines of code, and a cat /proc/pid/status | grep Threads shows 605 threads atm. The last 3 years of my life has been spent debugging multi-threaded problems, and I don't feel like I've overstated the problem one bit

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Multithreading (flag stopping a thread, ring buffer) volatile
    By ShwangShwing in forum C Programming
    Replies: 3
    Last Post: 05-19-2009, 07:27 AM
  2. Client/Server and Multithreading
    By osal in forum Windows Programming
    Replies: 2
    Last Post: 07-17-2004, 03:53 AM
  3. Directional Keys - Useing in Console
    By RoD in forum C++ Programming
    Replies: 38
    Last Post: 10-06-2002, 04:42 PM
  4. FAQ: Directional Keys - Useing in Console
    By RoD in forum FAQ Board
    Replies: 38
    Last Post: 10-06-2002, 04:42 PM