Thread: Code::Blocks, g++, Boost

  1. #16
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    What is specs of the machine you are running things on Linux - multi-threading only gives advantages if there is spare processing capacity on more than one CPU.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  2. #17
    Kiss the monkey. CodeMonkey's Avatar
    Join Date
    Sep 2001
    Posts
    937
    Both systems are on a notebook with Intel Pentium Dual Core (2 cores clocked 1.6 GHz each). I notice that both cores max with two threads, while only one maxes with one thread (this is true on both Vista and Ubuntu). Yet on Vista this translates to a performance gain if a lot of processing has to be done. For Linux, I find it does not. I was just wondering what is the most probable cause. I'm thinking either the library builds or the compilers (MSVC++ vs GCC).
    "If you tell the truth, you don't have to remember anything"
    -Mark Twain

  3. #18
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    sounds ratehr strange - I'd expect to see the same factor improvement on Linux as on Windows. But without knowing what the code does and what is taking up it's time, it's pretty hard to say for sure.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  4. #19
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    I think only a profiler can answer this question.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  5. #20
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by CornedBee View Post
    I think only a profiler can answer this question.
    Or at least, that's ONE WAY to find out where the code is spending time. If the code isn't very large or complex, code inspection may also find out what is going on (at least to the point of having one or two possible candidates).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #21
    Kiss the monkey. CodeMonkey's Avatar
    Join Date
    Sep 2001
    Posts
    937

    Dump

    I should warn that this is ugly code. This is a "toy" program.
    Code:
    //Applies the method of relaxation to an array of floating point values
    //in order to approximate the solution to the wave equation, via a diffusion
    //equation, with boundary condition of zero at the edges.
    //    i.e. Applies a function to every member of an array (v), stores it in another
    //          array (lap), and then does v += lap 
    #include <iostream>
    #include <boost/thread/thread.hpp>
    #include <boost/thread/barrier.hpp>
    #include <cmath>     //sin()
    #include <cassert>
    
    #define DO_FOR(i_,beg_,end_) for(int (i_) = (beg_); (i_) < (end_); ++(i_))
    
    template<class T1, class T2>
    std::ostream & operator << (std::ostream & os, std::pair<T1,T2> & p)
    {
    	os << p.first << ',' << p.second;
    	return os;
    }
    
    inline void randomize(double * arg, int n)
    {
    	for(unsigned int i = 0; i < n; ++i) arg[i] = std::rand();
    }
    
    void print_it(double * v, unsigned int sz)
    {
            if(sz < 150)
    	{
    		DO_FOR(i,1,sz-1)
    			std::cout << ( v[i] / sin(pi/(sz-1)*i) ) << ' ';
    	}
    }
    
    typedef std::pair<int,int> interval;
    
    struct task
    {
    	void operator()()     //This is run per thread
    	{
    		for(unsigned long int t = 0; t < ntimes; ++t)
    		{
    			for(int i = iv.first; i < iv.second; ++i)
    				lap[i] = (v[i-1] + v[i+1] - 2.0*v[i]) / 2.0;
    			bar.wait();	//Important! Don't comment out!
    			for(int i = iv.first; i < iv.second; ++i)
    				v[i] += dt*lap[i];
    			bar.wait();
    		}
    	}
    	task(double * v_arg, double * lap_arg, unsigned long int ntimes_arg,
    		 double dt_arg, boost::barrier & bar_arg, interval iv_arg)
    		: v(v_arg), lap(lap_arg), ntimes(ntimes_arg), dt(dt_arg), bar(bar_arg), iv(iv_arg) {}
    private:
    	double * v;
    	double * lap;
    	unsigned long int ntimes;
    	double dt;
    	boost::barrier & bar;
    	interval iv;
    };
    
    int main()
    {
    	using namespace std;
    	const double pi = 2.0*asin(1.0);
    	double dt = 0.1;
    
    	cout << "Size: ";
    	unsigned long int sz;
    	cin >> sz;
    	cout << "Iterations: ";
    	unsigned long int count;
    	cin >> count;
    	cout << "Number of threads: ";
    	unsigned int nthreads;
    	cin >> nthreads;
    
    	//randomize v
    	double * v = new double[sz];
    	randomize(v,sz);
    	v[0] = v[sz-1] = 0.0;
    
    	//splits v into nthreads intervals
    	interval * thread_intervals = new interval[nthreads];
    	ldiv_t d = div(sz-2, static_cast<long>(nthreads));
    	unsigned int offset = 0;
    	DO_FOR(i, 0, nthreads)
    	{
    		unsigned int x = 1 + i*d.quot + offset;
    		if(d.rem-- > 0) ++offset;
    		thread_intervals[i] = interval(x, 1 + (i+1)*d.quot + offset);
    		if(i == nthreads-1) assert(1 + (i+1)*d.quot + offset == sz - 1);
    	}
    	assert(thread_intervals.size() == nthreads);
    
    	//Prints out intervals
    	DO_FOR(i,0,nthreads)
    		cout << thread_intervals[i] << endl;
    	cout << endl;
    
    	//uses multiple threads; stopwatch
    	double * lap = new double[sz];
    	boost::barrier bar(nthreads);
    	boost::thread_group threads;
    	clock_t time_with_threads;
    	clock_t beg = clock();
    	DO_FOR(i,0,nthreads)
    		threads.create_thread( task(v,lap,count,dt,bar,thread_intervals[i]) );
           threads.join_all();
    	clock_t end = clock();
    	cout << "Below are the results for " << nthreads << " threads, which took "
    		<< (time_with_threads = (end-beg)) << " ticks." << endl;
    
            print_it(v, sz);
    	cout << '\n' << endl;
    
    	//Randomize v again
    	randomize(v,sz);
    	v[0] = v[sz-1] = 0.0;
    
    	//Now do with just one thread; stopwatch
    	clock_t time_without_threads;
    	beg = clock();
    	DO_FOR(i,0,count)      //This is run once after threads finish, for comparison.
    	{
    		DO_FOR(k,1,sz-1)
    			lap[k] = (v[k+1] + v[k-1] - 2.0*v[k]) / 2.0;
    		DO_FOR(j,1,sz-1)
    			v[j] += dt*lap[j];
    	}
    	end = clock();
    	cout << "Below are the results for one thread, which took "
    		<< (time_without_threads = (end-beg)) << " ticks." << endl;
    
    	print_it(v, sz);
    	cout << endl;
    
    	delete [] lap;
    	delete [] v;
    	delete [] thread_intervals;
    
    	cout << "\nDone. Press enter to return." << endl;
    	cin.ignore();
    	cin.get();
    }
    The calls to barrier::wait() are most expensive. One can see how the size of the array would matter more than the number of iterations, since wait() is called twice per iteration regardless. Still, no matter how large the array is, on Linux I gain no time.
    Last edited by CodeMonkey; 12-16-2008 at 10:30 AM.
    "If you tell the truth, you don't have to remember anything"
    -Mark Twain

  7. #22
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Code:
    #define DO_FOR(i_,beg_,end_) for(int (i_) = (beg_); (i_) < (end_); ++(i_))
    Making our own version of the C language, by introducing new constructs for existing language features, are we?

    bar::wait() does what, exactly? Is it possible that this is implemented differently for Windows and Linux?

    Do I take it that each thread operates on the every nthread-th element in the lap array? If so, you'd have cache-coherence complications if the CPU's do not have fully shared caches [I know of no multicore CPU that has fully shared caches].

    Finally, clock is different between Windows and Linux - in Linux, if you have two threads that run at the same time, Linux will report twice the amount of CPU time - and that is what clock() returns. So you probably need to measure the actual time in seconds (gettimeofday() for example) to see how long it takes with one and many threads.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #23
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    clock() works differently on Windows and Linux. I blame that, but I first need to check this.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  9. #24
    Kiss the monkey. CodeMonkey's Avatar
    Join Date
    Sep 2001
    Posts
    937
    Hm ok.

    1. bar::wait() uses a scoped mutex to halt a thread until a certain number (nthreads) have called it. I don't know the system-level implementation.

    2. Yes, each thread operates on a unique portion of the array. I'll look into cache-coherence (never heard of it).

    3. clock() is called before and after the threads run. Still, I'll try another function as you suggested.
    "If you tell the truth, you don't have to remember anything"
    -Mark Twain

  10. #25
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Stupid me. I can't validate this on a single-core machine For me, two threads actually take about twice as long as one.

    Here's the modified code, try it out.
    Code:
    //Applies the method of relaxation to an array of floating point values
    //in order to approximate the solution to the wave equation, via a diffusion
    //equation, with boundary condition of zero at the edges.
    #include <iostream>
    #include <boost/thread/thread.hpp>
    #include <boost/thread/barrier.hpp>
    #include <cmath>     //sin()
    #include <cassert>
    #include <sys/time.h>
    
    #define DO_FOR(i_,beg_,end_) for(int (i_) = (beg_); (i_) < (end_); ++(i_))
    
    const double pi = M_PI;
    
    template<class T1, class T2>
    std::ostream & operator << (std::ostream & os, std::pair<T1,T2> & p)
    {
            os << p.first << ',' << p.second;
            return os;
    }
    
    inline void randomize(double * arg, int n)
    {
            for(unsigned int i = 0; i < n; ++i) arg[i] = std::rand();
    }
    
    void print_it(double * v, unsigned int sz)
    {
            if(sz < 150)
            {
                    DO_FOR(i,1,sz-1)
                            std::cout << ( v[i] / sin(pi/(sz-1)*i) ) << ' ';
            }
    }
    
    typedef std::pair<int,int> interval;
    
    struct task
    {
            void operator()()     //This is run per thread
            {
                    for(unsigned long int t = 0; t < ntimes; ++t)
                    {
                            for(int i = iv.first; i < iv.second; ++i)
                                    lap[i] = (v[i-1] + v[i+1] - 2.0*v[i]) / 2.0;
                            bar.wait();     //Important! Don't comment out!
                            for(int i = iv.first; i < iv.second; ++i)
                                    v[i] += dt*lap[i];
                            bar.wait();
                    }
            }
            task(double * v_arg, double * lap_arg, unsigned long int ntimes_arg,
                     double dt_arg, boost::barrier & bar_arg, interval iv_arg)
                    : v(v_arg), lap(lap_arg), ntimes(ntimes_arg), dt(dt_arg), bar(bar_arg), iv(iv_arg) {}
    private:
            double * v;
            double * lap;
            unsigned long int ntimes;
            double dt;
            boost::barrier & bar;
            interval iv;
    };
    
    int main()
    {
            using namespace std;
            const double pi = 2.0*asin(1.0);
            double dt = 0.1;
    
            cout << "Size: ";
            unsigned long int sz;
            cin >> sz;
            cout << "Iterations: ";
            unsigned long int count;
            cin >> count;
            cout << "Number of threads: ";
            unsigned int nthreads;
            cin >> nthreads;
    
            //randomize v
            double * v = new double[sz];
            randomize(v,sz);
            v[0] = v[sz-1] = 0.0;
    
            //splits v into nthreads intervals
            interval * thread_intervals = new interval[nthreads];
            ldiv_t d = div(sz-2, static_cast<long>(nthreads));
            unsigned int offset = 0;
            DO_FOR(i, 0, nthreads)
            {
                    unsigned int x = 1 + i*d.quot + offset;
                    if(d.rem-- > 0) ++offset;
                    thread_intervals[i] = interval(x, 1 + (i+1)*d.quot + offset);
                    if(i == nthreads-1) assert(1 + (i+1)*d.quot + offset == sz - 1);
            }
    
            //Prints out intervals
            DO_FOR(i,0,nthreads)
                    cout << thread_intervals[i] << endl;
            cout << endl;
    
            //uses multiple threads; stopwatch
            double * lap = new double[sz];
            boost::barrier bar(nthreads);
            boost::thread_group threads;
            //clock_t time_with_threads;
            //clock_t beg = clock();
            struct timeval beg;
            gettimeofday(&beg, 0);
            DO_FOR(i,0,nthreads)
                    threads.create_thread( task(v,lap,count,dt,bar,thread_intervals[i]) );
            threads.join_all();
            //clock_t end = clock();
            struct timeval end;
            gettimeofday(&end, 0);
            long long llbeg = (beg.tv_sec * 1000000LL) + beg.tv_usec;
            long long llend = (end.tv_sec * 1000000LL) + end.tv_usec;
            long long time_with_threads;
            cout << "Below are the results for " << nthreads << " threads, which took "
                    << (time_with_threads = (llend-llbeg)) << " usecs." << endl;
    
            print_it(v, sz);
            cout << '\n' << endl;
    
            //Randomize v again
            randomize(v,sz);
            v[0] = v[sz-1] = 0.0;
    
            //Now do with just one thread; stopwatch
            //beg = clock();
            gettimeofday(&beg, 0);
            DO_FOR(i,0,count)      //This is run once after threads finish, for comparison.
            {
                    DO_FOR(k,1,sz-1)
                            lap[k] = (v[k+1] + v[k-1] - 2.0*v[k]) / 2.0;
                    DO_FOR(j,1,sz-1)
                            v[j] += dt*lap[j];
            }
            //end = clock();
            gettimeofday(&end, 0);
            llbeg = (beg.tv_sec * 1000000LL) + beg.tv_usec;
            llend = (end.tv_sec * 1000000LL) + end.tv_usec;
            long long time_without_threads;
            cout << "Below are the results for one thread, which took "
                    << (time_without_threads = (llend-llbeg)) << " usecs." << endl;
    
            print_it(v, sz);
            cout << endl;
    
            delete [] lap;
            delete [] v;
            delete [] thread_intervals;
    
            cout << "\nDone. Press enter to return." << endl;
            cin.ignore();
            cin.get();
    }
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  11. #26
    Kiss the monkey. CodeMonkey's Avatar
    Join Date
    Sep 2001
    Posts
    937
    I had modified my code exactly as you had. Thank you for taking the time. But yes, now I get a decidedly smaller number for two threads. Thanks.
    "If you tell the truth, you don't have to remember anything"
    -Mark Twain

  12. #27
    Kiss the monkey. CodeMonkey's Avatar
    Join Date
    Sep 2001
    Posts
    937
    Actually, the gain from two threads is rather more dramatic than on Vista. I think maybe program performance is like the stock market -- a lot of people know a lot about it, but nobody really has a clue. Or maybe just I don't have a clue. Thanks again for all of your help (everyone).
    "If you tell the truth, you don't have to remember anything"
    -Mark Twain

  13. #28
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by CodeMonkey View Post
    Actually, the gain from two threads is rather more dramatic than on Vista. I think maybe program performance is like the stock market -- a lot of people know a lot about it, but nobody really has a clue. Or maybe just I don't have a clue. Thanks again for all of your help (everyone).
    I _THINK_ I know a whole lot more about performance than I _THINK_ I know about the stock market. But you are right, the concept of both is pretty simple - in performance, we essentially want as few cycles as possible, and as many of the cycles as possible should "do something useful" - knowing how to get that out of the code can be trickier than the principle says, of course. The stock-market principle is also simple: Buy when prices are low, and sell when prices are high. Knowing when prices are low and when they are high is the tricky part (because you either buy before they bottom out, or on the up, and sell before the peak or after the peak - very hard to hit the peak EXACTLY).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. library issues in CodeBlocks
    By everlearnin in forum C++ Programming
    Replies: 0
    Last Post: 06-02-2009, 08:44 PM
  2. Codeblocks crashes linux?
    By Shakti in forum Tech Board
    Replies: 1
    Last Post: 03-25-2009, 07:26 AM
  3. Boost Auto-Linking
    By Tonto in forum C++ Programming
    Replies: 5
    Last Post: 12-30-2007, 06:11 AM
  4. Replies: 2
    Last Post: 12-12-2007, 06:45 AM
  5. Integrating Boost with STLPort
    By Mario F. in forum Tech Board
    Replies: 1
    Last Post: 11-11-2006, 06:49 AM