Thread: Curious perfomance difference

  1. #1
    Kiss the monkey. CodeMonkey's Avatar
    Join Date
    Sep 2001
    Posts
    937

    Curious perfomance difference

    *edit*
    Whoops! You can disregard this post. It turns out that I was doing slightly different things with each container, leading to different optimization opportunities. Below is the fixed code, and, as expected, the disparities between g++ and icc are far more subtle.
    *edit*


    I've been using a 2-dimensional c-style array wrapper class array2d to do some computation. It's nice and fast. But I've been reading about optimization, and so I decided to try a valarray-based 2-dimensional container class.

    In compiling my comparison program with g++ set with -O3, I see that the valarray performs quite better in almost every situation.

    In compiling with icc set with -fast (and I'm on an Intel chip), I see that array2d is substantially faster.

    Also, the g++-compiled program is faster overall. These results surprise me. I was under the naive impression that Intel would burn it. To what might I attribute the disparities?

    I'm running Ubuntu 8.04 on an Intel T2060 @ 1.60 GHz / Core.

    *edit*
    For anyone who's interested in running the test on their machine, here's the code crammed into one file. Just make sure only one of USE_STD_TIME or USE_UNIX_TIME is defined.
    Code:
    
    //Uncomment one of the following:
    #define USE_UNIX_TIME
    //#define USE_STD_TIME
    
    
    //A minimal wrapper for a 2-dimensional array.
    
    #include <utility>
    #ifndef NDEBUG
    #include <iostream>
    #endif
    
    template<typename T, typename uint = std::size_t>
    class array2d
    {
      private:
        //array2d & operator = (array2d<T,uint> & other);
      protected:
    	T** data;
    	const uint w;
    	const uint h;
    	const uint sz;
      public:
        	typedef T data_type;
        	typedef uint size_type;
        	typedef std::pair<uint, uint> point2d;
    
    	void alloc(const T & fill = T());
    	array2d(uint width_, uint height_, const T & fill = T()) : w(width_), h(height_), sz(width_*height_)
    		{ alloc(fill); }
    	array2d & operator = (const array2d<T, uint> & other);
    	array2d(array2d<T,uint> & other) : w(other.width()), h(other.height()), sz(other.size())
    		{ alloc(); *this = other; }
    	array2d(const array2d<T,uint> & other) : w(other.width()), h(other.height()), sz(other.size())
    		{ alloc(); *this = other; }
    	~array2d();
    	const uint size() const { return sz; }
    	const uint width() const { return w; }
    	const uint height() const { return h; }
    	T & operator()(uint x, uint y) { return data[x][y]; }
    	const T & operator()(uint x, uint y) const { return data[x][y]; }
    	T & operator()(point2d p) { return (*this)(p.first, p.second); }
    };
    
    template<typename T, typename uint>
    void array2d<T,uint>::alloc(const T & fill)
    {
    	data = new T*[w];
    	for(uint i = 0; i < w; ++i)
    	{
    		data[i] = new T[h];
    		for(uint j = 0; j < h; ++j)
                		data[i][j] = fill;
    	}
    }
    
    template<typename T, typename uint>
    array2d<T,uint> & array2d<T,uint>::operator = (const array2d<T,uint> & other)
    {
    	#ifndef NDEBUG
    	std::cout << "!!!!!!!! WARNING array2d BEING COPIED !!!!!!!!" << std::endl;
    	#endif
    	assert(other.width() == w && other.height() == h);
    	for(uint i = 0; i < w; ++i)
    	{
    		for(uint j = 0; j < h; ++j)
    			data[i][j] = other(i,j);
    	}
    	return *this;
    }
    
    template<typename T, typename uint>
    array2d<T,uint>::~array2d()
    {
       if(data)
       {
    	for(uint i = 0; i < w; ++i)
    		delete [] data[i];
    	delete [] data;
    	data = 0;
       }
    }
    
    //Iterates f over a rectangular region indicated by the points
    //top_left_point and bottom_right_point
    template<typename T, typename uint, class ternary_func_t>
    inline void for_each(array2d<T,uint> & a, const typename array2d<T,uint>::point2d & top_left_point,
    		const typename array2d<T,uint>::point2d & bottom_right_point, ternary_func_t & f)
    {
    	for(uint i = top_left_point.first; i < bottom_right_point.first; ++i)
    	{
    		for(uint j = top_left_point.second; j < bottom_right_point.second; ++j)
    			f( i, j, a );   //notice the order of the arguments
    	}
    }
    
    template<typename T, typename uint, class ternary_func_t>
    inline void for_each(array2d<T,uint> & a, ternary_func_t & f)
    {
    	for(uint i = 0; i < a.width(); ++i)
    	{
    		for(uint j = 0; j < a.height(); ++j)
    			f( i, j, a );   //notice the order of the arguments
    	}
    }
    
    template<typename uint>
    inline std::pair<uint, uint> make_point(uint x, uint y)
    {
        return std::make_pair(x,y);
    }
    
    template<typename T, typename uint>
    inline std::ostream & operator << (std::ostream & os, typename array2d<T,uint>::point2d & p)
    {
        return os << p.first << '\t' << p.second;
    }
    
    
    //Interface for a simple stopwatch. I made my own since
    //std::clock() is screwy when multithreading on POSIX,
    //so I have an std implementation using clock() ("stopwatch_std.cpp")
    //and a linux implementation using gettimeofday() ("stopwatch_linux.cpp").
    //I could have just used boost's solution, since boost::thread already
    //uses it -- but that would have been smart.
    
    class stopwatch
    {
    	public:
    		void start();
    		void stop();
    		const bool running() const { return rng; }
    		const unsigned long long int read();
    		static const bool auto_start = true;
    		stopwatch(bool auto_start_ = !auto_start) : curr(0), data(0), rng(false)
    		{
    			initialize_data();
    			if(auto_start_ == auto_start) start();
    		}
    		~stopwatch();
    	private:
    		void initialize_data();
    		void calc_diff();
    		void* data;
    		unsigned long long int curr;
    		bool rng;
    };
    
    
    
    #ifdef USE_STD_TIME
    
    #include <ctime>
    #include <cassert>
    
    void stopwatch::start()
    {
    	if(rng || (data == 0))
    		return;
    	((std::clock_t*)data)[0] = std::clock();
    	rng = true;
    }
    
    void stopwatch::stop()
    {
    	if(!rng || (data == 0))
    		return;
    	((std::clock_t*)data)[1] = std::clock();
    	rng = false;
    	calc_diff();
    }
    
    void stopwatch::calc_diff()
    {
    	assert(!rng && (data != 0));
    	std::clock_t & beg = ((std::clock_t*)data)[0];
    	std::clock_t & end = ((std::clock_t*)data)[1];
    	curr = end - beg;
    }
    
    const unsigned long long int stopwatch::read()
    {
    	if(!rng) return curr;
    	std::clock_t now = std::clock();
    	std::clock_t & beg = ((std::clock_t*)data)[0];
    	return now - beg;
    }
    
    void stopwatch::initialize_data()
    {
    	assert(data == 0);
    	data = new std::clock_t[2];
    }
    
    stopwatch::~stopwatch()
    {
    	if(data)
    	{
    		delete [] (std::clock_t*)(data);
    		data = 0;
    	}
    }
    
    #endif
    
    
    #ifdef USE_UNIX_TIME
    
    #include <sys/time.h>
    #include <cassert>
    
    void stopwatch::start()
    {
    	if(rng || (data == 0))
    		return;
    	gettimeofday(&((timeval*)data)[0],0);
    	rng = true;
    }
    
    void stopwatch::stop()
    {
    	if(!rng || (data == 0))
    		return;
    	gettimeofday(&((timeval*)data)[1],0);
    	rng = false;
    	calc_diff();
    }
    
    void stopwatch::calc_diff()
    {
    	timeval & beg = ((timeval*)data)[0];
    	timeval & end = ((timeval*)data)[1];
    	curr = (end.tv_sec*100000LL+end.tv_usec) - (beg.tv_sec*100000LL+beg.tv_usec);
    }
    
    const unsigned long long int stopwatch::read()
    {
    	if(!rng) return curr;
    	timeval now;
    	timeval & beg = ((timeval*)data)[0];
    	gettimeofday(&now,0);
    	return (now.tv_sec*100000LL+now.tv_usec) - (beg.tv_sec*100000LL+beg.tv_usec);
    }
    
    void stopwatch::initialize_data()
    {
    	assert(data == 0);
    	data = new timeval[2];
    }
    
    stopwatch::~stopwatch()
    {
    	if(data)
    	{
    		delete [] (timeval*)(data);
    		data = 0;
    	}
    }
    
    #endif
    
    // Program to test slices and a simple N*M matrix class
    
    // pp 670-674 and 683-684
    
    // No guarantees offered. Constructive comments to [email protected]
    
    
    #include<iostream>
    #include<valarray>
    #include<algorithm>
    #include<numeric>	// for inner_product
    using namespace std;
    
    // forward declarations to allow friend declarations:
    template<class T> class Slice_iter;
    template<class T> bool operator==(const Slice_iter<T>&, const Slice_iter<T>&);
    template<class T> bool operator!=(const Slice_iter<T>&, const Slice_iter<T>&);
    template<class T> bool operator< (const Slice_iter<T>&, const Slice_iter<T>&);
    
    template<class T> class Slice_iter {
    	valarray<T>* v;
    	slice s;
    	size_t curr;	// index of current element
    
    	T& ref(size_t i) const { return (*v)[s.start()+i*s.stride()]; }
    public:
    	Slice_iter(valarray<T>* vv, slice ss) :v(vv), s(ss), curr(0) { }
    
    	Slice_iter end() const
    	{
    		Slice_iter t = *this;
    		t.curr = s.size();	// index of last-plus-one element
    		return t;
    	}
    
    	Slice_iter& operator++() { curr++; return *this; }
    	Slice_iter operator++(int) { Slice_iter t = *this; curr++; return t; }
    
    	T& operator[](size_t i) { return ref(i); }		// C style subscript
    	T& operator()(size_t i) { return ref(i); }		// Fortran-style subscript
    	T& operator*() { return ref(curr); }			// current element
    
    	friend bool operator==<>(const Slice_iter& p, const Slice_iter& q);
    	friend bool operator!=<>(const Slice_iter& p, const Slice_iter& q);
    	friend bool operator< <>(const Slice_iter& p, const Slice_iter& q);
    
    };
    
    
    template<class T>
    bool operator==(const Slice_iter<T>& p, const Slice_iter<T>& q)
    {
    	return p.curr==q.curr
    		&& p.s.stride()==q.s.stride()
    		&& p.s.start()==q.s.start();
    }
    
    template<class T>
    bool operator!=(const Slice_iter<T>& p, const Slice_iter<T>& q)
    {
    	return !(p==q);
    }
    
    template<class T>
    bool operator<(const Slice_iter<T>& p, const Slice_iter<T>& q)
    {
    	return p.curr<q.curr
    		&& p.s.stride()==q.s.stride()
    		&& p.s.start()==q.s.start();
    }
    
    
    //-------------------------------------------------------------
    
    
    
    // forward declarations to allow friend declarations:
    template<class T> class Cslice_iter;
    template<class T> bool operator==(const Cslice_iter<T>&, const Cslice_iter<T>&);
    template<class T> bool operator!=(const Cslice_iter<T>&, const Cslice_iter<T>&);
    template<class T> bool operator< (const Cslice_iter<T>&, const Cslice_iter<T>&);
    
    
    template<class T> class Cslice_iter 
    {
    	valarray<T>* v;
    	slice s;
    	size_t curr; // index of current element
    	const T& ref(size_t i) const { return (*v)[s.start()+i*s.stride()]; }
    public:
    	Cslice_iter(valarray<T>* vv, slice ss): v(vv), s(ss), curr(0){}
    	Cslice_iter end() const
    	{
    		Cslice_iter t = *this;
    		t.curr = s.size(); // index of one plus last element
    		return t;
    	}
    	Cslice_iter& operator++() { curr++; return *this; }
    	Cslice_iter operator++(int) { Cslice_iter t = *this; curr++; return t; }
    	
    	const T& operator[](size_t i) const { return ref(i); }
    	const T& operator()(size_t i) const { return ref(i); }
    	const T& operator*() const { return ref(curr); }
    
    	friend bool operator==<>(const Cslice_iter& p, const Cslice_iter& q);
    	friend bool operator!=<>(const Cslice_iter& p, const Cslice_iter& q);
    	friend bool operator< <>(const Cslice_iter& p, const Cslice_iter& q);
    
    };
    
    template<class T>
    bool operator==(const Cslice_iter<T>& p, const Cslice_iter<T>& q)
    {
    	return p.curr==q.curr
    		&& p.s.stride()==q.s.stride()
    		&& p.s.start()==q.s.start();
    }
    
    template<class T>
    bool operator!=(const Cslice_iter<T>& p, const Cslice_iter<T>& q)
    {
    	return !(p==q);
    }
    
    template<class T>
    bool operator<(const Cslice_iter<T>& p, const Cslice_iter<T>& q)
    {
    	return p.curr<q.curr
    		&& p.s.stride()==q.s.stride()
    		&& p.s.start()==q.s.start();
    }
    
    
    //-------------------------------------------------------------
    
    
    class Matrix {
    	valarray<long double>* v;	// stores elements by column as described in 22.4.5
    	size_t d1, d2;	// d1 == number of columns, d2 == number of rows
    public:
    	Matrix(size_t x, size_t y);		// note: no default constructor
    	Matrix(const Matrix&);
    	Matrix& operator=(const Matrix&);
    	~Matrix();
    	
    	size_t size() const { return d1*d2; }
    	size_t dim1() const { return d1; }
    	size_t dim2() const { return d2; }
    
    	Slice_iter<long double> row(size_t i);
    	Cslice_iter<long double> row(size_t i) const;
    
    	Slice_iter<long double> column(size_t i);
    	Cslice_iter<long double> column(size_t i) const;
    
    	long double& operator()(size_t x, size_t y);					// Fortran-style subscripts
    	const long double& operator()(size_t x, size_t y) const;
    	//long double operator()(size_t x, size_t y) const;
    
    	Slice_iter<long double> operator()(size_t i) { return column(i); }
    	Cslice_iter<long double> operator()(size_t i) const { return column(i); }
    
    	Slice_iter<long double> operator[](size_t i) { return column(i); }	// C-style subscript
    	Cslice_iter<long double> operator[](size_t i) const { return column(i); }
    
    	Matrix& operator*=(long double);
    
    	valarray<long double>& array() { return *v; }
    };
    
    
    inline Slice_iter<long double> Matrix::row(size_t i)
    {
    	return Slice_iter<long double>(v,slice(i,d1,d2));
    }
    
    inline Cslice_iter<long double> Matrix::row(size_t i) const
    {
    	return Cslice_iter<long double>(v,slice(i,d1,d2));
    }
    
    inline Slice_iter<long double> Matrix::column(size_t i)
    {
    	return Slice_iter<long double>(v,slice(i*d2,d2,1));
    }
    
    inline Cslice_iter<long double> Matrix::column(size_t i) const
    {
    	return Cslice_iter<long double>(v,slice(i*d2,d2,1));
    }
    
    Matrix::Matrix(size_t x, size_t y)
    {
    	// check that x and y are sensible
    	d1 = x;
    	d2 = y;
    	v = new valarray<long double>(x*y);
    }
    
    Matrix::~Matrix()
    {
    	delete v;
    }
    
    long double & Matrix::operator()(size_t x, size_t y)
    {
    	return column(x)[y];
    }
    
    const long double & Matrix::operator()(size_t x, size_t y) const
    {
    	return column(x)[y];
    }
    
    
    
    //-------------------------------------------------------------
    
    
    
    
    long double mul(const Cslice_iter<long double>& v1, const valarray<long double>& v2)
    {
    	long double res = 0;
    	for (size_t i = 0; i<v2.size(); i++) res+= v1[i]*v2[i];
    	return res;
    }
    
    
    valarray<long double> operator*(const Matrix& m, const valarray<long double>& v)
    {
    	if (m.dim1()!=v.size()) cerr << "wrong number of elements in m*v\n";
    
    	valarray<long double> res(m.dim2());
    	for (size_t i = 0; i<m.dim2(); i++) res[i] = mul(m.row(i),v);
    	return res;
    }
    
    
    // alternative definition of m*v
    
    //valarray<long double> operator*(const Matrix& m, valarray<long double>& v)
    valarray<long double> mul_mv(const Matrix& m, valarray<long double>& v)
    {
    	if (m.dim1()!=v.size()) cerr << "wrong number of elements in m*v\n";
    
    	valarray<long double> res(m.dim2());
    
    	for (size_t i = 0; i<m.dim2(); i++) {
    		const Cslice_iter<long double>& ri = m.row(i);
    		res[i] = inner_product(ri,ri.end(),&v[0],double(0));
    	}
    	return res;
    }
    
    
    
    valarray<long double> operator*(valarray<long double>& v, const Matrix& m)
    {
    	if (v.size()!=m.dim2()) cerr << "wrong number of elements in v*m\n";
    
    	valarray<long double> res(m.dim1());
    
    	for (size_t i = 0; i<m.dim1(); i++) {
    		const Cslice_iter<long double>& ci = m.column(i);
    		res[i] = inner_product(ci,ci.end(),&v[0],double(0));
    	}
    	return res;
    }
    
    Matrix& Matrix::operator*=(long double d)
    {
    	(*v) *= d;
    	return *this;
    }
    
    ostream& operator<<(ostream& os, Matrix& m)
    {
    	for(int y=0; y<m.dim2(); y++)
    	{
    		for(int x=0; x<m.dim1(); x++)
    			os<<m[x][y]<<"\t";
    		os << "\n";
    	}
    	return os;
    }
    
    
    //-------------------------------------------------------------
    
    #include <iostream>
    
    template<class container, class T>
    struct lap
    {
    	lap( container & laplacian_ ) : laplacian(laplacian_) {}
    	void operator()(size_t x, size_t y, const container & a)
    	{
    		laplacian(x,y) = ( a(x+1,y) + a(x-1,y) + a(x,y+1) + a(x,y-1) - 4.0 * a(x,y) );
    	}
    	private: 
    		container & laplacian;
    };
    
    template<class container, class T>
    struct relax
    {
    	relax( const container & laplacian_ , T dt_ ) : laplacian(laplacian_), dt(dt_) {}
    	void operator()(size_t x, size_t y, container & a)
    	{
    		a(x,y) = dt * laplacian(x,y);
    	}
    	private: 
    		const container & laplacian;
    		T dt;
    };
    
    int main()
    {
    	size_t n, w, h;
    	cout << "n: ";
    	cin >> n;
    	cout << "w: ";
    	cin >> w;
    	cout << "h: ";
    	cin >> h;
    	
    	//Do it with array2d
    	///////////////////////////////////////////////////////////////////////////
    	array2d<long double> a(w,h,1.0);
    	array2d<long double> alap(w,h,0.0);
    	
    	lap< array2d<long double>, long double > alapf(alap);
    	relax< array2d<long double>, long double > arelaxf(alap, 0.24);
    	
    	stopwatch sw(stopwatch::auto_start);
    	for(size_t i = 0; i < n; ++i)
    	{
    		for_each( a, array2d<long double>::point2d(1,1), array2d<long double>::point2d(w-2,h-2), alapf );
    		for_each( a, array2d<long double>::point2d(1,1), array2d<long double>::point2d(w-2,h-2), arelaxf );
    	}
    	sw.stop();
    	cout << "That took array2d<long double> " << sw.read() << " ticks." << endl;
    	/////////////////////////////////////////////////////////////////////////////
    	//end array2d
    	
    	//Do it with Matrix
    	////////////////////////////////////////////////////////////////////////////
    	Matrix m(w,h);
    	Matrix mlap(w,h);
    	
    	lap< Matrix, long double > mlapf(mlap);
    	relax< Matrix, long double > mrelaxf(mlap, 0.24);
    	
    	sw.start();
    	for(size_t i = 0; i < n; ++i)
    	{
    		for(int x = 1; x < w-2; ++x)
    		{
    			for(int y = 1; y < h-2; ++y)
    				mlapf(x,y,m);
    		}
    		for(int x = 1; x < w-2; ++x)
    		{
    			for(int y = 1; y < h-2; ++y)
    				mrelaxf(x,y,m);
    		}
    	}
    	sw.stop();
    	cout << "That took Matrix " << sw.read() << " ticks." << endl;
    	//////////////////////////////////////////////////////////////////////////
    	//end Matrix
    }
    Last edited by CodeMonkey; 06-13-2009 at 06:37 PM. Reason: whoops
    "If you tell the truth, you don't have to remember anything"
    -Mark Twain

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Code:
    #endif
    
    
    #ifdef USE_UNIX_TIME
    I personally would use #elseif in such case - but in this case I'd put the special case of Unix time first, then only use #else at the end, so that if there wasn't ANY definition, it defaults to using standard C time.

    Code:
    		static const bool auto_start = true;
    		stopwatch(bool auto_start_ = !auto_start) : curr(0), data(0), rng(false)
    		{
    			initialize_data();
    			if(auto_start_ == auto_start) start();
    		}
    The auto-start looks fairly complicated. And if we set auto_start to false, and pass in false, it will start, since auto_start_ is now true, and auto_start is false. Is that what you intended?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Kiss the monkey. CodeMonkey's Avatar
    Join Date
    Sep 2001
    Posts
    937
    Ah. No -- auto_start is just a read-only indicator. Perhaps an enum would have been more clear. The idea is that the user passes auto_start into the initializer in order to start it automatically, and the user passes nothing in order not to automatically start it.
    And thanks for the preprocessor correction. I only put these several files into one very hastily.
    "If you tell the truth, you don't have to remember anything"
    -Mark Twain

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Difference Equations / Recurrence Relations
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 4
    Last Post: 10-05-2007, 10:26 AM
  2. What's the difference between var++ and ++var
    By ulillillia in forum C Programming
    Replies: 6
    Last Post: 05-31-2007, 02:27 AM
  3. Replies: 6
    Last Post: 08-26-2006, 11:09 PM
  4. Replies: 10
    Last Post: 11-06-2005, 09:29 PM
  5. What is the Difference between ANSI C and non-ANSI C ?
    By Unregistered in forum C Programming
    Replies: 1
    Last Post: 11-24-2001, 06:55 AM