Thread: Using multiThreads to speed up program

  1. #16
    Registered User
    Join Date
    May 2021
    Posts
    9
    Quote Originally Posted by Salem View Post
    You only need 1 bit per random number (you're only storing a true/false).
    So the whole RAND_MAX range would only need 256MB of memory.
    But the lookup is O(1) when you're done.

    By contrast, an array of just all the known primes upto RAND_MAX would contain approx 100M entries.
    Stored as 4-byte integers, this would come to 400MB.
    Since it is ordered, bsearch() is possible (in ~26 iterations per lookup) (it's O(log2(N))
    Hi again.
    So I used the sieve algorithm, initialized a bool array the size of RAND_MAX which is 2147483647
    and it is working.

    However, looping through the array to intialize with actual content is taking forever and makes my code even worse.
    The only time that it works in my favor is with huge numbers like
    55555555 randoms numbers. but anything smaller and it starts to lose effect.

    I tried using threads to initalize it simultaneously but results were the same .

    with the previous code I managed to beat my professor's code, but now he improved his code again , and I am back to the thinking board.

    He of course trust us to use the web in the look for answers.
    any ideas how to improve the sieves algorithm?

  2. #17
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Not without seeing your current code.

    Sieve of Eratosthenes - Wikipedia
    "Sieve of Eratosthenes: algorithm steps for primes below 121 (including optimization of starting from prime's square)."
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #18
    Registered User
    Join Date
    May 2021
    Posts
    9
    Quote Originally Posted by Salem View Post
    Not without seeing your current code.

    Sieve of Eratosthenes - Wikipedia
    "Sieve of Eratosthenes: algorithm steps for primes below 121 (including optimization of starting from prime's square)."
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <pthread.h>
    #include <math.h>
    
    
    #define MAX 2147483646
    char *primes;
    
    
    int *amount;
    int *num_of_threads;
    pthread_mutex_t lock;
    long sum = 0;
    long primeCounter = 0;
    
    
    int isPrime(int num)
    {
    	// if (primes[num - 1])
    	// 	return 1;
    
    
    	// else
    	// {
    		if (num <= 1)
    			return 0;
    		if (num <= 3){
    			primes[num-1]=1;
    			return 1;
    			}
    		if (num % 2 == 0 || num % 3 == 0)
    			return 0;
    		int sqr = sqrt(num);
    		for (int i = 5; i <= sqr; i = i + 6)
    		{
    			if (num % i == 0 || num % (i + 2) == 0)
    				return 0;
    		}
    		primes[num-1]=1;
    		return 1;
    	// }
    }
    
    
    void *func()
    {
    	long local_sum = 0, local_counter = 0, count = 0;
    	int limit = (*amount) / (*num_of_threads);
    	for (int i = 0; i < limit; ++i)
    	{
    		int random = rand();
    		if (isPrime(random))
    		{
    			local_sum += random;
    			local_counter++;
    			count++;
    			if (count == (*amount) / (*num_of_threads))
    				return NULL;
    		}
    	}
    	pthread_mutex_lock(&lock);
    	sum += local_sum;
    	primeCounter += local_counter;
    	if (count == (*amount))
    		return NULL;
    	pthread_mutex_unlock(&lock);
    	return NULL;
    }
    
    
    // void *func1()
    // {
    // 	// for (int i = 0; i < MAX * 0.25; i++)
    // 	// {
    // 	// 	primes[i] = 0;
    // 	// }
    
    
    // 	int limit = (sqrt(MAX) + 1) * 0.25;
    
    
    // 	for (long long i = 2; i < limit; i++)
    // 	{
    // 		if (primes[i - 1])
    // 		{
    // 			for (long long j = i * i; j <= MAX; j += i)
    // 			{
    // 				// pthread_mutex_lock(&lock);
    // 				primes[j - 1] = 0;
    // 				// pthread_mutex_unlock(&lock);
    // 			}
    // 		}
    // 	}
    
    
    // 	return NULL;
    // }
    
    
    // void *func2()
    // {
    // 	// for (int i = MAX * 0.25; i < MAX * 0.5; i++)
    // 	// {
    // 	// 	primes[i] = 0;
    // 	// }
    // 	int limit = (sqrt(MAX) + 1) * 0.5;
    
    
    // 	for (long long i = MAX * 0.25; i < limit; i++)
    // 	{
    // 		if (primes[i - 1])
    // 		{
    // 			for (long long j = i * i; j <= MAX; j += i)
    // 			{
    // 				pthread_mutex_lock(&lock);
    // 				primes[j - 1] = 0;
    // 				pthread_mutex_unlock(&lock);
    // 			}
    // 		}
    // 	}
    
    
    // 	return NULL;
    // }
    
    
    // void *func3()
    // {
    // 	// for (int i = MAX * 0.5; i < MAX * 0.75; i++)
    // 	// {
    // 	// 	primes[i] = 0;
    // 	// }
    
    
    // 	int limit = (sqrt(MAX) + 1) * 0.75;
    
    
    // 	for (long long i = MAX * 0.5; i < limit; i++)
    // 	{
    // 		if (primes[i - 1])
    // 		{
    // 			for (long long j = i * i; j <= MAX; j += i)
    // 			{
    // 				pthread_mutex_lock(&lock);
    // 				primes[j - 1] = 0;
    // 				pthread_mutex_unlock(&lock);
    // 			}
    // 		}
    // 	}
    
    
    // 	return NULL;
    // }
    
    
    // void *func4()
    // {
    // 	// for (int i = MAX * 0.75; i < MAX; i++)
    // 	// {
    // 	// 	primes[i] = 0;
    // 	// }
    
    
    // 	int limit = (sqrt(MAX) + 1);
    
    
    // 	for (long long i = MAX * 0.75; i < limit; i++)
    // 	{
    // 		if (primes[i - 1])
    // 		{
    // 			for (long long j = i * i; j <= MAX; j += i)
    // 			{
    // 				pthread_mutex_lock(&lock);
    // 				primes[j - 1] = 0;
    // 				pthread_mutex_unlock(&lock);
    // 			}
    // 		}
    // 	}
    // 	return NULL;
    // }
    
    
    int main(int argc, char *argv[])
    {
    	// primes = malloc((MAX + 1) * sizeof(char));
    	// primes[MAX]={0};
    
    
    	// pthread_t t1, t2, t3, t4;
    	// pthread_create(&t1, NULL, &func1, NULL);
    	// pthread_create(&t2, NULL, &func2, NULL);
    	// pthread_create(&t3, NULL, &func3, NULL);
    	// pthread_create(&t4, NULL, &func4, NULL);
    
    
    	// pthread_join(t1, NULL);
    	// pthread_join(t2, NULL);
    	// pthread_join(t3, NULL);
    	// pthread_join(t4, NULL);
    	// for (int i = 0; i < MAX; i++)
    	// {
    	// 	primes[i] = 0;
    	// }
    
    
    	/* Loop through a portion of the array (up to the square root of MAX). If
       * it's a prime, ensure all multiples of it are set to zero (false), as they
       * clearly cannot be prime.
       */
    	// int limit = sqrt(MAX) + 1;
    	// for (long long i = 2; i < limit; i++)
    	// {
    	// 	if (primes[i - 1])
    	// 	{
    	// 		for (long long j = i * i; j <= MAX; j += i)
    	// 		{
    	// 			primes[j - 1] = 0;
    	// 		}
    	// 	}
    	// }
    
    
    	long number_of_processors = sysconf(_SC_NPROCESSORS_ONLN);
    	num_of_threads = malloc(number_of_processors * sizeof(int));
    	*num_of_threads = number_of_processors;
    
    
    	//make sure enough arguments were received
    	if (argc != 3)
    	{
    		printf("Too few arguments ");
    		printf("USAGE: ./primeCalc <prime pivot> <num of random numbers>");
    		exit(0);
    	}
    	//receive arguments from user
    	int randomPivot = atoi(argv[1]);
    	int numOfRandomNumbers = atoi(argv[2]);
    	amount = malloc(numOfRandomNumbers * sizeof(int));
    	*amount = numOfRandomNumbers;
    
    
    	//creating thread
    	pthread_t threads[*num_of_threads];
    	srand(randomPivot);
    	//creating multiple threads , executing our function
    	for (int i = 0; i < *num_of_threads; ++i)
    	{
    		if (pthread_create(&threads[i], NULL, &func, NULL) != 0)
    		{
    			fprintf(stderr, "error: Cannot create thread # %d\n", i);
    			break;
    		}
    	}
    	for (int i = 0; i < *num_of_threads; ++i)
    	{
    		if (pthread_join(threads[i], NULL) != 0)
    		{
    			fprintf(stderr, "error: Cannot join thread # %d\n", i);
    		}
    	}
    
    
    	free(amount);
    	free(num_of_threads);
    	
    	pthread_mutex_destroy(&lock);
    	printf("%ld,%ld\n", sum, primeCounter);
    	// pthread_exit(NULL);
    	exit(0);
    }
    The code that is slashed is my Sieve attempt while using multithreads.

  4. #19
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Well that was interesting.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <limits.h>
    #include <math.h>
    #include <unistd.h>
    #include <pthread.h>
    
    #define TABLE_SIZE      ((INT_MAX/CHAR_BIT/sizeof(int)/2)+1)
    #define BITS_PER_ENTRY  (sizeof(int)*CHAR_BIT)
    
    // Halve the length of the table by only storing odd results
    // 2 is a special case.
    // If the bit is clear, it's a prime
    unsigned int table[TABLE_SIZE] = { 0 };
    
    pthread_mutex_t lock;
    long sum = 0;
    long primeCounter = 0;
    
    // map 1,3,5,7,9 etc to 0,1,2,3,4
    int valueToIndex(int value)
    {
      int result = value / 2;
      return result;
    }
    
    void setComposite(int composite)
    {
      if (composite % 2 == 0)
        return;
      int value = valueToIndex(composite);
      int pos = value / BITS_PER_ENTRY;
      int bit = value % BITS_PER_ENTRY;
      unsigned int mask = 1u << bit;
      table[pos] |= mask;
    }
    
    #ifdef USE_LOOKUP
    int isPrime(int number)
    {
      if (number <= 2)
        return 1;
      if (number % 2 == 0)
        return 0;
      int value = valueToIndex(number);
      int pos = value / BITS_PER_ENTRY;
      int bit = value % BITS_PER_ENTRY;
      unsigned int mask = 1u << bit;
      return (table[pos] & mask) == 0;
    }
    #else
    int isPrime(int num)
    {
      if (num <= 1)
        return 0;
      if (num <= 3)
        return 1;
      if (num % 2 == 0 || num % 3 == 0)
        return 0;
    
      int limit = sqrt(num);
      for (int i = 5; i <= limit; i = i + 6)
      {
        if (num % i == 0 || num % (i + 2) == 0)
          return 0;
      }
      return 1;
    }
    #endif
    
    int nextCandidatePrime(int p)
    {
      do {
        p += 2;                     // the next odd number
      } while (!isPrime(p));
      return p;
    }
    
    void createPrimeTable(int limit)
    {
      int s = sqrt(limit);
      for (int p = 3; p < s; p = nextCandidatePrime(p)) {
        int end = limit - p;
        for (int c = p * p; c < end; c += p) {
          setComposite(c);
        }
      }
    }
    
    typedef struct {
      unsigned int seed;
      int limit;
    } funcdata_st;
    
    void *func(void *p)
    {
      funcdata_st *info = (funcdata_st *) p;
      long local_sum = 0, local_counter = 0, count = 0;
      for (int i = 0; i < info->limit; ++i) {
        int random = rand_r(&info->seed); //!! NB - using rand_r !!
        if (isPrime(random)) {
          local_sum += random;
          local_counter++;
        }
      }
      pthread_mutex_lock(&lock);
      sum += local_sum;
      primeCounter += local_counter;
      pthread_mutex_unlock(&lock);
      return NULL;
    }
    
    int main(int argc, char *argv[])
    {
      //make sure enough arguments were received
      if (argc != 3) {
        printf("Too few arguments ");
        printf("USAGE: ./primeCalc <prime pivot> <num of random numbers>");
        exit(0);
      }
    #ifdef USE_LOOKUP
      createPrimeTable(INT_MAX);
      printf("Table built\n");
    #endif
    
      long number_of_processors = sysconf(_SC_NPROCESSORS_ONLN);
      int randomPivot = atoi(argv[1]);
      int numOfRandomNumbers = atoi(argv[2]);
    
      //creating thread
      pthread_t threads[number_of_processors];
      funcdata_st info[number_of_processors];
    
      pthread_mutex_init(&lock, NULL);
      for (int i = 0; i < number_of_processors; ++i) {
        info[i].seed = randomPivot + i; // makes no sense for all threads to generate the same sequence
        info[i].limit = numOfRandomNumbers / number_of_processors;
        if (pthread_create(&threads[i], NULL, func, &info[i]) != 0) {
          fprintf(stderr, "error: Cannot create thread # %d\n", i);
          break;
        }
      }
      for (int i = 0; i < number_of_processors; ++i) {
        if (pthread_join(threads[i], NULL) != 0) {
          fprintf(stderr, "error: Cannot join thread # %d\n", i);
        }
      }
    
      pthread_mutex_destroy(&lock);
      printf("%ld,%ld\n", sum, primeCounter);
      pthread_exit(NULL);
    }
    On my machine, making the sieve for all 32-bit ints takes about 9 seconds.
    It might be thread-able if it used bytes rather than bits to store whether a number was prime or not.

    Using the lookup table approach is useless for 1M primes.
    Code:
    $ gcc -DUSE_LOOKUP -O2 -g foo.c -lm -pthread
    $ time ./a.out 7 1000000
    Table built
    51378394789491,49097
    
    real	0m9.152s
    user	0m9.156s
    sys	0m0.024s
    
    $ gcc -O2 -g foo.c -lm -pthread
    $ time ./a.out 7 1000000
    51378394789491,49097
    
    real	0m0.400s
    user	0m3.076s
    sys	0m0.000s
    However, it really comes into it's own at 100M primes.
    Code:
    $ gcc -DUSE_LOOKUP -O2 -g foo.c -lm -pthread
    $ time ./a.out 7 100000000
    Table built
    5114402999725783,4891795
    
    real	0m9.884s
    user	0m15.204s
    sys	0m0.016s
    
    $ gcc -O2 -g foo.c -lm -pthread
    $ time ./a.out 7 100000000
    5114362197774199,4891775
    
    real	0m33.602s
    user	4m12.964s
    sys	0m0.092s
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #20
    Registered User
    Join Date
    May 2021
    Posts
    9
    Quote Originally Posted by Salem View Post
    Well that was interesting.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <limits.h>
    #include <math.h>
    #include <unistd.h>
    #include <pthread.h>
    
    #define TABLE_SIZE      ((INT_MAX/CHAR_BIT/sizeof(int)/2)+1)
    #define BITS_PER_ENTRY  (sizeof(int)*CHAR_BIT)
    
    // Halve the length of the table by only storing odd results
    // 2 is a special case.
    // If the bit is clear, it's a prime
    unsigned int table[TABLE_SIZE] = { 0 };
    
    pthread_mutex_t lock;
    long sum = 0;
    long primeCounter = 0;
    
    // map 1,3,5,7,9 etc to 0,1,2,3,4
    int valueToIndex(int value)
    {
      int result = value / 2;
      return result;
    }
    
    void setComposite(int composite)
    {
      if (composite % 2 == 0)
        return;
      int value = valueToIndex(composite);
      int pos = value / BITS_PER_ENTRY;
      int bit = value % BITS_PER_ENTRY;
      unsigned int mask = 1u << bit;
      table[pos] |= mask;
    }
    
    #ifdef USE_LOOKUP
    int isPrime(int number)
    {
      if (number <= 2)
        return 1;
      if (number % 2 == 0)
        return 0;
      int value = valueToIndex(number);
      int pos = value / BITS_PER_ENTRY;
      int bit = value % BITS_PER_ENTRY;
      unsigned int mask = 1u << bit;
      return (table[pos] & mask) == 0;
    }
    #else
    int isPrime(int num)
    {
      if (num <= 1)
        return 0;
      if (num <= 3)
        return 1;
      if (num % 2 == 0 || num % 3 == 0)
        return 0;
    
      int limit = sqrt(num);
      for (int i = 5; i <= limit; i = i + 6)
      {
        if (num % i == 0 || num % (i + 2) == 0)
          return 0;
      }
      return 1;
    }
    #endif
    
    int nextCandidatePrime(int p)
    {
      do {
        p += 2;                     // the next odd number
      } while (!isPrime(p));
      return p;
    }
    
    void createPrimeTable(int limit)
    {
      int s = sqrt(limit);
      for (int p = 3; p < s; p = nextCandidatePrime(p)) {
        int end = limit - p;
        for (int c = p * p; c < end; c += p) {
          setComposite(c);
        }
      }
    }
    
    typedef struct {
      unsigned int seed;
      int limit;
    } funcdata_st;
    
    void *func(void *p)
    {
      funcdata_st *info = (funcdata_st *) p;
      long local_sum = 0, local_counter = 0, count = 0;
      for (int i = 0; i < info->limit; ++i) {
        int random = rand_r(&info->seed); //!! NB - using rand_r !!
        if (isPrime(random)) {
          local_sum += random;
          local_counter++;
        }
      }
      pthread_mutex_lock(&lock);
      sum += local_sum;
      primeCounter += local_counter;
      pthread_mutex_unlock(&lock);
      return NULL;
    }
    
    int main(int argc, char *argv[])
    {
      //make sure enough arguments were received
      if (argc != 3) {
        printf("Too few arguments ");
        printf("USAGE: ./primeCalc <prime pivot> <num of random numbers>");
        exit(0);
      }
    #ifdef USE_LOOKUP
      createPrimeTable(INT_MAX);
      printf("Table built\n");
    #endif
    
      long number_of_processors = sysconf(_SC_NPROCESSORS_ONLN);
      int randomPivot = atoi(argv[1]);
      int numOfRandomNumbers = atoi(argv[2]);
    
      //creating thread
      pthread_t threads[number_of_processors];
      funcdata_st info[number_of_processors];
    
      pthread_mutex_init(&lock, NULL);
      for (int i = 0; i < number_of_processors; ++i) {
        info[i].seed = randomPivot + i; // makes no sense for all threads to generate the same sequence
        info[i].limit = numOfRandomNumbers / number_of_processors;
        if (pthread_create(&threads[i], NULL, func, &info[i]) != 0) {
          fprintf(stderr, "error: Cannot create thread # %d\n", i);
          break;
        }
      }
      for (int i = 0; i < number_of_processors; ++i) {
        if (pthread_join(threads[i], NULL) != 0) {
          fprintf(stderr, "error: Cannot join thread # %d\n", i);
        }
      }
    
      pthread_mutex_destroy(&lock);
      printf("%ld,%ld\n", sum, primeCounter);
      pthread_exit(NULL);
    }
    On my machine, making the sieve for all 32-bit ints takes about 9 seconds.
    It might be thread-able if it used bytes rather than bits to store whether a number was prime or not.

    Using the lookup table approach is useless for 1M primes.
    Code:
    $ gcc -DUSE_LOOKUP -O2 -g foo.c -lm -pthread
    $ time ./a.out 7 1000000
    Table built
    51378394789491,49097
    
    real    0m9.152s
    user    0m9.156s
    sys    0m0.024s
    
    $ gcc -O2 -g foo.c -lm -pthread
    $ time ./a.out 7 1000000
    51378394789491,49097
    
    real    0m0.400s
    user    0m3.076s
    sys    0m0.000s
    However, it really comes into it's own at 100M primes.
    Code:
    $ gcc -DUSE_LOOKUP -O2 -g foo.c -lm -pthread
    $ time ./a.out 7 100000000
    Table built
    5114402999725783,4891795
    
    real    0m9.884s
    user    0m15.204s
    sys    0m0.016s
    
    $ gcc -O2 -g foo.c -lm -pthread
    $ time ./a.out 7 100000000
    5114362197774199,4891775
    
    real    0m33.602s
    user    4m12.964s
    sys    0m0.092s
    Thank you! This indeed matches my professor's code,
    which feels out of my current level. But it helped me understand better how threads work.

  6. #21
    Registered User
    Join Date
    Apr 2021
    Posts
    139
    For the sieve version, have you tried putting the table generator in a separate thread, and allowing the primes to wait on generation? Actually, maybe several threads? (If you're using bool entries you don't need to synchronize, since it's always storing a 1. If you're using a bit-array, you'd need to synchronize, which might make things too slow.)

    On the generator side, maybe add the candidates to a min-heap or something, one per nthreads, and let the threads keep checking the lowest candidate value against the progress made on prime generation. After all, you can short-circuit the test whenever you get a positive result.

  7. #22
    Registered User I C everything's Avatar
    Join Date
    Apr 2019
    Posts
    101
    Must you use rand or are you free to use own inlined random generator?
    And are you allowed write 64bit version?

    Salem,I made some prime number code that stores a zero int in array when it finds a prime,could that work with multiple threads?
    Last edited by I C everything; 05-17-2021 at 04:54 AM.
    you tell me you can C,why dont you C your own bugs?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Run a program on multithreads and compare speed!
    By lordvader in forum C Programming
    Replies: 5
    Last Post: 06-16-2020, 11:01 PM
  2. Multithreads for windows
    By Jablan in forum C Programming
    Replies: 3
    Last Post: 03-09-2014, 05:38 AM
  3. Completion Port and Multithreads :: MFC
    By kuphryn in forum Windows Programming
    Replies: 0
    Last Post: 11-06-2002, 11:37 PM
  4. multithreads
    By JagWire in forum C Programming
    Replies: 1
    Last Post: 06-28-2002, 11:22 AM
  5. Multithreads & Pointer to bool :: MFC
    By kuphryn in forum Windows Programming
    Replies: 7
    Last Post: 06-26-2002, 10:50 AM

Tags for this Thread