Efficient large prime checker

**johan.g1** · 11-18-2012

I need check large integers (up to 10 million) for primality. My usual way is as follows:

Code:

int prime[MAX] = {0}, i, j;

for(i=2;i<MAX;i++)
   prime[i] = 1;

for(i=2;i<MAX;i++)
   for(j=i+i;j<MAX;j+=i)
      prime[j] = 0;

But I get "Segmentation Fault: 11" when I declare MAX to be 10 million. I don't understand the theory, but I presume this error is related to array capacity in C. So is there any way to increase the capacity? If not, what other methods can I use?

**Salem** · 11-18-2012

This is stored on the stack.

Code:

int main ( ) {
    int prime[MAX] = {0}, i, j;
    //// more code
    return 0;
}

This isn't stored on the stack.

Code:

    static int prime[MAX] = {0}, i, j;
    //// more code
    return 0;

It's not a C issue, but most implementations restrict the amount of stack space given to each process.
For most desktop systems, the default stack size is somewhere between 1 and 8MB.

> So is there any way to increase the capacity? If not, what other methods can I use?
Using an int to store only 0 or 1 is very space inefficient.
You could try an array of unsigned chars instead.

**grumpy** · 11-18-2012

Originally Posted by Salem

Using an int to store only 0 or 1 is very space inefficient.
You could try an array of unsigned chars instead.

And, if you really want, you can use the individual bits in the unsigned chars - and represent eight value per char. That way, MAX doesn't need to exceed 1.25 million. That is easily workable for any 32-bit system (assuming you resolve quotas and other concerns).

**Adak** · 11-18-2012

First, add the limits.h header file to a program:

Code:

#include <limits.h>
#include <stdio.h>

int main(void) {

   printf("Note: These are your highest values:\n\t\t         for an unsigned long int: %lu\n   \
   and for an unsigned long long int: %llu \n",ULONG_MAX,ULLONG_MAX); 

   return 0;
}

Because testing for primality with numbers < 10 million, is considered working with lower primes.

For working with lower prime testing, you MIGHT be able to work with the Sieve of Eratosthenes - without even bit packing any numbers. It's very easy and very fast - nothing faster for smaller prime checking, actually.

There is also a technique called "Wheel Factorization" which can also be used to eliminate 80-99% of the non-prime numbers. That would really cut the problem down to size.

Run the above, and report back what it says. Then we'll be able to know what we're talking about for your system.

**johan.g1** · 11-18-2012

I thought my method above is the Sieve of Eratosthenes? Here's what I got.

Code:

Note: These are your highest values:                 for an unsigned long int: 18446744073709551615
              and for an unsigned long long int: 18446744073709551615

Btw, I just implemented grumpy's idea. It turns out it can handle up to around 60 million. The bad news is, I just realized that I need much more than that. Probably I'll need to check up to 200 million or so.

**grumpy** · 11-18-2012

No need for him to report back. The standard actually gives a pretty good idea of what is expected from that code, Adak.

The standard guarantees what ULONG_MAX is not smaller than 4294967295, which is easily sufficient to represent values up to 10 million. Bigger values than that will mean a 64 bit system. ULLONG_MAX might not be supported on some older compilers though (since support of long long types only became standard in 1999).

SIZE_MAX (the upper limit of a size_t, and therefore of the length of an array) is only guaranteed to exceed 65535. Although, on 32 bit systems, it is a fair bet that SIZE_MAX will be roughly equal to that lower limit on ULONG_MAX. On a 16 bit system it probably won't.

**Adak** · 11-18-2012

Originally Posted by johan.g1

I thought my method above is the Sieve of Eratosthenes? Here's what I got.

Code:

Note: These are your highest values:                 for an unsigned long int: 18446744073709551615
              and for an unsigned long long int: 18446744073709551615

Btw, I just implemented grumpy's idea. It turns out it can handle up to around 60 million. The bad news is, I just realized that I need much more than that. Probably I'll need to check up to 200 million or so.

Your code is Sieve of Eratosthenes, it's just different than mine, and I only glanced at it. Does it work OK?

Check your SIZE_MAX, and let's get a max size for your arrays - that's a HUGE size for unsigned long int's, btw - I'm shocked it's the same size as the unsigned long long int!

**Adak** · 11-18-2012

I didn't have a SIZE_MAX, but I did find out my 64 bit Pelles C handles int arrays of 200 Million in global space! On this compiler and system (Windows 7 64 bit), size_t is an unsigned long.

Since Johan has a 64 bit system and compiler as well, we may be easily able to handle this in a single large array. I quite like this BIG array capability!

Edit: made it to 500 Million!

**johan.g1** · 11-18-2012

Sorry, but I don't understand what you guys are talking about.

(I'm almost a complete beginner in programming.) How exactly did you make it to 500 million?

**Adak** · 11-18-2012

Originally Posted by johan.g1

Sorry, but I don't understand what you guys are talking about.

(I'm almost a complete beginner in programming.) How exactly did you make it to 500 million?

We do like to prattle on, (OK, I like to prattle on!)

Code:

#include <stdio.h>
#include <limits.h>

#define MAX 500000000

int A[MAX];

int main(void) {
   unsigned long j;
   int count=0,k=1;
   printf("%lu is the largest array possible\n",ULONG_MAX);
   for(j=0;j<200000000;j++,k++) {
      if(k==1000000) {
         printf("k reached a million %d times\n",++count);
         k=1;
      }
   }

 
   return 0;
}

The ULONG_MAX value is larger, but at array[600 Million], I get the error that the array is larger than ULONG MAX, in bytes

See how your system behaves with the above program.

If you can make a 200 Million array of int's, then I'd suggest using that data type.

**Andreea** · 11-18-2012

As a matter of fact for large prime numbers it is customary to use a primality test. The Miller-Rabin primality test (see wikipedia) is used by RSA. It is a probabilistic algorithm for determining whether a number is prime or composite.
A standard implementation is using as already suggested bit masks.
If a number fails Miller's primality test for some base a it is not a prime number. If the number passes, it may be a prime. A composite number passing Miller-Rabin's test is called a strong pseudoprime to base a. However it should work for you since the smallest number that is strong pseudoprime to base 2 and would be hidden by the test is 3215031751.
I do not know if we are allowed to suggest where you can find an implementation or not.

**Nominal Animal** · 11-18-2012

Originally Posted by Andreea

However it should work for you since the smallest number that is strong pseudoprime to base 2 and would be hidden by the test is 3215031751.

I'm sorry, I don't follow.

Consider n = 2047, so s = 1, d = 1023, 2^s·d = n - 1 = 2046. n = 2047 is a strong pseudoprime to base 2, since a^d mod n = 2¹⁰²³ mod 2047 = 1, but n is not a prime: 2047 = 23·89.

Originally Posted by Andreea

suggest where you can find an implementation or not.

Perhaps pointing to the pseudocode shown in the Wikipedia article would be a good compromise between theory and implementation?

The Wikipedia article mentions that for integers n < 4759123141 (covering all 32-bit unsigned integers) one needs only test a = 2, 7, 61; for n < 2152302898747, a = 2, 3, 5, 7, and 11; for n < 3474749660383, a = 2, 3, 5, 7, 11, and 13; and for n < 341550071728321, a = 2, 3, 5, 7, 11, 13, and 17. If any of the tests report the number to be composite, then it is composite; otherwise it is prime.

For any possible 64-bit unsigned integer (n < 18446744073709551616), you need to test at most the first 12 primes (a = 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37).

The article states that "write (n−1) as 2^s·d with d odd", which may sound complicated, but is actually trivial. s is the number of zero bits in the least significant positions in (n-1), and d = (n-1) >> s. Or, in C:

Code:

unsigned long  d = n - 1UL;
unsigned long  s = 0UL;

while (!(d & 1UL)) {
    s++;
    d >>= 1UL;
}

For an efficient implementation, consider how you compute x = a^d mod n. See the Wikipedia article on modular exponentiation, especially the right-to-left binary method.

**Adak** · 11-18-2012

He has a 64 bit system. He has a Sieve program already, now if he can create a single array large enough to handle 200 Million ints, then he's got everything he needs.

If he wants to test for primality in the medium and high range of numbers, then he'll definitely need primality testers, instead of arrays.

**Nominal Animal** · 11-18-2012

Originally Posted by Adak

He has a Sieve program already

The OP didn't actually specify whether he needs to find all primes, or just to check whether some numbers (between one and some limit in the millions) are primes or not.

If you have just a few hundred or thousand input values, with the values ranging from zero to hundreds of millions, then a deterministic primality check might be more efficient.

If you have lots of input, then it is likely more efficient to precompute a bit array corresponding to typical input values. On 64-bit architectures, you can use memory-mapping techniques to map a read-only file, potentially much larger than available RAM, and let the kernel worry about memory pressure and file I/O. (It's a viable option when the amount of numbers to test is truly huge.)

A mixed bitmask - deterministic primality test approach is probably the most robust approach. You can have one or more bit maps covering typical values -- I'd use one map ranging from zero to some limit --, and use a deterministic primality test for the rest.

You might find how e.g. Mathematica PrimeQ primality test actually works interesting. (It uses Miller-Rabin a=2,3, and Lucas pseudoprime test, and is probabilistic, not deterministic.)

**Andreea** · 11-20-2012

Originally Posted by Nominal Animal

I'm sorry, I don't follow.

Consider n = 2047, so s = 1, d = 1023, 2^s·d = n - 1 = 2046. n = 2047 is a strong pseudoprime to base 2, since a^d mod n = 2¹⁰²³ mod 2047 = 1, but n is not a prime: 2047 = 23·89.

Perhaps pointing to the pseudocode shown in the Wikipedia article would be a good compromise between theory and implementation?
Strong Pseudoprime -- from Wolfram MathWorld

Correct. Point taken

.
However there are no pseudo-primes for all the bases.
I messed it up with only 2 as a base, but the idea was that for numbers below 1 million is enough to test first 2 prime bases: 2 and 3, not only the base 2 as I suggested wrongly. The number 3215031751 the smallest number for which the Rabin -Miller on bases less than or equal to the 4th prime fails.
Strong Pseudoprime -- from Wolfram MathWorld
For the implementation I thought of something like this: http://en.literateprograms.org/index...(C)&oldid=6349

It is more complicated.

Thread: Efficient large prime checker

Thread Tools

Search Thread

Display

Efficient large prime checker

Similar Threads

more efficient prime function??

looking for efficient algoritm to check wetter an integer is prime or not

Idea for a fast and efficient prime sieve

Checking very large prime numbers

Read/Write large arrays from/to a file (efficient plz)