Thread: Efficient large prime checker

  1. #1
    Registered User
    Join Date
    Sep 2012
    Posts
    9

    Efficient large prime checker

    I need check large integers (up to 10 million) for primality. My usual way is as follows:

    Code:
    int prime[MAX] = {0}, i, j;
    
    for(i=2;i<MAX;i++)
       prime[i] = 1;
    
    for(i=2;i<MAX;i++)
       for(j=i+i;j<MAX;j+=i)
          prime[j] = 0;
    But I get "Segmentation Fault: 11" when I declare MAX to be 10 million. I don't understand the theory, but I presume this error is related to array capacity in C. So is there any way to increase the capacity? If not, what other methods can I use?

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    This is stored on the stack.
    Code:
    int main ( ) {
        int prime[MAX] = {0}, i, j;
        //// more code
        return 0;
    }

    This isn't stored on the stack.
    Code:
        static int prime[MAX] = {0}, i, j;
        //// more code
        return 0;
    It's not a C issue, but most implementations restrict the amount of stack space given to each process.
    For most desktop systems, the default stack size is somewhere between 1 and 8MB.

    > So is there any way to increase the capacity? If not, what other methods can I use?
    Using an int to store only 0 or 1 is very space inefficient.
    You could try an array of unsigned chars instead.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by Salem View Post
    Using an int to store only 0 or 1 is very space inefficient.
    You could try an array of unsigned chars instead.
    And, if you really want, you can use the individual bits in the unsigned chars - and represent eight value per char. That way, MAX doesn't need to exceed 1.25 million. That is easily workable for any 32-bit system (assuming you resolve quotas and other concerns).
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  4. #4
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    First, add the limits.h header file to a program:

    Code:
    #include <limits.h>
    #include <stdio.h>
    
    int main(void) {
    
       printf("Note: These are your highest values:\n\t\t         for an unsigned long int: %lu\n   \
       and for an unsigned long long int: %llu \n",ULONG_MAX,ULLONG_MAX); 
    
       return 0;
    }
    Because testing for primality with numbers < 10 million, is considered working with lower primes.

    For working with lower prime testing, you MIGHT be able to work with the Sieve of Eratosthenes - without even bit packing any numbers. It's very easy and very fast - nothing faster for smaller prime checking, actually.

    There is also a technique called "Wheel Factorization" which can also be used to eliminate 80-99% of the non-prime numbers. That would really cut the problem down to size.

    Run the above, and report back what it says. Then we'll be able to know what we're talking about for your system.

  5. #5
    Registered User
    Join Date
    Sep 2012
    Posts
    9
    I thought my method above is the Sieve of Eratosthenes? Here's what I got.

    Code:
    Note: These are your highest values:                 for an unsigned long int: 18446744073709551615
                  and for an unsigned long long int: 18446744073709551615
    Btw, I just implemented grumpy's idea. It turns out it can handle up to around 60 million. The bad news is, I just realized that I need much more than that. Probably I'll need to check up to 200 million or so.

  6. #6
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    No need for him to report back. The standard actually gives a pretty good idea of what is expected from that code, Adak.

    The standard guarantees what ULONG_MAX is not smaller than 4294967295, which is easily sufficient to represent values up to 10 million. Bigger values than that will mean a 64 bit system. ULLONG_MAX might not be supported on some older compilers though (since support of long long types only became standard in 1999).

    SIZE_MAX (the upper limit of a size_t, and therefore of the length of an array) is only guaranteed to exceed 65535. Although, on 32 bit systems, it is a fair bet that SIZE_MAX will be roughly equal to that lower limit on ULONG_MAX. On a 16 bit system it probably won't.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  7. #7
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by johan.g1 View Post
    I thought my method above is the Sieve of Eratosthenes? Here's what I got.

    Code:
    Note: These are your highest values:                 for an unsigned long int: 18446744073709551615
                  and for an unsigned long long int: 18446744073709551615
    Btw, I just implemented grumpy's idea. It turns out it can handle up to around 60 million. The bad news is, I just realized that I need much more than that. Probably I'll need to check up to 200 million or so.
    Your code is Sieve of Eratosthenes, it's just different than mine, and I only glanced at it. Does it work OK?

    Check your SIZE_MAX, and let's get a max size for your arrays - that's a HUGE size for unsigned long int's, btw - I'm shocked it's the same size as the unsigned long long int!

  8. #8
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I didn't have a SIZE_MAX, but I did find out my 64 bit Pelles C handles int arrays of 200 Million in global space! On this compiler and system (Windows 7 64 bit), size_t is an unsigned long.

    Since Johan has a 64 bit system and compiler as well, we may be easily able to handle this in a single large array. I quite like this BIG array capability!

    Edit: made it to 500 Million!
    Last edited by Adak; 11-18-2012 at 06:15 AM.

  9. #9
    Registered User
    Join Date
    Sep 2012
    Posts
    9
    Sorry, but I don't understand what you guys are talking about. (I'm almost a complete beginner in programming.) How exactly did you make it to 500 million?

  10. #10
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by johan.g1 View Post
    Sorry, but I don't understand what you guys are talking about. (I'm almost a complete beginner in programming.) How exactly did you make it to 500 million?
    We do like to prattle on, (OK, I like to prattle on!)

    Code:
    #include <stdio.h>
    #include <limits.h>
    
    #define MAX 500000000
    
    int A[MAX];
    
    int main(void) {
       unsigned long j;
       int count=0,k=1;
       printf("%lu is the largest array possible\n",ULONG_MAX);
       for(j=0;j<200000000;j++,k++) {
          if(k==1000000) {
             printf("k reached a million %d times\n",++count);
             k=1;
          }
       }
    
     
       return 0;
    }
    The ULONG_MAX value is larger, but at array[600 Million], I get the error that the array is larger than ULONG MAX, in bytes

    See how your system behaves with the above program.

    If you can make a 200 Million array of int's, then I'd suggest using that data type.

  11. #11
    Registered User
    Join Date
    Nov 2012
    Posts
    17
    As a matter of fact for large prime numbers it is customary to use a primality test. The Miller-Rabin primality test (see wikipedia) is used by RSA. It is a probabilistic algorithm for determining whether a number is prime or composite.
    A standard implementation is using as already suggested bit masks.
    If a number fails Miller's primality test for some base a it is not a prime number. If the number passes, it may be a prime. A composite number passing Miller-Rabin's test is called a strong pseudoprime to base a. However it should work for you since the smallest number that is strong pseudoprime to base 2 and would be hidden by the test is 3215031751.
    I do not know if we are allowed to suggest where you can find an implementation or not.

  12. #12
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by Andreea View Post
    However it should work for you since the smallest number that is strong pseudoprime to base 2 and would be hidden by the test is 3215031751.
    I'm sorry, I don't follow.

    Consider n = 2047, so s = 1, d = 1023, 2s·d = n - 1 = 2046. n = 2047 is a strong pseudoprime to base 2, since ad mod n = 21023 mod 2047 = 1, but n is not a prime: 2047 = 23·89.

    Quote Originally Posted by Andreea View Post
    suggest where you can find an implementation or not.
    Perhaps pointing to the pseudocode shown in the Wikipedia article would be a good compromise between theory and implementation?

    The Wikipedia article mentions that for integers n < 4759123141 (covering all 32-bit unsigned integers) one needs only test a = 2, 7, 61; for n < 2152302898747, a = 2, 3, 5, 7, and 11; for n < 3474749660383, a = 2, 3, 5, 7, 11, and 13; and for n < 341550071728321, a = 2, 3, 5, 7, 11, 13, and 17. If any of the tests report the number to be composite, then it is composite; otherwise it is prime.

    For any possible 64-bit unsigned integer (n < 18446744073709551616), you need to test at most the first 12 primes (a = 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37).

    The article states that "write (n−1) as 2s·d with d odd", which may sound complicated, but is actually trivial. s is the number of zero bits in the least significant positions in (n-1), and d = (n-1) >> s. Or, in C:
    Code:
    unsigned long  d = n - 1UL;
    unsigned long  s = 0UL;
    
    while (!(d & 1UL)) {
        s++;
        d >>= 1UL;
    }
    For an efficient implementation, consider how you compute x = ad mod n. See the Wikipedia article on modular exponentiation, especially the right-to-left binary method.

  13. #13
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    He has a 64 bit system. He has a Sieve program already, now if he can create a single array large enough to handle 200 Million ints, then he's got everything he needs.

    If he wants to test for primality in the medium and high range of numbers, then he'll definitely need primality testers, instead of arrays.

  14. #14
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by Adak View Post
    He has a Sieve program already
    The OP didn't actually specify whether he needs to find all primes, or just to check whether some numbers (between one and some limit in the millions) are primes or not.

    If you have just a few hundred or thousand input values, with the values ranging from zero to hundreds of millions, then a deterministic primality check might be more efficient.

    If you have lots of input, then it is likely more efficient to precompute a bit array corresponding to typical input values. On 64-bit architectures, you can use memory-mapping techniques to map a read-only file, potentially much larger than available RAM, and let the kernel worry about memory pressure and file I/O. (It's a viable option when the amount of numbers to test is truly huge.)

    A mixed bitmask - deterministic primality test approach is probably the most robust approach. You can have one or more bit maps covering typical values -- I'd use one map ranging from zero to some limit --, and use a deterministic primality test for the rest.

    You might find how e.g. Mathematica PrimeQ primality test actually works interesting. (It uses Miller-Rabin a=2,3, and Lucas pseudoprime test, and is probabilistic, not deterministic.)

  15. #15
    Registered User
    Join Date
    Nov 2012
    Posts
    17
    Quote Originally Posted by Nominal Animal View Post
    I'm sorry, I don't follow.

    Consider n = 2047, so s = 1, d = 1023, 2s·d = n - 1 = 2046. n = 2047 is a strong pseudoprime to base 2, since ad mod n = 21023 mod 2047 = 1, but n is not a prime: 2047 = 23·89.


    Perhaps pointing to the pseudocode shown in the Wikipedia article would be a good compromise between theory and implementation?
    Strong Pseudoprime -- from Wolfram MathWorld
    Correct. Point taken.
    However there are no pseudo-primes for all the bases.
    I messed it up with only 2 as a base, but the idea was that for numbers below 1 million is enough to test first 2 prime bases: 2 and 3, not only the base 2 as I suggested wrongly. The number 3215031751 the smallest number for which the Rabin -Miller on bases less than or equal to the 4th prime fails.
    Strong Pseudoprime -- from Wolfram MathWorld
    For the implementation I thought of something like this: http://en.literateprograms.org/index...(C)&oldid=6349

    It is more complicated.
    Last edited by Andreea; 11-20-2012 at 12:16 PM. Reason: English...

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. more efficient prime function??
    By codingGuy in forum C Programming
    Replies: 10
    Last Post: 10-12-2011, 05:09 PM
  2. Replies: 16
    Last Post: 06-29-2010, 10:34 AM
  3. Idea for a fast and efficient prime sieve
    By Sebastiani in forum General Discussions
    Replies: 2
    Last Post: 06-21-2010, 06:26 AM
  4. Checking very large prime numbers
    By password in forum C++ Programming
    Replies: 2
    Last Post: 02-11-2008, 12:26 PM
  5. Replies: 4
    Last Post: 01-30-2005, 04:50 AM