Thread: Complexity Problem

  1. #1
    Registered User
    Join Date
    Sep 2001
    Posts
    752

    Complexity Problem

    I've been wanting to ask this somehow for a while, but had trouble phrasing it as a non-math question. What follows admittedly isn't really a programming problem. It is proving something very important about Turing machines and complexity.

    Presumably a lot of you are in college. It's expected that at some point, you have studied complexity and "Big O" notation. The following is pretty basic, but it is a very fundamental truth which is needed for many proofs.

    In the following code, what is the average complexity of increment(char * binary_string) with respect to the string's length?

    Code:
    #include <stdio.h>
    #include <assert.h>
    
    /** 
     * Advance binary_string to the 'next' value.
     *
     * binary_string An arbitrary sting of '0' and '1'.
     * binary_string is N characters long.
     * binary_string is not all '1's (no overflow)
     *
     * What is the expected number of comparisons of this function?
     */
    void increment (char * binary_string) {
       char * curr_char;
       // Scan until the first '0', flipping off all the '1' bits.
       for (curr_char = binary_string; *curr_char == '1'; ++curr_char)
          *curr_char = '0';
    
       // Flip the 0 bit and return.
       assert (*curr_char == '0');
       *curr_char = '1';
    
       return;
    }
    
    int main (void) {
       char foo[5];
      
       for (strcpy (foo, "0000"); strcmp (foo, "1111"); increment(foo))
          printf ("%s\n", foo);
    
       printf ("%s\n", foo);
    
       return 0;
    }
    Last edited by QuestionC; 03-23-2007 at 02:38 PM. Reason: Clarified the problem, expected complexity, not worse-case.
    Callou collei we'll code the way
    Of prime numbers and pings!

  2. #2
    C++ Developer XSquared's Avatar
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    2,718
    I believe (it's been a while since I dealt with big O) that best case would be O(1) in the situation where the first character is '0', and worst case would be O(n).
    Naturally I didn't feel inspired enough to read all the links for you, since I already slaved away for long hours under a blistering sun pressing the search button after typing four whole words! - Quzah

    You. Fetch me my copy of the Wall Street Journal. You two, fight to the death - Stewie

  3. #3
    Registered User
    Join Date
    Sep 2001
    Posts
    752
    I should point out that this problem is analogous to:
    Prove that ++x is O(1)
    And a proof of that easily translates to the solution, and vice-versa.
    Callou collei we'll code the way
    Of prime numbers and pings!

  4. #4
    Crazy Fool Perspective's Avatar
    Join Date
    Jan 2003
    Location
    Canada
    Posts
    2,640
    The complexity of increment is O(n), where n is the strings length.

  5. #5
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,412
    hmm... the probability of ending the loop on the first iteration is 2^(N-1)/(2^N-1), which is half as N tends to infinity. I am not too clear on how to determine average case complexity just yet, but it seems to me that this is good enough to conclude that the loop ends on the first iteration on average, so the average case complexity of increment() is O(1).

    EDIT:
    Still just guessing here, but perhaps another way of looking at it is to consider that the probability of ending the loop by the Tth iteration is at least the sum of 1/2^t from t=1 to t=T. It is very likely that the loop will terminate after half a dozen iterations, regardless of the length of the string.
    Last edited by laserlight; 03-23-2007 at 03:23 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #6
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    It is a very interesting function indeed, doing all of its counting backwards. :s

  7. #7
    Registered User
    Join Date
    Sep 2001
    Posts
    752
    Quote Originally Posted by laserlight
    hmm... the probability of ending the loop on the first iteration is 2^(N-1)/(2^N-1), which is half as N tends to infinity. I am not too clear on how to determine average case complexity just yet, but it seems to me that this is good enough to conclude that the loop ends on the first iteration on average, so the average case complexity of increment() is O(1).

    EDIT:
    Still just guessing here, but perhaps another way of looking at it is to consider that the probability of ending the loop by the Tth iteration is at least the sum of 1/2^t from t=1 to t=T. It is very likely that the loop will terminate after half a dozen iterations, regardless of the length of the string.
    This is the right track, but it does merit a bit more mathematical rigor. For example, in the following example, 75% of the values will terminate in 1-3 comparisons, but the function itself is not average-case O(1).

    Code:
    // Returns 0 if odd, 1 if even.
    int recurse_madness (unsigned long n) {
       if (n & 1)
          return 0;
       else
       {
          for (; recurse_madness(n >>= 1); );
          return 1;
       }
    }
    Basically, what I'm trying to say, is the cost of rare cases is important.



    Quote Originally Posted by citizen
    It is a very interesting function indeed, doing all of its counting backwards. :s
    That is just a matter of keeping the code short. Big-endian counting is hard.

    The reason this problem came up is because a lot of O(N) algorithms rely on counting. For example, Radix sort. These algorithms all tend to, at some point, have the following or its equivalent:
    Code:
    for (i = 0; i < N; ++i);
    If it turns out that ++i is a log(N) operation, then that means all sorts, including bucket sorts, are O(NlogN)
    Last edited by QuestionC; 03-23-2007 at 04:49 PM.
    Callou collei we'll code the way
    Of prime numbers and pings!

  8. #8
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    Code:
    // Scan until the first '0', flipping off all the '1' bits.
    My first impression reading that was: "Give those '1' bits t3h finger!" Now I know why we have the terms of "set" and "reset"...

    Quote Originally Posted by Perspective
    The complexity of increment is O(n), where n is the strings length.
    That's what I thought at first, until:
    Quote Originally Posted by citizen
    It is a very interesting function indeed, doing all of its counting backwards. :s
    I'm not to college yet, so I no only the basics of BigO/complexity. I did try feeding it random strings to see how long it took, but so far I've fed it up to 10MB strings, and the times are negligible. (Randomization takes up the bulk of the time...) I either need more accurate timers or a slower processor. (as if 450MHz weren't already...)

    Edit: Why & has lower precedence than == is beyond me... took me a good twenty minutes to find that bug...
    Last edited by Cactus_Hugger; 03-23-2007 at 05:22 PM.
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

  9. #9
    aoeuhtns
    Join Date
    Jul 2005
    Posts
    581
    Let T(n) denote the number of digits we have to look at, on average, for a bitstring of length n. There's an O(1) cost for each digit we look at; show that yourself. Then you get the recurrence,

    T(1) = 1
    T(n) <= (1 + (1 + T(n - 1))) / 2. We have an inequality since the number of allowable cases where the first digit is one is one smaller than the number where the first digit is zero, and since it can be shown inductively that each T(n) >= 1, that inequality is less-than-or-equal-to. Since the recurrence relationship is monotonic, we have T(n) < U(n) where
    U(1) = 1
    U(n) = (2 + U(n-1)) / 2.

    It follows that U(n) = 2 - (1/2)^(n-1).

    Thus T(n) <= 2 - (1/2)^(n-1) < 2, for all n. So the average number of digits we look at is O(20908098).
    There are 10 types of people in this world, those who cringed when reading the beginning of this sentence and those who salivated to how superior they are for understanding something as simple as binary.

  10. #10
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    What are the input constraints? No input constraints means that '1' and '0' is equally likely for every single character. Then the likelyhood of the first character being 0 is 50%. Stopping at the second is 25%, and so on. Continued to infinity, this goes near 0, but that's only a limes. (What's the English word for that?)
    This makes the worst-case complexity O(N). Fine. The average-case complexity may be O(1). Also fine.
    But when calculating the worst-case complexity of an algorithm, you have to consider the worst-case complexities of all components, not their average complexities. (Unless you can prove that worst-case for one component requires an input sequence that makes worst-case for another component impossible.)

    The reason why sorting algorithms still can use their complexities and not have to do something stupid like assume O(N) for every increment is that the N is a different one, and that counting isn't done with unlimited integers. For sorting N numbers, incrementing the count may be O(M) worst case, but that M is still just the number of bits in the count (log2(N)), not the number of elements. So do algorithms have to add log(N) to their complexity calculation? No, because you have to make a difference between theory and practice.
    In theory, N can be arbitrary. However, in theory, increment is an atomic operation and thus O(1).
    In practice, N is bounded by addressable memory, and thus M is bounded by the number of bits in a size_t, be that 32 or 64.
    So increment, even assuming that the CPU can't increment a number atomically (which it can, I think), is O(32) or O(64), which is equivalent to O(1).

    In other words, worst-case for incrementing is O(1), and thus obviously average-case is O(1) too.




    Another fun fact: the increment() function increments the pointer. Does that make it O(N&#178 worst-case?
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  11. #11
    Registered User
    Join Date
    Sep 2001
    Posts
    752
    Quote Originally Posted by Rashakil Fol
    Let T(n) denote the number of digits we have to look at, on average, for a bitstring of length n. There's an O(1) cost for each digit we look at; show that yourself. Then you get the recurrence,
    Your proof is correct. I would have appreciated it if you were a bit more step-by-step with it, since your way of solving it was different from mine.



    Quote Originally Posted by CornedBee
    But when calculating the worst-case complexity of an algorithm, you have to consider the worst-case complexities of all components, not their average complexities. (Unless you can prove that worst-case for one component requires an input sequence that makes worst-case for another component impossible.)

    The reason why sorting algorithms still can use their complexities and not have to do something stupid like assume O(N) for every increment is that the N is a different one, and that counting isn't done with unlimited integers. For sorting N numbers, incrementing the count may be O(M) worst case, but that M is still just the number of bits in the count (log2(N)), not the number of elements. So do algorithms have to add log(N) to their complexity calculation? No, because you have to make a difference between theory and practice.
    In theory, N can be arbitrary. However, in theory, increment is an atomic operation and thus O(1).
    In practice, N is bounded by addressable memory, and thus M is bounded by the number of bits in a size_t, be that 32 or 64.
    So increment, even assuming that the CPU can't increment a number atomically (which it can, I think), is O(32) or O(64), which is equivalent to O(1).
    I take issue with this. Let me address your points:

    The input constraints are given... it's an arbitrarily long string of 0s and 1s. It's purposefully general, since the math behind it is equally so.. it's just the increment operator.

    Worst case analysis is a bit more sophisticated than just assuming worst case for all components and multiplying loops. For example, many people here have shown that the following code worst-case is O(logX):
    Code:
    ++x;
    However, it can also be shown that the following code is worst-case O(N). Doing so basically requires a proof of the average case though:
    Code:
    for (i = 0; i < N; ++i)
       ++x;
    As I mentioned above, worst case analysis is more complicated than multiplying when you see a for loop, and the loop of the pointer increment is not a problem. I was a little worried about the pointer movement complicating the problem mathematically, but when you think of it as just incrementing a bit string on a Turing machine (which is what this problem is supposed to represent), the increment operation is just moving the head 1 bit, which is atomic.

    Finally, and this is the important point:
    You can't discard theory. The words 'in practice' are a trap.
    Yes, in practice, most (not all!) integers are of a predetermined finite maximum length, and therefore in practice, all increments are O(1).
    On the same note, in practice, all arrays are of a predetermined finite maximum length, and therefore in practice, all quicksorts on arrays are O(1). It's just a huge constant.
    Taking into account the fact that input is finite basically makes everything O(1). It's not a meaningful property.

    I don't mind the fact that people assume the increment operation is constant time. Everyone assumes QSort is average case O(NlogN) time, it's hard to prove, it seems fair. Everyone assumes statistics just 'works', because the math behind it is a beast. What is a problem is making that assumption without realizing that an assumption is being made. In any mathematical field (such as CS), that is a death kiss. For example, the halting problem was born from the understanding that our intuitive beliefs about computability were unfounded.

    I am very certain that if 'in theory', sorting algorithms had to take into account a different worst-case possibilty of increment, then they would. Furthermore, if we lived in a universe where the mathematical rules about incrementing worked that way, then computers would necessarily operate in a fundamentally different manner. The reason sorting algorithms don't add 'log(n)' to their results is because it is theoretically sound not to. Not because they don't get to 'in practice'.
    Callou collei we'll code the way
    Of prime numbers and pings!

  12. #12
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,412
    Your proof is correct. I would have appreciated it if you were a bit more step-by-step with it, since your way of solving it was different from mine.
    What was your way of solving it? I would appreciate explanation as well (for Rashakil Fol's proof as well, hopefully). I happened to get the lecturer that places least emphasis on complexity analysis for my data structures course this year, so what he has explained, I already knew, and what he glossed over (including how to perform average case analysis) I still do not know!
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  13. #13
    Registered User
    Join Date
    Sep 2001
    Posts
    752
    It was late last night, so I didn't post a solution. Here is a solution to the increment problem. I was forced between having the proof be simple but technically incorrect, or verbose and correct (to the best of my knowledge). So, it's overly verbose, and I'm sorry for that.

    The spirit of the proof is just 2 steps:
    1. Express the expected complexity as a sort-of-geometric series.
    2. Show that series is bounded by a constant.





    Let me also take a moment to mention that OpenOffice.org is a pretty amazing program for communicating this kind of stuff.
    Callou collei we'll code the way
    Of prime numbers and pings!

  14. #14
    Crazy Fool Perspective's Avatar
    Join Date
    Jan 2003
    Location
    Canada
    Posts
    2,640
    So the preconditions are quite stated right then,

    >* binary_string An arbitrary sting of '0' and '1'.

    should be

    * binary_string A random string of '0's and '1's drawn from a uniform distribution .


    otherwise you can't assume that the probability of seeing a '1' or '0' is 0.5.

  15. #15
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by CornedBee
    What are the input constraints? No input constraints means that '1' and '0' is equally likely for every single character.
    I don't see why that assumption is justified.

    Then the likelyhood of the first character being 0 is 50%. Stopping at the second is 25%, and so on. Continued to infinity, this goes near 0, but that's only a limes. (What's the English word for that?)
    The technical term is that it "almost surely does not occur." See this Wiki article, which includes a very similar example:

    http://en.wikipedia.org/wiki/Almost_surely

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Sorting (asymptotic) complexity problem
    By Petike in forum C++ Programming
    Replies: 8
    Last Post: 01-20-2009, 07:15 AM
  2. Memory problem with Borland C 3.1
    By AZ1699 in forum C Programming
    Replies: 16
    Last Post: 11-16-2007, 11:22 AM
  3. Someone having same problem with Code Block?
    By ofayto in forum C++ Programming
    Replies: 1
    Last Post: 07-12-2007, 08:38 AM
  4. A question related to strcmp
    By meili100 in forum C++ Programming
    Replies: 6
    Last Post: 07-07-2007, 02:51 PM
  5. WS_POPUP, continuation of old problem
    By blurrymadness in forum Windows Programming
    Replies: 1
    Last Post: 04-20-2007, 06:54 PM