# Complexity Problem

• 03-23-2007
QuestionC
Complexity Problem
I've been wanting to ask this somehow for a while, but had trouble phrasing it as a non-math question. What follows admittedly isn't really a programming problem. It is proving something very important about Turing machines and complexity.

Presumably a lot of you are in college. It's expected that at some point, you have studied complexity and "Big O" notation. The following is pretty basic, but it is a very fundamental truth which is needed for many proofs.

In the following code, what is the average complexity of increment(char * binary_string) with respect to the string's length?

Code:

```#include <stdio.h> #include <assert.h> /**  * Advance binary_string to the 'next' value.  *  * binary_string An arbitrary sting of '0' and '1'.  * binary_string is N characters long.  * binary_string is not all '1's (no overflow)  *  * What is the expected number of comparisons of this function?  */ void increment (char * binary_string) {   char * curr_char;   // Scan until the first '0', flipping off all the '1' bits.   for (curr_char = binary_string; *curr_char == '1'; ++curr_char)       *curr_char = '0';   // Flip the 0 bit and return.   assert (*curr_char == '0');   *curr_char = '1';   return; } int main (void) {   char foo[5];     for (strcpy (foo, "0000"); strcmp (foo, "1111"); increment(foo))       printf ("%s\n", foo);   printf ("%s\n", foo);   return 0; }```
• 03-23-2007
XSquared
I believe (it's been a while since I dealt with big O) that best case would be O(1) in the situation where the first character is '0', and worst case would be O(n).
• 03-23-2007
QuestionC
I should point out that this problem is analogous to:
Prove that ++x is O(1)
And a proof of that easily translates to the solution, and vice-versa.
• 03-23-2007
Perspective
The complexity of increment is O(n), where n is the strings length.
• 03-23-2007
laserlight
hmm... the probability of ending the loop on the first iteration is 2^(N-1)/(2^N-1), which is half as N tends to infinity. I am not too clear on how to determine average case complexity just yet, but it seems to me that this is good enough to conclude that the loop ends on the first iteration on average, so the average case complexity of increment() is O(1).

EDIT:
Still just guessing here, but perhaps another way of looking at it is to consider that the probability of ending the loop by the Tth iteration is at least the sum of 1/2^t from t=1 to t=T. It is very likely that the loop will terminate after half a dozen iterations, regardless of the length of the string.
• 03-23-2007
whiteflags
It is a very interesting function indeed, doing all of its counting backwards. :s
• 03-23-2007
QuestionC
Quote:

Originally Posted by laserlight
hmm... the probability of ending the loop on the first iteration is 2^(N-1)/(2^N-1), which is half as N tends to infinity. I am not too clear on how to determine average case complexity just yet, but it seems to me that this is good enough to conclude that the loop ends on the first iteration on average, so the average case complexity of increment() is O(1).

EDIT:
Still just guessing here, but perhaps another way of looking at it is to consider that the probability of ending the loop by the Tth iteration is at least the sum of 1/2^t from t=1 to t=T. It is very likely that the loop will terminate after half a dozen iterations, regardless of the length of the string.

This is the right track, but it does merit a bit more mathematical rigor. For example, in the following example, 75% of the values will terminate in 1-3 comparisons, but the function itself is not average-case O(1).

Code:

```// Returns 0 if odd, 1 if even. int recurse_madness (unsigned long n) {   if (n & 1)       return 0;   else   {       for (; recurse_madness(n >>= 1); );       return 1;   } }```
Basically, what I'm trying to say, is the cost of rare cases is important.

Quote:

Originally Posted by citizen
It is a very interesting function indeed, doing all of its counting backwards. :s

That is just a matter of keeping the code short. Big-endian counting is hard.

The reason this problem came up is because a lot of O(N) algorithms rely on counting. For example, Radix sort. These algorithms all tend to, at some point, have the following or its equivalent:
Code:

`for (i = 0; i < N; ++i);`
If it turns out that ++i is a log(N) operation, then that means all sorts, including bucket sorts, are O(NlogN)
• 03-23-2007
Cactus_Hugger
Code:

`// Scan until the first '0', flipping off all the '1' bits.`
My first impression reading that was: "Give those '1' bits t3h finger!" Now I know why we have the terms of "set" and "reset"...

Quote:

Originally Posted by Perspective
The complexity of increment is O(n), where n is the strings length.

That's what I thought at first, until:
Quote:

Originally Posted by citizen
It is a very interesting function indeed, doing all of its counting backwards. :s

I'm not to college yet, so I no only the basics of BigO/complexity. I did try feeding it random strings to see how long it took, but so far I've fed it up to 10MB strings, and the times are negligible. (Randomization takes up the bulk of the time...) I either need more accurate timers or a slower processor. (as if 450MHz weren't already...)

Edit: Why & has lower precedence than == is beyond me... took me a good twenty minutes to find that bug...
• 03-23-2007
Rashakil Fol
Let T(n) denote the number of digits we have to look at, on average, for a bitstring of length n. There's an O(1) cost for each digit we look at; show that yourself. Then you get the recurrence,

T(1) = 1
T(n) <= (1 + (1 + T(n - 1))) / 2. We have an inequality since the number of allowable cases where the first digit is one is one smaller than the number where the first digit is zero, and since it can be shown inductively that each T(n) >= 1, that inequality is less-than-or-equal-to. Since the recurrence relationship is monotonic, we have T(n) < U(n) where
U(1) = 1
U(n) = (2 + U(n-1)) / 2.

It follows that U(n) = 2 - (1/2)^(n-1).

Thus T(n) <= 2 - (1/2)^(n-1) < 2, for all n. So the average number of digits we look at is O(20908098).
• 03-23-2007
CornedBee
What are the input constraints? No input constraints means that '1' and '0' is equally likely for every single character. Then the likelyhood of the first character being 0 is 50%. Stopping at the second is 25%, and so on. Continued to infinity, this goes near 0, but that's only a limes. (What's the English word for that?)
This makes the worst-case complexity O(N). Fine. The average-case complexity may be O(1). Also fine.
But when calculating the worst-case complexity of an algorithm, you have to consider the worst-case complexities of all components, not their average complexities. (Unless you can prove that worst-case for one component requires an input sequence that makes worst-case for another component impossible.)

The reason why sorting algorithms still can use their complexities and not have to do something stupid like assume O(N) for every increment is that the N is a different one, and that counting isn't done with unlimited integers. For sorting N numbers, incrementing the count may be O(M) worst case, but that M is still just the number of bits in the count (log2(N)), not the number of elements. So do algorithms have to add log(N) to their complexity calculation? No, because you have to make a difference between theory and practice.
In theory, N can be arbitrary. However, in theory, increment is an atomic operation and thus O(1).
In practice, N is bounded by addressable memory, and thus M is bounded by the number of bits in a size_t, be that 32 or 64.
So increment, even assuming that the CPU can't increment a number atomically (which it can, I think), is O(32) or O(64), which is equivalent to O(1).

In other words, worst-case for incrementing is O(1), and thus obviously average-case is O(1) too.

Another fun fact: the increment() function increments the pointer. Does that make it O(N&#178;) worst-case?
• 03-24-2007
QuestionC
Quote:

Originally Posted by Rashakil Fol
Let T(n) denote the number of digits we have to look at, on average, for a bitstring of length n. There's an O(1) cost for each digit we look at; show that yourself. Then you get the recurrence,

Your proof is correct. I would have appreciated it if you were a bit more step-by-step with it, since your way of solving it was different from mine.

Quote:

Originally Posted by CornedBee
But when calculating the worst-case complexity of an algorithm, you have to consider the worst-case complexities of all components, not their average complexities. (Unless you can prove that worst-case for one component requires an input sequence that makes worst-case for another component impossible.)

The reason why sorting algorithms still can use their complexities and not have to do something stupid like assume O(N) for every increment is that the N is a different one, and that counting isn't done with unlimited integers. For sorting N numbers, incrementing the count may be O(M) worst case, but that M is still just the number of bits in the count (log2(N)), not the number of elements. So do algorithms have to add log(N) to their complexity calculation? No, because you have to make a difference between theory and practice.
In theory, N can be arbitrary. However, in theory, increment is an atomic operation and thus O(1).
In practice, N is bounded by addressable memory, and thus M is bounded by the number of bits in a size_t, be that 32 or 64.
So increment, even assuming that the CPU can't increment a number atomically (which it can, I think), is O(32) or O(64), which is equivalent to O(1).

The input constraints are given... it's an arbitrarily long string of 0s and 1s. It's purposefully general, since the math behind it is equally so.. it's just the increment operator.

Worst case analysis is a bit more sophisticated than just assuming worst case for all components and multiplying loops. For example, many people here have shown that the following code worst-case is O(logX):
Code:

`++x;`
However, it can also be shown that the following code is worst-case O(N). Doing so basically requires a proof of the average case though:
Code:

```for (i = 0; i < N; ++i)   ++x;```
As I mentioned above, worst case analysis is more complicated than multiplying when you see a for loop, and the loop of the pointer increment is not a problem. I was a little worried about the pointer movement complicating the problem mathematically, but when you think of it as just incrementing a bit string on a Turing machine (which is what this problem is supposed to represent), the increment operation is just moving the head 1 bit, which is atomic.

Finally, and this is the important point:
You can't discard theory. The words 'in practice' are a trap.
Yes, in practice, most (not all!) integers are of a predetermined finite maximum length, and therefore in practice, all increments are O(1).
On the same note, in practice, all arrays are of a predetermined finite maximum length, and therefore in practice, all quicksorts on arrays are O(1). It's just a huge constant.
Taking into account the fact that input is finite basically makes everything O(1). It's not a meaningful property.

I don't mind the fact that people assume the increment operation is constant time. Everyone assumes QSort is average case O(NlogN) time, it's hard to prove, it seems fair. Everyone assumes statistics just 'works', because the math behind it is a beast. What is a problem is making that assumption without realizing that an assumption is being made. In any mathematical field (such as CS), that is a death kiss. For example, the halting problem was born from the understanding that our intuitive beliefs about computability were unfounded.

I am very certain that if 'in theory', sorting algorithms had to take into account a different worst-case possibilty of increment, then they would. Furthermore, if we lived in a universe where the mathematical rules about incrementing worked that way, then computers would necessarily operate in a fundamentally different manner. The reason sorting algorithms don't add 'log(n)' to their results is because it is theoretically sound not to. Not because they don't get to 'in practice'.
• 03-25-2007
laserlight
Quote:

Your proof is correct. I would have appreciated it if you were a bit more step-by-step with it, since your way of solving it was different from mine.
What was your way of solving it? I would appreciate explanation as well (for Rashakil Fol's proof as well, hopefully). I happened to get the lecturer that places least emphasis on complexity analysis for my data structures course this year, so what he has explained, I already knew, and what he glossed over (including how to perform average case analysis) I still do not know!
• 03-25-2007
QuestionC
It was late last night, so I didn't post a solution. Here is a solution to the increment problem. I was forced between having the proof be simple but technically incorrect, or verbose and correct (to the best of my knowledge). So, it's overly verbose, and I'm sorry for that.

The spirit of the proof is just 2 steps:
1. Express the expected complexity as a sort-of-geometric series.
2. Show that series is bounded by a constant.

Let me also take a moment to mention that OpenOffice.org is a pretty amazing program for communicating this kind of stuff.
• 03-25-2007
Perspective
So the preconditions are quite stated right then,

>* binary_string An arbitrary sting of '0' and '1'.

should be

* binary_string A random string of '0's and '1's drawn from a uniform distribution .

otherwise you can't assume that the probability of seeing a '1' or '0' is 0.5.
• 03-26-2007
brewbuck
Quote:

Originally Posted by CornedBee
What are the input constraints? No input constraints means that '1' and '0' is equally likely for every single character.

I don't see why that assumption is justified.

Quote:

Then the likelyhood of the first character being 0 is 50%. Stopping at the second is 25%, and so on. Continued to infinity, this goes near 0, but that's only a limes. (What's the English word for that?)
The technical term is that it "almost surely does not occur." See this Wiki article, which includes a very similar example:

http://en.wikipedia.org/wiki/Almost_surely