1. Originally Posted by pianorain
Of course I have no data, but I'd imagine that the human mind (whether consciously or not) will cling far heavier to the ordered concept of 'be fair' than it would to the chaotic concept of 'be random'
Nah, nah, nah. There's no place here for conjectures. You are cheating with that argument. Please check post #132. I give you there the real challenge. And even at the face of what you say here, you will still not be able to predict or duplicate the conditions. You achieved randomness. Dare to contradict me?

2. @MK27
I think I understand what you mean now, and your method does seem to be more secure.
For example; If you had access to parts of the output stream from a PRNG like rand() and knew the exact implementation you could theroretically deduce what the seed was, and therefore get the rest of the keystream. But with your algorithm it would be impossible to figure out the entire key based on only having a part of it.

3. Originally Posted by Mario F.
It still bothers me, though. How can this be in any way applicable? How does someone intend to prove and properly quantify someone else's tendency for a given distribution when they are typing on a keyboard?
The input is a sequence of bytes. Once you can form numbers, you can apply statistical tests of randomness.

Originally Posted by Mario F.
It may be true if the subject is typing mindlessly at the keyboard without a purpose. I still question how can anyone take advantage of that, in order to come with a practical cryptanalysis algorithm, but whatever.
It would probably be feasible to use frequency analysis, if you know the keyboard in use and have information on the characteristics of sequences generated by people "typing mindlessly at the keyboard without a purpose".

Originally Posted by Mario F.
Now, what if the subject is told to type on the keyboard randomly and to try and keep a fair distribution? How can anyone sustain the result has no practical randomness?
I think that the problem is that it is easier said than done.

Rather unfortunately, it seems that the books I have at hand mainly talk about PRNGs. Schneier does talk about RNGs, so I will quote him here along with two earlier points about PRNGs:
For our purposes, a sequence generator is pseudo-random if it has this property:
1. It looks random. This means that it passes all the statistical tests of randomness that we can find.

For a sequence to be cryptographically secure pseudo-random, it must also have this property:
2. It is unpredictable. It must be computationally infeasible to predict what the next random bit will be, given complete knowledge of the algorithm or hardware generating the sequence and all of the previous bits in the stream.

From our point of view a sequence generator is real random if it has this additional third property:
3. It cannot be reliably reproduced. If you run the sequence generator twice with the exact same input (at least as exact as humanly possible), you will get two completely unrelated random sequences.

The output of a generator satisfying these three properties will be good enough for a one time pad, key generation, and any other cryptographic applications that require a truly random sequence generator. The difficulty is in determining whether a sequence is really random. If I repeatedly encrypt a string with DES and a given key, I will get a nice, random-looking output; you won't be able to tell that it's non-random unless you rent time on the NSA's DES cracker.
This property is puzzling; I am not sure what is the input that Schneier mentions here because I never considered the possibility of input to a RNG. I would have thought that the defining property is something like a notion of unpredictability that is stronger than being computationally infeasible to predict. (I concede that "cannot be reliably reproduced" is also a stronger notion than "computationally infeasible to predict", but it seems stronger than I expected.)

Originally Posted by Mario F.
- (1)Radioctive decay: It is predictable. So, by the requirements being defended on this thread, it doe not constitute a good source for a RNG.
I would not be so bold as to say that it is predictable when many of the experts in the field currently believe otherwise. But since you have a second example, you're still okay

4. Originally Posted by rogster001
mk at thetime i was not referring to your example at all im afraid, i was just saying that in general, keyboard bashing ALONE is not considered random enough, it just so happened that you mentioned it in a PART of your idea, bit lazy of me really not to point that out at the time
Okay.

Originally Posted by pianorain
A random statistical distribution does not say that a single generated binary sequence will consist of exactly 50% zeros and ones. It says that over the long run, the generation of binary sequences will tend towards 50% zeros and ones.
If I had a 1000 bytes, and it turned out there were exactly 4000 1's and 4000 0's, then you would take this as evidence that the data is random? No -- but if it were 47% one and the other, then that would also mean it's not random? Your thinking is flawed here, because you fail to grasp the meaning of your own words (the bold part is correct).

I'm sure you could tweek Geiger input to yield more ones than zeros. Would that make it less random? If so, then the "truest" random setting would be when you tweek the machine to yield exactly the same number of each? No -- but it is a goal to aim for, because, obviously, ending up with all one or the other is undesirable.

The key here is "over the long run, the generation of binary sequences will tend towards 50% zeros and ones". Arriving at 47% based on Statistical Distributions of English Text is an example of that. It means that the larger your sample of text, the more likely you are to end up with 47%. You could have a small sample of text that is exactly 50% -- that does not mean it is a "more random sample". You would expect some subset of a text that is 47% to be 50% -- that reflects the tendency. If every subsample were exactly 50% (or 47%), you DEFINITELY would not have random data, you would have 1010101010.

If you want exactly 50% "in the long run", you could easily take the statistical distribution and alter the algorithm to aim at specific characters rather than just using modulus. It would be simple to achieve this. But why bother? 47% is far enough away from 100 and 0 -- it is evidence that, in general, using text as an input will tend toward 50% as the size of the input increases. It is the tenancy toward 50% and away from 100 (or 0) that is conceptually important. You do not have to arrive at 50%. Indeed, doing that too easily might indicate a lack of randomness. The purpose of this is to avoid things which do the opposite: for example, if you have a sample of data that is 50%, but the more data you add the more it tends away from 50%, your method is not a good one, because you are more likely to get occasional samples that are 100% one way or the other. That is very unlikely with English language text.

You cannot examine a 2 digit sequence for statistical distribution as a randomness criteria, nb.

This is what I meant by the mindless regurgitation of "textbook 101 principles". What I meant by "learning to think for yourself" is that you need to comprehend the proper use of those principles so that you can apply them intelligently, rather than like a piece of spam software.

5. Ah, infinite are the arguments of mages. Menezes, van Oorschot and Vanstone present a somewhat different definition of a RNG, in terms of a random bit generator:
A random bit generator is a device or algorithm which outputs a sequence of statistically independent and unbiased binary digits.
You may also want to read section 2 of that chapter, which is available as a PDF document: Handbook of Applied Cryptography, Chapter 5

Unpredictability is not mentioned in that definition itself, but it should be obvious that it is a requirement, though unfortunately this also means that it is not very well defined, at least in that chapter.

6. Originally Posted by Mario F.
Nah, nah, nah. There's no place here for conjectures. You are cheating with that argument.
So any hypothesis is cheating? That must make it difficult to get anywhere. My conjecture is based on the assumption that humans act within reason. That, of course, is debatable, but it is supported in this field by the fact that we can publish lists of the most popular passwords. If humans were better at being random, this tiny dictionary of 500 plain-text words wouldn't be able to successfully attack 11% of the population.
Originally Posted by Mario F.
Please check post #132. I give you there the real challenge. And even at the face of what you say here, you will still not be able to predict or duplicate the conditions. You achieved randomness. Dare to contradict me?
Not contradict, merely suggest an alternative view.

The ability or inability to duplicate a result is not a requirement of a PRNG. For an example in the computer science field, consider the seed of a PRNG. If you seed a PRNG with the same number every time, you will generate the same numbers every time. This does not mean that the PRNG is broken; it's just being utilized poorly. That an event is repeatable does not factor either way into whether or not a PRNG is suitably random.

The lack of prediction, on the other hand, is a major requirement into determining the suitability of a PRNG. This is where my entire argument lies.

First, an explanation of the "next-bit" test. Straight from Wikipedia:
Given the first k bits of a random sequence, there is no polynomial-time algorithm that can predict the (k+1)th bit with probability of success better than 50%.
Why do I think simple typing on the keyboard fails this test? Because I can tell you the algorithm I think will predict the next bit (with greater than 50% success) given only a single bit.

To illustrate, I took four samples of text from this thread:
01001100010011110010001010110001010010110110001010 10100110010010100111010001010010011001010100110000 10101111000100011100001001010101000101101001111001 00101101001010011101000101110010100010000110101100 10010110011111101100100010101001111001010011010110 10010100101100001000010110000010100000101011010100 01010101010100111100101100001101001001010100101010 11001000010010111000100011001001010101110011000101 11001001100010010111000101010010110010001010110111 01010010101111010011010000101000101010101111100100 010011001010010110001011010
Note that no generator can be proved random or non-random. It can only be tested to suggest randomness. In this case, I'm suggesting that this method of producing random bits may fail a 'next-bit' test because of plain-text statistical distribution problems. Obviously my sample size of flail-text is tiny, but it also suggests distribution problems. Seeing as passing the 'next-bit' test is one of the requirements for a cryptographically secure PRNG, this is important. The only way to know definitively is to run the tests and see what happens.
If you have a key that generates a meaningful message, it would be easy to take that bit pattern and generate a corresponding text for it (you have 13 choices each time -- actually with punctuation and capitals, lets say you have 30 choices each time). Chance are you could do that with a message of any length. Going back to the five letter word, using modulus how many 30 character phrases could you come up with that would produce the "key" to produce a word from 5 scrambled bytes? The same number of phrases as there are possible keys, and most likely, even more than that, since just one such bit sequence could be produced by a whole bunch of possible word combinations.
The problem is that the sets of text are not all equally likely to be chosen by a given user. Recall why brute force can be effective in practice against hashed passwords (where a salt is not used): users tend to choose weak passwords for convenience, and these passwords are thus susceptible to a dictionary attack. Now, in practice these text are more like passphrases rather than passwords (though the distinction is largely just a matter of style, in my opinion), thus brute force is likely to fail anyway. Yet, just as users reuse passwords, they may reuse source text. Training them not to do so is easier said than done; even trained spies blunder.
I then converted the text to binary using mod 2 arithmetic and found the distributions of two-character sequences.
00: 0.233396584440228
01: 0.311195445920304
10: 0.311195445920304
11: 0.142314990512334

00: 0.226691042047532
01: 0.297989031078611
10: 0.297989031078611
11: 0.175502742230347

00: 0.223042836041359
01: 0.280649926144756
10: 0.282127031019202
11: 0.212703101920236

00: 0.242748091603053
01: 0.274809160305344
10: 0.274809160305344
11: 0.206106870229008
I noticed that 01 and 10 are always higher than 00 or 11. (They're also always very close to having the same probability, though I wasn't sure what to do with this.) This suggests a simple algorithm for guessing the next bit with greater than 50% accuracy would be to simply guess the other bit: if you are given a 1, guess a 0 (and vice versa).

I then created four strings of flail-text (each 680 characters in length). Here were the distribution results:
00: 0.222058823529412
01: 0.260294117647059
10: 0.261764705882353
11: 0.254411764705882

00: 0.254411764705882
01: 0.239705882352941
10: 0.239705882352941
11: 0.264705882352941

00: 0.222058823529412
01: 0.282352941176471
10: 0.283823529411765
11: 0.210294117647059

00: 0.263235294117647
01: 0.245588235294118
10: 0.245588235294118
11: 0.244117647058824
This means that if on any given bit, I was given the bit and told to guess the next one, I would have been correct 51.7% of the time if given a 0, and I would have been correct 51.4% of the time if given a 1. In other words, it fails the "next-bit" test.

Obviously, my sample size is small. I understand that I'm not proving anything. I'm merely suggesting that this limited evidence suggests that statistical distribution is a problem for this method.

7. Thanks for the references, laserlight.

I do think however they support my claim. Although I understand there can be a debate around "statistically independent and unbiased binary digits", exactly because as you say this is open for interpretation... or suggestive of a certain unmeasurable property.

One note about the Radioactive decay though. I was under the impression, that given a large number of similar atoms, radioactive decay can theoretically, at least partially, be predicted. This is not to say it is not random. But exactly that it is as random as our lack of ability to predict or duplicate. Why I find Bruce Schneier description more complete.

Another note about RNG inputs. As I understand an RNG input is the combination of forces that will activate and condition the system. The strength, vector, temperature, wind, rotation of the earth, atmospheric pressure, etc that activate and condition the throwing of the die, or the electrostatic and nuclear forces that activate and condition the radioactive decay.

Originally Posted by pianorain
First, an explanation of the "next-bit" test. Straight from Wikipedia:
Quote:
Given the first k bits of a random sequence, there is no polynomial-time algorithm that can predict the (k+1)th bit with probability of success better than 50%.
Why do I think simple typing on the keyboard fails this test? Because I can tell you the algorithm I think will predict the next bit (with greater than 50% success) given only a single bit.
I took a look at your attempt. And I cannot see how you proved to me you can predict what's the next character I'm going to type after this sequence "çlksjrpwqjrkasnf,ndb sd" with a probability of success better than 50%. (note: I really typed that on the fly)

Edit: Also note that by typing more characters (millions, billions, trillions of them) I'm not necessarily increasing your ability to predict the character, more than studying millions, billions, trillions of similar atoms in order to predict a window in time of atomic decay, or tossing a dice millions, billions and trillions of times in a similar fashion in order to predict the next number.

8. Originally Posted by Mario F.
The next character
Under MK's method, the next character doesn't matter, only the next bit. Are you talking independently of MK's method? If so, I have no idea how to do what you propose. It sounds like your argument would be better addressed to the authors of "The Code Book".

9. Originally Posted by Mario F.
One note about the Radioactive decay though. I was under the impression, that given a large number of similar atoms, radioactive decay can theoretically, at least partially, be predicted.
The prediction has to do with the proportion of atoms in the sample that will decay over time, but RNGs that use it are usually based on a measurement concerned with when the next atom decays.

If you read Asimov's Foundation series, you might notice that he used this kind of concept in psychohistory: Seldon's science allows him to predict events of humanity for centuries to come, but tells him nothing about what the person next to him will do next.

10. 51.4% may be more than 50% but it sounds like another version of the same bad joke: a random sample must yield 50/50 to be random.

You miss the forest for the trees. You don't need to do any sort of analysis at all to get a 50% success rate, if the data tends toward 50%: just guess the same thing every time. The next bit is either 0 or 1. The criteria is completely meaningless.

You already know the baseline here is 47%, which is why you can come up with a way "to guess" that will be correct slightly more than 50% -- just guess 1. If (because of some extremely silly misapplication of principles) you are determined to get 50%, like I said, adjust the algorithm to reflect English language distribution. The all your samples will be 50/50, and you will be unable to guess better than exactly 50%.

That will get your analysis more in line with what you want, but it does not take away from the fact that this is just a grotesque distortion of rational principles. If the reason you can get 51.4% is because the ultimate baseline is 47% (which it is), then you will never get a better rate than 53%, no matter how big the sample. And 53% will never enable you to break the cypher.

11. Let me tell you what I seem to believe is the problem here:

There's this vague notion written by someone that "randomly typing at a keyboard does not constitute a random event". You guys picked this at face value and are stating it as law. Trying to prove something that the author himself couldn't.

I suggest you stop.

12. There's this vague notion written by someone that "randomly typing at a keyboard does not constitute a random event". You guys picked this at face value and are stating it as law. Trying to prove something that the author himself couldn't.
for my own part i am merely saying that historically cryptographers have not used simple random typing as a secure form of randomness,whether through empirical evidence or theoretical study it has been found that it is not a usable method, the fact that its use has been shunned in real world applications by people that genuinely need to guarantee security as far as they can, is evidence enough for me.

13. Originally Posted by laserlight
The prediction has to do with the proportion of atoms in the sample that will decay over time, but RNGs that use it are usually based on a measurement concerned with when the next atom decays.

If you read Asimov's Foundation series, you might notice that he used this kind of concept in psychohistory: Seldon's science allows him to predict events of humanity for centuries to come, but tells him nothing about what the person next to him will do next.
Noted. Will study it later. Thanks

14. Originally Posted by Mario F.
Let me tell you what I seem to believe is the problem here:

There's this vague notion written by someone that "randomly typing at a keyboard does not constitute a random event". You guys picked this at face value and are stating it as law. Trying to prove something that the author himself couldn't.

I suggest you stop.
Or just quote a source. This paper was extremely interesting. Even after de-skewing, the typing test failed.

15. Originally Posted by rogster001
for my own part i am merely saying that historically cryptographers have not used simple random typing as a secure form of randomness,whether through empirical evidence or theoretical study it has been found that it is not a usable method, the fact that its use has been shunned in real world applications by people that genuinely need to guarantee security as far as they can, is evidence enough for me.
It's a good argument. But then,

It probably hasn't been used because it really isn't practical to type hundreds of random characters in a keyboard and it does tend to look silly and rather unprofessional for a cryptanalyst to do it

But more seriously (despite the above paragraph being more serious than what I'm making it look), it says nothing of the randomness approach to a One-Time pad. Why? Because your only hope to find the plaintext is to force the cypher against bruteforce and hope for the best, not to study the properties of the pad.

Originally Posted by pianorain
Or just quote a source. This paper was extremely interesting. Even after de-skewing, the typing test failed.
It probably did. As this sequence I just generated from an Mersenne Twister may fail: 195433953634