1. Originally Posted by MK27
However, that does not mean that I need to give credence to every possible objection, or that I need to exhaustively research the color of the sky to be very confident it is blue.
That is true. Let me try one last attempt: you cited a scenario of "a box with 10 holes in the bottom". Consider the scenario of coin flipping. When you flip a coin, you generate 1 bit of random information. So, if you flip a coin 8 times, you should get 8 bits of random information. But suppose that the coin is not a fair coin, e.g., it is weighted on one side. Would you really still get 8 bits of random information after 8 flips?

The idea here is that you don't, e.g., you get 7 bits of random information. But now, if you take the 8 bits generated from the coin flips and use it in a one time pad to encrypt 8 bits of information, then you would have actually used a key of effectively 7 bits, not 8 bits, to encrypt 8 bits of information.

In theory, this means that the 8 bits of ciphertext do not entirely depend on the 8 bits of key, as the actual information in the key is 7 bits. There is 1 bit of ciphertext that actually depends on both the plaintext and the key. It probably is not enough to deduce the entire plaintext, and in practice may not even be enough to deduce a single bit, but it still does not amount to perfect secrecy.

Relating this back to "plain language text arbitrarily converted (with mod 2) to a bitstream", it means that just because the process is random does not mean that you necessarily get at least as much random information as you need. So, "the very essence of entropy" is a grand phrase, but not necessarily an accurate one.

2. Originally Posted by laserlight
It is not cheating if you can determine that certain sets of text are much more likely to be used than others. It is only a one time pad if the pad is used once.
Not necessarily...it just has to be hard to guess!
Originally Posted by MK27
Well I fulfilled both criteria:
1) the key generated will be at least as large as the data.
2) the key was generated using a truly random source (to understand why and how this is so, read the thread*).
*paraphrased: human language texts considered as a modulus 2 of ascii values does not contain any pattern; it is a random sequence of 0 and 1.
You don't actually need "true" randomness, IMO. Theoretically, a distribution close to 50% (of 0's and 1's) is most desirable, and that heuristic could then be used to verify the integrity of the chosen algorithm. So, for example:

- A key of arbitrary length is selected by the user. Both binary and text modes should be available (eg: via command line options, stdin, file, etc).
- If the key is shorter than the data then it is expanded, otherwise it's truncated. The expansion shouldn't produce any large blocks of repetitive patterns (such as zeros), obviously. A sufficiently complex fractal could be used, for example.
- The OTP is applied to the data.
- Lather, rinse, repeat.

The fact that the guessed password reveals the data is also a non-issue, or at least it has nothing to do with the security of the algorithm. That's just a matter of key management, really. As always, a "strong" password must be chosen. That kind of goes without saying, though, doesn't it?

3. Originally Posted by laserlight
, "the very essence of entropy" is a grand phrase, but not necessarily an accurate one.
It is essentially an entropic source. The fact that it ends up being 47:53 is symptomatic of that -- the distribution is not perfect because the source is entropic. Who would have guessed?

Originally Posted by laserlight
Consider the scenario of coin flipping. When you flip a coin, you generate 1 bit of random information. So, if you flip a coin 8 times, you should get 8 bits of random information. But suppose that the coin is not a fair coin, e.g., it is weighted on one side. Would you really still get 8 bits of random information after 8 flips?

The idea here is that you don't, e.g., you get 7 bits of random information.
Here's another version of the box: I have a weighted coin that 3/5 of the time lands heads. Let's make a bet on 20 flips, and you can call it, but to make it fair, whoever chooses heads needs 13 flips, whereas tails only needs 9 flips to win.

Heads or tails? Just kidding. The real question: why do my odds make this fair? For the same reason that a tie with a (perfectly weighted) coin on 20 flips is unlikely, even tho that coin has a 50/50 distribution.

Let's go back to the idea that a 1000 bits represents 2^1000 possibilities. If I use a RNG that guarantees a 50/50 distribution, or is based on an algorithm admired for such, then that is no longer true...

IF I, AS THE CRACKER, KNOW THAT YOU ARE OBSESSED WITH PERFECT DISTRIBUTION, GUESS WHAT JUST HAPPENED? HA HA HA!

So perfect distribution is not (or should not be) the primary goal. It's an organizing principle. Do not reify it. Methinks this is an example of where experts sometimes lose -- the real world being not an ivory tower and all.

This is strong a criticism of my plain text algorithm, since it would appear that the longer the sample, the more likely you are to end up with a 47/53 split.

I would assume, however, that radioactive decay represents the same problem -- getting closer to a perfect 50/50 is not an improvement.

A perfect random bitstream generator would have the same odds of producing a stream of continuous zeros as a stream of continuous ones as a stream consisting of exactly 50% ones and exactly 50% zeros. There should be no ratio more likely than any other.

By aiming for perfect distribution, you reduce the entropic potential of your system.

I'm starting to believe the keyboard is not such a bad method at all.* Lemme set up the test differently: you play against the computer, and it uses perceptrons to guess your random sequence. Of course the first time it fails, but if you keep using the same technique, it may get better. So you vary your technique to include extreme cases that are unpredictable in context.

In light of all that, maybe doing mod 2 on plain text is not so great -- but it is probably about as good as radioactive decay (presuming radioactive decay represents a 50/50 distribution, which in all honest I have no idea).

I'm gonna assert the best key generator will have no predictable distribution at all.

Originally Posted by Sebastiani
You don't actually need "true" randomness, IMO. Theoretically, a distribution close to 50% (of 0's and 1's) is most desirable,
Only if you are pragmatic , in which case maybe 47/53 is pragmatically close enough to the (far from perfectly random) 50/50.

One thought I had here would be to take if str[1] < str[2] reverse the skew (from 53/47 to 47/53). The generator that could be 47/53 or (just as likely) 53/47 is probably better than the one that is consistently 50/50.

* That paper supplied by pianorain, btw, was really really bad. That was a grade five level paper.
http://www.daimi.au.dk/~ivan/Pseudorandom05.pdf
Altho in grade five you would probably get an "A" for being earnest and serious. An "A" level paper for grade 5 students. I would be ashamed of myself if I prepared that paper as anything more than crude notes and vague ideas (that need some deep consideration) at a university.

4. Originally Posted by MK27
Only if you are pragmatic , in which case maybe 47/53 is pragmatically close enough to the (far from perfectly random) 50/50.

One thought I had here would be to take if str[1] < str[2] reverse the skew (from 53/47 to 47/53). The generator that could be 47/53 or (just as likely) 53/47 is probably better than the one that is consistently 50/50.
Well right, which is why I said "close to". Hell, 80/20 would still work, but staying close to the optimal distribution just makes things that much more secure. You could monitor that by analysing the entire resultant key or the individual bytes, but the former would likely be most effective (though probably slower). The most important aspect, though, is checking that large predictable blocks don't arise in the key. A CRC type algorithm, or similar, might help prevent that from happening, though.

5. Laserlight's example of the coin flip is correct. An unfair (non-50/50) coin results in less than 1 bit of information per flip. Suppose the coin is 40/60:

= -0.6*lg2(0.6)-0.4*lg2(0.4)
= 0.971 bits per flip

The fact that it is still random is not the point. The INFORMATION is what matters. If a 1 bit is more probable than a 0 bit, then you can simply guess that all bits of the key are 1, and the majority of decrypted bits will be correct.

6. Originally Posted by MK27
IF I, AS THE CRACKER, KNOW THAT YOU ARE OBSESSED WITH PERFECT DISTRIBUTION, GUESS WHAT JUST HAPPENED? HA HA HA!
What do you think this means?

7. Originally Posted by MK27
I'm gonna assert the best key generator will have no predictable distribution at all.
This is essentially my argument and also supported by Schneider. And I couldn't have put it better than you did.

Essentially, issues like distribution serve our purposes only downstream, after the sequence or the algorithm is known in order to constitute a case for or against randomness. They are properties of the Randomness. They are not however defining. This is very important, because in the face of a less than optimal RNG, we still have randomness.

It's essential at this point to remember everyone that there is no formal definition for randomness. This every expert agrees and non experts like ourselves will, with more or less effort, come to realize. And I think anyone doubtful of the how meaningful this argument is, I suggest you stop and think for a moment how relevant to the discussion are the words "No Formal Definition". In laymen terms it simply means we don't know what randomness is! Which leads also to we cannot measure randomness.

What with our limited knowledge of "How Things Work" we have been doing, is find systems that can generate Unpredictable and Non Duplicable (sp? help me here folks) sequences. That's all. And that's what I believe Schneider stresses. And what variables like Distribution or Periodicity are, is just intelligent, yet limited, methods to somehow measure these two characteristics.

As MK so well puts, an good RNG is in fact a system without even a predictable distribution. And this is what we have been observing in those cases in nature where we have been looking for RNGs (like radioactive decay, dice, and I insist randomly typing at a keyboard, or randomly moving the mouse on the screen).

PRNGs are more easily measure because they rely simply on smart mathematics. The nature of the system is at the reach of anyone. Here I do not support anymore that anything less than an optimal distribution is desirable, simply because PRNGs are purely deterministic and as such a less than optimal distribution greatly facilitates the prediction and duplication of results. So this is where I depart from MK. An optimal distribution (say, that 50/50) actually makes it more difficult to predict the result. Not easier because I found the distribution.

8. Originally Posted by brewbuck
The fact that it is still random is not the point. The INFORMATION is what matters. If a 1 bit is more probable than a 0 bit, then you can simply guess that all bits of the key are 1, and the majority of decrypted bits will be correct.
If you promise me a 50/50 distribution, I can just guess one bit or the other and get half the bits correct. Getting a few percent more is not much difference.

So if a perfect algorithm results in me easily solving 50% of the key, and an imperfect one 53%, your idea of the right vs. wrong way to do things here is delusional. You recite all the liturgy you want. Delusion. The cheese smells in Denmark, and all that. Again:

A perfect random bitstream generator would have the same odds of producing a stream of continuous zeros as a stream of continuous ones as a stream consisting of exactly 50% ones and exactly 50% zeros. There should be no ratio more likely than any other.

By aiming for perfect distribution, you reduce the entropic potential of your system.
Around and around...

9. Originally Posted by Mario F.
It's essential at this point to remember everyone that there is no formal definition for randomness. This every expert agrees and non experts like ourselves will, with more or less effort, come to realize. And I think anyone doubtful of the how meaningful this argument is, I suggest you stop and think for a moment how relevant to the discussion are the words "No Formal Definition". In laymen terms it simply means we don't know what randomness is! Which leads also to we cannot measure randomness.

What with our limited knowledge of "How Things Work" we have been doing, is find systems that can generate Unpredictable and Non Duplicable (sp? help me here folks) sequences. That's all. And that's what I believe Schneider stresses. And what variables like Distribution or Periodicity are, is just intelligent, yet limited, methods to somehow measure these two characteristics.
It all boils down to entropy, but essentially yes, you are correct - it cannot be measured directly, or rather, so tangibly. Entropy is, after all, in the eye of the encoder! You really have to take an heuristic approach in cases like this, to see the forest from the trees...

10. Originally Posted by MK27
If you promise me a 50/50 distribution, I can just guess one bit or the other and get half the bits correct.
Hmm... this is a shocking statement. You are forgetting that you don't need to know the distribution. if you are bruteforcing, a lower distribution will essentially mean you will find more correct bits, even if you don't know what the actual distribution is.

Example:
If you have a 1/2 distribution, you will find half of the correct bits. If you have a lower 2/3 distribution you will find 2 thirds of the correct bits regardless of knowing or not what the distribution is, simply by guessing.

Again, this does not remove the random nature of the sequence. I agree there. But especially for PRNGs where the remaining bits are weakened by a known, measurable and deterministic system, this is death.

11. Originally Posted by Sebastiani
It all boils down to entropy, but essentially yes, you are correct - it cannot be measured directly, or rather, so tangibly. Entropy is, after all, in the eye of the encoder! You really have to take an heuristic approach in cases like this, to see the forest from the trees...
I was gonna site that little gem of Mario's too. The emperor wears no clothes!

Reminded me of a Derek Jarman movie about Wittgenstein, where he (L.W.) suddenly recognizes something about his own myopic vision of the "ice world".

Randomness isn't, IMO. I'm a determinist like Einstein. Randomness is an illusion.

Entropy is I see more like an effect of an "arbitrary" event or juxtaposition, as in thermodynamics or quantum physics. On which I am no expert

The point being, there are no random sources, only entropic wells to tap.

12. Yah, but i still had no one correct my spelling there. Help?

13. I think where everyone has a cerebral disconnect is where distribution applies. Some of us seem to think that uniform distribution means that out of all generated data, the data will appear in equal quantities, such that you can guess 1/nth of it by brute force. Not at all the case. It just means that you've selected something that may "seem right".

In cryptography especially, seeming right is tantamount to disaster. The output of an rng has nothing to do with its randomness, it's how random it is going to be. If a generator favors 1 as opposed to 0 by whatever amount... it may produce things that are very random... but without an equal probability, I just cannot define something as random. There's always some other factor.

The whole point of a period in a random system is after x tries, there is no choice, it's going to repeat the period, unless the prng is self modifying (and that would be interesting).

I just start to wonder if I make sense to anyone at all.

14. Originally Posted by whiteflags
Some of us seem to think that uniform distribution means that out of all generated data, the data will appear in equal quantities, such that you can guess 1/nth of it by brute force. Not at all the case.
So you could get a stream of all ones with a uniform distribution, is what you are saying. I guess so!

If a generator favors 1 as opposed to 0 by whatever amount... it may produce things that are very random... but without an equal probability, I just cannot define something as random. There's always some other factor.
You are still hung-up on this REIFIED idea of randomness. You need to read some critiques of Platonism my friend!

If a generator does not favor 1 or 0 zero by any amount, then it must favor a uniform distribution. That is not random.

A perfect random bitstream generator would have the same odds of producing a stream of continuous zeros as a stream of continuous ones as a stream consisting of exactly 50% ones and exactly 50% zeros. There should be no ratio more likely than any other.
Given infinite time and computing power, what kind of distribution would you assign to such a generator at the end? Uniform?

Do you see some paradoxical problems starting to form? There are a few routes to a uniform distribution. And the one that makes it there first is not the winner! Therefore: you can never declare a distribution uniform until you have all the data the generator will ever produce! You are just guesstimating the truth!

It's all because of that reification fallacy. I'm starting to understand where this came from...

Originally Posted by whiteflags
I just start to wonder if I make sense to anyone at all.

15. In a word, yes!

I want you to fully understand my point, so I am going to use the same quote.

"A perfect random bitstream generator would have the same odds of producing a stream of continuous zeros as a stream of continuous ones as a stream consisting of exactly 50% ones and exactly 50% zeros. There should be no ratio more likely than any other."

Except, with dice, 1/6 is not greater than 1/6. With bits, a system that has an equal number of ones as to choose from as outcomes, versus an equal number of zeros, is going to be perfectly random. If you choose a source that maps to 47 zeros amd 53 ones, that's not going to be random, according to your own damn quote, because a one is more favorable than zero.

This is why I kept telling you I don't care how you nigger rig the results so that there are 50 zeros and 50 ones, or whatever. You cannot determine randomness through output. A stream of ones is as likely as a stream of zeros if the outcomes are equally likely. This is distribution. Distribution has nothing to do with output. Bold, capitals, color. I have been trying to explain this to you the whole time. The keyboard is not a random source because you have not shown me a keyboard that maps to an equal number of ones and zeros.

If you don't want to believe that, you misunderstand your own quote.