Let's say I have a 6-sided dice and I want to accurately estimate the probability of throwing a six by using a model.
How do I determine how many runs I should do on my model to reach a 1% margin of error?
Let's say I have a 6-sided dice and I want to accurately estimate the probability of throwing a six by using a model.
How do I determine how many runs I should do on my model to reach a 1% margin of error?
Originally Posted by brewbuck:
Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.
It's a bit hard to explain on here since it involves a bit of math, but here is my best attempt:
First of all, you can never guarantee that you will be within 1%. The best you can do is say that it will be within 1% 50 or 95 percent of the time.
I am assuming that you are running a program that chooses a random number 1-6 and counts the number of times 6 is chosen. So I am guessing your code looks something like this:
Now we don't know N yet, but let's assume that it is large (at least a couple hundred). Then we can say "probability" is within 1% if n is between 0.99*N/6 and 1.01*N/6 (since N/6 is the "right" answer).Code:for (int i=0; i<N; ++i) { //choose a number 1-6 if (number==6) ++n; } double probability=(double)n/N;
Calculate the probability that it is within this range using the Gaussian distribution and the central limit theorem where mu=Np and sigma^2=Np(1-p) (p here is 1/6).
If you looked up the integral on wiki and did some algebra, you should have an expression involving the error function of N and a bunch of numbers. Set this equal to 0.5 or .95 or whatever number you need and solve for N. (You'll need a calculator that can do inverse error function. Wolfram is good for that.)
Hopefully, this is enough to give you a push in the right direction. If you still need help, let me know and I'll explain it in more detail. It's just that on this forum, it's such a pain to try to describe complex equations.
I copied it from the last program in which I passed a parameter, which would have been pre-1989 I guess. - esbo
s / sqrt(n) where s is the sample error (5/6) and n is the number of samples
soThat is just the probability that a given sample does not represent the population as a whole. Any given sample of that many may in fact be wildly inaccurate. This technique is often used by lobbyists to push their agenda using false surveys. Take a thousand surveys of 1000 people, and pick the one that represents your viewpoint best and claim it gives a standard error of 0.5 / sqrt(1000) or ~1.5%Code:#include <stdio.h> int main(){ double s = 5.0/6.0; unsigned long long n = 1; while(s / sqrt(n) > 0.01) n++; printf("%d samples needed.\n" , n); return 0; }
Last edited by abachler; 09-17-2009 at 06:16 PM.
Awesome material, you two!
Thanks a bunch. I'll try and take it from here.
Originally Posted by brewbuck:
Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.