Newsgroups: rec.games.frp.archives
From: barnett@agsm.unsw.oz.au (Glen Barnett)
Subject: PAPER: Testing Dice for bias
MessageID:
Organization: The Australian Graduate School of Management
Date: Wed, 2 Dec 1992 04:41:08 GMT
Approved: goldm@rpi.edu
Lines: 749
The following information is intended for distribution over Internet,
and outside of that may be copied for personal use only.
(c) Glen L. Barnett, 1992. All rights reserved.
I recently responded to a thread on rec.games.frp.dnd about testing dice,
in order to fix up a few misconceptions and make some suggestions. On the
suggestion of Coyt Watters I'm putting this on r.g.f.archives, which I think
is a good idea. What follows has some major additions, however.
In this post, I will begin by discussing various posts on the subject
of testing dice, then show how to do the test under discussion in the
thread (the chisquared goodnessoffit test) properly, and then talk
about more appropriate tests.

In article ,
alongley@cape.UWaterloo.ca (Allan Longley) writes:
[..stuff deleted..]
>testing dice for bias. Well, here is a test to use. I haven't actually
>tested this yet, but it should  in theory  work. And no, this is not
>a copy out of the old Dragon magazine, but it is the same test  its a
>pretty standard test. I will use simple terms  so all you math/stat
>people out there, don't correct the fine points, I know.
[description of chisquared goodnessoffit test deleted]
Since Allan asks for no correction of fine points, I will attempt to
limit myself to major problems. This is not intended as a flame on
Allan, but this is fairly important stuff, and should be explained
correctly. If at any stage I get less than pleasant, please accept
my apology in advance.
While the calculations that Allan describes give the correct value
of a chisquare goodnessoffit statistic (which he calls "Indicator"),
you should be *very* wary of interpreting the results in the way
he describes, as I will explain:
Let us assume you have 40 dice that (unknown to you) are all perfectly
fair, and you wish to test all of them, to see if any are "biased".
The way Allan has set his test up, you'd expect 2 of them to give
results below "Probably Fair", which he says indicates the die is
probably unbiased. That is, you have 40 fair dice, and you will
expect to regard only *two* of them as probably O.K.! Similarly,
you will expect to consider two of your "purely fair" dice as
probably unfair. Of the remaining 36, you will expect 18 scores
between "Probably Fair" and "Maybe" and 18 more between "Maybe"
and "Probably biased". For these 36, you have to do the test again,
under Allan's scheme. If you get both results below "Maybe" (you expect
9 of these) you say "Probably Fair". Similarly you expect 9 above
"Maybe" on both trials.
So we have (after repeating the test for 90% of the dice):
Number of Fair Dice: 40
Expected number "probably fair": 11
Expected number "probably biased": 11
Expected number which we don't know about: 18.
So over a quarter of perfectly fair dice will be called "probably biased".
If we continue testing those remaining 18 we are still undecided about,
the problem gets worse.
Other problems:
Allan says:
"The column titled "Maybe" are the Indicator values where there is
a 50% chance that the die is fair and a 50% chance that the die is
biased." (A)
This is just plain wrong.
The column he refers to is the value that a test on a *fair* die will
exceed 50% of the time. This is very different, and probably explains
why Allan misunderstands the whole interpretation of the results. (B)
If any of you can't see why what I said (B), and what Allan said (A) are
totally different, don't despair. This stuff is not always obvious from
the start. If you can follow the rules of an average RPG, you are smart
enough to understand a few nontrivial statistical ideas. I'm quite happy
to provide further clarification to the net if the demand is there.
> >From Table 1, it appears that the d4 tested may be "Fair" but another test
> should be done.
I'd say not. A reasonable interpretation of the result is "There is
no reason to doubt that the die is O.K.".
Incidentally, the test statistic of 2.00 obtained in the example is
only 2/3 of what you'd expect with a fair die. The value of 2.00 will
be exceeded almost 60% of the time by a test on a fair die.

In article ,
alongley@cape.UWaterloo.ca (Allan Longley), in response
to Michael Wright, says:
In article dks@acpub.duke.edu writes:
>
>> This is called a chisquare test, and an article with the
>>procedure and numbers for it appeared way back in Dragon issue
>>#74... Thank you, Mr. Longley, for reposting it (or did you come
>>up with it in isolation? =) ) for the benefit of those who don't
>>have the issue (probably most readers).

Yes, I seen the issue. The chisquare test is a standard statistical test
for determinig if a data set matches a particular distribution  so, no, I
did not come up with the test in isolation, its been around for a lot longer
than D&D. I don't actually have the issue, so I didn't copy it for the net.

I've been playing with the chisquare test and you know what I found out 
ALL DICE ARE BIASED!! Well, that's not true  all except d4's and d6's are
biased. Of course, this really shouldn't be a surprise. So, I've been
looking at modifiying the chisquare test for "real world" dice  more on
this in a later post.
Allan is correct that the test has been around a lot longer than D&D.
The test, due to Karl Pearson, is nearly 100 years old.
Its no surprise that Allan finds that all real dice are biased:
i) Its impossible to make a truly fair die (obviously). Its just
that most are close enough that we don't care too much. The
chance of getting "close to fair" will decrease with the number
of sides.
ii) Allan's testing method will call more than a quarter of fair
dice (assuming they existed) biased. Even the fairest dice you
could buy have a good chance of being called biased.
>Actually, I use a program I made to roll dice for stats. Unfortunately, nobody
>in the party wants to use it, because dice rolls invariably end up better. I
>think this must be because of the pseudorandomness of the program.
>Anyone out there that knows better?

I wouldn't want to use a computer generated dievalue while playing AD&D.
THe thrill of the "rolling die" is part of the game. Also, with reference
to the above, most players will have a favourite die/dice due to the
inherent bias found in real dice. The trick is to find the dice that are
biased beyond reasonable playability.
In response to Michael G Wright:
Michael's discovery that handrolled dice often come out better
could occur for a couple of reasons:
i) Players tend to hang on to "favourite" dice that "roll well"
(i.e. come up with good results). So biased dice have some
chance of concentrating into the hands of players. As long as
this isn't too extreme, it probably doesn't matter too much.
(Allan quite correctly identifies this reason).
ii) Players don't really roll randomly. I had a fairly long email
discussion with Sea Wasp on this topic just recently. Even
unconciously, you can pick up the dice in a "nonrandom" fashion,
so that a good roll will tend to be followed by another good
roll if you don't roll tooo vigorously. You may notice that
after a bad roll players tend to throw harder. This may be more
of a problem, as some players are *much* better at it than others.
If it becomes too noticeable, you may wish to invest in a dice
cup, or mock up a crapstable affair.
Allan's comment that "The trick is to find the dice that are biased
beyond reasonable playability" is spot on. Exactly correct.
Remember it, because I'll come back to it later.
I think the above discussion also answers Paul Kinsler's questions.
[in article , kinsler@physics.uq.oz.au
(Paul Kinsler) asked for clarification of Allan's comment that all
dice are biased].

In article <1992Nov12.194015.1602@Princeton.EDU>,
dagolden@phoenix.Princeton.EDU (David Alexandre Golden) writes:
[stuff deleted]
>I once did something along those lines to test whether my DM's die was
>fair. (It would roll 20's a lot. Often on command.)
>
>To test the die, I made a histogram (I believe is the term for it) like this:
>
>x x x x x
>x x x x x x x x x x x x x x x x x x x x
>x x x x x x x x x x x x x x x x x x x x
>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
>
>With an "x" being each time that number came up. A totally fair die would
>have a straight line across, assuming it was rolled enough times. The
>question is, what is enough times? I rolled a d20 about 380 times (not bad
>if one person rolls and the other makes an x in the column) and while it
>gave a fairly good idea that the die was biased, the biased numbers had only
>occured a couple times more than the average. (If I remember correctly).
>So I wrote a computer program to do the same thing until the deviation between
>the highest and lowest number of occurances was less than about 15% of the
>average number of occurances. (i.e a reasonably smooth profile). The
>computer required SEVERAL THOUSAND ROLLS to do this.
Your idea of making a histogram is a good one. In fact all the
chisquared test does is the same as drawing a line across the
histogram where your expected "straight line across" would go,
looking at the deviations from that, squaring (to get all positives)
and adding the squared deviations up and dividing by that expected
number. This gives a single overall measure of deviation from uniformity.
The advantage of looking at the histogram is you see where
the the differences are, but you can't tell how big they "ought" to be
for a fair die. The actual number in the cell will be approximately normally
distributed with mean equal to the expected number in each cell and standard
deviation approximately the square root of the mean. In the above example,
we'd expect Dave to get 19 in each cell, so the standard deviation is about
4.35. That is, we'd expect to get about 2/3 of the cells with counts in
the range 15 to 23, and about a 2/3 chance of all but 1 or 2 of the values
inside the range 11 to 27.
[ some of my own discussion deleted  see the section "Tests based
on the histogram" below]
> ... Still, the point is that I'm skeptical
>that the "fairness" of a die can be determined in only a hundred or so rolls.
>(d4 maybe... d20 no way!)
Well, in fact Allan's suggestion was to use 20 rolls per cell, so he'd use
400 rolls for a d20 and 80 rolls for a d4. But in any case you can never
decide that a die is actually fair. If you do a test and get a result
close to what you expected if the die was fair, you have a lack of evidence
against the hypothesis of fairness (which is the default assumption for
a statistical test of biasedness in a die  the "null hypothesis").
What you get is either a higher degree of evidence against the hypothesis
of fairness (by getting a result that is very unlikely with a fair die),
or a low degree of evidence against fairness. It's like in a court case,
(a criminal case), where the defendant is assumed innocent until proven
guilty (innocence is the null hypothesis), but evidence against the
defendant is presented by the prosecution. The jury then decides either
"guilty" if there is strong enough evidence, or "Not guilty" if there
is not. They don't declare innocence.
So we can't determine "fairness" anyway. The question we need to ask
is: If the die is biased, will a hundred rolls (or whatever number)
be enough for us to have a good chance to pick up that difference,
while at the same time, not "convicting the innocent" too often?
Whether it is enough depends on how big a difference you think it is
important to pick up.

In article <1992Nov12.231019.1204@mcs.kent.edu>,
adray@mcs.kent.edu (Adam Dray) writes (in response to Dave Golden):
>In other words, a histogram shows very little. Random doesn't mean
>necessarily that you'll get an even distribution. It just mean the
>probability that you won't get an even distribution is proportional to
>the number of sides on the die, and the number of times you roll it.
I disgree with the first sentence. The final sentence above is wrong.
The more you roll it, the more even the distribution will be, as long
as the die is fair. It doesn't really depend on the number of sides.
(except as far as the negative dependence between cell counts is
reduced for more sides).
>Notes about the fairness of dice:
>
>Sharpedged dice are better than smoothedged dice. They're also more
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Not always, but this may be true more often than not.
>expensive, however. Rounded dice are often inked by coating the
>entire die with ink, then tossing the die in a "tumbler" (similar to
>tumblers for smoothing rocks) until all the die on the outside is
>gone. Thus, the ink is left in the crevices where the numbers are.
>
>Theoretically, the grooves for the numbers can make one side a more
>likely outcome. Official casino dice don't have inset pips.
If you do it right any effect will be swamped by other manufacturing
defects anyway.
[some stuff deleted]
>
>GameScience did tests on other manufacturers' dice. They found
>certain numbers to be more likely. I've heard that the real 100sided
>die tends to roll certain numbers more often.
It is impossible (both effectively and theoretically) to get a fair 100
sided die. The practical problems are more important than the theoretical
ones.
>Filing corners off your dice can make certain outcomes more probable.
>Natural wear can do the same thing.
>
>For most people, none of this matters one damn bit. =)
In general, no, it doesn't matter. It's encouraging to see that so
many people (just about all posters on the topic) realise this.

Interpreting the test statistic (Allan's "Indicator")
Carry out the calculations as described by Allan*, but use any number of
throws per cell (possible outcome) you like (I'd suggest 10 as a minimum,
because otherwise the tabled distribution is out a bit). The more rolls
you do, the better chance you have of picking up a difference of a given
size. The value of 20 that Allan suggested may well be a reasonable choice
in most circumstances. Allan gives the calculations for two different
numbers of throws (20 and 10 per cell, but in different posts), so you
ought to be able to generalise.
* The calculations given by Allan may no longer be available to you, so
an indication of how to do the calculations is given here:
Roll the die many times, say 20 times per face. Record each result
(I suggest you make up a tally sheet). Calculate the difference between
the number of times each face came up and the expected number (20 in this
case). Square these values and add them. Divide by the expected number
per face. This is your chisquared statistic.
E.g. d4: Roll 20 times per face = 80 rolls
Face: 1 2 3 4
No times 23 18 15 24
expected 20 20 20 20
difference 3 2 5 4
diff^2 9 4 25 16 Sum = 54, chisquared value = 54/20 = 2.7
If the result is less than the final column of Allan's table 2 (which
are the tabulated values for a 5% significance level), you shouldn't
worry too much, there is not very strong evidence of bias  in fact 1
in 20 tests on a fair die will score worse than this. If the result is
much bigger than the value you have some cause for concern. A result
bigger than the 1% column below is quite unusual if the die is fair
(a result at least this big only occuring 1% of the time), so it gives
us good reason to suspect bias.
A small table of the chisquared distribution:
5% 1% df
d4 7.81 11.34 3
d6 11.07 15.09 5
d8 14.07 18.48 7
d10 16.92 21.67 9
d12 19.68 24.72 11
d20 30.14 36.19 19
(these results came from a computer approximation to the chisquared
distribution. They should be accurate to the figures given.)
If you want a more "cookbook" approach; if the result exceeds the 1%
value, its probably biased. If its between the 1% and 5% values, there
is a moderate degree of evidence that its biased, but it still might be
OK. If its less than the 5% value, you don't have any reason to think
its biased on the basis of the test.
You will find more extensive tables in most elementary statistics books.
(references for the chisquared and KolmogorovSmirnov tests are
at the end of this article).
You look up the df (degressoffreedom) that are one less than the
number of faces on the die (e.g. d4 > 3 df).
A note on pronunciation: The Greek letter chi (the capital looks like
an X, and the lowercase has one of the two crossed lines a bit curly)
is pronounced with a hard "ch" like Charisma, and the word rhymes with
pie. Note that mathematical symbols come from *ancient* Greek, so no
arguments from any modern Greeks please.
This will provide a reasonable allround test for bias in a die.

Why you probably don't want to do the chisquared test:
(at least for d8 and above)
The chisquared test will pick up any kind of deviation from a purely even
distribution. However, we are much more worried about some kind of deviations
than others. For example, I'd be more interested in knowing that "20" came
up too often on a d20 than knowing "10" came up too often. The first could
affect play substantially, the second probably only a little. We should use a
test with a better chance to pick up the kind of deviations from fairness
that are most important to us (which will trade off with less chance of
picking up deviations we are less concerned with).
Let us consider a more complete example:
Imagine we have two d20's we'd like to test, and that in fact
(but unknown to us) they have the following (percentage) probabilities:
(the rows are: Face number
% prob 1st die
% prob 2nd die )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
5 4.5 6 3.5 7 2.5 8 1.5 7 5 5 7 1.5 8 2.5 7 3.5 6 4.5 5
1.5 1.5 2.5 2.5 3.5 3.5 4.5 4.5 5 5 5 5 6 6 7 7 7 7 8 8
A fair die would have 5% right across, of course. These 2 dice can be obtained
from each other by relabelling the faces. The first die will be reasonable in
play, because, of course, we don't try to roll 'exactly 8' or 'exactly 9',
but 'less than 8' or 'greater than 11'. The first die is never out by
more than one twentieth of the required probablility (e.g. probability of
a 2 or less is 9.5% instead of 10%) in either direction. It has the correct
average, and almost the correct standard deviation (the difference is tiny).
The second die would be very unbalancing in play: it has about a 2/3 chance
(66%) of rolling 11 or higher, and a 20 is more than 6 times as likely as
a 1. The mean is almost 13. The standard deviation is also out, but that's
relatively unimportant.
The chisquared will rate them as equally bad!
So a good test should be likely to identify the second die, but we might
be prepared to sacrifice some of our ability to pick up the first, since
it will make little practical difference in play. (I said I'd come back to
this point!)
Note that almost any deviation on a d4 will be important (there are only 4
different values), and to a lesser extent a d6. I'd stick with the chi
squared test on those.
There are many tests that will do what we want. I will present only one
such test*. (This is not to say that a properly applied chisquared is
not good, just that a test more closely tailored to our specific
question of interest will be even better.)
* two tests if you count "Tests based on histograms", below.
The KolmogorovSmirnov test:
Collect data as for the chisquared test, up to the point where you
start doing calculations.
That is, lay out like this (you could run down rather than across):
Roll: 1 2 3 4 5 ......
Count: 17 19 22 27 24 ......
Expected: 20 20 20 20 20 ......
Now add up your counts and expected counts, writing the partial
totals as you go:
Roll: 1 2 3 4 5 ......
Count: 17 19 22 27 24 ......
Expected: 20 20 20 20 20 ......
Sum Count: 17 36 58 85 109 ......
Sum Exp: 20 40 60 80 100 ......
Now find the differences (without sign):
Sum Count: 17 36 58 85 109 ......
Sum Exp: 20 40 60 80 100 ......
Difference: 3 4 2 5 9 ......
The last difference will be zero, so you don't have to work out the
final column (I still would as a check).
Divide the largest difference (9 is the largest difference above, for
the calculations you can see) by the number of rolls you made altogether.
This is your test statistic. Let's call the value D.
You can look it up in most books on nonparametrics, which will
have tables. However, you would be better to use the table below, for
reasons I'll discuss in a second.
You multiply D by the square root of the number of rolls
(equivalently, divide the largest difference by the square root of
the number of rolls), and compare with:
5% 1% d#
1.08 1.35 4
1.10 1.37 6 These values apply pretty well irrespective of
1.11 1.38 8 the total number of rolls, but I would use at
1.12 1.39 10 least 10 rolls per face.
1.12 1.40 12 Note also that these values come from simulation,
1.14 1.42 20 and are hence not exact. This doesn't really matter.
and interpret as I suggested for the chisquare test.
You may find the following values in tables:
5% 1%
1.36 1.63 (irrespective of the number of sides on the die)
the reason these are larger is that they are based on the assumption that
the distribution the data are from is continuous (effectively, a *very*
large number of faces on the die would give these values). If you use the
textbook values, the test will be conservative (a fair die will reject
slightly less often than the supposed 5% and 1% for the above table), due
to the distribution of values being discrete (d20 generates only integers,
not anything between).
So, for our above example, assume there are no larger differences
than 9, and that we made 400 rolls on a d20 (hence the expected number
in each cell is 20, as above). Then D is 9/400 = .0225, which if you
can get tables you'd look up. We made 400 rolls, so we could use the
table above: the square root of 400 is 20, so D x 20 (= 9/20) = .45.
This is much less than the 5% value, so there is little evidence that
the die is unfair.
There are tests which are probably even more appropriate, but these
two (chisquared and KS) will be enough for you to get a good idea
of any suspect dice.
Note: If you suspect a die, and decide to test it, don't use the
rolls that made you suspect it in the test. Generate a new
set. e.g. if you are all recording your rolls as you play,
and one players' results look funny, don't then test those
recorded values  you have to generate a new set.

Testing a die based on the histogram of rolls:
The histogram approach can be turned into a test of sorts as follows:
After drawing the histogram, 2 lines can be drawn either side of the
expected (mean) result. If all histogram bars lie within the inner
lines, there is no strong evidence of bias. If any of the bars go outside
the outer lines, there is fairly clear evidence of bias. If one of the
bars lies between the inner and outer lines, then there is some (mild)
evidence of bias, but its is not really clear. You may wish to then
perform a further test on the probability for that individual side,
as described in the next section (Testing an individual face). If
several of the bars lie between the inner and outer lines, we have a
stronger indication of bias.
Where do we draw the lines?
I have worked out values for rolling 20 times for each face (as in the
other examples, 80 times for a d4, 400 times for a d20). The bars on the
histogram must actually go past these values. You could think of these
values as giving "Acceptable Ranges" (literally, 95% and 99% acceptance
regions) for the histogram. A fair die will give histograms with one or
more bars outside these ranges 5% and 1% of the time respectively
(actually just under).
Table for 20
throws per face Approximate formula:
d# 5% 1% Let N be the total number of rolls.
4 834 637 Let c be the number of faces on the die.
6 933 736 Let e be the expected number of times
8 933 835 each face will come up ( e = N/c ).
10 1032 835 Then the lines go at e +/ A x sqrt[e (c1)/c],
20 1130 932 and the value for 'A' comes from the table below.
All fractions should be rounded up.
e.g. N=160, c=8, e=20 give:
5%: 20 +/ 2.73 sqrt (20 x 7/8) or 832
1%: 20 +/ 3.22 sqrt (20 x 7/8) or 734
Table to go with
approximate formula:
d# 5% 1%
4 2.49 3.02 Due to being in the
6 2.63 3.14 extreme tails of the
8 2.73 3.22 distribution, combined
10 2.80 3.29 with slight asymmetry,
12 2.86 3.34 the ranges we get are
20 3.02 3.48 sometimes out a bit.
This is not a big deal.
Example: We throw a d20 400 times, and record the results
and from the table above, we draw the inner lines at 11 & 30,
and the outer lines at 9 & 32, as well as a reference line at 20.
(in the histogram below, "." =1 count, ":" =2 counts. The horizontal
lines aren't quite in the correct positions; they ought to be about
a quarter to half a character position lower.)
Counts (34)
35 + > <
__________________________:__________________________________ (32)
__________________________:__________________________________ (30)
 :
 :
25 + : .
 . : : :
__:__._____:_____:________:____________________:__:__:_______ (20)
 : : : . : : : . : : :
 : : : : : : : : . : : : : : : :
15 + : : : : : : : : : : : . : . : : : : :
 : : : : : : : : : : : : : : : : : : : :
:::::::::::::::::::: (11)
:::::::::::::::::::: ( 9)
 : : : : : : : : : : : : : : : : : : : :
5 + : : : : : : : : : : : : : : : : : : : :
 : : : : : : : : : : : : : : : : : : : :
 : : : : : : : : : : : : : : : : : : : :
`++++++++++++++++++++
face: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
counts: 22 21 18 23 19 22 20 16 34 17 19 15 18 15 14 22 25 24 18 18
^^
One value (34) goes outside the 1% values (technically, you could say
goes outside a 99% acceptance region), so it seems our dice is biased.
More particularly, it rolls too many 9's. Whether this will affect a
game very much is another point. (If it had been "1" or "20", however,
perhaps this would result in a large effect on the game).
Testing an individual face:
When you suspect a particular face is coming up with the wrong
frequency, or only wish to test a particular face (e.g. 20 on a
d20), throw as before, but you can compare with a narrower range,
as below. However, you can't use the data that made you suspect
the face in this test, you must generate a new set of rolls.
E.g. If you a histogram and find the results for 7 look odd,
you can't use those numbers in this test. In that case the ranges
given above (for testing an entire histogram) are appropriate.
For the tables below, as for those above, you need to exceed the
range given to call the die 'probably biased'.
When you don't know the direction already (twotailed test):
(If you are in doubt, use this table rather than the next one)
Table for 20
throws per face Approximate formula (reasonably accurate):
d# 5% 1% Let N be the total number of rolls.
4 1328 1130 Let c be the number of faces on the die.
6 1228 1031 Let e be the expected number of times
8 1229 1031 each face will come up ( e = N/c ).
10 1229 1032 Then the lines go at e +/ 1.96 x sqrt[e (c1)/c],
12 1229 1032 (5%) and for 1% at e +/ 2.58 x sqrt[e (c1)/c].
20 1229 1032 All fractions should be rounded up.
e.g. N=160, c=8, e=20 give:
5%: 20 +/ 1.96 sqrt(20 x 7/8) or 1229 (rounded up)
1%: 20 +/ 2.58 sqrt(20 x 7/8) or 1031 (rounded up)
When you think a particular face is coming up too often, or if you
think a particular face isn't coming up enough (onetailed test):
Table for 20
throws per face Approximate formula (reasonably accurate):
d# 5% 1% Let N be the total number of rolls.
4 14/26 11/30 Let c be the number of faces on the die.
6 13/27 11/30 Let e be the expected number of times
8 13/27 11/30 each face will come up ( e = N/c ).
10 13/27 11/30 Then the lines go at e +/ 1.65 x sqrt[e (c1)/c],
12 13/27 11/30 (5%) and for 1% at e +/ 2.33 x sqrt[e (c1)/c].
20 13/27 11/31 All fractions should be rounded up.
e.g. N=160, c=8, e=20 give:
5%: 20 +/ 1.65 sqrt(20 x 7/8) or 1427 (rounded up)
1%: 20 +/ 2.33 sqrt(20 x 7/8) or 1130 (rounded up)
These values are given as either/or i.e. since you have already specified
a particular direction, you will compare with only the higher values or the
lower values, not both.
Example 1: You decide to test you new d20 to see if it rolls the
correct number of 20's, but you don't believe it to
be biased in a particular direction. You roll 400
times, and 35 times you get a "20". From the "twotailed"
table above, you can see that's outside the outer (1%) range.
It seems your d20 rolls too many 20's.
Example 2: Another player seems to be rolling a lot of 1's on her d4.
You decide to test it whether 1 comes up too often.
You roll it 80 times, and get the following:
1 2 3 4
13 26 25 16
Since you decided to test if there were too many 1's, you
can only see if the number of 1's exceeds 26, which it does
not. You can't say, after generating the data "Oh, actually,
perhaps it rolls too few ones", or "perhaps it rolls too many
2's" without generating a new set of data for the new hypothesis.
You must never base what you are testing for on what you spy
in the set of data you use in the test. Our only conclusion
on this test: the d4 doesn't roll too many 1's.
You may like to then generate a new set of rolls to see if it
rolls too few 1's.

As further examples, here are the chisquared test and KolmogorovSmirnov (KS)
tests performed on the same data.
chi squared test:
counts: 22 21 18 23 19 22 20 16 34 17 19 15 18 15 14 22 25 24 18 18
diff
from 20: 2 1 2 3 1 2 0 4 14 3 1 5 2 5 6 2 5 6 2 2
diff^2: 4 1 4 9 1 4 0 16 196 9 1 25 4 25 36 4 25 36 4 4
sum diff^2: 408
chisquared statistic: 408/20 = 20.4
(far less than the 5% value of 30.14)
KolmogorovSmirnov test:
(Calculations have been run down the page because I can't fit 20 3 digit
numbers, with spaces and labels across an 80column screen).
counts sum expected diff
22 22 20 2
21 43 40 3
18 61 60 1
23 84 80 4
19 103 100 3
22 125 120 5
20 145 140 5
16 161 160 1
34 195 180 15 < max diff, D, is 15. Well short of significance
17 212 200 12 at the 5% level. e.g. calc D/sqrt(n) = 15/20
19 231 220 11 or .75; where the 5% value from the simulations
15 246 240 6 is 1.14
18 264 260 4
15 279 280 1
14 293 300 7
22 315 320 5
25 340 340 0
24 364 360 4
18 382 380 2
18 400 400 0

Conover, W.J. (1980): Practical nonparametric statistics,
2nd Ed., Wiley, New York.
Neave, H.R. and Worthington, P.L.B. (1988): Distributionfree tests,
Unwin Hyman, London.