χ^{2}^{ }Distribution
The χ^{2
}Distribution is one of the
distributions we will be using during the course. Unlike the normal and
tdistributions that are symmetric, the χ^{2}distribution is
skewed to the right. Like the tdistribution, the χ^{2}distribution
consists of a whole family of distributions distinguished by a single whole
number parameter, ν , called the number of
degrees of freedom. This value of ν determines the skewness
of the graph.
We will use the χ^{2
}Distribution in three applications:
(a) Estimating a Population Variance
(b) Performing a Goodnessoffit
Test
(c) Contingency Tables
In all three applications, we
will be looking for the value of the test statistic χ^{2} ....What
is this χ^{2}
statistic?
Think about this experiment: You toss a coin 100 times. Of course, we can
simulate this experiment using the TI calculator with the function randInt(1,2,100)> L1. Then, we can
sort the data and count the number of ones( tails) (or twos (heads)) we got. I did the experiment and
I got 54 ones ( 46 twos). If we perform the experiment
several times, we can get different values, or some of them could be repeated.
We will call these , the observed (O)
values. Before performing the experiment, we expected to get 5050, if the coin
is fair. We will call these, the expected (E) values. We call
these the expected values because if we perform this experiment many many times, we expect to get equal number of tails and
heads. This conviction is based on the fact that the probability of getting a
tail or a head is 5050%( if the coin is fair!). So,
if we perform this experiment many many times, we
will not be surprised of getting an average of 50 tails ( or
50 heads!). Now, look at the number defined as: [
(O_{heads}E)^{2} + (O_{tails}  E)^{2}] / E . If we use the
values we got before (54,46), then this number will be
equal [ (4650)^{2} + (5450)^{2}] / 50 = 0.64. This
number is what we call the χ^{2} statistic. Observe
that this number has to be a positive number. The amazing thing is that if we
perform this experiment many many times, the
distribution of the χ^{2} statistic values is not
arbitrary, but follows a distribution called the χ^{2}
distribution. The expression of the χ^{2} function
is:
as you can see, the function depends on an
additional parameter, ν, the degrees of freedom.
To check how accurate are the predictions using this formula, I have
performed three experiments.
In the
first one, I have simulated tossing a coin 500 times, and then I repeated the
experiment 500 times. I have compared the values obtained in the experiments(O), with the values calculated(E) using the
above formula for the intervals, 01, 12, 23, ....1213. In this case we have
1 degree of freedom (ν=1). (At the end, you can find the program I
wrote to make the simulation.)


In the second experiment, I have simulated tossing a die 500
times, and then I have repeated the experiment 500 times. In this case the
number the degrees of freedom is ν=5,
and the value of χ^{2 is}
given by the expression:
χ^{2 }= [(n_{1}m)^{2} + (n_{2}m)^{2}
+(n_{3}m)^{2} +(n_{4}m)^{2} +(n_{5}m)^{2}
+(n_{6}m)^{2} ] / m
where, n_{i}
is the number of times we observed the number i,
and m is the expected value which is the total number of trials divided by
6. The results are in the next table:


In the third experiment, I have
simulated tossing a soccerballlike die( 12 faces!)
500 times, and then I have repeated the experiment 500 times. In this case the
number the degrees of freedom is ν=11,
and the value of χ^{2 is}
given by the expression:
where, m_{i} is the number of
times we observed the number i, and m is the
expected value which is the total number of trials divided by 12. The
results are in the next table:


I find all of this really amazing!. You see that there is some order, some logic behind all
of these statistical fluctuations! Why?...I don't
know. If you get a result like this in Physics, you say:...there
is some law of conservation behind these numbers!...but, what we have
here?...We are talking about coins and dice!.....but, wait a minute!!...there
is more!!....
Next I decided to make a different kind of
simulation. What if instead of using
dice and coins, we use some process that follows a normal (continuous)
distribution? ….
Using the program randnorm(), I simulated
selecting random samples of 100
individuals and asking them about their IQ. We know that the people’s IQ
follows a normal distribution with mean value µ=100 and standard deviation
σ=15.
I made two simulations: In one, I divided the
data into ten classes where each class had the same probability (10%), in the other I divided the data into five classes where
each class had the same probability (20%). To find the limits for each class I
used the function invNorm(). So, in the case of 10 classes I found the percentiles 10^{th} , 20^{th} ,30^{th} , and so
on. In the case of the five classes I used the percentiles 20^{th}, 40^{th},..and so on. Each simulation was
repeated 500 times.
The results obtained for the simulation in
the case of 5 groups is shown in the following table: In the first column you
find the intervals used for the χ^{2 }values. In the second
column, you find the value used for the calculation. In the last columns you
find the observed and the calculated values using 4 degrees of freedom. The
mean value of χ^{2 }calculated for the 500 simulations was 4.27
which is consistent with the value we could expect
from a χ^{2 }distribution with 4 degrees of freedom. So, the idea
that the number of degrees of freedom could be less than four does not seem
reasonable.


The results obtained for the simulation in
the case of 10 groups is shown in the following table: In the first column you
find the intervals used for the χ^{2 }values. In the second
column, you find the value used for the calculation. In the last columns you
find the observed and the calculated values using 9 degrees of freedom. The
mean value of χ^{2 }calculated for the 500 simulations was 9.64
which is consistent with the value we could expect
from a χ^{2 }distribution with 9 degrees of freedom. So, the idea
that the number of degrees of freedom could be less than nine does not seem
reasonable.


, but…wait a minute!...there
is more!...
Because we were making simulations with a
normal distribution, it makes sense to continue in this direction. If we select
random samples of dimension N from a population with characteristic that
follows a normal distribution, then the sampling distribution for the standard
deviation of the samples ( the distribution of Sx!) follows a Chisquare distribution with (N1) degrees
of freedom if we standardize the variable this way:
χ^{2} = [(N1)*S_{x}^{2}]
/ σ^{2}
where Sx is the standard
deviation from the sample, and σ
is the standard deviation for the population.
I made simulations for samples of dimension
N=5, and N=10 individuals and we asked them about their IQ. We know that the IQ
follows a Normal distribution with mean µ=100 and standard deviation σ=15.
Here are the results :
For N=5
The results obtained for the simulation in
the case of samples with dimension 5 is shown in the following table: In the
first column you find the intervals used for the χ^{2 }values. In
the second column, you find the value used for the calculation. In the last
columns you find the observed and the calculated values using 4 degrees of
freedom. The mean value of χ^{2 }calculated for the 500
simulations was 4.08 which is consistent with the
value we could expect from a χ^{2 }distribution with 4 degrees of
freedom.


For N=10
The results obtained for the simulation in
the case of samples with dimension 10 is shown in the following table: In the
first column you find the intervals used for the χ^{2 }values. In
the second column, you find the value used for the calculation. In the last
columns you find the observed and the calculated values using 4 degrees of
freedom. The mean value of χ^{2 }calculated for the 500
simulations was 8.96 which is consistent with the
value we could expect from a χ^{2 }distribution with 9 degrees of
freedom.


Just
Amazing!!