**
**

**
**

__EXPERIMENTS ABOUT THE CENTRAL
LIMIT THEOREM (CLT)__

The CLT plays an
important role in statistics and theory of probabilities. Essentially, what the
CLT states is that if you take the mean value (X)
of many samples of dimension n, from a distribution that could be symmetric, or
not symmetric, and if N is big enough, then the distribution of these mean
values ( this distribution is callled Sampling Distribution) will be a Normal
distribution with mean value (m_{sd})
equal to the mean value of the original distribution (m),
and standard deviation (s_{sd})
equal to the standard deviation of the original distribution (s)
divided by the square root of N.

m_{sd}
= m

s_{sd}
= s/(n^{½})

In these experiments we will study the effect of changing the dimension (n) of the samples on the conclusions of the CLT.

We selected samples from a binomial distribution with probability p=0.1, so
particularly for small values of N, the distribution should be skewed to the
right. Observe that this N IS NOT the same n we use when selecting the dimension
of the sample. Each experiment was repeated 500 times (These 500 repetitions
have nothing to do with N or n!!). As an example of the original distribution,
in fig.1 we show 500 values with N=10. In this case, the calculated mean value (Np)
is equal 1.0, the calculated standard deviation(Npq)^{½} is 0.95, and
the calculated skewness [(1-2p)/(Npq)^{½}] was 0.84 which means that
the distribution is skewed to the right. For these 500 repetitions, the
experimental mean value obtained was 0.98, the standard deviation was 0.94, and
the experimental skewness was 0.80.

Fig.1

The sampling
distribution for n=10 is shown in fig.2. The calculated values are: m_{sd}
= 1.00,
s_{sd}
= 0.30,
and the experimental values are: m_{sd}
= 1.00,
s_{sd}
= 0.28,
and skewness = 0.09 that means it is not skewed. To compare, we have
included the points calculated using a normal distribution with m
= 1.00,
and
s=0.30.
To
compare the experimental distribution with the calculated one, we have
calculated the Chi-Squared value which is c^{2}
= 20.76, with 8 degrees of freedom that represents a Pvalue=0.008. So, it looks
like these two distributions are different.

Fig.2

The sampling distribution for n=20 is
shown in fig.3. The calculated values are: m_{sd}
= 1.00,
s_{sd}
= 0.21,
and the experimental values are: m_{sd}
= 0.99,
s_{sd}
= 0.20,
and skewness = 0.42 that means it is not very skewed. To compare, we have
included the points calculated using a normal distribution with m
= 1.00,
and
s=0.21.To
compare the experimental distribution with the calculated one, we have
calculated the Chi-Squared value which is c^{2}
= 26.46, with 11 degrees of freedom that represents a Pvalue=0.0055. It means, it
it looks like these distributions are different.

Fig 3

The sampling distribution for n=30 is
shown in fig.4. The calculated values are: m_{sd}
= 1.00,
s_{sd}
= 0.17,
and the experimental values are: m_{sd}
= 1.00,
s_{sd}
= 0.17,
and skewness = 0.08 that means it is not skewed at all. To compare, we have
included the points calculated using a normal distribution with m
= 1.00,
and
s=0.17.To
compare the experimental distribution with the calculated one, we have
calculated the Chi-Squared value which is c^{2}
= 18.96, with 11 degrees of freedom that represents a Pvalue=0.06. It means, it
is not quite clear if these distributions are the same or not!.

Fig 4

The sampling distribution for n=40 is
shown in fig.5. The calculated values are: m_{sd}
= 1.00,
s_{sd}
= 0.15,
and the experimental values are: m_{sd}
= 1.01,
s_{sd}
= 0.14,
and skewness = 0.17 that means it is not skewed at all. To compare, we have
included the points calculated using a normal distribution with m
= 1.00,
and
s=0.15.To
compare the experimental distribution with the calculated one, we have
calculated the Chi-Squared value which is c^{2}
= 12.66, with 8 degrees of freedom that represents a Pvalue=0.12. It means, it
looks like these distributions are similar.

Fig 5

The sampling distribution for n=50 is
shown in fig.6. The calculated values are: m_{sd}
= 1.00,
s_{sd}
= 0.13,
and the experimental values are: m_{sd}
= 1.00,
s_{sd}
= 0.12,
and skewness = -0.08 that means it is not skewed at all. To compare, we have
included the points calculated using a normal distribution with m
= 1.00,
and
s=0.13.To
compare the experimental distribution with the calculated one, we have
calculated the Chi-Squared value which is c^{2}
= 10.49, with 7 degrees of freedom that represents a Pvalue=0.16. It means, it
looks like both distributions are not different.

Fig 6

The sampling distribution for n=100 is
shown in fig.7. The calculated values are: m_{sd}
= 1.00,
s_{sd}
= 0.09,
and the experimental values are: m_{sd}
= 1.00,
s_{sd}
= 0.09,
and skewness = -0.09 that means it is not skewed at all. We have
included the points calculated using a normal distribution with m
= 1.00,
and
s=0.09.
To
compare the experimental distribution with the calculated one, we have
calculated the Chi-Squared value which is c^{2}
= 7.56, with 5 degrees of freedom that represents a Pvalue=0.18. It means, it
looks like both distributions are not different.

Fig 7

If we make a graph showing the dependence of the P-value vs the sample dimension (n), we can see why most of the statistics books use the value n=30 as a criteria to decide if you can apply the inference methods or not (see Fig.8). For n values smaller than 30, the P-value is smaller than 0.05 making very small the probability of getting a normal distribution if the original distribution is not symmetric.

Fig.8