EXPERIMENTS ABOUT THE CENTRAL LIMIT THEOREM (CLT)

The CLT plays an important role in statistics and theory of probabilities. Essentially, what the CLT states is that if you take the mean value (X) of many samples of dimension n, from a distribution that could be symmetric, or not symmetric, and if N is big enough, then  the distribution of these mean values ( this distribution is callled Sampling Distribution) will be a Normal distribution with mean value (msd) equal to the mean value of the original distribution (m), and standard deviation (ssd) equal to the standard deviation of the original distribution (s) divided by the square root of N.

msd = m

ssd  = s/(n½)

In these experiments we will study  the effect of changing the dimension (n) of the samples on the conclusions of the CLT.

We selected samples from a binomial distribution with probability p=0.1, so particularly for small values of N, the distribution should be skewed to the right. Observe that this N IS NOT the same n we use when selecting the dimension of the sample. Each experiment was repeated 500 times (These 500 repetitions have nothing to do with N or n!!). As an example of the original distribution, in fig.1 we show 500 values with N=10. In this case, the calculated mean value (Np) is equal 1.0, the calculated standard deviation(Npq)½ is 0.95, and the calculated skewness [(1-2p)/(Npq)½] was 0.84 which means that the distribution is skewed to the right. For these 500 repetitions, the experimental mean value obtained was 0.98, the standard deviation was 0.94, and the experimental skewness was 0.80.

Fig.1

The sampling distribution for n=10 is shown in fig.2. The calculated values are: msd = 1.00, ssd  = 0.30, and the experimental values are: msd = 1.00, ssd  = 0.28, and skewness = 0.09 that means it is not skewed. To compare, we have included the points calculated using a normal distribution with m = 1.00,  and s=0.30. To compare the experimental distribution with the calculated one, we have calculated the Chi-Squared value which is c2 = 20.76, with 8 degrees of freedom that represents a Pvalue=0.008. So, it looks like these two distributions are different.

Fig.2

The sampling distribution for n=20 is shown in fig.3. The calculated values are: msd = 1.00, ssd  = 0.21, and the experimental values are: msd = 0.99, ssd  = 0.20, and skewness = 0.42 that means it is not very skewed. To compare, we have included the points calculated using a normal distribution with m = 1.00,  and s=0.21.To compare the experimental distribution with the calculated one, we have calculated the Chi-Squared value which is c2 = 26.46, with 11 degrees of freedom that represents a Pvalue=0.0055. It means, it it looks like these distributions are different.

Fig 3

The sampling distribution for n=30 is shown in fig.4. The calculated values are: msd = 1.00, ssd  = 0.17, and the experimental values are: msd = 1.00, ssd  = 0.17, and skewness = 0.08 that means it is not  skewed at all. To compare, we have included the points calculated using a normal distribution with m = 1.00,  and s=0.17.To compare the experimental distribution with the calculated one, we have calculated the Chi-Squared value which is c2 = 18.96, with 11 degrees of freedom that represents a Pvalue=0.06. It means, it is not quite clear if these distributions are the same or not!.

Fig 4

The sampling distribution for n=40 is shown in fig.5. The calculated values are: msd = 1.00, ssd  = 0.15, and the experimental values are: msd = 1.01, ssd  = 0.14, and skewness = 0.17 that means it is not  skewed at all. To compare, we have included the points calculated using a normal distribution with m = 1.00,  and s=0.15.To compare the experimental distribution with the calculated one, we have calculated the Chi-Squared value which is c2 = 12.66, with 8 degrees of freedom that represents a Pvalue=0.12. It means, it looks like these distributions are  similar.

Fig 5

The sampling distribution for n=50 is shown in fig.6. The calculated values are: msd = 1.00, ssd  = 0.13, and the experimental values are: msd = 1.00, ssd  = 0.12, and skewness = -0.08 that means it is not  skewed at all. To compare, we have included the points calculated using a normal distribution with m = 1.00,  and s=0.13.To compare the experimental distribution with the calculated one, we have calculated the Chi-Squared value which is c2 = 10.49, with 7 degrees of freedom that represents a Pvalue=0.16. It means, it looks like both distributions are not different.

Fig 6

The sampling distribution for n=100 is shown in fig.7. The calculated values are: msd = 1.00, ssd  = 0.09, and the experimental values are: msd = 1.00, ssd  = 0.09, and skewness = -0.09 that means it is not  skewed at all. We have included the points calculated using a normal distribution with m = 1.00,  and s=0.09. To compare the experimental distribution with the calculated one, we have calculated the Chi-Squared value which is c2 = 7.56, with 5 degrees of freedom that represents a Pvalue=0.18. It means, it looks like both distributions are not different.

Fig 7

If we make a graph showing the dependence of the P-value vs the sample dimension (n), we can see why most of the statistics books use the value n=30 as a criteria to decide if you can apply the inference methods or not (see Fig.8). For n values smaller than 30, the P-value is smaller than 0.05 making very small the probability of getting a normal distribution if the original distribution is not symmetric.

Fig.8

BACK