What effect does increasing the sample size have on a distribution of sample means

Distribution of Normal Means with Different Sample Sizes

Initializing live version

What effect does increasing the sample size have on a distribution of sample means

Samples of a given size were taken from a normal distribution with mean 52 and standard deviation 14. The distribution of sample means for samples of size 16 (in blue) does not change but acts as a reference to show how the other curve (in red) changes as you move the slider to change the sample size. Distributions of sample means from a normal distribution change with the sample size. This Demonstration lets you see how the distribution of the means changes as the sample size increases or decreases.


Snapshots


Details

The population mean of the distribution of sample means is the same as the population mean of the distribution being sampled from. Thus the mean of the distribution of the means never changes. The standard deviation of the sample means, however, is the population standard deviation from the original distribution divided by the square root of the sample size. Thus as the sample size increases, the standard deviation of the means decreases; and as the sample size decreases, the standard deviation of the sample means increases.

Reference:

Michael Sullivan, Fundamentals of Statistics, Upper Saddle River, NJ: Pearson Education, Inc., 2008 pp. 382–383.



The central limit theorem states that the sampling distribution of the mean approaches a normal distribution, as the sample size increases. This fact holds especially true for sample sizes over 30.

Therefore, as a sample size increases, the sample mean and standard deviation will be closer in value to the population mean μ and standard deviation σ .

The central limit theorem tells us that no matter what the distribution of the population is, the shape of the sampling distribution will approach normality as the sample size (N) increases.

This is useful, as the research never knows which mean in the sampling distribution is the same as the population mean, but by selecting many random samples from a population the sample means will cluster together, allowing the research to make a very good estimate of the population mean.

Thus, as the sample size (N) increases the sampling error will decrease.

• As the sample size increases, the distribution of frequencies approximates a bell-shaped curved (i.e. normal distribution curve).

• Sample size equal to or greater than 30 are required for the central limit theorem to hold true.

• A sufficiently large sample can predict the parameters of a population such as the mean and standard deviation.

How to reference this article:

How to reference this article:

McLeod, S. A. (2019, Nov 25). What is central limit theorem in statistics? Simply psychology: https://www.simplypsychology.org/central-limit-theorem.html

How to reference this article:

How to reference this article:

McLeod, S. A. (2019, November 25). What is central limit theorem in statistics? Simply Psychology. www.simplypsychology.org/central-limit-theorem.html

One of the key considerations when designing a experiment is the sample size. In general, decreasing the sample size will decrease the accuracy of the results. But what effect does decreasing the sample size have on the distribution of sample means?

Checkout this video:

Introduction

As the sample size decreases, the spread of the distribution of sample means decreases. The distribution becomes more concentrated around the mean.

Theoretical Results

As the sample size decreases, the distribution of sample means approaches a normal distribution. This is due to the Central Limit Theorem, which states that the distribution of the sample means will be normal, regardless of the distribution of the population. Let’s take a look at how this works.

The Central Limit Theorem

The Central Limit Theorem is one of the most important results in statistics. It tells us that, under certain conditions, the distribution of the sum (or average) of a large number of independent random variables will be approximately normally distributed, regardless of the underlying distribution of the individual random variables.

This result is known as the “weak law of large numbers,” and it is the foundation for many powerful statistical methods, including the analysis of means, hypothesis testing, and inference for proportions.

To understand the Central Limit Theorem, it helps to first consider what happens to the distribution of a single random variable as we take more and more samples. If we are sampling from a population with a known mean and variance, then we know that the distribution of the sample means will be Normally distributed with a mean equal to the population mean and a variance equal to the population variance divided by the sample size.

As we take more and more samples, however, something strange happens: even if we are sampling from a population with a non-normal distribution, the distribution of the sample means begins to look more and more like a Normal distribution! This phenomenon is illustrated in the figure below:

![alt text](https://i.imgur.com/ CentralLimitTheorem.png)

As you can see in this figure, as we take more and more samples (n), the distribution of the sample means approaches a Normal distribution with a mean equal to the population mean (μ) and a variance equal to the population variance (σ2) divided by n. In other words, regardless of whether or not the underlying population is Normally distributed, as n gets larger and larger, our sampling distributions will look increasingly like Normal distributions!

The Standard Error of the Mean

If you take a random sample of size 100 from a population, the mean of that sample will not be exactly equal to the mean of the population. If you take another random sample of size 100 from the same population, the second sample may have a different mean. In fact, if you took many such samples and calculated the means of each, they would all be slightly different. The distribution of these means is called the sampling distribution of the mean. It can be shown that this sampling distribution has certain properties. One important property is that its standard deviation is equal to:

standard error = standard deviation / square root(sample size)

This formula is very important because it tells us how much variation we can expect in the means of our samples if we keep everything else constant. For example, suppose we want to know how much difference there is between the average heights of men and women in our population. We could take a random sample of men and women and calculate their respective means. Even if men and women have different average heights in our population, it’s unlikely that the first sample we took would have exactly equal means. But if we took many samples (each consisting of a random selection of men and women), then calculated the means for each sample, we would expect that most (if not all) of those means would fall fairly close to each other. In other words, there would be little variation in the means from one sample to another. On the other hand, if we took only one sample consisting of just 10 men and 10 women, then it’s quite possible that the mean height for men in our sample would be very different from the mean height for women in our sample (even though men and women have equal average heights in our population). This is because there is more variation in a small sample than there is in a large sample. The standard error formula tells us how much variation to expect in the means of our samples; the smaller the standard error, the less variation we should expect.

Simulation Results

A smaller sample size will cause the distribution of sample means to be more dispersed. This can be seen in the simulation results below. As the sample size decreases, the spread of the distribution increases.

Varying the Sample Size

As the sample size decreases, the distribution of the sample means becomes more spread out. This is because there is more variability in the estimates when there are fewer data points. In general, as the sample size increases, the variability of the estimates decreases.

Varying the Population Standard Deviation

The population standard deviation (σ) is a measure of how spread out the values are in a population. The larger the standard deviation, the more the values are spread out. For example, suppose we have a population with μ = 100 and σ = 10. The distribution of values would be fairly tight, with most of the values falling between 90 and 110:

Now, let’s say we decrease the standard deviation to σ = 5. The distribution of values would be even tighter, with most of the values falling between 95 and 105:

As you can see, when the population standard deviation is smaller, the distribution of sample means will be tighter (less spread out).

Conclusion

A smaller sample size will produce a narrower distribution of sample means. This is because there is less variability in the data when there is a smaller number of data points. When the sample size is increased, the distribution of sample means becomes more spread out because there is more variability in the data.

How does increasing sample size affect sample mean?

As sample sizes increase, the sampling distributions approach a normal distribution. With "infinite" numbers of successive random samples, the mean of the sampling distribution is equal to the population mean (µ).

What effect does increasing the sample sizes have on the center of the distribution?

We have already seen that as the sample size increases the sampling distribution becomes closer and closer to the normal distribution. As this happens, the standard deviation of the sampling distribution changes in another way; the standard deviation decreases as n increases.

What is the effect of increasing sample size on the sampling distribution and what does this mean in terms of the central limit theorem?

Properties of the Central Limit Theorem As the sample size increases, the sampling distribution converges on a normal distribution where the mean equals the population mean, and the standard deviation equals σ/√n.