If an alternative hypothesis is true rejecting the null hypothesis decreases when the sample size is

The probability of not committing a Type II error is called the power of a hypothesis test.

Effect Size

To compute the power of the test, one offers an alternative view about the "true" value of the population parameter, assuming that the null hypothesis is false. The effect size is the difference between the true value and the value specified in the null hypothesis.

Effect size = True value - Hypothesized value

For example, suppose the null hypothesis states that a population mean is equal to 100. A researcher might ask: What is the probability of rejecting the null hypothesis if the true population mean is equal to 90? In this example, the effect size would be 90 - 100, which equals -10.

Factors That Affect Power

The power of a hypothesis test is affected by three factors.

  • Sample size (n). Other things being equal, the greater the sample size, the greater the power of the test.
  • Significance level (α). The lower the significance level, the lower the power of the test. If you reduce the significance level (e.g., from 0.05 to 0.01), the region of acceptance gets bigger. As a result, you are less likely to reject the null hypothesis. This means you are less likely to reject the null hypothesis when it is false, so you are more likely to make a Type II error. In short, the power of the test is reduced when you reduce the significance level; and vice versa.
  • The "true" value of the parameter being tested. The greater the difference between the "true" value of a parameter and the value specified in the null hypothesis, the greater the power of the test. That is, the greater the effect size, the greater the power of the test.

Test Your Understanding

Problem 1

Other things being equal, which of the following actions will reduce the power of a hypothesis test?

I. Increasing sample size.
II. Changing the significance level from 0.01 to 0.05.
III. Increasing beta, the probability of a Type II error.

(A) I only
(B) II only
(C) III only
(D) All of the above
(E) None of the above

Solution

The correct answer is (C). Increasing sample size makes the hypothesis test more sensitive - more likely to reject the null hypothesis when it is, in fact, false. Changing the significance level from 0.01 to 0.05 makes the region of acceptance smaller, which makes the hypothesis test more likely to reject the null hypothesis, thus increasing the power of the test. Since, by definition, power is equal to one minus beta, the power of a test will get smaller as beta gets bigger.

Problem 2

Suppose a researcher conducts an experiment to test a hypothesis. If she doubles her sample size, which of the following will increase?

I. The power of the hypothesis test.
II. The effect size of the hypothesis test.
III. The probability of making a Type II error.

(A) I only
(B) II only
(C) III only
(D) All of the above
(E) None of the above

Solution

The correct answer is (A). Increasing sample size makes the hypothesis test more sensitive - more likely to reject the null hypothesis when it is, in fact, false. Thus, it increases the power of the test. The effect size is not affected by sample size. And the probability of making a Type II error gets smaller, not bigger, as sample size increases.


3. This could be more than just an analogy: Consider a situation where the verdict hinges on statistical evidence (e.g., a DNA test), and where rejecting the null hypothesis would result in a verdict of guilty, and not rejecting the null hypothesis would result in a verdict of not  guilty.

4. This is consistent with the system of justice in the USA, in which a defendant is assumed innocent until proven guilty beyond a reasonable doubt; proving the defendant guilty beyond a reasonable doubt is analogous to providing evidence that would be very unusual if the null hypothesis is true.

5. There are (at least) two reasons why this is important. First, the significance level desired is one criterion in deciding on an appropriate sample size. (See Power  for more information.) Second, if more than one hypothesis test is planned, additional considerations need to be taken into account. (See Multiple Inference for more information.)

6. The answer to this may well depend on the seriousness of the punishment and the seriousness of the crime. For example, if the punishment is death, a Type I error is extremely serious. Also, if a Type I error results in a criminal going free as well as an innocent person being punished, then it is more serious than a Type II error.

Back to the Table of Contents

Applied Statistics - Lesson 11

Lesson Overview

  • Sample Size Importance
  • Power of a Statistical Test
  • Sample Size Calculations
  • Homework

The role of sample size in the power of a statistical test must be considered before we go on to advanced statistical procedures such as analysis of variance/covariance and regression analysis. One can select a power and determine an appropriate sample size beforehand or do power analysis afterwards. However, power analysis is beyond the scope of this course and predetermining sample size is best.

Sample Size Importance

An appropriate sample size is crucial to any well-planned research investigation.

Although crucial, the simple question of sample size has no definite answer due to the many factors involved. We expect large samples to give more reliable results and small samples to often leave the null hypothesis unchallenged. Large samples may be justified and appropriate when the difference sought is small and the population variance large. Established statistical procedures help ensure appropriate sample sizes so that we reject the null hypothesis not only because of statistical significance, but also because of practical importance. These procedures must consider the size of the type I and type II errors as well as the population variance and the size of the effect. The probability of committing a type I error is the same as our level of significance, commonly, 0.05 or 0.01, called alpha, and represents our willingness of rejecting a true null hypothesis. This might also be termed a false negative—a negative pregnancy test when a woman is in fact pregnant. The probability of committing a type II error or beta (ß) represents not rejecting a false null hypothesis or false positive—a positive pregnancy test when a woman is not pregnant. Ideally both types of error are minimized. The power of any test is 1 - ß, since rejecting the false null hypothesis is our goal.

Power of a Statistical Test

The power of any statistical test is 1 - ß.

Unfortunately, the process for determining 1 - ß or power is not as straightforward as that for calculating alpha. Specifically, we need a specific value for both the alternative hypothesis and the null hypothesis since there is a different value of ß for each different value of the alternative hypothesis. Fortunately, if we minimize ß (type II errors), we maximize 1 - ß (power). However, if alpha is increased, ß decreases. Alpha is generally established before-hand: 0.05 or 0.01, perhaps 0.001 for medical studies, or even 0.10 for behavioral science research. The larger alpha values result in a smaller probability of committing a type II error which thus increases the power.

Example: Suppose we have 100 freshman IQ scores which we want to test a null hypothesis that their one sample mean is 110 in a one-tailed z-test with alpha=0.05. We will find the power = 1 - ß for the specific alternative hypothesis of IQ>115.
Solution: Power is the area under the distribution of sampling means centered on 115 which is beyond the critical value for the distribution of sampling means centered on 110. More specifically, our critical z = 1.645 which corresponds with an IQ of 1.645 = (IQ - 110)/(15/sqrt(100)) or 112.47 defines a region on a sampling distribution centered on 115 which has an area to the right of z = -1.69 or 0.954. Note that we have more power against an IQ of 118 (z= -3.69 or 0.9999) and less power against an IQ of 112 (z = 0.31 or 0.378).

The basic factors which affect power are the directional nature of the alternative hypothesis (number of tails); the level of significance (alpha); n (sample size); and the effect size (ES). We will consider each in turn.

Example: Suppose we change the example above from a one-tailed to a two-tailed test.
Solution: We first note that our critical z = 1.96 instead of 1.645. There are now two regions to consider, one above 1.96 = (IQ - 110)/(15/sqrt(100)) or an IQ of 112.94 and one below an IQ of 107.06 corresponding with z = -1.96. Most of the area from the sampling distribution centered on 115 comes from above 112.94 (z = -1.37 or 0.915) with little coming from below 107.06 (z = -5.29 or 0.000) for a power of 0.915. For comparison, the power against an IQ of 118 (below z = -7.29 and above z = -3.37) is 0.9996 and 112 (below z = -3.29 and above z = 0.63) is 0.265.

One-tailed tests generally have more power.

Example: Suppose we instead change the first example from alpha=0.05 to alpha=0.01.
Solution: Our critical z = 2.236 which corresponds with an IQ of 113.35. The area is now bounded by z = -1.10 and has an area of 0.864. For comparison, the power against an IQ of 118 (above z = -3.10) is 0.999 and 112 (above z = 0.90) is 0.184.

"Increasing" alpha generally increases power.

Since a larger value for alpha corresponds with a small confidence level, we need to be clear we are referred strictly to the magnitude of alpha and not the increased confidence we might associate with a smaller value!

Example: Suppose we instead change the first example from n = 100 to n = 196.
Solution: Our critical z = 1.645 stays the same but our corresponding IQ = 111.76 is lower due to the smaller standard error (now 15/14 was 15/10). Our z = -3.02 gives power of 0.999. For comparison, the power against an IQ of 118 (above z = -5.82) is 1.000 and 112 (above z = -0.22) is 0.589.

Increasing sample size increases power.

For comparison we will summarize our results:

factors\Ha=112115118
1-tail, alpha=0.05, n = 100 0.378 0.954 0.9999
2-tail, alpha=0.05, n = 100 0.265 0.915 0.9996
1-tail, alpha=0.01, n = 100 0.184 0.864 0.9990
1-tail, alpha=0.05, n = 196 0.589 0.999 1.0000

When one reads down the columns in the table above, we show the affect of the number of tails, the value of alpha, and the size of our sample on power. When one reads across the table above we see how effect size affects power.

A statistical test generally has more power against larger effect size.

We should note, however, that effect size appears in the table above as a specific difference (2, 5, 8 for 112, 115, 118, respectively) and not as a standardized difference. These correspond to standardized effect sizes of 2/15=0.13

If an alternative hypothesis is true rejecting the null hypothesis decreases when the sample size is
, 5/15=0.33
If an alternative hypothesis is true rejecting the null hypothesis decreases when the sample size is
, and 8/15=0.53
If an alternative hypothesis is true rejecting the null hypothesis decreases when the sample size is
.

The process of determining the power of the statistical test for a two-sample case
is identical to that of a one-sample case. Exactly the same factors apply.

Hinkle, page 312, in a footnote, notes that for small sample sizes (n < 50) and situations where the sampling distribution is the t distribution, the noncentral t distribution should be associated with Ha and the power calculation. Formulas and tables are available or any good statistical package should use such.

Sample Size Calculations

It is considered best to determine the desired power before establishing sample size rather than after.
Some behavioral science researchers have suggested that Type I errors are more serious than
Type II errors and a 4:1 ratio of ß to alpha can be used to establish a desired power of 0.80.

Using this criterion, we can see how in the examples above our sample size was insufficient to supply adequate power in all cases for IQ = 112 where the effect size was only 1.33

If an alternative hypothesis is true rejecting the null hypothesis decreases when the sample size is
(for n = 100) or 1.87
If an alternative hypothesis is true rejecting the null hypothesis decreases when the sample size is
.

We now have the tools to calculate sample size. We start with the formula z = ES/(

If an alternative hypothesis is true rejecting the null hypothesis decreases when the sample size is
/
If an alternative hypothesis is true rejecting the null hypothesis decreases when the sample size is
n) and solve for n. The z used is the sum of the critical values from the two sampling distribution. This will depend on alpha and beta.

Example: Find z for alpha=0.05 and a one-tailed test.
Solution: We would use 1.645 and might use -0.842 (for a ß = 0.20 or power of 0.80).

Example: For an effect size (ES) above of 5 and alpha, beta, and tails as given in the example above, calculate the necessary sample size.
Solution: Solving the equation above results in n =

If an alternative hypothesis is true rejecting the null hypothesis decreases when the sample size is
2 • z2/(ES)2 = 152 • 2.4872 / 52 = 55.7 or 56. Thus in the first example, a sample size of only 56 would give us a power of 0.80.

Note: it is usual and customary to round the sample size up to the next whole number. Thus pi=3.14... would round up to 4.

Example: Find the minimum sample size needed for alpha=0.05, ES=5, and two tails for the examples above.
Solution: The necessary z values are 1.96 and -0.842 (again)---we can generally ignore the miniscule region associated with one of the tails, in this case the left. The same formula applies and we obtain: n = 225 • 2.8022 / 25 = 70.66 or 71.

For a given effect size, alpha, and power, a larger sample size is required
for a two-tailed test than for a one-tailed test.

Recalling the pervasive joke of knowing the population variance, it should be obvious that we still haven't fulfilled our goal of establishing an appropriate sample size. There are two common ways around this problem. First, it is acceptable to use a variance found in the appropriate research literature to determine an appropriate sample size. Second, it is also common to express the effect size in terms of the standard deviation instead of as a specific difference. Since effect size and standard deviation both appear in the sample size formula, the formula simplies.

Tables to help determine appropriate sample size are commonly available. Such tables not only address the one- and two-sample cases, but also cases where there are more than two samples. Since more than one treatment (i.e. sample) is common and additional treatments may reduce the effect size needed to qualify as "large," the question of appropriate effect size can be more important than that of power or sample size. That question is answered through the informed judgment of the researcher, the research literature, the research design, and the research results.

We have thus shown the complexity of the question and how sample size relates to alpha, power, and effect size.

Effect size, power, alpha, and number of tails all influence sample size.
BACKHOMEWORKACTIVITYCONTINUE

  • e-mail:
  • voice/mail: 269 471-6629/ BCM&S Smith Hall 106; Andrews University; Berrien Springs,
  • classroom: 269 471-6646; Smith Hall 100/FAX: 269 471-3713; MI, 49104-0140
  • home: 269 473-2572; 610 N. Main St.; Berrien Springs, MI 49103-1013
  • URL: http://www.andrews.edu/~calkins/math/edrm611/edrm11.htm
  • Copyright ©2005, Keith G. Calkins. Revised on or after July 28, 2005.

Does sample size affect null hypothesis?

When we increase the sample size, decrease the standard error, or increase the difference between the sample statistic and hypothesized parameter, the p value decreases, thus making it more likely that we reject the null hypothesis.

What happens when reject null hypothesis is true?

If we reject the null hypothesis when it is true, then we made a type I error. If the null hypothesis is false and we failed to reject it, we made another error called a Type II error.

Does rejecting the null hypothesis mean the alternative hypothesis is true?

Rejecting or failing to reject the null hypothesis If our statistical analysis shows that the significance level is below the cut-off value we have set (e.g., either 0.05 or 0.01), we reject the null hypothesis and accept the alternative hypothesis.

How do you know when to reject the alternative hypothesis?

After you perform a hypothesis test, there are only two possible outcomes..
When your p-value is less than or equal to your significance level, you reject the null hypothesis. The data favors the alternative hypothesis. ... .
When your p-value is greater than your significance level, you fail to reject the null hypothesis..