The probability of not committing a Type II error is called the power of a hypothesis test. Show
Effect SizeTo compute the power of the test, one offers an alternative view about the "true" value of the population parameter, assuming that the null hypothesis is false. The effect size is the difference between the true value and the value specified in the null hypothesis. Effect size = True value - Hypothesized value For example, suppose the null hypothesis states that a population mean is equal to 100. A researcher might ask: What is the probability of rejecting the null hypothesis if the true population mean is equal to 90? In this example, the effect size would be 90 - 100, which equals -10. Factors That Affect PowerThe power of a hypothesis test is affected by three factors.
Test Your UnderstandingProblem 1 Other things being equal, which of the following actions will reduce the power of a hypothesis test? I. Increasing sample size. (A) I only Solution The correct answer is (C). Increasing sample size makes the hypothesis test more sensitive - more likely to reject the null hypothesis when it is, in fact, false. Changing the significance level from 0.01 to 0.05 makes the region of acceptance smaller, which makes the hypothesis test more likely to reject the null hypothesis, thus increasing the power of the test. Since, by definition, power is equal to one minus beta, the power of a test will get smaller as beta gets bigger. Problem 2 Suppose a researcher conducts an experiment to test a hypothesis. If she doubles her sample size, which of the following will increase? I. The power of the hypothesis test. (A) I only Solution The correct answer is (A). Increasing sample size makes the hypothesis test more sensitive - more likely to reject the null hypothesis when it is, in fact, false. Thus, it increases the power of the test. The effect size is not affected by sample size. And the probability of making a Type II error gets smaller, not bigger, as sample size increases.
4. This is consistent with the system of justice in the USA, in which a defendant is assumed innocent until proven guilty beyond a reasonable doubt; proving the defendant guilty beyond a reasonable doubt is analogous to providing evidence that would be very unusual if the null hypothesis is true. 5. There are (at least) two reasons why this is important. First, the significance level desired is one criterion in deciding on an appropriate sample size. (See Power for more information.) Second, if more than one hypothesis test is planned, additional considerations need to be taken into account. (See Multiple Inference for more information.) 6. The answer to this may well depend on the seriousness of the punishment and the seriousness of the crime. For example, if the punishment is death, a Type I error is extremely serious. Also, if a Type I error results in a criminal going free as well as an innocent person being punished, then it is more serious than a Type II error. Back to the Table of ContentsApplied Statistics - Lesson 11Lesson Overview
The role of sample size in the power of a statistical test must be considered before we go on to advanced statistical procedures such as analysis of variance/covariance and regression analysis. One can select a power and determine an appropriate sample size beforehand or do power analysis afterwards. However, power analysis is beyond the scope of this course and predetermining sample size is best. Sample Size Importance
Although crucial, the simple question of sample size has no definite answer due to the many factors involved. We expect large samples to give more reliable results and small samples to often leave the null hypothesis unchallenged. Large samples may be justified and appropriate when the difference sought is small and the population variance large. Established statistical procedures help ensure appropriate sample sizes so that we reject the null hypothesis not only because of statistical significance, but also because of practical importance. These procedures must consider the size of the type I and type II errors as well as the population variance and the size of the effect. The probability of committing a type I error is the same as our level of significance, commonly, 0.05 or 0.01, called alpha, and represents our willingness of rejecting a true null hypothesis. This might also be termed a false negative—a negative pregnancy test when a woman is in fact pregnant. The probability of committing a type II error or beta (ß) represents not rejecting a false null hypothesis or false positive—a positive pregnancy test when a woman is not pregnant. Ideally both types of error are minimized. The power of any test is 1 - ß, since rejecting the false null hypothesis is our goal. Power of a Statistical Test
Unfortunately, the process for determining 1 - ß or power is not as straightforward as that for calculating alpha. Specifically, we need a specific value for both the alternative hypothesis and the null hypothesis since there is a different value of ß for each different value of the alternative hypothesis. Fortunately, if we minimize ß (type II errors), we maximize 1 - ß (power). However, if alpha is increased, ß decreases. Alpha is generally established before-hand: 0.05 or 0.01, perhaps 0.001 for medical studies, or even 0.10 for behavioral science research. The larger alpha values result in a smaller probability of committing a type II error which thus increases the power. Example: Suppose we have 100 freshman IQ scores which we want to test a null hypothesis that their one sample mean is 110 in a one-tailed z-test with alpha=0.05. We will find the power = 1 - ß for the specific alternative hypothesis of IQ>115. The basic factors which affect power are the directional nature of the alternative hypothesis (number of tails); the level of significance (alpha); n (sample size); and the effect size (ES). We will consider each in turn. Example: Suppose we change the example above from a one-tailed to a two-tailed test.
Example: Suppose we instead change the first example from alpha=0.05 to alpha=0.01.
Since a larger value for alpha corresponds with a small confidence level, we need to be clear we are referred strictly to the magnitude of alpha and not the increased confidence we might associate with a smaller value! Example: Suppose we instead change the first example from n = 100 to n = 196.
For comparison we will summarize our results:
When one reads down the columns in the table above, we show the affect of the number of tails, the value of alpha, and the size of our sample on power. When one reads across the table above we see how effect size affects power.
We should note, however, that effect size appears in the table above as a specific difference (2, 5, 8 for 112, 115, 118, respectively) and not as a standardized difference. These correspond to standardized effect sizes of 2/15=0.13
Hinkle, page 312, in a footnote, notes that for small sample sizes (n < 50) and situations where the sampling distribution is the t distribution, the noncentral t distribution should be associated with Ha and the power calculation. Formulas and tables are available or any good statistical package should use such. Sample Size CalculationsIt is considered best to determine the desired power before establishing sample size rather than after.
Using this criterion, we can see how in the examples above our sample size was insufficient to supply adequate power in all cases for IQ = 112 where the effect size was only 1.33 We now have the tools to calculate sample size. We start with the formula z = ES/( Example: Find z for alpha=0.05 and a one-tailed test. Example: For an effect size (ES) above of 5 and alpha, beta, and tails as given in the example above, calculate the necessary sample size. Note: it is usual and customary to round the sample size up to the next whole number. Thus pi=3.14... would round up to 4. Example: Find the minimum sample size needed for alpha=0.05, ES=5, and two tails for the examples above.
Recalling the pervasive joke of knowing the population variance, it should be obvious that we still haven't fulfilled our goal of establishing an appropriate sample size. There are two common ways around this problem. First, it is acceptable to use a variance found in the appropriate research literature to determine an appropriate sample size. Second, it is also common to express the effect size in terms of the standard deviation instead of as a specific difference. Since effect size and standard deviation both appear in the sample size formula, the formula simplies. Tables to help determine appropriate sample size are commonly available. Such tables not only address the one- and two-sample cases, but also cases where there are more than two samples. Since more than one treatment (i.e. sample) is common and additional treatments may reduce the effect size needed to qualify as "large," the question of appropriate effect size can be more important than that of power or sample size. That question is answered through the informed judgment of the researcher, the research literature, the research design, and the research results. We have thus shown the complexity of the question and how sample size relates to alpha, power, and effect size.
Does sample size affect null hypothesis?When we increase the sample size, decrease the standard error, or increase the difference between the sample statistic and hypothesized parameter, the p value decreases, thus making it more likely that we reject the null hypothesis.
What happens when reject null hypothesis is true?If we reject the null hypothesis when it is true, then we made a type I error. If the null hypothesis is false and we failed to reject it, we made another error called a Type II error.
Does rejecting the null hypothesis mean the alternative hypothesis is true?Rejecting or failing to reject the null hypothesis
If our statistical analysis shows that the significance level is below the cut-off value we have set (e.g., either 0.05 or 0.01), we reject the null hypothesis and accept the alternative hypothesis.
How do you know when to reject the alternative hypothesis?After you perform a hypothesis test, there are only two possible outcomes.. When your p-value is less than or equal to your significance level, you reject the null hypothesis. The data favors the alternative hypothesis. ... . When your p-value is greater than your significance level, you fail to reject the null hypothesis.. |