What values are necessary to calculate a confidence interval for a proportion?

We have focussed so far on 95% confidence intervals, which is the confidence level that is used most commonly. The general form of an approximate \(C\%\) confidence interval for a population proportion is

\[ \hat{p} \pm z \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}, \]

where the value of \(z\) is appropriate for the confidence level. For a 95% confidence interval, we use \(z=1.96\), while for a 90% confidence interval, for example, we use \(z=1.64\).

In general, for a \(C\%\) confidence interval, we need to find the value of \(z\) that satisfies

\[ \Pr(-z < Z < z) = \dfrac{C}{100}, \quad\text{where } Z \stackrel{\mathrm{d}}{=} \mathrm{N}(0,1). \]

figure 21 shows the required value of \(z\) as a function of the confidence level.

What values are necessary to calculate a confidence interval for a proportion?

Figure 21: The relationship between the confidence level and the value of \(z\) in the formula for an approximate confidence interval.

The following figure is a repeat of figure 13. It shows confidence intervals based on the same estimated proportion, but with different confidence levels. The larger confidence levels lead to wider confidence intervals.

What values are necessary to calculate a confidence interval for a proportion?

Figure 22: Confidence intervals from the same data, but with different confidence levels.

The distance from the sample estimate \(\hat{p}\) to the endpoints of the confidence interval is

\[ E = z \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}. \]

The quantity \(E\) is referred to as the margin of error. The margin of error is half the width of the confidence interval. Sometimes confidence intervals are reported as \(\hat{p} \pm E\); this means the bounds of the interval are not directly stated, but must be calculated.

We have seen in figure 22 that the margin of error is larger when the confidence level is larger. This is because the value of \(z\) from the standard Normal distribution will be larger when the confidence level is larger.

Example: Mobile-phone use among children

Continuing with the mobile-phone example, consider the 12–14 age group. Calculate an approximate 90% confidence interval for the true proportion of mobile-phone owners in this group.

Solution

From the table in the initial mobile-phone example, we have \(\hat{p} = \dfrac{918}{1250} = 0.734\). For a 90% confidence interval, we use \(z=1.64\), and so the margin of error is

\[ 1.64 \sqrt{\dfrac{0.734(1-0.734)}{1250}} = 0.0205. \]

Hence, the 90% confidence interval is \(0.734 \pm 0.0205\), or \((0.714, 0.755)\). In percentage terms, the confidence interval is \(71.4\%\) to \(75.5\%\).

Exercise 6

Consider Casey's sample of Venus bars from exercise 5. He obtained a random sample of 180 wrappers, and found that 20 were winners. Rather than a 95% confidence interval for the true proportion of winning wrappers, consider a 99% confidence interval.

  1. Without calculating the 99% confidence interval, guess the lower and upper bounds.
  2. Find the value of the factor \(z\) from the standard Normal distribution for a 99% confidence interval (if necessary, by reading it off the graph in figure 21). Consider the ratio of the values of \(z\) for the 99% and 95% confidence intervals, and estimate the lower and upper bounds of the 99% confidence interval.
  3. Calculate the approximate 99% confidence interval for the true proportion of winning wrappers, based on Casey's sample of Venus bars.
  4. Consider Casey's suspicion that the true proportion of winners is not one in six. Comment on this, based on the 99% confidence interval.

Maximum margin of error

In the module Binomial distribution, we noted that if \(X \stackrel{\mathrm{d}}{=} \mathrm{Bi}(n,p)\), then the variance is largest (for a given value of \(n\)) when \(p = \dfrac{1}{2}\), in which case \(\mathrm{var}(X) = n \times \dfrac{1}{2} \times \dfrac{1}{2} = \dfrac{n}{4}\).

This has the direct consequence that, when estimating \(p\), the variance of \(\hat{P}\) is largest when \(p=0.5\), and is equal to \(\dfrac{0.25}{n}\). This is perhaps slightly unfortunate for political polling in particular, since such surveys are quite often estimating a characteristic (such as political preference) which is present in about half of the population.

However, there is some good news for the pollsters: While they may be in the realm of least precise inferences, they know how bad it can get. For a random sample of size \(n\), the standard deviation of \(\hat{P}\) cannot be bigger than \(\dfrac{0.5}{\sqrt{n}}\), and hence the margin of error for a 95% confidence interval is at most \(1.96\,\dfrac{0.5}{\sqrt{n}} = \dfrac{0.98}{\sqrt{n}}\).

To make the reporting of such polls succinct, this fact is sometimes exploited. The report simply uses the maximum margin of error for the given sample size \(n\), knowing that this is conservative: the precision will be as claimed if the estimated proportion \(\hat{p}\) is 0.5 (the percentage is 50%), and better than claimed otherwise.

Exercise 7

A Nielsen Poll published on 17 February 2013 reported that, in a two-party vote, 56% of voters prefer the Coalition (44% prefer ALP). The report indicates that the approximate margin of error is at most \(2.6\%\).

  1. Based on this margin of error, find the 95% confidence interval for the true proportion of voters preferring the Coalition. (Assume the margin of error provided is for a 95% level of confidence.)
  2. Based on this margin of error, approximately how many voters were surveyed?
  3. Why might Nielsen provide a single (maximum) margin of error in reporting on a variety of different outcomes (two-party-preferred vote, approval of the Prime Minister, approval of the Opposition Leader)?
  4. The approval of the Prime Minister was reported to be 40%. Find a 95% confidence interval for the true approval of the Prime Minister, based on the margin of error provided. Will this confidence interval be conservative (wider than it should be) or not (narrower than it should be)? Explain why.

When to use the Normal approximation

A guideline for when to use the Normal approximation for a confidence interval for \(p\) was given in the previous section: both \(x\) and \(n-x\) should be greater than 10. These conditions are generally met for the illustrative data in figure 23, based on \(n=100\).

In contrast, the confidence intervals shown in figure 24 are based on data when \(n=10\); in no case is both \(x\) and \(n-x\) greater than 10. In four of the ten cases presented, the confidence intervals are nonsense: either the lower or the upper bound is outside the range 0 to 1. Of course, proportions must be in the range 0 to 1. This shows that these intervals are wrong, and it indicates that we should not trust the approximation when \(n\) is this small.

What values are necessary to calculate a confidence interval for a proportion?

Figure 23: Approximate 95% confidence intervals for various estimates \(\hat{p}\), with \(n=100\).

What values are necessary to calculate a confidence interval for a proportion?

Detailed description

Figure 24: Approximate 95% confidence intervals for various estimates \(\hat{p}\), with \(n=10\).

You may wonder whether it is possible to find a confidence interval for \(p\) when \(n\) is small, by avoiding the Normal approximation. The answer is that there is a method that uses only the binomial distribution, and does not approximate. It is beyond the scope of the curriculum.

Exercise 8

This exercise asks you to estimate \(\pi\), using a statistical approach. Suppose you were aware that the area of a circle is \(A = kr^2\), where \(r\) is the radius of the circle and \(k\) is a constant. Suppose you also knew that the equation for a circle centred at the origin is \(x^2 + y^2 = r^2\). So, you knew a lot about circles… but not the value of \(\pi\).

What are the conditions for a confidence interval for proportions?

There are three conditions we need to satisfy before we make a one-sample z-interval to estimate a population proportion. We need to satisfy the random, normal, and independence conditions for these confidence intervals to be valid.

Which three elements are necessary for calculating a confidence interval?

A confidence interval has three elements. First there is the interval itself, something like (123, 456). Second is the confidence level, something like 95%. Third there is the parameter being estimated, something like the population mean, μ or the population proportion, p.

What are the conditions to construct a 95% confidence interval for the proportion?

The 95% confidence interval for a proportion is: This formula is appropriate whenever there are at least 5 subjects with the outcome and at least 5 without the outcome. You should always use Z scores (not t-scores) to compute the confidence interval for a proportion.

What assumptions are required for a confidence interval for a single proportion?

Here are the six assumptions you should check when constructing a confidence interval:.
Assumption #1: Random Sampling. ... .
Assumption #2: Independence. ... .
Assumption #3: Large Sample. ... .
Assumption #4: The 10% Condition. ... .
Assumption #5: The Success / Failure Condition. ... .
Assumption #6: Homogeneity of Variances..