If n 15 and the population standard deviation is not known what is the appropriate distribution

Hypothesis Tests for One Population Mean when Sigma is Unknown


You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License. Please cite as follow: Hartmann, K., Krois, J., Waske, B. (2018): E-Learning Project SOGA: Statistics and Geospatial Data Analysis. Department of Earth Sciences, Freie Universitaet Berlin.

S

Fred E. Szabo PhD, in The Linear Algebra Survival Guide, 2015

Standard Deviation of a Numerical Vector

The population standard deviation a measures the spread of a vector in ℝn. It is defined to be the square root of the population variance of the vector. The sample standard deviation s is defined to the square root of the sample variance of the vector.

Illustration

The population standard deviation of a vector in ℝ6

x = {2, 6, 3, 1, 8, 9}; mx = Mean [x];

σ=SqrtTotal16Tablexi‐mx2,i,1,6

3296

The sample standard deviation of a vector in ℝ6

x = {2, 6, 3, 1, 8, 9}; mx = Mean [x];

s=SqrtTotal15Tablexi‐mx2,i,1,6

32930

The Mathematica StandardDeviation function computes the sample standard deviation of a vector.

s == StandardDeviation[x]

True

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780124095205500266

Hypothesis Testing

Gary Smith, in Essential Statistics, Regression, and Econometrics (Second Edition), 2015

Using an Estimated Standard Deviation

Our initial calculations assumed that the population standard deviation is known to be 8.53. However, this is an estimate based on sample data. The preceding chapter explains how the t distribution can be used in place of the normal distribution when the sample standard deviation S is used in place of the population standard deviation σ. Here, instead of the Z statistic in Eqn (7.2), we use the t statistic:

(7.3)t=X¯−μS/n

The calculated t value is identical to the Z value we calculated previously:

t=25.53−358.53/203= −15.82

The difference is that, instead of using the normal distribution to calculate the P value, we use the t distribution with n − 1 = 203 − 1 = 202 degrees of freedom. The probability works out to be:

P[t≤−15.82]=3.23×10−37

so that the two-sided P value is 2(3.23 × 10−37) = 6.46 × 10−37.

A P value calculated from the t distribution is exactly correct if the data come from a normal distribution and, because of the power of the central limit theorem, is an excellent approximation if we have at least 15 observations from a generally symmetrical distribution or at least 30 observations from a very asymmetrical distribution. In our poker example, the P value is minuscule and is strong evidence against the null hypothesis that the population mean is 35.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780128034590000078

Tests on Means of Continuous Data

R.H. Riffenburgh, in Statistics in Medicine (Third Edition), 2012

The Normal Test and the t Test Are Two Forms of the Two-Sample Means Test

Two forms are addressed in this section: the case of known population standard deviations, or samples large enough that the sample standard deviations are not practically different from known, and the case of small-sample estimated standard deviations. The means test uses a standard normal distribution (z distribution) in the first case and a t distribution in the second.

(Review: The means are assumed normal. Standardizing places a standard deviation in the denominator. If the standard deviation is known, it behaves as a constant and the normal distribution remains. If the standard deviation is estimated from the sample, it follows a probability distribution and the ratio of the numerator’s normal distribution to this distribution turns out to be t.)

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780123848642000123

Computational Statistics with R

Marepalli B. Rao, Subramanyam Kasala, in Handbook of Statistics, 2014

5.1 Specifications

For sample size calculations, specify level, power, alternative population means, and within population standard deviationσ. The between population variance is defined by

Between populations variance σpop2 =  ∑i=1kμi−μ¯2 k, where μ¯=1k∑i=1k μi = average of the population means.

In the context of multiple populations, the effect size is defined by

Effectsize=Δ=σpopσ=bet.pop.SD/within.pop.SD

If the number of populations is k = 2, the effect size works out to be μ1-μ22*σ , which is different from μ1-μ2σ.

In engineering terminology, σ is the noise present in the populations and σpop is a measure of the signal. There are so many signals associated with different populations we need to measure how close they are. The standard deviation σpop of these signals is a measure how close they are. If they are very close, one needs a large sample to detect the differences. If the means are very close, the alternative and null hypotheses are very close. Let us find out how the effect size shapes in some examples.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B978044463431300005X

Confidence Intervals and Probability

ROBERT H. RIFFENBURGH, in Statistics in Medicine (Second Edition), 2006

EXAMPLE

Using the 10 prostate volumes from Table DB1.1, what is a 95% confidence interval on the population standard deviation, σ? We have calculated s = 15.92 ml, which has n − 1 = 9 df. We know the probability distribution of s2 but not of s; therefore, we shall find the interval on the population variance, σ2, and take square roots: s2 = 15.922 = 253.4464. Looking in Table III or Table 4.3 under right tail area = 0.025 for 9 df, we find 19.02. Similarly in Table IV, under left tail area = 0.025 for 9 df, we find 2.70. Substituting these values in the formula for confidence limits on a variance [see Eq. (4.7)], we find:

P[s2−df/χR2<σ2<s2×df/χL2] =P[253.45×9/19.02<σ2<253.45×9/2.70]=P[1119.93<σ2<844.83]=0.95.

Taking square roots within the brackets, we obtain

P[10.95<σ<29.07]=0.95.

We note from the first table of DBl (showing means and standard deviations) that the population standard deviation (excluding known BPH to make the distribution more symmetric) is 16.35 ml, falling well within the confidence interval.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780120887705500435

Confidence Intervals

R.H. Riffenburgh, in Statistics in Medicine (Third Edition), 2012

Example Posed: CI on Standard Deviation of Prostate Volumes

Using the 10 prostate volumes from Table DB1.1, what is a 95% confidence interval on the population standard deviationσ? We have calculated s=15.92 ml which has n−1=9 df. We know the probability distribution of s2 but not of s, so we shall find the interval on the population variance, σ2, and take square roots. s2=15.922=253.4464. How do we express a confidence interval?

Method for Confidence on σ2 or σ

We know (from Section 4.7) that sample variance s2 drawn randomly from a normal population is distributed as χ2×σ2/df. A confidence-type statement on σ2 from this relationship, with the additional twist that the asymmetric distribution requires the chi-square values excluding 1−α/2 in each tail to be found separately, is given by

(7.12)P[s2×df/χ2R<σ2<s 2×df/χ2L]=1-α,

where χ2R is the critical value for the right tail found from Table III (see Tables of Probability Distributions) and χ2L is that for the left found from Table IV (see Tables of Probability Distributions). To find the confidence on σ rather than σ2, we just take square roots of the components within the brackets.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B978012384864200007X

The Basis of Monte Carlo

William L. Dunn, J. Kenneth Shultis, in Exploring Monte Carlo Methods, 2012

2.8 Summary

A probability density function of a single continuous variable is a non-negative function defined on an interval and normalized so that its integral over that interval is unity. The associated cumulative distribution function of x is interpreted as the probability that a random sample has a value less than or equal to x. These concepts extend naturally to discrete and multidimensional random variables.

Generally, probability distributions have well-defined population means and variances. These are explicit values that are characteristic of the distribution. Monte Carlo is a method that allows one to estimate the population mean and population variance by the sample mean and sample variance. These concepts apply to functions of a discrete variable, to functions of a continuous variable, and to functions of many variables, whether discrete or continuous.

The Monte Carlo estimates of the sample mean and sample variance almost surely approach the population mean and population variance as the number of samples (usually called “histories”) gets large. This important feature is a consequence of the law of large numbers, which can be stated as follows. If

(2.69)〈z〉=∫abz(x)f(x)dx

and

(2.70)z¯=1N∑i=1Nz(xi),

with the xi suitably sampled from f(x), then, almost surely,

(2.71) limN→∞z¯=〈z〉.

The CLT then provides a prescription for estimating the uncertainty in the sample mean and sample variance.

It should be kept in mind that the term standard deviation is often used, rather loosely, for any of the following:

the population standard deviation σ (z);

the sample standard deviation s(z)=[N(z2¯−z¯2) /(N−1)]1/2;

the standard deviation of the sample mean σ(z¯)=σ(z)/N; and

the estimation of the standard deviation of the sample mean s(z¯)=s(z)/N=[(z2¯−z¯2)/(N−1)]1/2.

The context must be employed to indicate which interpretation is meant.

Finally, Monte Carlo can mean a wide variety of different calculational methods, but all are based on the statistical power of the central limit theorem that guarantees expected values can be approximated by averages and that allows confidence intervals to be constructed for the averages. Although many different applications of Monte Carlo appear to be quite different, they all depend on this critical theorem.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780444515759000026

HYPOTHESIS TESTING

Rand R. Wilcox, in Applying Contemporary Statistical Techniques, 2003

5.1.2 A Two-Sided Test: Testing for Exact Equality

One other variation of hypothesis testing needs to be described: testing the hypothesis that the mean is exactly equal to some specified value. Returning to the example regarding open-mindedness, suppose it is claimed that the average score of all adult men is exactly 50, as opposed to being greater than or equal to 50. Then the null hypothesis is

H0:μ=50.

If the sample mean is exactly equal to 50, you would not reject, because this is consistent with H0. If X¯>50,, then the larger the sample mean happens to be, the more doubt there is that μ = 50. Similarly, if X¯<50,, then the smaller the sample mean, the more doubt there is that μ = 50. That is, now it is reasonable to reject H0 if is either too large or too small. An equivalent way of saying this is that you should reject if

Z=X¯−50σ/n

is too large or too small.

Suppose you reject H0 if either Z ≤ −1.96 or Z ≥ 1.96. A more succinct way of describing this decision rule is that you reject if the absolute value of Z is greater than or equal to 1.96. In symbols, reject if |Z| ≥ 1.96. If the null hypothesis is true and sampling is from a normal distribution, then Z has a standard normal distribution, so the probability of rejecting is

P(Z≤−196 )+P(Z≥1.96)=.025+.025=0.5,

which is the total area of the two shaded regions in Figure 5.2.

FIGURE 5.2. Critical region for a two-sided test such that P(Type I error) = .05.

EXAMPLE

Imagine a list of 55 minor malformations babies might have at birth. For illustrative purposes, it is assumed that the average number of malformations is 15 and the population standard deviation is σ = 6. For babies born to diabetic women, is the average number different from 15? That is, can you reject the hypothesis H0: μ = 15? To find out, you sample n = 16 babies having diabetic mothers, count the number of malformations for each, and find that the average number of malformations is X¯=19. Then

Z=19−156/16=2.67.

If the goal is to have the probability of a Type I error equal to .05, then the critical values are −1.96 and 1.96, for the reasons just given. Because 2.67 is greater than 1.96, you reject the null hypothesis and conclude that the average number of malformations is greater than 15.

The significance level, or p-value, can be determined when testing for exact equality, but you must take into account that the critical region consists of both tails of the standard normal distribution.

EXAMPLE

Continuing the last example, where Z = 2.67, if you had decided to reject the null hypothesis if Z ≤ −2.67 or if Z ≥ 2.67, then the probability of a Type I error is

P(Z≤−2.67)+ P(Z≥2.67)=.0038+.0038=.0076.

This means that the significance level is 0.0076.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780127515410500262

Statistical Testing, Risks, and Odds in Medical Decisions

ROBERT H. RIFFENBURGH, in Statistics in Medicine (Second Edition), 2006

THE NORMAL TEST AND THE t TEST ARE Two FORMS OF THE TWO-SAMPLE MEANS TEST

This section concentrates on the type of means test most frequently seen in the medical literature: the test for a difference between two means. We will look at two subclasses: the case of known population standard deviations, or samples large enough that the sample standard deviations are not practically different from known, and the case of small-sample estimated standard deviations. The means test uses a standard normal distribution (z distribution) in the first case and a t distribution in the second, for reasons discussed in Section 2.8. (Review: The means are assumed normal. Standardizing places a standard deviation in the denominator. If the standard deviation is known, it behaves as a constant and the normal distribution remains. If the standard deviation is estimated from the sample, it follows a probability distribution and the ratio of the numerator's normal distribution to this distribution turns out to be t.)

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780120887705500459

Estimation

Sheldon M. Ross, in Introductory Statistics (Third Edition), 2010

SUMMARY

The sample mean

is an unbiased estimator of the population mean μ. Its standard deviation, sometimes referred to as the standard error of

as an estimator of μ, is given by

SD(X¯)=σn

where σ is the population standard deviation.

The statistic , equal to the proportion of a random sample having a given characteristic, is the estimate of p, the proportion of the entire population with the characteristic. The standard error of the estimate is

SD(pˆ)=p(1−p)n

where n is the sample size. The standard error can be estimated by

pˆ(1−pˆ)n

The sample variance S2 is the estimator of the population variance σ2. Correspondingly, the sample standard deviation S is used to estimate the population standard deviation σ.

If X1, …, Xn are a sample from a normal population having a known standard deviation σ,

X¯±zα /2σn

is a 100(1 – α) percent confidence interval estimator of the population mean μ. The length of this interval, namely,

2zα/2σn

will be less than or equal to b when the sample size n is such that

n≥(2zα/2σb)2

A 100(1 – α) lower confidence bound for μ is given by

X¯−zασn

That is, we can assert with 100(1 – α) percent confidence that

μ>X¯+zασn

A 100(1 – α) upper confidence bound for μ is

X¯+zασn

That is, we can assert with 100(1 – α) percent confidence that

μ<X¯+zασn

If X1, …, Xn are a sample from a normal population whose standard deviation is unknown, a 100(1 – α) percent confidence interval estimator of μ is

X¯±tn−1,α/2S n

In the preceding, tn–1, α/2 is such that

P{Tn−1>tn−1,α/2}=α2

when Tn–1 is a t random variable with n – 1 degrees of freedom.

The 100(1 – α) percent lower and upper confidence bounds for μ are, respectively, given by

X¯−tn−1,αSn

To obtain a confidence interval estimate of p, the proportion of a large population with a specific characteristic, take a random sample of size n. If is the proportion of the random sample that has the characteristic, then an approximate 100(1 – α) percent confidence interval estimator of p is

pˆ±zα/2pˆ(1−pˆ)n

The length of this interval always satisfies

Lengthofconfidenceinterval≤ zα/2n

The distance from the center to the endpoints of the 95 percent confidence interval estimator, that is, 1.96pˆ(1−pˆ)/n, is commonly referred to as the margin of error. For instance, suppose a newspaper states that a new poll indicates that 64 percent of the population consider themselves to be conservationists, with a margin of error of ±3 percent. By this, the newspaper means that the results of the poll yield that the 95 percent confidence interval estimate of the proportion of the population who consider themselves to be conservationists is 0.64 ± 0.03.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780123743886000089

What is the appropriate distribution when the population standard deviation is not known?

You must use the t-distribution table when working problems when the population standard deviation (σ) is not known and the sample size is small (n<30). General Correct Rule: If σ is not known, then using t-distribution is correct.

When n ≥ 30 and the population standard deviation is known what is the appropriate distribution?

Thus, for samples of size 30 or larger, the z-distribution is generally used, even if the population standard deviation is not known.

What test is used when n ≥ 30 or when the population is normally distributed and the standard deviation is known?

A one sample mean test is used when the population is known to be normally distributed and when the population standard deviation ( ) is known.

What is the standard deviation of the population if σ 15 and N 100?

The sample standard deviation is given by σχ = σ√n = 15√100 = 1510 = 1.5.

Toplist

Neuester Beitrag

Stichworte