Identify the associations as either statistically significant or not statistically significant.

Significance Testing

Michael Dickson, Davis Baird, in Philosophy of Statistics, 2011

2.7 The Institutionalization of Significance Tests

Statistical significance has become the gold standard in many academic disciplines. (Gigerenzer [1993] tells the story in the case of psychology.) Many major journals in social science, for example, require — either officially or in practice — that publishable studies demonstrate a statistically significant effect (i.e., the data must depart ‘significantly’ from some specified null hypothesis, for some statistic). Often, the level of significance (commonly, 5% or 1%) is dictated as well. This transformation of social science (and it has spread to other data-intensive disciplines as well) occurred rapidly. By one account, it was complete by 1940 or so: “[statisticians] have already overrun every branch of science with a rapidity of conquest rivaled only by Attila, Mohammed, and the Colorado beetle” [Kendel, 1942,69].

In the light of vocal critics (see Section 4 below), the issue of whether and to what extent to continue these requirements has been taken up by some major academic societies, such as the American Psychological Association (see [Azar, 1996]), although significance tests continue to play much the same role as they had throughout at least the latter half of the 20th century.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978044451862050006X

Research and Methods

Michael Borenstein, in Comprehensive Clinical Psychology, 1998

3.14.4.4.1 Misinterpreting the significant p-value

Statistical significance is often assumed to reflect substantive significance. Almost invariably, the first question asked by the reader, and the first point made by the researcher, is that the results were “significant.” This is the point highlighted at meetings, in abstracts, and in the results section of publications. Often, the discussion of effect does not proceed beyond the question of significance at all. Even when it does, the issue of significance, that is, statistical significance, is emphasized over clinical significance or effect size. In fact, though, the only information imparted by a statistically significant p-value is that the true effect is (probably) not nil. A significant p-value could reflect a clinically meaningful effect. It could also reflect a clinically trivial effect that had been found in a large sample (because of the large sample the effect size is reported precisely, and though small is known to be non-nil).

Cohen (1965) writes “Again and again, the results section of an article describing an effect as significant or highly significant is followed by a discussion section which (usually implicitly) proceeds to treat the effect as if it had been found to be large or very large” (p. 102). The same point has been documented repeatedly in the field of psychology and medicine (Feinstein, 1975, 1976, 1977; Friedman & Phillips, 1981; Mainland, 1982; Nelson, Rosenthal, & Rosnow, 1986; Rothman, 1986b; Tversky & Kahneman, 1971; Wonnacott, 1985).

A review of a manuscript is recalled that purported to show a substantial advantage for a novel antipsychotic drug. The basis for this claim was a difference between treatment groups on a critical variable, with p < 0.001. As it happens, the sample size was of the order of 2000 patients, and a p-value < 0.001 could reflect a difference that was so small as to be trivial. In fact, some of the baseline differences between the groups (which had, appropriately, been dismissed as negligible in size) were larger (and more significant!) than the post-treatment differences being submitted as evidence of the drug effect.

It gets worse. If the confusion of statistical significance with clinical significance is a problem in the interpretation of single studies, the situation is even worse when researchers use p-values to compare the results in different studies. This type of comparison is common when we want to know if the treatment is more effective in men than it is for women, or if one treatment is more effective than another.

Since the p-value incorporates information about both the sample size and effect size, a p-value of 0.05 could represent response rates in two groups of 50% vs. 70% (a 20-point effect) with a sample size of 50 cases per group. It could also represent response rates of 50% vs. 90% (a 40-point effect) with a sample size of10 cases per group. In the second case the effect size is substantially larger than it is in the first case, but this fact is lost in the p-values, which are identical. Similarly, a p-value of 0.01 could represent response rates of 40% vs. 65% (25 points) with 50 cases per group, or 40% vs. 95% (55 points) with 10 cases per group. Again, the difference in effect sizes is not evident from the p-value.

In fact, Tversky and Kahneman (1971) found that students presented with information about p-values and sample sizes tend to make exactly the wrong conclusions about the effect size. Students were presented with two studies where the p-value was 0.05, and told that the sample size was 10 per group in one and 50 per group in the other. Invariably, students assumed that the effect size in the second case was more impressive, while exactly the reverse is true (see also Berkson, 1938, 1942; Friedman & Phillips, 1981; Rosenthal & Gaito, 1963, 1964; Rozeboom, 1960).

The possibilities for mistakes expand when we consider the possibility of comparing results between studies when both the p-value and the sample size differ. If one study yielded a p-value of 0.05 and another yielded a p-value of 0.01, then in the absence of any additional information a reader might assume that the effect size was stronger in the latter case. In fact, though, if the first study (p = 0.05) used a sample of 10 per group and the second (p = 0.01) used 50 per group, then the effect size would have been substantially larger in the study with the modest p-value (a 40-point effect as compared with a 25-point effect).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0080427073002091

Evaluating Treatment Efficacy

Sarah B. Hunter, ... Elizabeth J. D’Amico, in Interventions for Addiction, 2013

Effect Size and Clinical Significance

Statistical significance indicates the probability associated with the null hypothesis, but it does not determine whether the result is important, meaningful, substantive, large enough to care about, or clinically significant. To interpret the effect of an intervention in terms of whether it is sufficiently large to be termed clinically significant, we must describe the magnitude of the effect of the intervention.

Some outcome measures can be summarized by a mean, which allows us to simply calculate an average value for individuals in two groups. Because people can examine their data in different ways, however, we must understand the baseline from which we start to determine a treatment’s effect. For example, if we expected that treatment would increase service use, we might find that the mean number of days of service use for a control group was 25, and the mean for the treatment group was 45. We can represent the size of this effect as the difference between these two groups by using days, stating that treatment increased service use by 20 days. Alternatively, we can express the effect of treatment as a ratio, stating that the effect of the treatment was to increase service use by 80% (45/25 = 1.80 or an 80% increase). There is not a one-to-one correspondence between the two measures. In another example, a reduction of 1 drink per drinking day can represent a 17% reduction, relative to a control group which is consuming, on average, 6 drinks per drinking day, but a 50% reduction relative to a control group that consumes, on average, 2 drinks per drinking day. Thus, the baseline from which we start can affect how we evaluate the effectiveness of our intervention.

Effects are often reported using standardized measures. For example, when a difference between two groups with respect to a continuous outcome measure is examined, one might standardize the difference by dividing by the standard deviation (this might be a pooled standard deviation or the standard deviation of the control group). This standardized effect size measure is known as Cohen’s d. An effect size measure such as Cohen’s d allows one to make comparisons across different continuous measures. Specifically, if one study has evaluated a treatment in terms of days in treatment, but a second study has evaluated a treatment in terms of an Alcohol Use Disorders Identification Test (AUDIT) score, the effects of these studies could be compared through the use of Cohen’s d. A second use of Cohen’s d is that it allows one to quantify the size of an effect using a unit-free measure. Cohen applied labels to different effect sizes for Cohen’s d, calling 0.2 a small effect, 0.5 a medium effect, and 0.8 a large effect. If a study evaluates a treatment effect and finds an AUDIT score change of 4 units, this is difficult to interpret for anyone unfamiliar with the AUDIT scoring system, but if we are also told that the Cohen’s d effect size is 0.5, we can interpret the size of the effect and have a somewhat better understanding of the clinical significance of the study. However, Cohen’s d and similar effect size measures have some disadvantages. The first is that the units of the measure are lost when any effect size measure is standardized. Learning that an intervention doubled the number of days in treatment might be more interpretable than d = 0.5. A standardized effect size measure alone cannot therefore convey clinical meaningfulness. Another problem with standardized effect size measures is that they are determined not only by the size of the effect, but also by the amount of variance in the data. For example, a difference in the number of days in treatment of 1 would result in an effect size of 0.5 (medium) if the standard deviation were equal to 2, but only 0.2 (small) if the standard deviation were equal to 5. These concerns also hold for other types of standardized effect size measures for noncontinuous outcomes.

Another measure of effect size that can be used for either categorical or continuous outcomes (but we will limit this discussion to categorical outcomes) is the number needed to treat or NNT. The NNT is the number of people who need to be treated with the intervention, rather than the control condition, in order that one person sees a benefit. The NNT is considered to be a useful indicator for policy decisions, as it describes the impact at a level of individuals affected. The NNT is calculated as 1/(risk difference). If the risk of relapse is 60% (or 0.6) in the control group and 40% (or 0.4) in the treatment group, the risk difference is 0.6 − 0.4 = 0.2. The NNT is 1/0.2 = 5. Thus, if five people were treated with the intervention, rather than the control condition, we would expect one additional individual to cease use who would not have otherwise ceased use.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123983381000610

Statistical Significance Versus Effect Size

X. Fan, T.R. Konold, in International Encyclopedia of Education (Third Edition), 2010

Use of Effect Size

As statistical significance shows in probabilistic terms how likely to obtain sample statistic values equal to or more extreme than the observed sample statistic if the NH is true, but does not inform about the magnitude of effect or whether the findings are practically meaningful, the use of effect size as an indicator of practical meaningfulness has been widely advocated in recent years (Kirk, 2001; Kline, 2004). The APA Task Force on Statistical Inference recommended to always provide some effect-size estimate when reporting a p value (Wilkinson and The APA Task Force on Statistical Inference, 1999). The most recent Publication Manual of the American Psychological Association (5th edition) states:

For the readers to fully understand the importance of your findings, it is almost always necessary to include some index of effect size or strength of relationship in your Results section…. to provide the reader not only with information about statistical significance, but also with enough information to assess the magnitude of the observed effect or relationship. (American Psychological Association, 2001: 25, 26)

There is a general consensus that the role of effect size should be enhanced, and the role of statistical significance should be reduced. However, there is disagreement about how much the role of statistical significance should be reduced. On the one hand, statistical significance test has been criticized as representing obstacles to scientific inquiry (e.g., Falk and Greenbaum, 1995; Hunter, 1997; Meehl, 1978). On the other hand, researchers have argued for the legitimate role for the correct use of significance testing (e.g., Cortina and Dunlap, 1997; Hagen, 1997).

Interpreting Effect Size

Based on the observed typical results in social science literature, Cohen (1969, 1988) proposed some tentative benchmarks for what may be considered as small, medium, and large effects:

small effect: d ≈ |0.2|; η2 ≈ 1%;

medium effect: d ≈ |0.5|; η2 ≈ 10%; and

large effect: d ≥ |0.8|; η2 ≥ 25%

This tentative interpretive framework for effect size has been useful for research practitioners, and this is probably a good impetus for the increasing popularity of effect size in research practice (Kirk, 1996). For a better understanding of these tentative benchmarks (d = 0.20, 0.50, and 0.80), readers may consult Henson (2006) for easy-to-understand visual graphs of these effect sizes. These interpretive benchmarks, however, should not be considered as a one-size-fits-all rule of thumb, as it is very likely that what can be considered as a small or a large effect will vary across substantive disciplines; for the same effect size (e.g., small effect), the practical importance may be quite different across disciplines. In research practice, the tentative benchmarks for effect size should be considered in light of the typical results in the discipline (Thompson, 2001).

As there are so many different types of effect size measures (Cohen, 1988; Grissom and Kim, 2005; Kline, 2004), and different effect size measures can be very different numerically (e.g., d and η2), it is critical that the specific effect size measure is reported. It would not be helpful to simply say that an effect size of 0.30 is obtained, because, without knowing what the effect size measure is, this information is not interpretable.

Effect Size and Its Confidence Interval

Sample effect size itself is a random variable. This means that sample effect size is subject to sampling variability, and the degree of its sampling variation is inversely related to sample size. In other words, sample effect size from small samples may deviate farther from the population effect size than that from larger samples. For small sample size conditions, a medium, or even a large, effect size could occur due to sampling error (i.e., by chance) when the true population effect size is zero (Fan, 2001).

The fact that sample effect size is a random variable has been widely known (e.g., Hedges and Olkin, 1985), although some education researchers may have not paid sufficient attention to this. It is not uncommon to hear the discussion to the effect that the outcome of a statistical significance test is influenced by sample size (true!), so attention should be paid to effect size, as if effect size were not influenced by sample size. Undoubtedly, the use of effect size measure makes good quantitative and common sense. However, the use of effect size serves a different purpose from that of statistical significance: while statistical significance test evaluates the probability of obtaining the sample outcome by chance, effect size provides some indication for the magnitude and practical meaningfulness. On the one hand, statistical significance test may detect a trivial effect (sufficient statistical power due to large sample size), or it may fail to detect a meaningful or obvious effect (lack of statistical power due to small sample size). On the other hand, under small sample size conditions, a seemingly meaningful (e.g., medium or even large) effect can occur by chance, thus not trustworthy (Fan, 2001). Because of these considerations, statistical significance and effect size complement each other, but they do not substitute for one another.

Thompson (2002, 2007) proposed the use of confidence intervals for sample effect size. This approach combines the sample size information with effect size measure, and provides variability estimate for a sample effect size. This approach represents a balance between the descriptive use of sample effect size and statistical significance. More importantly, “Confidence intervals for effect sizes are especially valuable because they facilitate meta-analytic thinking and the interpretation of intervals via comparison with the effect intervals from related prior studies” (Thompson, 2002: 25, emphasis original). Ultimately, for translating research findings to education practice, a researcher should not only ask if a sample result is likely, but also if an effect is practically noteworthy and replicable.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780080448947013683

Research and Methods

Sharon H. Kramer, Robert Rosenthal, in Comprehensive Clinical Psychology, 1998

3.15.4.3.2 Effect size estimation

The statistical significance of the heterogeneity of the effect sizes of three or more studies is also obtained from a chi-square using the following equation

(7)∑(nj−3)(z r1−z―r)2=χ2

where n is the number of sampling units on which each r is based, zrj is the Fisher zr corresponding to each r, and z―r is the weighted mean zr, that is

(8)z―r =∑(nj−3)zrj∑ (nj−3)

Example 9Studies A, B, C, and D yield effect sizes of r = 0.70 (n = 30), r = 0.45 (n = 45), r = 0.10 (n = 20), and r = — 0.15 (n = 25), respectively.

The Fisher zr's corresponding to these r's are 0.87, 0.48, 0.10, —0.15. First, using Equation (8), the weighted mean, zr,is

[27(0.87)+42(0.48)+17(0.10)+27(−0.15)]27+42+17+22=0.39

Then, applying Equation (7), we find that

27(0.87−0.39)2+42(48−0.39)2+17(0.10−0.39)2+22(−0.15−0.39)2=14.4

which for K— 1= 3 df is significant at p = 0.0024. These four effect sizes, then, are significantly heterogeneous.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0080427073002613

Statistical/Substantive, Interpretations and Data Limitations

Ronda Priest, in Encyclopedia of Social Measurement, 2005

Statistical Significance Testing

Overview of the Process

Tests of statistical significance provide measures of the likelihood that differences among outcomes are actual, and not just due to chance. All significance tests have these basic elements: assumption, null hypothesis (H0), theoretical or alternative hypothesis (HA), test statistic (e.g., t), P-value, and conclusion. First, regardless of the type of statistical significance test, certain assumptions must be met. These assumptions vary somewhat, but include factors such as: (1) type of data, (2) theoretical form of population distribution, (3) method of sampling (usually random sampling), and (4) sample size.

Second, one develops a null hypothesis about some phenomenon or parameter. For example in its simplest form

H0:μ=#

A null hypothesis is the opposite of the research hypothesis

HA:μ≠#

Next, data are collected that bear on the issue and an appropriate statistic (e.g., proportions, mean, regression coefficient) is calculated that measures the association between an independent and dependent variable. A statistical test of the null hypothesis that determines there is no relationship between the predictor (independent) variable and the dependent variable is then conducted via a test statistic. The test statistic typically involves a point estimate of the parameter to which the hypothesis refers, for example the Student's t test:

t= x¯−μ0{s/√(n−1)} ,

where is the sample mean, μ0 is the value of the parameter assumed under the null hypothesis, S is the standard deviation, and n is the sample size.

The test statistic is used to generate a P-value. The P-value is the probability, when H0 is true, that the test statistic value is at least as contradictory to Hd0 as the value actually observed.

Finally, the P-value is compared a predetermined cut-off value (α) that is usually set at 0.05 or 0.01. When a P-value below α is attained, the results are reported as statistically significant. From this point several interpretations of P often are made.

Interpretation of the P-value

Sometimes the P-value is interpreted as the probability that the results obtained were due to chance. For example, small P-values are taken to indicate that the results were not just due to random variation. A large value of P, say for a test that μ = #, would suggest that the sample mean x ¯ actually recorded was due to chance, and μ could be assumed to be value assumed under the null hypothesis.

The P-value may also be said to measure the reliability of the result, that is, the probability of getting the same result if the study were repeated. Significant differences are often termed “reliable” under this interpretation. Ironically, while tests of statistical significance measure the reliability of a test statistic, measures of statistical significance sometimes prove to be unreliable themselves.

Finally, P can be treated as the probability that the null hypothesis is true. This interpretation is the most direct, as it addresses the question of interest. These three common interpretations are all incorrect. Small values of P are taken to represent evidence that the null hypothesis is false. However, several studies have demonstrated this is not necessarily so. In reality, a P-value is the probability of the observed data or data more extreme, given that: (1) the null hypothesis is true, (2) the sample size was adequate according to the type of test, and (3) the sampling was done randomly.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0123693985001808

Evaluation of Network Estimation

Paul D. McNelis, in Neural Networks in Finance, 2005

4.3.5 Bootstrapping for Assessing Significance

Assessing the statistical significance of an input variable in the neural network processes is straightforward. Suppose we have a model with several input variables. We are interested, for example, in whether or not government spending growth affects inflation. In a linear model, we can examine the t statistic. With nonlinear neural network estimation, however, the number of network parameters is much larger. As was mentioned, likelihood ratio statistics are often unreliable.

A more reliable but time-consuming method is to use the boostrapping method originally due to Efron (1979, 1983) and Efron and Tibshirani (1993). This bootstrapping method is different from the .632 bootstrap method for in-sample bias. In this method, we work with the original date, with the full sample, [y, x], obtain the best predicted value with a neural network, y⌢, and obtain the set of residuals, e⌢=y−y⌢. We then randomly sample this vector, e⌢, with replacement and obtain the first set of shocks for the first bootstrap experiment, e⌢b1. With this set of first randomly sampled shocks from the base of residuals, e⌢b 1, we generate a new dependent variable for the first bootstrap experiment, yb1=y⌢+e⌢ b1, and use the new data set [yb1 x] to re-estimate a neural network and obtain the partial derivatives and other statistics of interest from the nonlinear estimation. We then repeat this procedure 500 or 1000 times, obtaining e⌢bi and ybi for each experiment, and redo the estimation. We then order the set of estimated partial derivatives (as well as other statistics) from lowest to highest values, and obtain a probability distribution of these derivatives. From this we can calculate bootstrap p-values for each of the derivatives, giving the probability of the null hypothesis that each of these derivatives is equal to zero.

The disadvantage of the bootstrap method, as should be readily apparent, is that it is more time-consuming than likelihood ratio statistics, since we have to resample from the original set of residuals and re-estimate the network 500 or 1000 times. However, it is generally more reliable. If we can reject the null hypothesis that a partial derivative is equal to zero, based on resampling the original residuals and re-estimating the model 500 or 1000 times, we can be reasonably sure that we have found a significant result.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012485967850004X

Significance, Tests of

Nancy Reid, in International Encyclopedia of the Social & Behavioral Sciences (Second Edition), 2015

Conclusion

A test of statistical significance is a mathematical calculation based on a test statistic, a null hypothesis, and the distribution of the test statistic under the null hypothesis. The result of the test is to indicate whether the data are consistent with the null hypothesis: if they are not, then either we have observed an event of low probability, or the null hypothesis is not correct.

The choice of test statistic is in principle arbitrary, but in practice might be determined by convention in the field of application, by intuition in a relatively new setting, or by one or more considerations developed in statistical theory. It is convenient to use test statistics whose distributions can be easily calculated exactly or to a good approximation. It is useful to use a test statistic that is sensitive to the particular departures from the null hypothesis that are of particular interest in the application.

A test of statistical significance is just one component of the analysis of a set of data, and should be supplemented by estimates of effects of interest, considerations related to sample size, and a discussion of the validity of any assumptions of independence or underlying models that have been made in the analysis. A statistically significant result is not necessarily an important result in any particular analysis, but needs to be considered in the context of research in that field.

An eloquent introduction to tests of significance is given in Fisher (1935: Chapter II). Kalbfleisch (1979: Chapter 12) is a good text book reference at an undergraduate level. The discussion here draw considerably from Cox and Hinkley (1974: Chapter 3), which is a good reference at a more advanced level. An excellent overview is given in Cox (1977).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780080970868421707

Significance, Tests of

N. Reid, in International Encyclopedia of the Social & Behavioral Sciences, 2001

3 Conclusion

A test of statistical significance is a mathematical calculation based on a test statistic, a null hypothesis, and the distribution of the test statistic under the null hypothesis. The result of the test is to indicate whether the data are consistent with the null hypothesis: if they are not, then either we have observed an event of low probability, or the null hypothesis is not correct.

The choice of test statistic is in principle arbitrary, but in practice might be determined by convention in the field of application, by intuition in a relatively new setting, or by one or more considerations developed in statistical theory. It is convenient to use test statistics whose distributions can easily be calculated exactly or to a good approximation. It is useful to use a test statistic that is sensitive to the particular departures from the null hypothesis that are of particular interest in the application.

A test of statistical significance is just one component of the analysis of a set of data, and should be supplemented by estimates of effects of interest, considerations related to sample size, and a discussion of the validity of any assumptions of independence or underlying models that have been made in the analysis. A statistically significant result is not necessarily an important result in any particular analysis, but needs to be considered in the context of research in that field.

An eloquent introduction to tests of significance is given in Fisher (1935, Chap. 2). Kalbfleisch (1979, Chap. 12) is a good textbook reference at an undergraduate level. The discussion here draws considerably from Cox and Hinkley (1974, Chap. 3), which is a good reference at a more advanced level. An excellent overview is given in Cox (1977). For a criticism of p-values see Schervish (1996) as well as Matthews (1998).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0080430767005064

Evidence-Based Interventions for Adolescent Substance Users

Josephine M. Hawke, Yifrah Kaminer, in Evidence-Based Addiction Treatment, 2009

Systematic Reviews and Meta-analyses

Systematic reviews and meta-analyses compare findings across studies. Compared to reviews that rely heavily on descriptors across studies or the authors’ expertise in the interpretation of the literature, methods used for systematic reviews and meta-analyses are more rigorous. They start with fairly extensive literature searches to identify published articles and clearly articulated inclusion criteria. They often supplement the published studies with unpublished ones obtained from investigators to reduce “publication bias”—the tendency for published literature to report predominately positive findings. They calculate effect sizes and provide additional quantitative analyses of pooled data in order to draw conclusions about the therapeutic effectiveness of treatment interventions. Finally, they use criteria to evaluate the robustness of the study’s methods (e.g., Nathan & Gorman, 2002) and categorize the interventions according to empirical evidence demonstrating their effectiveness (e.g., Chambless & Hollon,1998). Thus, evidence from systematic reviews and meta-analyses is the best source of information about which interventions for adolescent substance use have the greatest empirical support.

There have been several published systematic reviews of evidence-based interventions to treat adolescent substance abuse that summarize the evidence base for treatments for adolescent substance use in general (e.g., Deas, 2008; Deas & Thomas, 2001; Jenson, Howard, & Vaughn, 2004; Slesnick, Kaminer, & Kelly, 2008; Williams & Chang, 2000), in outpatient modalities (Waldron & Turner, 2008), for family therapies (Austin, Macgowan, & Wagner, 2005; Ozechowski & Liddle, 2000), and psychopharmacology (Waxmonsky & Wilens, 2005), as well as reviews that address relevant subpopulations, such as ethnic minorities (e.g., Huey & Polo, 2008) and patients with co-occurring disorders (Bender, Springer, & Kim, 2006).

Treatment effectiveness

Studies typically report the statistical significance of the findings. However, statistical significance does not tell you how large the effect is. An effect size refers to the strength of the relationship between two variables. Statistical measures of the effect size do. The best known measure of effect size is Cohen’s (1988) d. Cohen’s d measures the overlap in two distributions. Cohen’s thresholds for small, moderate, and large are 0.20, 0.50, and 0.80. Recent meta-analyses report effect sizes that range from small to moderate for interventions to treat adolescent substance abuse. Huey and Polo’s (2008) meta-analysis of 22 controlled trials of treatments for children and adolescents seeking treatment for mental health disorders reported that psychotherapy was associated with small to medium average effect sizes across all the studies (Cohen’s d = 0.44 for the entire sample and Cohen’s d = 0.57 for studies comparing active treatments to no treatment or placebo conditions). These findings suggest that slightly more than two-thirds of youths who receive treatment for mental health problems (including but not limited to substance use treatments) are better off than the average child receiving no treatment. In a meta-analysis of clinical trials of outpatient substance abuse treatments for adolescents, Waldron and Turner (2008) also found small-to-moderate average effect sizes for active treatments (Cohen’s d = 0.45). They concluded that evidence across clinical trials suggests that a number of distinct treatment approaches are effective.

In general, reviews of treatment effectiveness have demonstrated that treatment is effective in treating adolescent substance abuse. However, it is still premature to make definitive conclusions about the superiority of one intervention over another. A handful of interventions are considered evidence based. Family therapies and cognitive behavioral therapies (CBTs) were recognized most consistently as well supported compared to other approaches. There are also treatments such as the Assertive Community Reinforcement Approach and pharmacotherapies for which the empirical literature with adolescents is beginning to emerge. The following contains short descriptions of the interventions that were evaluated as meeting the most stringent criteria for being evidence based.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123743480000185

How do you tell if there is an association between two variables statistics?

Scatter Plots A scatter plot can be used to visually inspect whether there is an association between two quantitative variables. If there is a pattern in the plot, the variables are associated; if there is no pattern, the variables are not associated.

What does it mean when a variable is not statistically significant?

This means that the results are considered to be „statistically non-significant‟ if the analysis shows that differences as large as (or larger than) the observed difference would be expected to occur by chance more than one out of twenty times (p > 0.05).

How do you know if a relationship is statistically significant?

If chi-square statistic is at least 3.84, the p-value is 0.05 or less, so conclude relationship in population is real. That is, we reject the null hypothesis and conclude the relationship is statistically significant.

How do you find association in statistics?

The appropriate measure of association for this situation is Pearson's correlation coefficient, r (rho), which measures the strength of the linear relationship between two variables on a continuous scale. The coefficient r takes on the values of −1 through +1.