Calculating Analysis of Variance (ANOVA) and Post Hoc Analyses Following ANOVA
Analysis of variance (ANOVA) is a statistical procedure that compares data between two or more groups or conditions to investigate the presence of differences between those groups on some continuous dependent variable (see Exercise 18). In this exercise, we will focus on the one-way ANOVA, which involves testing one independent variable and one dependent variable (as opposed to other types of ANOVAs, such as factorial ANOVAs that incorporate multiple independent variables).
Why ANOVA and not a t-test? Remember that a t-test is formulated to compare two sets of data or two groups at one time (see Exercise 23 for guidance on selecting appropriate statistics). Thus, data generated from a clinical trial that involves four experimental groups, Treatment 1, Treatment 2, Treatments 1 and 2 combined, and a Control, would require 6 t-tests. Consequently, the chance of making a Type I error (alpha error) increases substantially (or is inflated) because so many computations are being performed. Specifically, the chance of making a Type I error is the number of comparisons multiplied by the alpha level. Thus, ANOVA is the recommended statistical technique for examining differences between more than two groups (Zar, 2010).
ANOVA is a procedure that culminates in a statistic called the F statistic. It is this value that is compared against an F distribution (see Appendix C) in order to determine whether the groups significantly differ from one another on the dependent variable. The formulas for ANOVA actually compute two estimates of variance: One estimate represents differences between the groups/conditions, and the other estimate represents differences among (within) the data.
Research Designs Appropriate for the One-Way ANOVA
Research designs that may utilize the one-way ANOVA include the randomized experimental, quasi-experimental, and comparative designs (Gliner, Morgan, & Leech, 2009). The independent variable (the “grouping” variable for the ANOVA) may be active or attributional. An active independent variable refers to an intervention, treatment, or program. An attributional independent variable refers to a characteristic of the participant, such as gender, diagnosis, or ethnicity. The ANOVA can compare two groups or more. In the case of a two-group design, the researcher can either select an independent samples t-test or a one-way ANOVA to answer the research question. The results will always yield the same conclusion, regardless of which test is computed; however, when examining differences between more than two groups, the one-way ANOVA is the preferred statistical test.
Example 1: A researcher conducts a randomized experimental study wherein she randomizes participants to receive a high-dosage weight loss pill, a low-dosage weight loss pill, or a placebo. She assesses the number of pounds lost from baseline to post-treatment 378for the three groups. Her research question is: “Is there a difference between the three groups in weight lost?” The independent variables are the treatment conditions (high-dose weight loss pill, low-dose weight loss pill, and placebo) and the dependent variable is number of pounds lost over the treatment span.
Null hypothesis: There is no difference in weight lost among the high-dose weight loss pill, low-dose weight loss pill, and placebo groups in a population of overweight adults.
Example 2: A nurse researcher working in dermatology conducts a retrospective comparative study wherein she conducts a chart review of patients and divides them into three groups: psoriasis, psoriatric symptoms, or control. The dependent variable is health status and the independent variable is disease group (psoriasis, psoriatic symptoms, and control). Her research question is: “Is there a difference between the three groups in levels of health status?”
Null hypothesis: There is no difference between the three groups in health status.
Statistical Formula and Assumptions
Use of the ANOVA involves the following assumptions (Zar, 2010):
1. Sample means from the population are normally distributed.
2. The groups are mutually exclusive.
3. The dependent variable is measured at the interval/ratio level.
4. The groups should have equal variance, termed “homogeneity of variance.”
5. All observations within each sample are independent.
The dependent variable in an ANOVA must be scaled as interval or ratio. If the dependent variable is measured with a Likert scale and the frequency distribution is approximately normally distributed, these data are usually considered interval-level measurements and are appropriate for an ANOVA (de Winter & Dodou, 2010; Rasmussen, 1989).
The basic formula for the F without numerical symbols is:
F=Mean Square Between GroupsMean Square Within Groups
The term “mean square” (MS) is used interchangeably with the word “variance.” The formulas for ANOVA compute two estimates of variance: the between groups variance and the within groups variance. The between groups variance represents differences between the groups/conditions being compared, and the within groups variance represents differences among (within) each group’s data. Therefore, the formula is F = MS between/MS within.
Using an example from a study of students enrolled in an RN to BSN program, a subset of graduates from the program were examined (Mancini, Ashwill, & Cipher, 2014). The data are presented in Table 33-1. A simulated subset was selected for this example so that 379the computations would be small and manageable. In actuality, studies involving one-way ANOVAs need to be adequately powered (Aberson, 2010; Cohen, 1988). See Exercises 24 and 25 for more information regarding statistical power.
MONTHS FOR COMPLETION OF RN TO BSN PROGRAM BY HIGHEST DEGREE STATUS
The independent variable in this example is highest degree obtained prior to enrollment (Associate’s, Bachelor’s, or Master’s degree), and the dependent variable was number of months it took for the student to complete the RN to BSN program. The null hypothesis is “There is no difference between the groups (highest degree of Associate’s, Bachelor’s, or Master’s) in the months these nursing students require to complete an RN to BSN program.”
The computations for the ANOVA are as follows:
Step 1: Compute correction term, C.
Square the grand sum (G), and divide by total N:
C=460 2 27 =7,837.04
Step 2: Compute Total Sum of Squares.
Square every value in dataset, sum, and subtract C:
(17 2 +19 2 +24 2 +18 2 +24 2 +16 2 +16 2 +…+12 2 )−7,837.04=8,234−7,837.04=396.96
Step 3: Compute Between Groups Sum of Squares.
Square the sum of each column and divide by N. Add each, and then subtract C:
178 2 9 +125 2 9 +157 2 9 −7,837.04(3,520.44+1,736.11+2,738.78)−7,837.04=158.29
Step 4: Compute Within Groups Sum of Squares.
Subtract the Between Groups Sum of Squares (Step 3) from Total Sum of Squares (Step 2):
Step 5: Create ANOVA Summary Table (see Table 33-2).
a. Insert the sum of squares values in the first column.
b. The degrees of freedom are in the second column. Because the F is a ratio of two separate statistics (mean square between groups and mean square within groups) both have different df formulas—one for the “numerator” and one for the denominator:
Mean square between groupsdf=number of groups−1
Mean square within groups df=N-number of groups
For this example, thedffor the numerator is 3−1=2.
Thedffor the denominator is 27−3=24.
c. The mean square between groups and mean square within groups are in the third column. These values are computed by dividing the SS by the df. Therefore, the MS between = 158.29 ÷ 2 = 79.15. The MS within = 238.67 ÷ 24 = 9.94.
d. The F is the final column and is computed by dividing the MS between by the MS within. Therefore, F = 79.15 ÷ 9.94 = 7.96.
ANOVA SUMMARY TABLE
Source of Variation
Step 6: Locate the critical F value on the F distribution table (see Appendix C) and compare it to our obtained F = 7.96 value. The critical F value for 2 and 24 df at α = 0.05 is 3.40, which indicates the F value in this example is statistically significant. Researchers report ANOVA results in a study report using the following format: F(2,24) = 7.96, p < 0.05. Researchers report the exact p value instead of “p < 0.05,” but this usually requires the use of computer software due to the tedious nature of p value computations.
Our obtained F = 7.96 exceeds the critical value in the table, which indicates that the F is statistically significant and that the population means are not equal. Therefore, we can reject our null hypothesis that the three groups spent the same amount of time completing the RN to BSN program. However, the F does not indicate which groups differ from one another, and this F value does not identify which groups are significantly different from one another. Further testing, termed multiple comparison tests or post hoc tests, is required to complete the ANOVA process and determine all the significant differences among the study groups.
Post Hoc Tests
Post hoc tests have been developed specifically to determine the location of group differences after ANOVA is performed on data from more than two groups. These tests were developed to reduce the incidence of a Type I error. Frequently used post hoc tests are the Newman-Keuls test, the Tukey Honestly Significant Difference (HSD) test, the Scheffé test, and the Dunnett test (Zar, 2010; see Exercise 18 for examples). When these tests are 381calculated, the alpha level is reduced in proportion to the number of additional tests required to locate statistically significant differences. For example, for several of the aforementioned post hoc tests, if many groups’ mean values are being compared, the magnitude of the difference is set higher than if only two groups are being compared. Thus, post hoc tests are tedious to perform by hand and are best handled with statistical computer software programs. Accordingly, the rest of this example will be presented with the assistance of SPSS.
The following screenshot is a replica of what your SPSS window will look like. The data for ID numbers 24 through 27 are viewable by scrolling down in the SPSS screen.
Step 1: From the “Analyze” menu, choose “Compare Means” and “One-Way ANOVA.” Move the dependent variable, Number of Months to Complete Program, over to the right, as in the window below.
Step 2: Move the independent variable, Highest Degree at Enrollment, to the right in the space labeled “Factor.”
Step 3: Click “Options.” Check the boxes next to “Descriptive” and “Homogeneity of variance test.” Click “Continue” and “OK.”
Interpretation of SPSS Output
The following tables are generated from SPSS. The first table contains descriptive statistics for months to completion, separated by the three groups. The second table contains the Levene’s test of homogeneity of variances. The third table contains the ANOVA summary table, along with the F and p values.
The first table displays descriptive statistics that allow us to observe the means for the three groups. This table is important because it indicates that the students with an Associate’s degree took an average of 19.78 months to complete the program, compared to 13.89 months for students with a Bachelor’s and 17.44 months for students with a Master’s degree.
The second table contains the Levene’s test for equality of variances. The Levene’s test is a statistical test of the equal variances assumption. The p value is 0.488, indicating there was no significant difference among the three groups’ variances; thus, the data have met the equal variances assumption for ANOVA.
The last table contains the contents of the ANOVA summary table, which looks much like Table 33-2. This table contains an additional value that we did not compute by hand—the exact p value, which is 0.002. Because the SPSS output indicates that we have a significant ANOVA, post hoc testing must be performed.
Return to the ANOVA window and click “Post Hoc.” You will see a window similar to the one below. Select the “LSD” and “Tukey” options. Click “Continue” and “OK.”
The following output is added to the original output. This table contains post hoc test results for two different tests: the LSD (Least Significant Difference) test and the Tukey HSD (Honestly Significant Difference) test. The LSD test, the original post hoc test, explores all possible pairwise comparisons of means using the equivalent of multiple t-tests. However, the LSD test, in performing a set of multiple t-tests, reports inaccurate p values that have not been adjusted for multiple computations (Zar, 2010). Consequently, researchers should exercise caution when choosing the LSD post hoc test following an ANOVA.
The Tukey HSD comparison test, on the other hand, is a more “conservative” test, meaning that it requires a larger difference between two groups to indicate a significant difference than some of the other post hoc tests available. By requiring a larger difference between the groups, the Tukey HSD procedure yields more accurate p values of 0.062 to reflect the multiple comparisons (Zar, 2010).
Post Hoc Tests
Observe the “Mean Difference” column. Any difference noted with an asterisk (*) is significant at p < 0.05. The p values of each comparison are listed in the “Sig.” column, and values below 0.05 indicate a significant difference between the pair of groups. Observe the p values for the comparison of the Bachelor’s degree group versus the Master’s degree group. The Tukey HSD test indicates no significant difference between the groups, with a p of 0.062; however, the LSD test indicates that the groups significantly differed, with a p of 0.025. This example enables you see the difference in results obtained when calculating a conservative versus a lenient post hoc test. However, it should be noted that because an a priori power analysis was not conducted, there is a possibility that these analyses are underpowered. See Exercises 24 and 25 for more information regarding the consequences of low statistical power.
Final Interpretation in American Psychological Association (Apa) Format
The following interpretation is written as it might appear in a research article, formatted according to APA guidelines (APA, 2010). A one-way ANOVA performed on months to program completion revealed significant differences among the three groups, F(2,24) = 7.96, p = 0.002. Post hoc comparisons using the Tukey HSD comparison test indicated that the students in the Associate’s degree group took significantly longer to complete the program than the students in the Bachelor’s degree group (19.8 versus 13.9 months, respectively) (APA, 2010). However, there were no significant differences in program completion time between the Associate’s degree group and the Master’s degree group or between the Bachelor’s degree group and the Master’s degree group.
1. Is the dependent variable in the Mancini et al. (2014) example normally distributed? Provide a rationale for your answer.
2. What are the two instances that must occur to warrant post hoc testing following an ANOVA?
3. Do the data in this example meet criteria for homogeneity of variance? Provide a rationale for your answer.
4. What is the null hypothesis in the example?
5. What was the exact likelihood of obtaining an F value at least as extreme as or as close to the one that was actually observed, assuming that the null hypothesis is true?
6. Do the data meet criteria for “mutual exclusivity”? Provide a rationale for your answer.
7. What does the numerator of the F ratio represent?
8. What does the denominator of the F ratio represent?
9. How would our final interpretation of the results have changed if we had chosen to report the LSD post hoc test instead of the Tukey HSD test?
10. Was the sample size adequate to detect differences among the three groups in this example? Provide a rationale for your answer.
Answers to Study Questions
1. Yes, the data are approximately normally distributed as noted by the frequency distribution generated from SPSS, below. The Shapiro-Wilk (covered in Exercise 26) p value for months to completion was 0.151, indicating that the frequency distribution did not significantly deviate from normality.
2. The two instances that must occur to warrant post hoc testing following an ANOVA are (1) the ANOVA was performed on data comparing more than two groups, and (2) the F value is statistically significant.
3. Yes, the data met criteria for homogeneity of variance because the Levene’s test for equality of variances yielded a p of 0.488, indicating no significant differences in variance between the groups.
4. The null hypothesis is: “There is no difference between groups (Associate’s, Bachelor’s, and Master’s degree groups) in months until completion of an RN to BSN program.”
5. The exact likelihood of obtaining an F value at least as extreme as or as close to the one that was actually observed, assuming that the null hypothesis is true, was 0.2%.
6. Yes, the data met criteria for mutual exclusivity because a student could only belong to one of the three groups of the highest degree obtained prior to enrollment (Associate, Bachelor’s, and Master’s degree).
7. The numerator represents the between groups variance or the differences between the groups/conditions being compared.
8. The denominator represents within groups variance or the extent to which there is dispersion among the dependent variables.
9. The final interpretation of the results would have changed if we had chosen to report the LSD post hoc test instead of the Tukey HSD test. The results of the LSD test indicated that the 389students in the Master’s degree group took significantly longer to complete the program than the students in the Bachelor’s degree group (p = 0.025).
10. The sample size was most likely adequate to detect differences among the three groups overall because a significant difference was found, p = 0.002. However, there was a discrepancy between the results of the LSD post hoc test and the Tukey HSD test. The difference between the Master’s degree group and the Bachelor’s degree group was significant according to the results of the LSD test but not the Tukey HSD test. Therefore, it is possible that with only 27 total students in this example, the data were underpowered for the multiple comparisons following the ANOVA.
Data for Additional Computational Practice for Questions to be Graded
Using the example from Ottomanelli and colleagues (2012) study, participants were randomized to receive Supported Employment or treatment as usual. A third group, also a treatment as usual group, consisted of a nonrandomized observational group of participants. A simulated subset was selected for this example so that the computations would be small and manageable. The independent variable in this example is treatment group (Supported Employment, Treatment as Usual–Randomized, and Treatment as Usual–Observational/Not Randomized), and the dependent variable was the number of hours worked post-treatment. Supported employment refers to a type of specialized interdisciplinary vocational rehabilitation designed to help people with disabilities obtain and maintain community-based competitive employment in their chosen occupation (Bond, 2004).
The null hypothesis is: “There is no difference between the treatment groups in post-treatment number of hours worked among veterans with spinal cord injuries.”
Compute the ANOVA on the data in Table 33-3 below.
POST-TREATMENT HOURS WORKED BY TREATMENT GROUP
“TAU” = Treatment as Usual.
EXERCISE 33 Questions to Be Graded
Name: _______________________________________________________ Class: _____________________
Follow your instructor’s directions to submit your answers to the following questions for grading. Your instructor may ask you to write your answers below and submit them as a hard copy for grading. Alternatively, your instructor may ask you to use the space below for notes and submit your answers online at http://evolve.elsevier.com/Grove/statistics/ under “Questions to Be Graded.”
1. Do the data meet criteria for homogeneity of variance? Provide a rationale for your answer.
2. If calculating by hand, draw the frequency distribution of the dependent variable, hours worked at a job. What is the shape of the distribution? If using SPSS, what is the result of the Shapiro-Wilk test of normality for the dependent variable?
3. What are the means for three groups’ hours worked on a job?
4. What are the F value and the group and error df for this set of data?
5. Is the F significant at α = 0.05? Specify how you arrived at your answer.
6. If using SPSS, what is the exact likelihood of obtaining an F value at least as extreme as or as close to the one that was actually observed, assuming that the null hypothesis is true?
7. Which group worked the most weekly job hours post-treatment? Provide a rationale for your answer.
8. Write your interpretation of the results as you would in an APA-formatted journal.
9. Is there a difference in your final interpretation when comparing the results of the LSD post hoc test versus Tukey HSD test? Provide a rationale for your answer.
10. If the researcher decided to combine the two Treatment as Usual groups to represent an overall “Control” group, then there would be two groups to compare: Supported Employment versus Control. What would be the appropriate statistic to address the difference in hours worked between the two groups? Provide a rationale for your answer.
I just need help in question 5th and 6th of questions to be graded, exercise 33