Introduction
The aim of this analysis is to determine the association strength for the variables agecat and degree as well the major contributing cells through a chi-square analysis. Through the use of standardized residuals, it should aid in determining the cell contributions.
Hypothesis
- Null: There is no basis of difference between the agecat and degree
- Alternative: There is are real differences between the agecat and degree
Methodology
For this project, the gss.sav file is loaded into SPSS (GSS, n.d.). The goal is to look at the relationships between the following variables: agecat (Age category) and degree (Respondent’s highest degree).
To conduct a chi-square analysis, navigate through Analyze > Descriptive Statistics > Crosstabs.
The variable degree was placed in the “Row(s)” box and agecat was placed under “Column(s)” box. Select “Statistics” button and select “Chi-square” and under the “Nominal” section select “Lambda”. Select the “Cells” button and select “Standardized” under the “Residuals” section. The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output were observed in the next four tables.
Results
Table 1: Case processing summary.
Cases | ||||||
Valid | Missing | Total | ||||
N | Percent | N | Percent | N | Percent | |
Degree * Age category | 1411 | 99.4% | 8 | 0.6% | 1419 | 100.0% |
From the total sample size of 1419 participants, 8 cases are reported to be missing, yielding a 99.4% response rate (Table 1). Examining the cross tabulation, for the age groups 30-39, 40-49, 50-59, and 60-89 the standardized residual is far less than -1.96 or far greater than +1.96 respectively. Thus, the frequencies between these two differ significantly. Finally, for the 60-89 age group the standardized residual is less than -1.96, making these two frequencies differ significantly. Thus, for all these frequencies, SPSS identified that the observed frequencies are far apart from the expected frequencies (Miller, n.d.). For those significant standardized residuals that are negative is pointing out that the SPSS model is over predicting people of that age group with that respective diploma (or lack thereof). For those significant standardized residuals that are positive is point out that the SPSS model is under-predicting people of that age group with a lack of a diploma.
Table 2: Degree by Age category crosstabulation.
Age category | Total | ||||||||
18-29 | 30-39 | 40-49 | 50-59 | 60-89 | |||||
Degree | Less than high school | Count | 42 | 33 | 36 | 20 | 112 | 243 | |
Standardized Residual | -.1 | -2.8 | -2.3 | -2.7 | 7.1 | ||||
High school | Count | 138 | 162 | 154 | 113 | 158 | 725 | ||
Standardized Residual | .9 | .2 | -.2 | .4 | -1.2 | ||||
Junior college or more | Count | 68 | 115 | 114 | 78 | 68 | 443 | ||
Standardized Residual | -1.1 | 1.8 | 1.9 | 1.4 | -3.7 | ||||
Total | Count | 248 | 310 | 304 | 211 | 338 | 1411 |
Deriving the degrees of freedom from Table 2, df = (5-1)*(3-1) is 8. However, none of the expected counts were less than five because the minimum expected count is 36.3 (Table 3) which is desirable. The chi-squared value is 96.364 and is significance at the 0.05 level. Thus, the null hypothesis is rejected, and there is a statistically significant association between a person’s age category and diploma level. This test doesn’t tell us anything about the directionality of the relationship.
Table 3: Chi-Square Tests
Value | df | Asymptotic Significance (2-sided) | |
Pearson Chi-Square | 96.364a | 8 | .000 |
Likelihood Ratio | 90.580 | 8 | .000 |
Linear-by-Linear Association | 23.082 | 1 | .000 |
N of Valid Cases | 1411 | ||
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 36.34. |
Table 4: Directional Measures
Value | Asymptotic Standard Errora | Approximate Tb | Approximate Significance | |||
Nominal by Nominal | Lambda | Symmetric | .029 | .013 | 2.278 | .023 |
Degree Dependent | .000 | .000 | .c | .c | ||
Age category Dependent | .048 | .020 | 2.278 | .023 | ||
Goodman and Kruskal tau | Degree Dependent | .024 | .005 | .000d | ||
Age category Dependent | .019 | .004 | .000d | |||
a. Not assuming the null hypothesis. | ||||||
b. Using the asymptotic standard error assuming the null hypothesis. | ||||||
c. Cannot be computed because the asymptotic standard error equals zero. | ||||||
d. Based on chi-square approximation |
Since there is a statistically significant association between a person’s age category and diploma level, the chi-square test doesn’t show how much these variables are related to each other. The lambda value (when we reject the null hypothesis) is 0.029; there is a 2.9% relationship between the two variables. Thus the relationship has a very weak effect (Table 4). Thus, 2.9% of the variance is accounted for, and there is nothing going on in here.
Conclusions
There is a statistically significant association between a person’s age category and diploma level. According to the crosstabulation, the SPSS model is significantly over-predicting the number of people with less education than a high school diploma for the age groups of 20-59 as well as those with a college degree for the 60-89 age group. This difference in the standard residual helped drive a large and statistically significant chi-square value. With a lambda of 0.029, it shows that 2.9% of the variance is accounted for, and there is nothing going on in here.
SPSS Code
CROSSTABS
/TABLES=ndegree BY agecat
/FORMAT=AVALUE TABLES
/STATISTICS=CHISQ CC LAMBDA
/CELLS=COUNT SRESID
/COUNT ROUND CELL.
References:
- GSS (n.d.) SPSS data file [DataSet]. Retrieved from https://classroom.coloradotech.edu/app/classResourceRedirect.html?id=2931693&url=/lms/class/95707/document/2931693/open
- Miller, R. (n.d.). Week 8: Chi-Square Test. [Video file]. Retrieved from http://breeze.careeredonline.com/p47dupnqy1q/?launcher=false&fcsContent=true&pbMode=normal