SPSS Code – Dr. Skylar (Sky) Hernandez

Quant: In-depth Analysis in SPSS

Abstract

This short analysis attempts to understand the marital happiness level on combined income. It was found that marital happiness levels are depended on a couples’ combined income, but for the happiest couples, they were happy regardless how much money they had. This, quantitative analysis on the sample data, has shown that when the happiness levels are low, there is a higher chance of lower levels of combined income.

Introduction

Mulligan (1973), was one of the first that stated arguments about money was one of the top reasons for divorce between couples. Factors for financial arguments could stem from: Goals and savings; record keeping; delaying tactics; apparel cost-cutting strategies; controlling expenditures; financial statements; do-it-yourself techniques; and cost cutting techniques (Lawrence, Thomasson, Wozniak, & Prawitz, 1993). Lawrence et al. (1993) exerts that financial arguments are common between families. However, when does money no longer become an issue? Does the increase in combined family income affect the marital happiness levels? This analysis attempts to answer these questions.

Methods

Crosstabulation was conducted to get a descriptive exploration of the data. Graphical images of box-plots helped show the spread and distribution of combined income per marital happiness. In this analysis of the data the two alternative hypothesis will be tested:

There is a difference between the mean values of combined income per marital happiness levels.
There is a dependence between the combined income and marital happiness level

This would lead to finally analyzing the hypothesis introduced in the previous section, one-way analysis of variance and two-way chi-square test was conducted respectively.

Results

Table 1: Case processing summary for analyzing happiness level versus family income.

Table 2: Crosstabulation for analyzing happiness level versus family income (<$21,250).

Table 3: Crosstabulation for analyzing happiness level versus family income for (>$21,250).

Table 4: Chi-square test for analyzing happiness level versus family income.

u6db1f5

Table 5: Analysis of Variance for analyzing happiness level versus family income.

u6db1f6

Figure 1: Boxplot diagram per happiness level of a marriage versus the family incomes.

Figure 2: Line diagram per happiness level of a marriage versus the mean of the family incomes.

Discussions and Conclusions

There are 1419 participants, and only 38.5% had responded to both their happiness of marriage and family income (Table 1). What may have contributed to this huge unresponsive rate is that there could have been people who were not married, and thus making the happiness of marriage question not applicable to the participants. Thus, it is suggested that in the future, there should be an N/A classification in this survey instrument, to see if we can have a higher response rate. Given that there are still 547 responses, there is other information to be gained from analyzing this data.

As a family unit gains more income, their happiness level increases (Table 2-3). This can be seen as the dollar value increases, the % within the family income and ranges recorded to midpoint for the very happy category increases as well from the 50% to the 75% level. The unhappiest couples seem to be earning a combined medium amount of $7500-9000 and at $27500-45000. Though for marriages that are pretty happy, it’s about stable at 30-40% of respondents at $13750 or more.

The mean values of family income to happiness (Figure 2), shows that on average, happier couples make more money together, but at a closer examination using boxplots (Figure 1), the happiest couples, seem to be happy regardless of how much money they make as the tails of the box plot extend really far from the median. One interesting feature is that the spread of family combined income is shrinks as happiness decreases (Figure 1). This could possibly suggest that though money is not a major factor for those couples that are happy, if the couple is unhappy it could be driven by lower combined incomes.

The two-tailed chi-squared test, shows statistical significance between family combined income and marital happiness allowing us to reject the null hypothesis #2, which stated that these two variables were independent of each other (Table 4). Whereas the analysis of variance doesn’t allow for a rejection of the null hypothesis #1, which states the means are different between the groups of marital happiness level (Table 5).

There could be many reasons for this analysis, thus future work could include analyzing other variables that could help define other factors for marital happiness. A possible multi-variate analysis may be necessary to see the impact on marital happiness as the dependent variable and combined income as one of many independent variables.

SPSS Code

GET

FILE=’C:\Users\mkher\Desktop\SAV files\gss.sav’.

DATASET NAME DataSet1 WINDOW=FRONT.

CROSSTABS

/TABLES=hapmar BY incomdol

/FORMAT=AVALUE TABLES

/STATISTICS=CHISQ CORR

/CELLS=COUNT ROW COLUMN

/COUNT ROUND CELL.

ONEWAY rincome BY hapmar

/MISSING ANALYSIS

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=hapmar incomdol MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: hapmar=col(source(s), name(“hapmar”), unit.category())

DATA: incomdol=col(source(s), name(“incomdol”))

DATA: id=col(source(s), name(“$CASENUM”), unit.category())

GUIDE: axis(dim(1), label(“HAPPINESS OF MARRIAGE”))

GUIDE: axis(dim(2), label(“Family income; ranges recoded to midpoints”))

SCALE: cat(dim(1), include(“1”, “2”, “3”))

SCALE: linear(dim(2), include(0))

ELEMENT: schema(position(bin.quantile.letter(hapmar*incomdol)), label(id))

END GPL.

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=hapmar MEAN(incomdol)[name=”MEAN_incomdol”]

MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: hapmar=col(source(s), name(“hapmar”), unit.category())

DATA: MEAN_incomdol=col(source(s), name(“MEAN_incomdol”))

GUIDE: axis(dim(1), label(“HAPPINESS OF MARRIAGE”))

GUIDE: axis(dim(2), label(“Mean Family income; ranges recoded to midpoints”))

SCALE: cat(dim(1), include(“1”, “2”, “3”))

SCALE: linear(dim(2), include(0))

ELEMENT: line(position(hapmar*MEAN_incomdol), missing.wings())

END GPL.

References

GSS (n.d.) SPSS data file [DataSet]. Retrieved from https://classroom.coloradotech.edu/app/classResourceRedirect.html?id=2931693&url=/lms/class/95707/document/2931693/open
Lawrence, F. C., Thomasson, R. H., Wozniak, P. J., & Prawitz, A. D. (1993). Factors relating to spousal financial arguments. Retrieved from https://www.afcpe.org/assets/pdf/vol-46.pdf
Miller, R. (n.d.). Week 3: Crosstabs. [Video file]. Retrieved from http://breeze.careeredonline.com/p1xi2oe0rfo/?launcher=false&fcsContent=true&pbMode=normal
Miller, R. (n.d.). Week 4: Exploring. [Video file]. Retrieved from http://breeze.careeredonline.com/p2nqdtzebk5/?launcher=false&fcsContent=true&pbMode=normal
Miller, R. (n.d.). Week 6: Parametric Tests. [Video file]. Retrieved from http://breeze.careeredonline.com/p7xq8uo99cm/?launcher=false&fcsContent=true&pbMode=normal
Miller, R. (n.d.). Week 8: Chi-Square Test. [Video file]. Retrieved from http://breeze.careeredonline.com/p47dupnqy1q/?launcher=false&fcsContent=true&pbMode=normal
Mulligan, W. (1973). Family Law Quarterly, 7(1), 123-128. Retrieved from http://www.jstor.org/stable/25739046

Quant: Chi-Square Test in SPSS

Introduction The aim of this analysis is to determine the association strength for the variables agecat and degree as well the major contributing cells through a chi-square analysis. Through the use of standardized residuals, it should aid in determining the cell contributions. Hypothesis Null: There is no basis of difference between the agecat and degree … Continue reading “Quant: Chi-Square Test in SPSS”

Introduction

The aim of this analysis is to determine the association strength for the variables agecat and degree as well the major contributing cells through a chi-square analysis. Through the use of standardized residuals, it should aid in determining the cell contributions.

Hypothesis

Null: There is no basis of difference between the agecat and degree
Alternative: There is are real differences between the agecat and degree

Methodology

For this project, the gss.sav file is loaded into SPSS (GSS, n.d.). The goal is to look at the relationships between the following variables: agecat (Age category) and degree (Respondent’s highest degree).

To conduct a chi-square analysis, navigate through Analyze > Descriptive Statistics > Crosstabs.

The variable degree was placed in the “Row(s)” box and agecat was placed under “Column(s)” box. Select “Statistics” button and select “Chi-square” and under the “Nominal” section select “Lambda”. Select the “Cells” button and select “Standardized” under the “Residuals” section. The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output were observed in the next four tables.

Results

Table 1: Case processing summary.


	Cases
	Valid		Missing		Total
	N	Percent	N	Percent	N	Percent
Degree * Age category	1411	99.4%	8	0.6%	1419	100.0%

From the total sample size of 1419 participants, 8 cases are reported to be missing, yielding a 99.4% response rate (Table 1). Examining the cross tabulation, for the age groups 30-39, 40-49, 50-59, and 60-89 the standardized residual is far less than -1.96 or far greater than +1.96 respectively. Thus, the frequencies between these two differ significantly. Finally, for the 60-89 age group the standardized residual is less than -1.96, making these two frequencies differ significantly. Thus, for all these frequencies, SPSS identified that the observed frequencies are far apart from the expected frequencies (Miller, n.d.). For those significant standardized residuals that are negative is pointing out that the SPSS model is over predicting people of that age group with that respective diploma (or lack thereof). For those significant standardized residuals that are positive is point out that the SPSS model is under-predicting people of that age group with a lack of a diploma.

Table 2: Degree by Age category crosstabulation.


			Age category					Total
			18-29	30-39	40-49	50-59	60-89
Degree	Less than high school	Count	42	33	36	20	112	243
	Less than high school	Standardized Residual	-.1	-2.8	-2.3	-2.7	7.1
	High school	Count	138	162	154	113	158	725
	High school	Standardized Residual	.9	.2	-.2	.4	-1.2
	Junior college or more	Count	68	115	114	78	68	443
	Junior college or more	Standardized Residual	-1.1	1.8	1.9	1.4	-3.7
Total		Count	248	310	304	211	338	1411

Deriving the degrees of freedom from Table 2, df = (5-1)*(3-1) is 8. However, none of the expected counts were less than five because the minimum expected count is 36.3 (Table 3) which is desirable. The chi-squared value is 96.364 and is significance at the 0.05 level. Thus, the null hypothesis is rejected, and there is a statistically significant association between a person’s age category and diploma level. This test doesn’t tell us anything about the directionality of the relationship.

Table 3: Chi-Square Tests


	Value	df	Asymptotic Significance (2-sided)
Pearson Chi-Square	96.364^a	8	.000
Likelihood Ratio	90.580	8	.000
Linear-by-Linear Association	23.082	1	.000
N of Valid Cases	1411
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 36.34.

Table 4: Directional Measures


			Value	Asymptotic Standard Error^a	Approximate T^b	Approximate Significance
Nominal by Nominal	Lambda	Symmetric	.029	.013	2.278	.023
		Degree Dependent	.000	.000	.^c	.^c
		Age category Dependent	.048	.020	2.278	.023
	Goodman and Kruskal tau	Degree Dependent	.024	.005		.000^d
	Goodman and Kruskal tau	Age category Dependent	.019	.004		.000^d
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
c. Cannot be computed because the asymptotic standard error equals zero.
d. Based on chi-square approximation

Since there is a statistically significant association between a person’s age category and diploma level, the chi-square test doesn’t show how much these variables are related to each other. The lambda value (when we reject the null hypothesis) is 0.029; there is a 2.9% relationship between the two variables. Thus the relationship has a very weak effect (Table 4). Thus, 2.9% of the variance is accounted for, and there is nothing going on in here.

Conclusions

There is a statistically significant association between a person’s age category and diploma level. According to the crosstabulation, the SPSS model is significantly over-predicting the number of people with less education than a high school diploma for the age groups of 20-59 as well as those with a college degree for the 60-89 age group. This difference in the standard residual helped drive a large and statistically significant chi-square value. With a lambda of 0.029, it shows that 2.9% of the variance is accounted for, and there is nothing going on in here.

SPSS Code

CROSSTABS

/TABLES=ndegree BY agecat

/FORMAT=AVALUE TABLES

/STATISTICS=CHISQ CC LAMBDA

/CELLS=COUNT SRESID

/COUNT ROUND CELL.

References:

GSS (n.d.) SPSS data file [DataSet]. Retrieved from https://classroom.coloradotech.edu/app/classResourceRedirect.html?id=2931693&url=/lms/class/95707/document/2931693/open
Miller, R. (n.d.). Week 8: Chi-Square Test. [Video file]. Retrieved from http://breeze.careeredonline.com/p47dupnqy1q/?launcher=false&fcsContent=true&pbMode=normal

Quant: Linear Regression in SPSS

Introduction

The aim of this analysis is to look at the relationship between a father’s education level (dependent variable) when you know the mother’s education level (independent variable). The variable names are “paeduc” and “maeduc.” Thus, the hope is to determine the linear regression equation for predicting the father’s education level from the mother’s education.

From the SPSS outputs the following questions will be addressed:

How much of the total variance have you accounted for with the equation?

Based upon your equation, what level of education would you predict for the father when the mother has 16 years of education?

Methodology

For this project, the gss.sav file is loaded into SPSS (GSS, n.d.). The goal is to look at the relationships between the following variables: paeduc (HIGHEST YEAR SCHOOL COMPLETED, FATHER) and maeduc (HIGHEST YEAR SCHOOL COMPLETED, MOTHER). To conduct a linear regression analysis navigate through Analyze > Regression > Linear Regression. The variable paeduc was placed in the “Dependent List” box, and maeduc was placed under “Independent(s)” box. The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output was observed in the next four tables.

The relationship between paeduc and maeduc are plotted in a scatterplot by using the chart builder. Code to run the chart builder code is shown in the code section, and the resulting image is shown in the results section.

Results

Table 1: Variables Entered/Removed


Model	Variables Entered	Variables Removed	Method
1	HIGHEST YEAR SCHOOL COMPLETED, MOTHER^b	.	Enter
a. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER
b. All requested variables entered.

Table 1, reports that for the linear regression analysis the dependent variable is the highest years of school completed for the father and the independent variable is the highest year of school completed by the mother. No variables were removed.

Table 2: Model Summary


Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.639^a	.408	.407	3.162
a. Predictors: (Constant), HIGHEST YEAR SCHOOL COMPLETED, MOTHER
b. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER

For a linear regression trying to predict the father’s highest year of school completed based on his wife’s highest year of school completed, the correlation is positive with a value of 0.639, which can only 0.408 of the variance explained (Table 2) and 0.582 of the variance is unexplained. The linear regression formula or line of best fit (Table 4) is: y = 0.76 x + (2.572 years) + e. The line of best fit essentially explains in equation form the mathematical relationship between two variables and in this case the father’s and mother’s highest education level. Thus, if the mother has completed her bachelors’ degree (16th year), then this equation would yield (y = 2.572 years + 0.76 (16 years) + e = 14.732 years + e). The e is the error in this prediction formula, and it exists because of the r2 value is not exactly -1.0 or +1.0. The ANOVA table (Table 3) describes that this relationship between these two variables is statistically significant at the 0.05 level.

Table 3: ANOVA Table


Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	6231.521	1	6231.521	623.457	.000^b
	Residual	9045.579	905	9.995
	Total	15277.100	906
a. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER
b. Predictors: (Constant), HIGHEST YEAR SCHOOL COMPLETED, MOTHER

Table 4: Coefficients


Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.
		B	Std. Error	Beta
1	(Constant)	2.572	.367		7.009	.000
	HIGHEST YEAR SCHOOL COMPLETED, MOTHER	.760	.030	.639	24.969	.000
a. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER

The image below (Figure 1), is a scatter plot, which is plotting the highest year of school completed by the mother vs. the father along with the linear regression line (Table 4) and box plot images of each respective distribution. There are more outliers in the husband’s education level compared to those of the wife’s education level, and the spread of the education level is more concentrated about the median for the husband’s education level.

Figure 1: Highest year of school completed by the mother vs the father scatter plot with regression line and box plot images of each respective distribution.

Conclusion

There is a statistically significant relation between the husband’s and wife’s highest year of education completed. The line of best-fit formula shows a moderately positive correlation and is defined as y = 0.76 x + (2.572 years) + e; which can only explain 40.8% of the variance, while 58.2% of the variance is unexplained.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

REGRESSION

/MISSING LISTWISE

/STATISTICS COEFF OUTS R ANOVA

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT paeduc

/METHOD=ENTER maeduc

/CASEWISE PLOT(ZRESID) OUTLIERS(3).

STATS REGRESS PLOT YVARS=paeduc XVARS=maeduc

/OPTIONS CATEGORICAL=BARS GROUP=1 BOXPLOTS INDENT=15 YSCALE=75

/FITLINES LINEAR APPLYTO=TOTAL.

References:

GSS (n.d.) SPSS data file [DataSet]. Retrieved from https://classroom.coloradotech.edu/app/classResourceRedirect.html?id=2931693&url=/lms/class/95707/document/2931693/open
Miller, R. (n.d.). Week 7: Regression. [Video file]. Retrieved from http://breeze.careeredonline.com/p6ioo7i8x1k/?launcher=false&fcsContent=true&pbMode=normal

Quant: ANOVA and Multiple Comparisons in SPSS

Introduction

The aim of this analysis is to look at the relationship between the dependent variable of the income level of respondents (rincdol) and the independent variable of their reported level of happiness (happy). This independent variable has at least 3 or more levels within it.

From the SPSS outputs the goal is to:

How to use the ANOVA program to determine the overall conclusion. Use of the Bonferroni correction as a post-hoc analysis to determine the relationship of specific levels of happiness to income.

Hypothesis

Null: There is no basis of difference between the overall rincdol and happy
Alternative: There is are real differences between the overall rincdol and happy
Null2: There is no basis of difference between the certain pairs of rincdol and happy
Alternative2: There is are real differences between the certain pairs of rincdol and happy

Methodology

For this project, the gss.sav file is loaded into SPSS (GSS, n.d.). The goal is to look at the relationships between the following variables: rincdol (Respondent’s income; ranges recoded to midpoints) and happy (General Happiness). To conduct a parametric analysis, navigate to Analyze > Compare Means > One-Way ANOVA. The variable rincdol was placed in the “Dependent List” box, and happy was placed under “Factor” box. Select “Post Hoc” and under the “Equal Variances Assumed” select “Bonferroni”. The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output was observed in the next two tables.

The relationship between rincdol and happy are plotted by using the chart builder. Code to run the chart builder code is shown in the code section, and the resulting image is shown in the results section.

Results

Table 1: ANOVA


Respondent’s income; ranges recoded to midpoints
	Sum of Squares	df	Mean Square	F	Sig.
Between Groups	11009722680.000	2	5504861341.000	9.889	.000
Within Groups	499905585000.000	898	556687733.900
Total	510915307700.000	900

Through the ANOVA analysis, Table 1, it shows that the overall ANOVA shows statistical significance, such that the first Null hypothesis is rejected at the 0.05 level. Thus, there is a statistically significant difference in the relationship between the overall rincdol and happy variables. However, the difference between the means at various levels.

Table 2: Multiple Comparisons


Dependent Variable: Respondent’s income; ranges recoded to midpoints
Bonferroni
(I) GENERAL HAPPINESS	(J) GENERAL HAPPINESS	Mean Difference (I-J)	Std. Error	Sig.	95% Confidence Interval
(I) GENERAL HAPPINESS	(J) GENERAL HAPPINESS	Mean Difference (I-J)	Std. Error	Sig.	Lower Bound	Upper Bound
VERY HAPPY	PRETTY HAPPY	4093.678	1744.832	.058	-91.26	8278.61
VERY HAPPY	NOT TOO HAPPY	12808.643^*	2912.527	.000	5823.02	19794.26
PRETTY HAPPY	VERY HAPPY	-4093.678	1744.832	.058	-8278.61	91.26
PRETTY HAPPY	NOT TOO HAPPY	8714.965^*	2740.045	.005	2143.04	15286.89
NOT TOO HAPPY	VERY HAPPY	-12808.643^*	2912.527	.000	-19794.26	-5823.02
NOT TOO HAPPY	PRETTY HAPPY	-8714.965^*	2740.045	.005	-15286.89	-2143.04
*. The mean difference is significant at the 0.05 level.

According to Table 2, for the pairings of “Very Happy” and “Pretty Happy” did not disprove the Null2 for that case at the 0.05 level. But, all other pairings “Very Happy” and “Not Too Happy” with “Pretty Happy” and “Not Too Happy” can reject the Null2 hypothesis at the 0.05 level. Thus, there is a difference when comparing across the three different pairs.

u3db3f1

Figure 1: Graphed means of General Happiness versus incomes.

The relationship between general happiness and income are positively correlated (Figure 1). That means that a low level of general happiness in a person usually have lower recorded mean incomes and vice versa. There is no direction or causality that can be made from this analysis. It is not that high amounts of income cause general happiness, or happy people make more money due to their positivism attitude towards life.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

ONEWAY rincdol BY happy

/MISSING ANALYSIS

/POSTHOC=BONFERRONI ALPHA(0.05).

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=happy MEAN(rincdol)[name=”MEAN_rincdol”]

MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: happy=col(source(s), name(“happy”), unit.category())

DATA: MEAN_rincdol=col(source(s), name(“MEAN_rincdol”))

GUIDE: axis(dim(1), label(“GENERAL HAPPINESS”))

GUIDE: axis(dim(2), label(“Mean Respondent’s income; ranges recoded to midpoints”))

SCALE: cat(dim(1), include(“1”, “2”, “3”))

SCALE: linear(dim(2), include(0))

ELEMENT: line(position(happy*MEAN_rincdol), missing.wings())

END GPL.

References:

GSS (n.d.) SPSS data file [DataSet]. Retrieved from https://classroom.coloradotech.edu/app/classResourceRedirect.html?id=2931693&url=/lms/class/95707/document/2931693/open
Miller, R. (n.d.). Week 6: Parametric Tests. [Video file]. Retrieved from http://breeze.careeredonline.com/p7xq8uo99cm/?launcher=false&fcsContent=true&pbMode=normal

Quant: Group Statistics in SPSS

Introduction

The aim of this analysis is to make a decision about whether a person is alive or dead ten years after a coronary is reflected in a significant difference in his diastolic blood pressure taken when that event occurred. The variable “DBP58” will be used as a dependent variable and “Vital10” as an independent variable.

From the SPSS outputs the goal is to:

Analyze these conditions to determine if there is a significant difference between the DBP levels of those (vital10) who are alive 10 years later compared to those who died within 10 years.

Hypothesis

Null: There is no basis of difference between the DBP58 and Vital10
Alternative: There is are real differences between the DBP58 and Vital10

Methodology

For this project, the electric.sav file is loaded into SPSS (Electric, n.d.). The goal is to look at the relationships between the following variables: DBP58 (Average Diastolic Blood Pressure) and Vital10 (Status at Ten Years). To conduct a parametric analysis, navigate to Analyze > Compare Means > Paired-Samples T Test. The variable DBP58 was placed in the “Test Variables” box, and Vital10 was placed under “grouping variable” box. Then select the “Define Groups” button and enter 0 for “Group 1” and 1 for “Group 2”. The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output was observed in the next two tables.

Results

Table 1: Group Statistics


	Status at Ten Years	N	Mean	Std. Deviation	Std. Error Mean
Average Diast Blood Pressure 58	Alive	178	87.56	11.446	.858
Average Diast Blood Pressure 58	Dead	61	92.38	16.477	2.110

According to the results in Table 1, the mean diastolic blood pressure of those who have passed away ten years later was 5 points higher and had a huge standard deviation. Thus, those who are alive ten years later have a smaller variation of their diastolic blood pressure.

Table 2: Independent Samples Test


		Levene’s Test for Equality of Variances		t-test for Equality of Means
		F	Sig.	t	df	Sig. (2-tailed)	Mean Difference	Std. Error Difference	95% Confidence Interval of the Difference
									Lower	Upper
Average Diast Blood Pressure 58	Equal variances assumed	8.815	.003	-2.515	237	.013	-4.815	1.915	-8.587	-1.043
	Equal variances not assumed			-2.114	80.735	.038	-4.815	2.277	-9.347	-.284

According to the independent t-test for equality of means, shows that there is no equality in the variance at the 0.05 level, such that when equal variances are not assumed, the null hypothesis could be rejected at the 0.05 level because the significance value is 0.038. Thus, there is a statistically significant difference between the means of diastolic blood pressure of those who are alive and those who have passed away.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

T-TEST GROUPS=vital10(0 1)

/MISSING=ANALYSIS

/VARIABLES=dbp58

/CRITERIA=CI(.95).

References:

Electric (n.d.) SPSS data file [DataSet]. Retrieved from https://classroom.coloradotech.edu/app/classResourceRedirect.html?id=2931690&url=/lms/class/95707/document/2931690/open
Miller, R. (n.d.). Week 6: Parametric Tests. [Video file]. Retrieved from http://breeze.careeredonline.com/p7xq8uo99cm/?launcher=false&fcsContent=true&pbMode=normal

Quant: Paired Sample Statistics in SPSS

Introduction

The aim of this analysis is to conduct a comparison of productivity under two organizational structures: The data are artificial estimates of productivity with column 1 representing traditional vertical management and column 2 representing other autonomous work teams (ATW). The background is that a company of 100 factory workers had been operating under traditional vertical management and decided to move to ATW. The same employees were involved in both systems having first worked under vertical management and then being converted to ATW.

From the SPSS outputs the goal is to:

Analyze the productivity levels of the 2 management approaches, and decide which is superior.

Hypothesis

Null: There is no basis of difference between the prodpre and prodpost
Alternative: There is are real differences between the prodpre and prodpost

Methodology

For this project, the atw.sav file is loaded into SPSS (ATW, n.d.). The goal is to look at the relationships between the following variables: prodpre (productivity level preceding the new process) and prodpost (productivity level following the new process). To conduct a parametric analysis, navigate to Analyze > Compare Means > Paired-Samples T Test. The variable prodpre was placed in the “Paired Variables” box under “Pair” 1 and “Variable 1”, and prodpost was placed under “Pair” 1 and “Variable 2”. The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output was observed in the next three tables.

Results

Table 1: Paired Sample Statistics

		Mean	N	Std. Deviation	Std. Error Mean
Pair 1	productivity level preceding the new process	76.43	100	16.820	1.682
Pair 1	productivity level following the new process	84.24	100	9.797	.980

Descriptively, productivity on average increased by 8 points, and the standard deviation about the mean decreased by 7 points. This means that the estimates of productivity under the traditional vertical management are less than and showcase a wider spread than those of the estimates of productivity under the autonomous work teams. Essentially these distributions tell the story that the workers are getting better productivity estimates with less deviation under autonomous work teams.

Table 2: Paired Samples Correlation


		N	Correlation	Sig.
Pair 1	productivity level preceding the new process & productivity level following the new process	100	.040	.695

Based on Table 2, there is a weak correlation (r = 0.040) between the estimates of productivity under the traditional vertical management and the autonomous work teams. Although correlation does not imply causation.

Table 3: Paired Samples Test


		Paired Differences					t	df	Sig. (2-tailed)
		Mean	Std. Deviation	Std. Error Mean	95% Confidence Interval of the Difference
		Mean	Std. Deviation	Std. Error Mean	Lower	Upper
Pair 1	productivity level preceding the new process – productivity level following the new process	-7.817	19.126	1.913	-11.612	-4.022	-4.087	99	.000

Based on the results from the 2-tailed student t-tests (Table 3), the null hypothesis can be rejected. There is a significant difference between the two variables prodpre and prodpost at the 0.05 level or less. The data based on 100 workers (with degrees of freedom of 99) show that there is a significance in the estimates of productivity under the traditional vertical management and the autonomous work teams.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

T-TEST PAIRS=prodpre WITH prodpost (PAIRED)

/CRITERIA=CI(.9500)

/MISSING=ANALYSIS.

References:

ATW (n.d.) SPSS data file [DataSet]. Retrieved from https://classroom.coloradotech.edu/app/classResourceRedirect.html?id=2931684&url=/lms/class/95707/document/2931684/open
Miller, R. (n.d.). Week 6: Parametric Tests. [Video file]. Retrieved from http://breeze.careeredonline.com/p7xq8uo99cm/?launcher=false&fcsContent=true&pbMode=normal

Quant: Exploring Data with SPSS

Introduction

The aim of this analysis is to run a distribution analysis on diastolic blood pressure (DBP58), examining the following for individuals who have had no history of cardiovascular heart disease and individuals with a history of cardiovascular heart disease (CHD). The variable that looks at individual history is CHD.

From the SPSS outputs the following questions will be addressed:

What can be determined from the measures of skewness and kurtosis about a normal curve? What are the mean and median?
Does one seem better than the other to represent the scores?
What differences can be seen in the pattern of responses of those with history versus those with no history?
What information can be determined from the box plots?

Methodology

For this project, the electric.sav file is loaded into SPSS (Electric, n.d.). The goal is to look at the relationships between the following variables: DBP58 (Average Diastolic Blood Pressure) and CHD (Incidence of Coronary Heart Disease). To conduct a descriptive analysis, navigate through Analyze > Descriptive Analytics > Explore. The variable DBP58 was placed in the “Dependent List” box, and CHD was placed on the “Factor List” box. Then on the Explore dialog box, “Statistics” button was clicked, and in this dialog box “Descriptives” at the 95% “Confidence interval for the mean” is selected along with outliers and percentiles. Then going back to the on the Explore dialog box, “Plots” button was clicked, and in this dialog box under the “Boxplot” section only “Factor levels together” was selected, under the “Descriptive” section, both options were selected, and the “Spread vs. Level with Levene Test” section, “None” was selected. The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output was observed in the next four tables and five figures.

Results

Table 1: Case Processing Summary.

	Incidence of Coronary Heart Disease	Cases
		Valid		Missing		Total
		N	Percent	N	Percent	N	Percent
Average Diast Blood Pressure 58	none	119	99.2%	1	0.8%	120	100.0%
Average Diast Blood Pressure 58	chd	120	100.0%	0	0.0%	120	100.0%

According to Table 1, 99.2% or greater of the data is valid and not missing for when there is a history of Coronary Heart Disease (CHD) and when there isn’t. There is one missing data point in the case with no history of CHD. This data set contains 120 participants.

Table 2: Descriptive Statistics on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

	Incidence of Coronary Heart Disease			Statistic	Std. Error
Average Diast Blood Pressure 58	none	Mean		87.66	1.005
		95% Confidence Interval for Mean	Lower Bound	85.66
			Upper Bound	89.65
		5% Trimmed Mean		87.31
		Median		87.00
		Variance		120.312
		Std. Deviation		10.969
		Minimum		65
		Maximum		125
		Range		60
		Interquartile Range		15
		Skewness		.566	.222
		Kurtosis		.671	.440
	chd	Mean		89.92	1.350
		95% Confidence Interval for Mean	Lower Bound	87.24
			Upper Bound	92.59
		5% Trimmed Mean		88.89
		Median		87.00
		Variance		218.732
		Std. Deviation		14.790
		Minimum		65
		Maximum		160
		Range		95
		Interquartile Range		18
		Skewness		1.406	.221
		Kurtosis		3.620	.438

According to Table 2, there is a difference in the mean by +2 points and +0.345 in standard error in Diastolic Blood Pressure with CHD compared to when there isn’t. The median for both cases of CHD or not are 87, with the mean for patients with CHD 89.92 (slightly skewed) and that can be seen with a skewness of 1.406 and a kurtosis of 3.620. For the cases without a CHD, the mean blood pressure is 87.66 (showing little to now skewness in the data), as evident by the skewness of 0.566 and kurtosis of 0.671. Upon further inspection of Figures 1 & 2, the skewness or lack thereof seems to appear to be the result of some outliers. The box plot in Figure 3 confirms these outliers. The kurtosis values of 0.671 and 3.620 indicate they are Leptokurtic, which means they have higher peaks in their distribution and deviate from a normal distribution.

u2db3f1

Figure 1: Histogram on the Incidents of Coronary Heart Disease = none and the Average Diastolic Blood Pressure.

Figure 2: Histogram on the Incidents of Coronary Heart Disease = chd and the Average Diastolic Blood Pressure.

Figure 3: Box plots on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

Comparing the two histograms in Figures 1 & 2, there is a negative skewness to the data when there is CHD compared to when there isn’t. The spread between the two histograms increases by about 3.7 points (the standard deviation from the mean) when there is CHD. This shows that blood pressure in the sample population can vary greatly if there is CHD, whereas blood pressure is a bit more stable in the sample population that doesn’t have CHD. Looking at the range of these the average diastolic blood pressure, if there is a CHD, then it increases, which is supported by the greater standard deviation number, and can be seen in Figure 3. In the case with no CHD the interquartile range (which represents the middle 50% of the participants) is smaller than the participants with CHD. Participant 120 was excluded from the interquartile range due to its extreme nature.

Table 3: Percentiles on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

		Incidence of Coronary Heart Disease	Percentiles
		Incidence of Coronary Heart Disease	5	10	25	50	75	90	95
Weighted Average (Definition 1)	Average Diast Blood Pressure 58	none	71.00	75.00	80.00	87.00	95.00	102.00	105.00
Weighted Average (Definition 1)	Average Diast Blood Pressure 58	chd	70.05	75.00	80.00	87.00	98.00	109.90	117.95
Tukey’s Hinges	Average Diast Blood Pressure 58	none			80.00	87.00	94.50
Tukey’s Hinges	Average Diast Blood Pressure 58	chd			80.00	87.00	98.00

In Table 3, the percentiles on the incidents of CHD on the average diastolic blood pressure is mapped out. 95 % of all cases exist below 105 (117.95) diastolic blood pressure for no history of CHD (for the history of CHD). These percentiles show that in the case where there is no CHD, the diastolic blood pressure values are centered more towards the median value of 87, which is supported by the above-mentioned Tables and Figures.

Table 4: Extreme Values on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

	Incidence of Coronary Heart Disease			Case Number	Value
Average Diast Blood Pressure 58	none	Highest	1	163	125
			2	232	119
			3	144	115
			4	126	110
			5	131	109
		Lowest	1	157	65
			2	156	65
			3	175	68
			4	153	68
			5	237	69
	chd	Highest	1	120	160
			2	56	133
			3	42	125
			4	26	121
			5	111	120
		Lowest	1	73	65
			2	34	68
			3	101	70
			4	33	70
			5	7	70^a
a. Only a partial list of cases with the value 70 are shown in the table of lower extremes.

Examining the extreme values through Table 4, the top 5 and lowest 5 cases are considered. In the case were there is no CHD, the lowest diastolic blood pressure value can be seen as 65 which is the same as those with CHD. However, in the highest diastolic blood pressure value, there is a 35 point greater difference for the highest case with CHD on the highest case without CHD.

Frequency Stem & Leaf
.00 6 .
5.00 6 . 55889
4.00 7 . 1144
18.00 7 . 555677777777888899
21.00 8 . 000000000001122223344
21.00 8 . 555556666777777888999
20.00 9 . 00000111111222233334
14.00 9 . 55666777888899
8.00 10 . 00012233
4.00 10 . 5559
1.00 11 . 0
1.00 11 . 5
2.00 Extremes (>=119)
Stem width: 10
Each leaf: 1 case(s)

Figure 4: Stem and leaf plot on the Incidents of Coronary Heart Disease = none and the Average Diastolic Blood Pressure.

Frequency Stem & Leaf
.00 6 .
2.00 6 . 58
9.00 7 . 000012233
14.00 7 . 55555677788899
23.00 8 . 00000000000111233333344
24.00 8 . 555556667777777788999999
11.00 9 . 00001122223
13.00 9 . 6677788888999
5.00 10 . 02333
7.00 10 . 5557789
4.00 11 . 0003
3.00 11 . 578
2.00 12 . 01
1.00 12 . 5
2.00 Extremes (>=133)
Stem width: 10
Each leaf: 1 case(s)

Figure 5: Stem and leaf plot on the Incidents of Coronary Heart Disease = chd and the Average Diastolic Blood Pressure.

Figures 4 and 5 show more detail than the histogram information by stating the actual frequency to the left of the Stem values as well as stating what is considered to be extreme values. In the case of CHD, a diastolic blood pressure greater than 133 is considered an outlier and when there is no CHD the extreme values are considered to be a diastolic blood pressure of 119 or more.

Conclusions

There is a difference between the distributions of those participants that have a history of Coronary Heart Disease (CHD) and those that don’t on their average diastolic blood pressure. This is represented through the range, skewness, and distribution between both groups. Both groups have similar medians, and lowest values, but vary greatly in the mean, standard deviation and highest values of diastolic blood pressure.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

EXAMINE VARIABLES=dbp58 BY chd

/PLOT BOXPLOT STEMLEAF HISTOGRAM

/COMPARE GROUPS

/PERCENTILES(5,10,25,50,75,90,95) HAVERAGE

/STATISTICS DESCRIPTIVES EXTREME

/CINTERVAL 95

/MISSING LISTWISE

/NOTOTAL.

References:

Electric (n.d.) SPSS data file [DataSet]. Retrieved from https://classroom.coloradotech.edu/app/classResourceRedirect.html?id=2931690&url=/lms/class/95707/document/2931690/open
Miller, R. (n.d.). Week 4: Exploring. [Video file]. Retrieved from http://breeze.careeredonline.com/p2nqdtzebk5/?launcher=false&fcsContent=true&pbMode=normal

Quant: Crosstabs in SPSS

Introduction

The aim of this analysis is to answer the question, if someone was rich, would they continue or stop working on their highest degree earned, gender, and job satisfaction.

Methodology

For this project, the gss.sav file is loaded into SPSS (GSS, n.d.). The goal is to look at the relationships between the following variables: richwork (being wealthy), sex (demographics of gender), satjob (satisfaction level with the job), and degree (education degree level). The variable richwork is the dependent variable and the other three variables are considered independent variables for this analysis. To conduct a crosstabs analysis, navigate through Analyze > Descriptive Analytics > Crosstabs. The variable richwork was placed in the “Row(s)” box, and the other three variables were placed in the “Column(s)” box. Then on the crosstabs dialog box, “Cells” button was clicked, and under the “Counts” section “Observed” was selected and all three boxes were seleceted under the “Percentages” section. The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output was observed in the next four tables.

Results

Table 1: Cases Processing Summary.

	Cases
	Valid		Missing		Total
	N	Percent	N	Percent	N	Percent
IF RICH, CONTINUE OR STOP WORKING * Respondent’s highest degree	625	44.0%	794	56.0%	1419	100.0%
IF RICH, CONTINUE OR STOP WORKING * Respondent’s sex	628	44.3%	791	55.7%	1419	100.0%
IF RICH, CONTINUE OR STOP WORKING * JOB OR HOUSEWORK	624	44.0%	795	56.0%	1419	100.0%

According to Table 1, about 44% (~625) of all cases are valid in all three scenarios and about 56% (~793) had missing data, from a total of 1419 respondents.

Table 2: If rich do people continue or stop working with respondent’s highest degree cross tabulation.

			Respondent’s highest degree									Total
			Less than HS	High school		Junior college		Bachelor		Graduate
IF RICH, CONTINUE OR STOP WORKING	CONTINUE WORKING	Count	52		210		39		84		36		421
		% within IF RICH, CONTINUE OR STOP WORKING	12.4%		49.9%		9.3%		20.0%		8.6%		100.0%
		% within Respondent’s highest degree	69.3%		64.6%		81.3%		67.2%		69.2%		67.4%
		% of Total	8.3%		33.6%		6.2%		13.4%		5.8%		67.4%
	STOP WORKING	Count	23		115		9		41		16		204
		% within IF RICH, CONTINUE OR STOP WORKING	11.3%		56.4%		4.4%		20.1%		7.8%		100.0%
		% within Respondent’s highest degree	30.7%		35.4%		18.8%		32.8%		30.8%		32.6%
		% of Total	3.7%		18.4%		1.4%		6.6%		2.6%		32.6%
Total		Count	75		325		48		125		52		625
		% within IF RICH, CONTINUE OR STOP WORKING	12.0%		52.0%		7.7%		20.0%		8.3%		100.0%
		% within Respondent’s highest degree	100.0%		100.0%		100.0%		100.0%		100.0%		100.0%
		% of Total	12.0%		52.0%		7.7%		20.0%		8.3%		100.0%

According to Table 2, with further analysis on whether or not people would continue or stop working, 67.4% would stay, and 32.6% would stop working. In our data about 12% have less than a high school diploma, 52% have a high school diploma, 7.7% have a gone to junior college, 20% have a bachelor degree and 8.3% have a graduate degree. With further analysis with respect to whether or not people would continue or stop working with respect to the respondent’s highest degree earned, 56.4% of respondents who have only a high school diploma would choose to leave work if they were rich making them the biggest demographic to leave in this “what if” scenario. Finally, 81.3% of those with a junior college degree would stay at their job if they were rich, making them the biggest demographic to stay in this “what if” scenario. Those with a high school diploma, bachelor degree or graduate degree were approximately 65-69% more likely to continue working if they were rich.

Table 3: If rich do people continue or stop working with respondent’s gender cross tabulation.

			Respondent’s sex		Total
			Male	Female	Total
IF RICH, CONTINUE OR STOP WORKING	CONTINUE WORKING	Count	214	209	423
		% within IF RICH, CONTINUE OR STOP WORKING	50.6%	49.4%	100.0%
		% within Respondent’s sex	69.3%	65.5%	67.4%
		% of Total	34.1%	33.3%	67.4%
	STOP WORKING	Count	95	110	205
		% within IF RICH, CONTINUE OR STOP WORKING	46.3%	53.7%	100.0%
		% within Respondent’s sex	30.7%	34.5%	32.6%
		% of Total	15.1%	17.5%	32.6%
Total		Count	309	319	628
		% within IF RICH, CONTINUE OR STOP WORKING	49.2%	50.8%	100.0%
		% within Respondent’s sex	100.0%	100.0%	100.0%
		% of Total	49.2%	50.8%	100.0%

In our sample data set about 49.2% were male and 50.8% were female, according to Table 3. With further analysis on whether or not people would continue or stop working on the respondent’s gender, 34.5% of women and 30.7% of men would choose to leave work if they were rich. Gender doesn’t seem to be as strong of an indicator to help determine if a respondent were more likely to continue or stop working if they were rich in this “what if” scenario.

Table 4: If rich would people continue or stop working with respondent’s job satisfaction cross tabulation.

			JOB OR HOUSEWORK				Total
			VERY SATISFIED	MOD. SATISFIED	A LITTLE DISSAT	VERY DISSATISFIED
IF RICH, CONTINUE OR STOP WORKING	CONTINUE WORKING	Count	199	172	36	14	421
		% within IF RICH, CONTINUE OR STOP WORKING	47.3%	40.9%	8.6%	3.3%	100.0%
		% within JOB OR HOUSEWORK	71.8%	64.9%	60.0%	63.6%	67.5%
		% of Total	31.9%	27.6%	5.8%	2.2%	67.5%
	STOP WORKING	Count	78	93	24	8	203
		% within IF RICH, CONTINUE OR STOP WORKING	38.4%	45.8%	11.8%	3.9%	100.0%
		% within JOB OR HOUSEWORK	28.2%	35.1%	40.0%	36.4%	32.5%
		% of Total	12.5%	14.9%	3.8%	1.3%	32.5%
Total		Count	277	265	60	22	624
		% within IF RICH, CONTINUE OR STOP WORKING	44.4%	42.5%	9.6%	3.5%	100.0%
		% within JOB OR HOUSEWORK	100.0%	100.0%	100.0%	100.0%	100.0%
		% of Total	44.4%	42.5%	9.6%	3.5%	100.0%

In our sample data set about 49.2% were male and 50.8% were female, according to Table 3. With further analysis on whether or not people would continue or stop working on the respondent’s gender, 34.5% of women and 30.7% of menFinally, in Table 4, about 44.4% of respondents are very satisfied at work, 42.5% of respondents are moderately satisfied at work, 3.8% of respondents are moderately dissatisfied at work, and 1.3% of respondents are very dissatisfied at work. With further analysis on whether or not people would continue or stop working on the respondent’s job satisfaction level, 40% of respondents who are moderately dissatisfied would choose to leave work if they were rich making them the biggest demographic to leave in this “what if” scenario. In fact, if the respondents were anything but very satisfied with their job, they had an approximately 7-12% chance increase of wanting to leave their jobs if not rich. This illustrates that 71.8% of those who are very satisfied with their jobs would stay at their job if they were rich, making them the biggest demographic to stay in this “what if” scenario.

Conclusions

Overall, this analysis has shown that to answer the question, if someone was rich, would they continue or stop working on their highest degree earned, and job satisfaction may have a contributing factor to the respondent’s decision in this “what if” scenario. However, gender may not play an important role in answering this question.

Would choose to leave work if they were rich. Gender doesn’t seem to be as strong of an indicator to help determine if a respondent were more likely to continue or stop working if they were rich in this “what if” scenario.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

CROSSTABS

/TABLES=richwork BY degree sex satjob

/FORMAT=AVALUE TABLES

/CELLS=COUNT ROW COLUMN TOTAL

/COUNT ROUND CELL.

References:

GSS (n.d.) SPSS data file [DataSet]. Retrieved from https://classroom.coloradotech.edu/app/classResourceRedirect.html?id=2931693&url=/lms/class/95707/document/2931693/open
Miller, R. (n.d.). Week 3: Crosstabs. [Video file]. Retrieved from http://breeze.careeredonline.com/p1xi2oe0rfo/?launcher=false&fcsContent=true&pbMode=normal