## Quant: Parametric and Non-Parametric Stats

Parametric statistics is inferential and based on random sampling from a well-defined population, and that the sample data is making strict inferences about the population’s parameters. Thus tests like t-tests, chi-square, f-tests (ANOVA) can be used (Huck, 2011; Schumacker, 2014).  Nonparametric statistics, “assumption-free tests”, is used for tests that are using ranked data like Mann-Whitney U-test, Wilcoxon Signed-Rank test, Kruskal-Wallis H-test, and chi-square (Field, 2013; Huck, 2011).

First, there is a need to define the types of data.  Continuous data is interval/ratio data, and categorical data is nominal/ordinal data.  Modified from Schumacker (2014) with data added from Huck (2011):

 Statistic Dependent Variable Independent Variable Analysis of Variance (ANOVA) One way Continuous Categorical t-Tests Single Sample Continuous Independent groups Continuous Categorical Dependent (paired groups) Continuous Categorical Chi-square Categorical Categorical Mann-Whitney U-test Ordinal Ordinal Wilcoxon Ordinal Ordinal Kruskal-Wallis H-test Ordinal Ordinal

ANOVAs (or F-tests) are used to analyze the differences in a group of three or more means, through studying the variation between the groups, and tests the null hypothesis to see if the means between the groups are equal (Huck, 2011). Student t-tests, or t-tests, test as a null hypothesis that the mean of a population has some specified number and is used when the sample size is relatively small compared to the population size (Field, 2013; Huck, 2011; Schumacker, 2014).  The test assumes a normal distribution (Huck, 2011). With large sample sizes, t-test/values are the same as z-tests/values, the same can happen with chi-square, as t and chi-square are distributions with samples size in their function (Schumacker, 2014).  In other words, at large sample sizes the t-distribution and chi-square distribution begin to look like a normal curve.  Chi-square is related to the variance of a sample, and the chi-square tests are used for testing the null hypothesis, which is the sample mean is part of a normal distribution (Schumacker, 2014).  Chi-square tests are so versatile it can be used as a parametric and non-parametric test (Field, 2013; Huck, 2011; Schumacker, 2014).

The Mann-Whiteney U-test and Wilcox signed-rank test are both equivalent, since they are the non-parametric equivalent of the t-tests and the samples don’t even have to be of the same sample length (Field, 2013).

The nonparametric Mann-Whitney U-test can be substituted for a t-test when the normal distribution cannot be assumed and was designed for two independent samples that do not have repeated measures (Field, 2013; Huck, 2011). Thus, this makes this a great substitution for the independent group’s t-test (Field, 2013). A benefit of choosing the Mann-Whitney U test is that it probably will not produce type II error-false negative (Huck, 2011). The null hypothesis is that the two independent samples come from the same population (Field, 2013; Huck, 2011).

The nonparametric Wilcoxon signed-rank test is best for distributions that are skewed, where variance homogeneity cannot be assumed, and a normal distribution cannot be assumed (Field, 2013; Huck, 2011).  Wilcoxon signed test can help compare two related/correlated samples from the same population (Huck, 2011). Each pair of data is chosen randomly and independently and not repeating between the pairs (Huck, 2011).  This is a great substitution for the dependent t-tests (Field, 2013; Huck, 2011).  The null hypothesis is that the central tendency is 0 (Huck, 2011).

The nonparametric Kruskal-Wallis H-test can be used to compare two or more independent samples from the same distribution, which is considered to be like a one-way analysis of variance (ANOVA) and focuses on central tendencies (Huck, 2011).  It is usually an extension of the Mann-Whitney U-test (Huck, 2011). The null hypothesis is that the medians in all groups are equal (Huck, 2011).

References

• Field, A. (2013) Discovering Statistics Using IBM SPSS Statistics (4th ed.). UK: Sage Publications Ltd. VitalBook file.
• Huck, S. W. (2011) Reading Statistics and Research (6th ed.). Pearson Learning Solutions. VitalBook file.
• Schumacker, R. E. (2014) Learning statistics using R. California, SAGE Publications, Inc, VitalBook file.

## Quant: Validity and Reliability

the construction process of a survey that would ensure a valid & reliable assessment instrument

Most flaws in research methodology exist because the validity and reliability weren’t established (Gall, Gall, & Borg, 2006). Thus, it is important to ensure a valid and reliable assessment instrument.  So, in using any existing survey as an assessment instrument, one should report the instrument’s: development, items, scales, reports on reliability, and reports on validity through past uses (Creswell, 2014; Joyner, 2012).  Permission must be secured for using any instrument and placed in the appendix (Joyner, 2012).    The validity of the assessment instrument is key to drawing meaningful and useful statistical inferences (Creswell, 2014). Creswell (2014), stated that there are multiple types of validity that can exist in the instruments: content validity (measuring what we want), predictive or concurrent validity (measurements aligned with other results), construct validity (measuring constructs or concepts).  Establishing validity in the assessment instrument helps ensure that it’s the best instrument to use for the right situation.  Reliability in assessments instruments is when authors report that the assessment instrument has internal consistency and have been tested multiple times to ensure stable results every single time (Creswell, 2014).

Unfortunately, picking up an assessment instrument that doesn’t match the content exactly will not benefit anyone, nor will the results be accepted by the greater community.  Modifying an assessment instrument that doesn’t quite match completely, can damage the reliability of this new version of the instrument, and it can take huge amounts of time to establish validity and reliability on this new version of the instrument (Creswell, 2014).  Also creating a brand new assessment instrument would mean extensive pilot studies and tests, along with an explanation of how it was developed to help establish the instrument’s validity and reliability (Joyner, 2012).

Selecting a target group for the administration of the survey

Through sampling of a population and using a valid and reliable survey instrument for assessment, attitudes and opinions about a population could be correctly inferred from the sample (Creswell, 2014).  Thus, not only is validity and reliability important but selecting the right target group for the survey is key.  A targeted group for this survey means that the population in which information will be inferred from must be stratified, which means that the characters of the population are known ahead of time (Creswell, 2014; Gall et al. 2006). From this stratified population, is where a random sampling of participants should be selected from, to ensure that statistical inference could be made for that population (Gall et al., 2006). Sometimes a survey instrument doesn’t fit those in the target group. Thus it would not produce valid nor reliable inferences for the targeted population. One must select a targeted population and determine the size of that stratified population (Creswell, 2014).  Finally, one must consider the sample size of the targeted group.

Administrative procedure to maximize the consistency of the survey

Once a stratified population and a random sample from that population have been carefully selected, there is a need to maximize the consistency of the survey.  Thus, researchers must take into account the availability of sampling, through either mail, email, website, or other survey tools like SurveyMonkey.com are ways to gather data (Creswell 2014). However, mail has a low rate of return (Miller, n.d.), so face-to-face methods or online the use of online providers may be the best bet to maximize the consistency of the survey.

References

Creswell, J. W. (2014) Research design: Qualitative, quantitative and mixed method approaches (4th ed.). California, SAGE Publications, Inc. VitalBook file.

Gall, M. D., Joyce Gall, Walter Borg. Educational Research: An Introduction (8th ed.). Pearson Learning Solutions. VitalBook file.

Joyner, R. L. (2012) Writing the Winning Thesis or Dissertation: A Step-by-Step Guide (3rd ed.). Corwin. VitalBook file.

Miller, R. (n.d.). Week 5: Research study construction. [Video file]. Retrieved from http://breeze.careeredonline.com/p8v1ruos1j1/?launcher=false&amp;fcsContent=true&amp;pbMode=normal

## Quant: Exploring Data with SPSS

Introduction

The aim of this analysis is to run a distribution analysis on diastolic blood pressure (DBP58), examining the following for individuals who have had no history of cardiovascular heart disease and individuals with a history of cardiovascular heart disease (CHD). The variable that looks at individual history is CHD.

From the SPSS outputs the following questions will be addressed:

• What can be determined from the measures of skewness and kurtosis about a normal curve? What are the mean and median?
• Does one seem better than the other to represent the scores?
• What differences can be seen in the pattern of responses of those with history versus those with no history?
• What information can be determined from the box plots?

Methodology

For this project, the electric.sav file is loaded into SPSS (Electric, n.d.).  The goal is to look at the relationships between the following variables: DBP58 (Average Diastolic Blood Pressure) and CHD (Incidence of Coronary Heart Disease). To conduct a descriptive analysis, navigate through Analyze > Descriptive Analytics > Explore.  The variable DBP58 was placed in the “Dependent List” box, and CHD was placed on the “Factor List” box.  Then on the Explore dialog box, “Statistics” button was clicked, and in this dialog box “Descriptives” at the 95% “Confidence interval for the mean” is selected along with outliers and percentiles.  Then going back to the on the Explore dialog box, “Plots” button was clicked, and in this dialog box under the “Boxplot” section only “Factor levels together” was selected, under the “Descriptive” section, both options were selected, and the “Spread vs. Level with Levene Test” section, “None” was selected.  The procedures for this analysis are provided in video tutorial form by Miller (n.d.). The following output was observed in the next four tables and five figures.

Results

Table 1: Case Processing Summary.

 Incidence of Coronary Heart Disease Cases Valid Missing Total N Percent N Percent N Percent Average Diast Blood Pressure 58 none 119 99.2% 1 0.8% 120 100.0% chd 120 100.0% 0 0.0% 120 100.0%

According to Table 1, 99.2% or greater of the data is valid and not missing for when there is a history of Coronary Heart Disease (CHD) and when there isn’t. There is one missing data point in the case with no history of CHD. This data set contains 120 participants.

Table 2: Descriptive Statistics on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

 Incidence of Coronary Heart Disease Statistic Std. Error Average Diast Blood Pressure 58 none Mean 87.66 1.005 95% Confidence Interval for Mean Lower Bound 85.66 Upper Bound 89.65 5% Trimmed Mean 87.31 Median 87.00 Variance 120.312 Std. Deviation 10.969 Minimum 65 Maximum 125 Range 60 Interquartile Range 15 Skewness .566 .222 Kurtosis .671 .440 chd Mean 89.92 1.350 95% Confidence Interval for Mean Lower Bound 87.24 Upper Bound 92.59 5% Trimmed Mean 88.89 Median 87.00 Variance 218.732 Std. Deviation 14.790 Minimum 65 Maximum 160 Range 95 Interquartile Range 18 Skewness 1.406 .221 Kurtosis 3.620 .438

According to Table 2, there is a difference in the mean by +2 points and +0.345 in standard error in Diastolic Blood Pressure with CHD compared to when there isn’t.  The median for both cases of CHD or not are 87, with the mean for patients with CHD 89.92 (slightly skewed) and that can be seen with a skewness of 1.406 and a kurtosis of 3.620.  For the cases without a CHD, the mean blood pressure is 87.66 (showing little to now skewness in the data), as evident by the skewness of 0.566 and kurtosis of 0.671.  Upon further inspection of Figures 1 & 2, the skewness or lack thereof seems to appear to be the result of some outliers. The box plot in Figure 3 confirms these outliers.  The kurtosis values of 0.671 and 3.620 indicate they are Leptokurtic, which means they have higher peaks in their distribution and deviate from a normal distribution.

Figure 1: Histogram on the Incidents of Coronary Heart Disease = none and the Average Diastolic Blood Pressure.

Figure 2: Histogram on the Incidents of Coronary Heart Disease = chd and the Average Diastolic Blood Pressure.

Figure 3: Box plots on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

Comparing the two histograms in Figures 1 & 2, there is a negative skewness to the data when there is CHD compared to when there isn’t.  The spread between the two histograms increases by about 3.7 points (the standard deviation from the mean) when there is CHD.  This shows that blood pressure in the sample population can vary greatly if there is CHD, whereas blood pressure is a bit more stable in the sample population that doesn’t have CHD.  Looking at the range of these the average diastolic blood pressure, if there is a CHD, then it increases, which is supported by the greater standard deviation number, and can be seen in Figure 3.  In the case with no CHD the interquartile range (which represents the middle 50% of the participants) is smaller than the participants with CHD. Participant 120 was excluded from the interquartile range due to its extreme nature.

Table 3: Percentiles on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

 Incidence of Coronary Heart Disease Percentiles 5 10 25 50 75 90 95 Weighted Average (Definition 1) Average Diast Blood Pressure 58 none 71.00 75.00 80.00 87.00 95.00 102.00 105.00 chd 70.05 75.00 80.00 87.00 98.00 109.90 117.95 Tukey’s Hinges Average Diast Blood Pressure 58 none 80.00 87.00 94.50 chd 80.00 87.00 98.00

In Table 3, the percentiles on the incidents of CHD on the average diastolic blood pressure is mapped out.  95 % of all cases exist below 105 (117.95) diastolic blood pressure for no history of CHD (for the history of CHD).  These percentiles show that in the case where there is no CHD, the diastolic blood pressure values are centered more towards the median value of 87, which is supported by the above-mentioned Tables and Figures.

Table 4: Extreme Values on the Incidents of Coronary Heart Disease and the Average Diastolic Blood Pressure.

 Incidence of Coronary Heart Disease Case Number Value Average Diast Blood Pressure 58 none Highest 1 163 125 2 232 119 3 144 115 4 126 110 5 131 109 Lowest 1 157 65 2 156 65 3 175 68 4 153 68 5 237 69 chd Highest 1 120 160 2 56 133 3 42 125 4 26 121 5 111 120 Lowest 1 73 65 2 34 68 3 101 70 4 33 70 5 7 70a a. Only a partial list of cases with the value 70 are shown in the table of lower extremes.

Examining the extreme values through Table 4, the top 5 and lowest 5 cases are considered.  In the case were there is no CHD, the lowest diastolic blood pressure value can be seen as 65 which is the same as those with CHD.  However, in the highest diastolic blood pressure value, there is a 35 point greater difference for the highest case with CHD on the highest case without CHD.

•  Frequency    Stem &  Leaf
•       .00        6 .
•      5.00        6 .  55889
•      4.00        7 .  1144
•     18.00        7 .  555677777777888899
•     21.00        8 .  000000000001122223344
•     21.00        8 .  555556666777777888999
•     20.00        9 .  00000111111222233334
•     14.00        9 .  55666777888899
•      8.00       10 .  00012233
•      4.00       10 .  5559
•      1.00       11 .  0
•      1.00       11 .  5
•      2.00 Extremes    (>=119)
•  Stem width:   10
•  Each leaf:        1 case(s)

Figure 4: Stem and leaf plot on the Incidents of Coronary Heart Disease = none and the Average Diastolic Blood Pressure.

•  Frequency    Stem &  Leaf
•       .00        6 .
•      2.00        6 .  58
•      9.00        7 .  000012233
•     14.00        7 .  55555677788899
•     23.00        8 .  00000000000111233333344
•     24.00        8 .  555556667777777788999999
•     11.00        9 .  00001122223
•     13.00        9 .  6677788888999
•      5.00       10 .  02333
•      7.00       10 .  5557789
•      4.00       11 .  0003
•      3.00       11 .  578
•      2.00       12 .  01
•      1.00       12 .  5
•      2.00 Extremes    (>=133)
•  Stem width:   10
•  Each leaf:        1 case(s)

Figure 5: Stem and leaf plot on the Incidents of Coronary Heart Disease = chd and the Average Diastolic Blood Pressure.

Figures 4 and 5 show more detail than the histogram information by stating the actual frequency to the left of the Stem values as well as stating what is considered to be extreme values.  In the case of CHD, a diastolic blood pressure greater than 133 is considered an outlier and when there is no CHD the extreme values are considered to be a diastolic blood pressure of 119 or more.

Conclusions

There is a difference between the distributions of those participants that have a history of Coronary Heart Disease (CHD) and those that don’t on their average diastolic blood pressure.  This is represented through the range, skewness, and distribution between both groups.  Both groups have similar medians, and lowest values, but vary greatly in the mean, standard deviation and highest values of diastolic blood pressure.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

EXAMINE VARIABLES=dbp58 BY chd

/PLOT BOXPLOT STEMLEAF HISTOGRAM

/COMPARE GROUPS

/PERCENTILES(5,10,25,50,75,90,95) HAVERAGE

/STATISTICS DESCRIPTIVES EXTREME

/CINTERVAL 95

/MISSING LISTWISE

/NOTOTAL.

References:

## Quant: Crosstabs in SPSS

Introduction

The aim of this analysis is to answer the question, if someone was rich, would they continue or stop working on their highest degree earned, gender, and job satisfaction.

Methodology

For this project, the gss.sav file is loaded into SPSS (GSS, n.d.).  The goal is to look at the relationships between the following variables: richwork (being wealthy), sex (demographics of gender), satjob (satisfaction level with the job), and degree (education degree level).   The variable richwork is the dependent variable and the other three variables are considered independent variables for this analysis. To conduct a crosstabs analysis, navigate through Analyze > Descriptive Analytics > Crosstabs.  The variable richwork was placed in the “Row(s)” box, and the other three variables were placed in the “Column(s)” box.  Then on the crosstabs dialog box, “Cells” button was clicked, and under the “Counts” section “Observed” was selected and all three boxes were seleceted under the “Percentages” section. The procedures for this analysis are provided in video tutorial form by Miller (n.d.).  The following output was observed in the next four tables.

Results

Table 1: Cases Processing Summary.

 Cases Valid Missing Total N Percent N Percent N Percent IF RICH, CONTINUE OR STOP WORKING * Respondent’s highest degree 625 44.0% 794 56.0% 1419 100.0% IF RICH, CONTINUE OR STOP WORKING * Respondent’s sex 628 44.3% 791 55.7% 1419 100.0% IF RICH, CONTINUE OR STOP WORKING * JOB OR HOUSEWORK 624 44.0% 795 56.0% 1419 100.0%

According to Table 1, about 44% (~625) of all cases are valid in all three scenarios and about 56% (~793) had missing data, from a total of 1419 respondents.

Table 2: If rich do people continue or stop working with respondent’s highest degree cross tabulation.

 Respondent’s highest degree Total Less than HS High school Junior college Bachelor Graduate IF RICH, CONTINUE OR STOP WORKING CONTINUE WORKING Count 52 210 39 84 36 421 % within IF RICH, CONTINUE OR STOP WORKING 12.4% 49.9% 9.3% 20.0% 8.6% 100.0% % within Respondent’s highest degree 69.3% 64.6% 81.3% 67.2% 69.2% 67.4% % of Total 8.3% 33.6% 6.2% 13.4% 5.8% 67.4% STOP WORKING Count 23 115 9 41 16 204 % within IF RICH, CONTINUE OR STOP WORKING 11.3% 56.4% 4.4% 20.1% 7.8% 100.0% % within Respondent’s highest degree 30.7% 35.4% 18.8% 32.8% 30.8% 32.6% % of Total 3.7% 18.4% 1.4% 6.6% 2.6% 32.6% Total Count 75 325 48 125 52 625 % within IF RICH, CONTINUE OR STOP WORKING 12.0% 52.0% 7.7% 20.0% 8.3% 100.0% % within Respondent’s highest degree 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% % of Total 12.0% 52.0% 7.7% 20.0% 8.3% 100.0%

According to Table 2, with further analysis on whether or not people would continue or stop working, 67.4% would stay, and 32.6% would stop working.  In our data about 12% have less than a high school diploma, 52% have a high school diploma, 7.7% have a gone to junior college, 20% have a bachelor degree and 8.3% have a graduate degree. With further analysis with respect to whether or not people would continue or stop working with respect to the respondent’s highest degree earned, 56.4% of respondents who have only a high school diploma would choose to leave work if they were rich making them the biggest demographic to leave in this “what if” scenario.  Finally, 81.3% of those with a junior college degree would stay at their job if they were rich, making them the biggest demographic to stay in this “what if” scenario. Those with a high school diploma, bachelor degree or graduate degree were approximately 65-69% more likely to continue working if they were rich.

Table 3: If rich do people continue or stop working with respondent’s gender cross tabulation.

 Respondent’s sex Total Male Female IF RICH, CONTINUE OR STOP WORKING CONTINUE WORKING Count 214 209 423 % within IF RICH, CONTINUE OR STOP WORKING 50.6% 49.4% 100.0% % within Respondent’s sex 69.3% 65.5% 67.4% % of Total 34.1% 33.3% 67.4% STOP WORKING Count 95 110 205 % within IF RICH, CONTINUE OR STOP WORKING 46.3% 53.7% 100.0% % within Respondent’s sex 30.7% 34.5% 32.6% % of Total 15.1% 17.5% 32.6% Total Count 309 319 628 % within IF RICH, CONTINUE OR STOP WORKING 49.2% 50.8% 100.0% % within Respondent’s sex 100.0% 100.0% 100.0% % of Total 49.2% 50.8% 100.0%

In our sample data set about 49.2% were male and 50.8% were female, according to Table 3. With further analysis on whether or not people would continue or stop working on the respondent’s gender, 34.5% of women and 30.7% of men would choose to leave work if they were rich.  Gender doesn’t seem to be as strong of an indicator to help determine if a respondent were more likely to continue or stop working if they were rich in this “what if” scenario.

Table 4: If rich would people continue or stop working with respondent’s job satisfaction cross tabulation.

 JOB OR HOUSEWORK Total VERY SATISFIED MOD. SATISFIED A LITTLE DISSAT VERY DISSATISFIED IF RICH, CONTINUE OR STOP WORKING CONTINUE WORKING Count 199 172 36 14 421 % within IF RICH, CONTINUE OR STOP WORKING 47.3% 40.9% 8.6% 3.3% 100.0% % within JOB OR HOUSEWORK 71.8% 64.9% 60.0% 63.6% 67.5% % of Total 31.9% 27.6% 5.8% 2.2% 67.5% STOP WORKING Count 78 93 24 8 203 % within IF RICH, CONTINUE OR STOP WORKING 38.4% 45.8% 11.8% 3.9% 100.0% % within JOB OR HOUSEWORK 28.2% 35.1% 40.0% 36.4% 32.5% % of Total 12.5% 14.9% 3.8% 1.3% 32.5% Total Count 277 265 60 22 624 % within IF RICH, CONTINUE OR STOP WORKING 44.4% 42.5% 9.6% 3.5% 100.0% % within JOB OR HOUSEWORK 100.0% 100.0% 100.0% 100.0% 100.0% % of Total 44.4% 42.5% 9.6% 3.5% 100.0%

In our sample data set about 49.2% were male and 50.8% were female, according to Table 3. With further analysis on whether or not people would continue or stop working on the respondent’s gender, 34.5% of women and 30.7% of menFinally, in Table 4, about 44.4% of respondents are very satisfied at work, 42.5% of respondents are moderately satisfied at work, 3.8% of respondents are moderately dissatisfied at work, and 1.3% of respondents are very dissatisfied at work. With further analysis on whether or not people would continue or stop working on the respondent’s job satisfaction level, 40% of respondents who are moderately dissatisfied would choose to leave work if they were rich making them the biggest demographic to leave in this “what if” scenario. In fact, if the respondents were anything but very satisfied with their job, they had an approximately 7-12% chance increase of wanting to leave their jobs if not rich.  This illustrates that 71.8% of those who are very satisfied with their jobs would stay at their job if they were rich, making them the biggest demographic to stay in this “what if” scenario.

Conclusions

Overall, this analysis has shown that to answer the question, if someone was rich, would they continue or stop working on their highest degree earned, and job satisfaction may have a contributing factor to the respondent’s decision in this “what if” scenario.  However, gender may not play an important role in answering this question.

Would choose to leave work if they were rich.  Gender doesn’t seem to be as strong of an indicator to help determine if a respondent were more likely to continue or stop working if they were rich in this “what if” scenario.

SPSS Code

DATASET NAME DataSet1 WINDOW=FRONT.

CROSSTABS

/TABLES=richwork BY degree sex satjob

/FORMAT=AVALUE TABLES

/CELLS=COUNT ROW COLUMN TOTAL

/COUNT ROUND CELL.

References:

## Quant: Understanding Variance

If a researcher were to look at a measure of job performance resulting from 2 different manufacturing processes and found that the mean performance of process A was 82.5, and the mean performance of process B was 78.5, they could not automatically assume that process A will consistently outperform process B.  The reason the researchers cannot come to a conclusion until an analysis of variance done to that data.  There could be variance between the types of the statement of work that is uniquely different and are required between process A and process B (within-group variance), and there could be variances between the groups of people conducting the statement of work (between group variance).  These two types of variances will feed into the F-statistic result which would allow the researcher to state then whether or not they can reject the null hypothesis that the means between both mean performances are the same.