Adv Quant: General Linear Regression Model in R

Introduction

A goal for this post is to convert the dataset to a dataframe for analysis and performing a regression on the state.x77 dataset.

Results

IP1.5F1.png

Figure 1: Scatter plot matrix of the dataframe state.x77. The red box illustrates the relationship that is personally identified for further analysis.

IP1.5F2.PNG

IP1.5F3.png

Figure 2: Scatter plot of murder rates versus illiteracy rates across the united states, with the linear regression function of illiteracy = 0.11607 * Murder + 0.31362; with a correlation of 0.729752.

Discussion

This post analyzes the dataset state.x77 under the MASS R library, was converted into a data frame (see code section), and an analysis of the data was conducted. To identify which variable relationship would be interesting to conduct a regression on this dataset, all the relationships within the data frame were plotted in a matrix (Figure 1). The relationship that personally seemed interesting was the relationship between illiteracy and murder. Thus, moving forward with these variables a simple linear regression was conducted on that data. It was determined that there is a positive correlation on this data of 0.729752, and the relationship between the data is defined by

illiteracy = 0.11607 * Murder + 0.31362 (1)

From this equation that describes the relationship (Figure 2) between these variables, can explain, 53.25% of the variance between these variables. Both the intercept value and the regression weight are statistically significant at the 0.01 level, meaning that there is less than a 1% chance that this relationship could be developed from pure random chance (R output between Figure 1 & 2). In conclusion, this data is stating that states with lower illiteracy rates will have the least amount of murder rates in their state, and vice versa.

Code

## Converting a dataset to a dataframe for analysis.

library(MASS) # Activate the MASS library

library(nutshell) # Activate the nutshell library to access the plot function

data() # Lists all data and datasets within the Mass Library

data(state) # Data in question is located in state

head(state.x77) # Print out the top five entries of state.x77

df= data.frame(state.x77) # Convert the state.x77 data into a dataframe

## Regression formulation

plot(df) # Scatter plot matrix, of all relationships between the variables in the df

stateRegression = lm(Illiteracy~Murder, data= df) # Selecting this relationship for further analysis

summary(stateRegression) # Plotting a summary of the regression data

# Plotting a scatterplot from a dataframe below

plot(df$Murder, df$Illiteracy, type=”p”, main=”Illiteracy rates vs Murder rates”, xlab=”Murder”, ylab=”Illiteracy”) # Plotting a scatterplot from a dataframe

abline(lm(Illiteracy~Murder, data= df), col=”red”) # Plotting a red regression line

cor(df$Murder, df$Illiteracy)

References

Berkeley Statistics (n.d.). Data Frames and Plotting. Retrieved from http://www.stat.berkeley.edu/~s133/R-4a.html
Dalgaard, P. (2011). [R] state.x77 dataset. Retrieved from https://stat.ethz.ch/pipermail/r-help/2011-March/271280.html
Marin, M. (2013) Linear regression in R (R tutorial 5.1). Retrieved from https://www.youtube.com/watch?v=66z_MRwtFJM
R (n.d.). Plot Method for Data Frames. Retrieved from https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/plot.dataframe.html
R (n.d.). Return the first or last part of an object. Retrieved from https://stat.ethz.ch/R-manual/R-devel/library/utils/html/head.html
Schumacker, R. E. (2014) Learning statistics using R. California, SAGE Publications, Inc, VitalBook file.

Share this:

Leave a comment Cancel reply