Adv Quant: More on Logistic Regression

Logistic regression is another flavor of multi-variable regression, where one or more independent variables are continuous or categorical which are used to predict a dichotomous/ binary/ categorical dependent variable (Ahlemeyer-Stubbe, & Coleman, 2014; Field, 2013; Gall, Gall, & Borg, 2006; Huck, 2011).  Zheng and Agresti (2000) defines predictive power as a measure that helps compare competing regressions via analyzing the importance of the independent variables.  For linear regression and multiple linear regression, the correlation coefficient and coefficient of determination are adequate for predictive power (Field, 2014; Zheng & Agresti, 2000). The more data that is collected could yield a stronger predictive power (Field, 2014).  Predictive power is used to sell the relationships between variables to management (Ahlemeyer-Stubbe, & Coleman, 2014).

For logistic regression, the predictive power of the independent variables can be evaluated by the concept of the odds ratio for each independent variable (Huck, 2011). Field (2013) and Schumacker (2014) explained that when the logistic regression is calculated, the categorical/binary variables are transformed into ln(odds ratio) and a regression is then performed on this newly scaled variable (scale factor seen in equation 1):

eq1.PNG                                                 (1)

Since the probability of one categorical variable varies between 0à0.999…, the odds ratio value can vary between 0 à 999.999… (Schumacker, 2014). If the value of the odds ratio is greater than 1, it will show that as the independent variable value increases, so do the odds of the dependent variable (Y = n) occurs increases and vice versa (Field, 2013). Thus, the odds ratio measures the constant strength of association between the independent and dependent variables (Huck, 2011; Smith, 2015).  Due to this ln(odds ratio) transformation, logistic regression should be used for binary outcomes.

Field (2013) and Schumacker (2014) further explained that given that this ln(odds ratio) transformation needs to be made on the variables; the way to predict categorical outcomes from the regression formula (2),

  eq2.PNG                                            (2)

is best to explained the probability of the categorical outcome value one is trying to calculate:

eq3                                                (3)

The probability equation (3) can be expressed in multiple ways, through typical algebraic manipulations.  Thus, the probability/likelihood of the dependent variable Y is defined between 0-100% and the odds ratio is used to discuss the strength of these relationships.

References

  • Ahlemeyer-Stubbe, Andrea, Shirley Coleman. (2014). A Practical Guide to Data Mining for Business and Industry, 1st Edition. [VitalSource Bookshelf Online].
  • Gall, M. D., Gall, J. P., Borg, W. R. (2006). Educational Research: An Introduction, 8th Edition. [VitalSource Bookshelf Online].
  • Field, Andy. (2013). Discovering Statistics Using IBM SPSS Statistics, 4th Edition. [VitalSource Bookshelf Online].
  • Huck, Schuyler W. (2011). Reading Statistics and Research, 6th Edition. [VitalSource Bookshelf Online].
  • Schumacker, Randall E. (2014). Learning Statistics Using R, 1st Edition. [VitalSource Bookshelf Online].
  • Smith, M. (2015). Statistical analysis handbook. Retrieved from http://www.statsref.com/HTML/index.html?introduction.html
  • Zheng, B. and Agresti, A. (2000) Summarizing the predictive power of a generalized linear model.  Retrieved from http://www.stat.ufl.edu/~aa/articles/zheng_agresti.pdf