Binary Regression

\[\begin{gather*} y\in \{0,1\}: \text{dependent variable.}\\ x_1,\cdots x_k: \text{predictor variables}\\ x_i=(1,x_{i1},\cdots,x_{ik})^T,\ i=1,\cdots n\\ \beta = (\beta_0,\cdots,\beta_k)^T \end{gather*}\]

Let \(\hat{y_i}=\beta^Tx_i+\varepsilon_i\).

\[ y_i=\begin{cases} 1& \hat{y_i}\ge0\\ 0& \hat{y_i}<0 \end{cases} \]

Probit Model

For Probit Model, \(\varepsilon\sim N(0,1)\). So, \[ P\{y_i=1\}=P\{\hat{y_i}\ge0\}=P\{\varepsilon_i\ge-\beta^Tx_i\}=1-\Phi(-\beta^Tx_i)=\Phi(\beta^Tx_i) \] Therefore Probit Model is shown below.

\[\Phi(u)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{u}e^{-\frac{z^2}{2}}dz\]

Logit Model

For Logit Model, \(\varepsilon \sim \text{Logistic}(0,1)\).

P.D.F of Standard Logistic Distribution: \(f(x)=\frac{e^{-x}}{(1+e^{-x})^2}\).

C.D.F of Standard Logistic Distribution: \(F(x)=\frac{1}{1+e^{-x}}\).

Therefore Probit Model is shown below. Where \(u=\beta_0+\beta_1x_1+\cdots+\beta_kx_k\).

\[ \Lambda(u)=\frac{1}{1+e^{-u}} \]

We can estimate \(\beta\) by maximum likelihood estimation.

\(u\) is also referred to as the log-odds.

\[ u=\ln\frac{p}{1-p} \]

Doctor Data

Read Data

To read “xlsx” files, I use R function - read_excel in package readxl.

library(readxl)
read_excel('../datasets/Doctor.xlsx',) -> doctors

For each patient, variable descriptions are as follows:

y: dependent variable - result of a pregnancy.
x1~x5: other independent variables.

Build Logit Model

In this case, I find that variable x4 might be a categorical variable. The code below estimates a logistic regression model using the glm (generalized linear model) function.

factor(doctors$x4)->doctors$x4
doctors.logit<-glm(y~.,data = doctors,family = binomial(logit))
summary(doctors.logit)

## 
## Call:
## glm(formula = y ~ ., family = binomial(logit), data = doctors)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4599  -0.4412   0.1605   0.4367   1.7655  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -10.70095    3.69973  -2.892  0.00382 **
## x1            0.45959    0.17705   2.596  0.00944 **
## x2            0.81826    0.43646   1.875  0.06083 . 
## x3           -0.37897    0.42125  -0.900  0.36832   
## x42          -2.14618    4.14447  -0.518  0.60457   
## x43           0.22639    0.84413   0.268  0.78855   
## x5           -0.03383    0.02045  -1.654  0.09816 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 75.934  on 57  degrees of freedom
## Residual deviance: 39.251  on 51  degrees of freedom
## AIC: 53.251
## 
## Number of Fisher Scoring iterations: 6

As shown in the above results, x1, x2, x3 and two terms of x4 are statistically significant. The logistic regression coefficients give the change of the log-odds for a one unit increase in the predictor variable. In this case x4=1 is reference category.

From the result, we can see only variable x1 is significant in \(\alpha=0.05\) significance level. The corresponding estimator means that for one unit increase in x1, the log-odds of y=1(Successful pregnancy) increase by \(0.45959\).(Bruin 2011)

The indicator variable for x4 is slightly different. For instance, the log-odds of y=1(Successful pregnancy) with x4=2, verse with x4=1. Change the variable x4 from \(2\) into \(1\), the log-odds would change \(-2.14618\).

Build Probit Model

factor(doctors$x4)->doctors$x4
doctors.probit<-glm(y~.,data = doctors,family = binomial(probit))
summary(doctors.probit)

## 
## Call:
## glm(formula = y ~ ., family = binomial(probit), data = doctors)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4306  -0.4425   0.1226   0.4728   1.7298  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -6.12853    1.94100  -3.157  0.00159 **
## x1           0.26397    0.09484   2.783  0.00538 **
## x2           0.47466    0.23534   2.017  0.04370 * 
## x3          -0.23146    0.23321  -0.992  0.32096   
## x42         -1.23190    1.93357  -0.637  0.52405   
## x43          0.09670    0.47985   0.202  0.84029   
## x5          -0.01906    0.01125  -1.695  0.09014 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 75.934  on 57  degrees of freedom
## Residual deviance: 39.239  on 51  degrees of freedom
## AIC: 53.239
## 
## Number of Fisher Scoring iterations: 7

From the result, we can see variables x1 and x2 are significant in \(\alpha=0.05\) significance level. The corresponding estimator means that for one unit increase in x1, the log-odds of y=1(Successful pregnancy) increase by \(0.26397\), but for x2, it increases by \(0.47466\).

Significance Test (Wald Test)

I use library aod to continue the next step - significance test.

Logit model

library(aod)
wald.test(b=coef(doctors.logit),Sigma = vcov(doctors.logit),Terms = 1:6)

## Wald test:
## ----------
## 
## Chi-squared test:
## X2 = 13.5, df = 6, P(> X2) = 0.036

Probit model

library(aod)
wald.test(b=coef(doctors.probit),Sigma = vcov(doctors.probit),Terms = 1:6)

## Wald test:
## ----------
## 
## Chi-squared test:
## X2 = 17.2, df = 6, P(> X2) = 0.0087

As we can see, p-values are both less than \(0.05\), therefore we can conclude that these two models are both significant at \(95\%\) confidence level. In other words, not all coefficients equal to \(0\).

Find “optimalCutoff” for both models

Compute the optimal cutoff can help us minimizes the misclassification error for both above model. I use package InformationValue to find it.

Logit Model

doctors.logit.predicted <- plogis(predict(doctors.logit,doctors))
library(InformationValue)
doctors.logit.optcutoff <- optimalCutoff(doctors$y,doctors.logit.predicted)
show(doctors.logit.optcutoff)

## [1] 0.5293058

Probit Model

doctors.probit.predicted <- plogis(predict(doctors.probit,doctors))
library(InformationValue)
doctors.probit.optcutoff <- optimalCutoff(doctors$y,doctors.probit.predicted)
show(doctors.probit.optcutoff)

## [1] 0.5234879

Construct the confusion matrices for both model with default and optimal cutoffs

Logit Model

with default

confusionMatrix(doctors$y,doctors.logit.predicted)

##    0  1
## 0 16  3
## 1  5 34

with optimal cufoff

confusionMatrix(doctors$y,doctors.logit.predicted,threshold = doctors.logit.optcutoff )

##    0  1
## 0 18  3
## 1  3 34

Probit Model

confusionMatrix(doctors$y,doctors.probit.predicted)

##    0  1
## 0 17  3
## 1  4 34

with optimal cufoff

confusionMatrix(doctors$y,doctors.probit.predicted,threshold = doctors.probit.optcutoff )

##    0  1
## 0 18  3
## 1  3 34

Plot ROC curve

Logit Model

plotROC(doctors$y,doctors.logit.predicted)

Probit Model

plotROC(doctors$y,doctors.probit.predicted)

Higher the area under curve, better the prediction power of the model. From these figure above, we can condlude that the Logit Model is slightly better than Probit Model.

Exam Data

Read Data

To read “xlsx” files, I use R function - read_excel in package readxl.

library(readxl)
read_excel('../datasets/binary regression.xlsx') -> exams
# Converts data format to numeric value
exams$x1<-as.numeric(exams$x1)
exams$x2<-as.numeric(exams$x2)

For each exam, variable descriptions are as follows:

y: dependent variable - result of a exam
x1~x2: other independent variables.

Build Logit Model

In this case, there is no categorical variable. The code below estimates a logistic regression model using the glm (generalized linear model) function.

exams.logit<-glm(y~.,data = exams,family = binomial(logit))
summary(exams.logit)

## 
## Call:
## glm(formula = y ~ ., family = binomial(logit), data = exams)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2246  -0.1425   0.0042   0.2007   2.1524  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -16.1603     4.0819  -3.959 7.53e-05 ***
## x1            1.7454     0.4308   4.051 5.09e-05 ***
## x2            1.4865     0.3941   3.772 0.000162 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 137.628  on 99  degrees of freedom
## Residual deviance:  43.402  on 97  degrees of freedom
## AIC: 49.402
## 
## Number of Fisher Scoring iterations: 8

As shown in the above results, both x1 and x2 are statistically significant.

From the result, we can see variables x1 and x2 are significant in \(\alpha=0.001\) significance level. The corresponding estimator means that for one unit increase in x1, the log-odds of y=1(Pass Exam) increase by \(1.7454\), but for x2, it increase by \(1.4865\).

Build Probit Model

exams.probit<-glm(y~.,data = exams,family = binomial(probit))
summary(exams.probit)

## 
## Call:
## glm(formula = y ~ ., family = binomial(probit), data = exams)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.20524  -0.08993   0.00000   0.15986   2.11765  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -9.3426     2.1896  -4.267 1.98e-05 ***
## x1            1.0057     0.2291   4.390 1.13e-05 ***
## x2            0.8643     0.2155   4.012 6.03e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 137.628  on 99  degrees of freedom
## Residual deviance:  43.074  on 97  degrees of freedom
## AIC: 49.074
## 
## Number of Fisher Scoring iterations: 9

From the result, we can see both x1 and x2 are significant in \(\alpha=0.001\) significance level. The corresponding estimator means that for one unit increase in x1, the log-odds of y=1(Pass Exam) increase by \(1.0057\), but for x2, it increases by \(0.8643\).

Significance Test (Wald Test)

I use library aod to continue the next step - significance test.

Logit model

library(aod)
wald.test(b=coef(exams.logit),Sigma = vcov(exams.logit),Terms = 1:2)

## Wald test:
## ----------
## 
## Chi-squared test:
## X2 = 16.5, df = 2, P(> X2) = 0.00026

Probit model

library(aod)
wald.test(b=coef(exams.probit),Sigma = vcov(exams.probit),Terms = 1:2)

## Wald test:
## ----------
## 
## Chi-squared test:
## X2 = 19.4, df = 2, P(> X2) = 6e-05

As we can see, p-values are both less than \(0.05\), therefore we can conclude that these two models are both significant at \(99.9\%\) confidence level. In other words, not all coefficients equal to \(0\).

Find “optimalCutoff” for both models

Compute the optimal cutoff can help us minimizes the misclassification error for both above model. I use package InformationValue to find it.

Logit Model

exams.logit.predicted <- plogis(predict(exams.logit,exams))
library(InformationValue)
exams.logit.optcutoff <- optimalCutoff(exams$y,exams.logit.predicted)
show(exams.logit.optcutoff)

## [1] 0.6399997

Probit Model

exams.probit.predicted <- plogis(predict(exams.probit,exams))
library(InformationValue)
exams.probit.optcutoff <- optimalCutoff(exams$y,exams.probit.predicted)
show(exams.probit.optcutoff)

## [1] 0.5798182

Construct the confusion matrices for both model with default and optimal cutoffs

Logit Model

with default

confusionMatrix(exams$y,exams.logit.predicted)

##    0  1
## 0 39  5
## 1  6 50

with optimal cufoff

confusionMatrix(exams$y,exams.logit.predicted,threshold = exams.logit.optcutoff )

##    0  1
## 0 43  7
## 1  2 48

Probit Model

confusionMatrix(exams$y,exams.probit.predicted)

##    0  1
## 0 39  5
## 1  6 50

with optimal cufoff

confusionMatrix(exams$y,exams.probit.predicted,threshold = exams.probit.optcutoff )

##    0  1
## 0 43  7
## 1  2 48

Plot ROC curve

Logit Model

plotROC(exams$y,exams.logit.predicted)

Probit Model

plotROC(exams$y,exams.probit.predicted)

Higher the area under curve, better the prediction power of the model. From these figure above, we can condlude that the Probit Model is slightly better than Logit Model.

Acknowledgements

Thanks for knitr designed by(Xie 2015).

References

Bruin, J. 2011. “Newtest: Command to Compute New Test @ONLINE.” February 2011. https://stats.idre.ucla.edu/stata/ado/analysis/.

Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.name/knitr/.

Lecture 6 - Binary Regression - Zhao Chi - 19.M09

Zhao Chi

Binary Regression

Probit Model

Logit Model

Doctor Data

Read Data

Build Logit Model

Build Probit Model

Significance Test (Wald Test)

Find “optimalCutoff” for both models

Construct the confusion matrices for both model with default and optimal cutoffs

Logit Model

Probit Model

Plot ROC curve

Exam Data

Read Data

Build Logit Model

Build Probit Model

Significance Test (Wald Test)

Find “optimalCutoff” for both models

Construct the confusion matrices for both model with default and optimal cutoffs

Logit Model

Probit Model

Plot ROC curve

Acknowledgements

References