To read “xlsx” files, I use R function - read_excel in package readxl.
For each patient, variable descriptions are as follows:
\[Y=X\beta+\epsilon\]
\[J(\beta)=(y-X\beta)^T(y-X\beta)\]
Minimize the cost function, we can get the OLS (ordinary least squares) estimator: \[\hat{\beta}=(X^TX)^{-1}X^TY\]
In this case, \(X_1\) is a dependent variable and the other two are the independent variables. Then I use the following code to build a linear regression model.
##
## Call:
## lm(formula = X1 ~ ., data = patients)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4640 -1.1949 -0.4078 1.8511 2.6981
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.9941 11.9438 2.595 0.03186 *
## X2 0.8614 0.2482 3.470 0.00844 **
## X3 0.3349 0.1307 2.563 0.03351 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.318 on 8 degrees of freedom
## Multiple R-squared: 0.9768, Adjusted R-squared: 0.9711
## F-statistic: 168.8 on 2 and 8 DF, p-value: 2.874e-07
As shown above, we can not guarantee that matrix \(X^TX\) is always reversible. Therefore, if \(X^TX\) is an irreversible matrix, we can not get the estimator \(\hat{\beta}\). And if \(|X^TX|\rightarrow0\), the estimator \(\hat{\beta}\) will tend to infinity.
To solve such a problem(multicollinearity), we can use ridge regression.
Cost Function of Ridge Regression
\[J(\beta,\lambda)=(y-X\beta)^T(y-X\beta)+\lambda\beta^T\beta\] Minimize the cost function, we can get the Ridge regression estimator: \[\hat\beta(\lambda)=(X^TX+\lambda I_{k+1})^{-1}X^TY\] For any matrix \(X\) and any \(\lambda > 0\), there exist matrix \((X^TX+\lambda I_{k+1})^{-1}\).
Cost Function of Quantile Regression
\[J(\beta,\tau)=\sum p_{\tau}(y_i-(X\beta)_i)=\sum_{i:y_i\ge(X\beta)_i}\tau(y_i-(X\beta)_i)+\sum_{i:y_i\le(X\beta)_i}(1-\tau)((X\beta)_i-y_i),\ \tau \in (0,1)\]
LAD estimator: \(\hat{\beta}(\tau)\)
Construct ridge regression for \(\lambda\) from \(0\) to \(2\) with step \(0.001\):
## modified HKB estimator is 0
## modified L-W estimator is 0
## smallest value of GCV at 0.56
From the figure and the result of select function, I can conclude that the choice of the ridge parameter(\(\lambda\)) have great uncertainty by using different methods.
In this case, let \(\lambda=0.56\). The result is shown below.
## X2 X3
## 31.9476379 0.7876384 0.3535983
chosen automatically
##
## Call:
## linearRidge(formula = X1 ~ ., data = patients)
##
##
## Coefficients:
## Estimate Scaled estimate Std. Error (scaled) t value (scaled)
## (Intercept) 30.9289 NA NA NA
## X2 0.8505 24.5135 6.2148 3.944
## X3 0.3387 18.5445 6.2148 2.984
## Pr(>|t|)
## (Intercept) NA
## X2 8e-05 ***
## X3 0.00285 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Ridge parameter: 0.004737444, chosen automatically, computed using 1 PCs
##
## Degrees of freedom: model 1.917 , variance 1.84 , residual 1.993
chosen manually
##
## Call:
## linearRidge(formula = X1 ~ ., data = patients, lambda = 0.56)
##
##
## Coefficients:
## Estimate Scaled estimate Std. Error (scaled) t value (scaled)
## (Intercept) 54.4895 NA NA NA
## X2 0.5914 17.0441 1.7555 9.709
## X3 0.3009 16.4736 1.7555 9.384
## Pr(>|t|)
## (Intercept) NA
## X2 <2e-16 ***
## X3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Ridge parameter: 0.56
##
## Degrees of freedom: model 0.8644 , variance 0.6107 , residual 1.118
LR
blood.predicted.lr <- predict(blood.lr,patients)
blood.compare <- cbind(actual=patients$X1,blood.predicted.lr)
show(blood.compare)
## actual blood.predicted.lr
## 1 132 133.7183
## 2 143 143.4317
## 3 153 153.6716
## 4 162 164.5327
## 5 154 151.7570
## 6 168 168.4078
## 7 137 140.4640
## 8 149 146.4939
## 9 159 156.3019
## 10 128 126.5407
## 11 166 165.6804
## [1] 0.9886923
RR with \(\lambda=0.56\)
blood.predicted.rr.m <- predict(blood.rr.m,patients)
blood.compare1 <- cbind(actual=patients$X1,blood.predicted.rr.m)
show(blood.compare1)
## actual blood.predicted.rr.m
## 1 132 137.2896
## 2 143 144.7385
## 3 153 152.4780
## 4 162 161.1408
## 5 154 151.3057
## 6 168 164.4400
## 7 137 142.9852
## 8 149 147.1247
## 9 159 155.2066
## 10 128 131.9362
## 11 166 162.3547
## [1] 0.9792692
RR with \(\lambda = 0.00473\) chosen automatically
blood.predicted.rr.a <- predict(blood.rr.a,patients)
blood.compare2 <- cbind(actual=patients$X1,blood.predicted.rr.a)
show(blood.compare2)
## actual blood.predicted.rr.a
## 1 132 133.7481
## 2 143 143.4272
## 3 153 153.6181
## 4 162 164.4789
## 5 154 151.7440
## 6 168 168.3776
## 7 137 140.5295
## 8 149 146.4830
## 9 159 156.3201
## 10 128 126.6130
## 11 166 165.6605
## [1] 0.9887473
Compare the three results, model blood.rr.a is more accurate than the other two.
## [1] 0.9887473
We use function rq in package quantreg, for different \(\tau\) from \(0.2\) to \(0.8\) with step \(0.1\). The results as shown below.
##
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
##
## backsolve
##
## Call: rq(formula = X1 ~ ., tau = 2:8/10, data = patients)
##
## tau: [1] 0.2
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 41.13115 -104.93178 63.27750
## X2 1.14754 -0.20444 1.28465
## X3 0.18033 -0.03971 0.93687
##
## Call: rq(formula = X1 ~ ., tau = 2:8/10, data = patients)
##
## tau: [1] 0.3
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 27.03704 4.76042 57.92169
## X2 0.95062 -0.10568 1.31937
## X3 0.32099 0.07881 0.79637
##
## Call: rq(formula = X1 ~ ., tau = 2:8/10, data = patients)
##
## tau: [1] 0.4
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 29.30435 1.80309 68.14027
## X2 0.79710 0.01156 1.31406
## X3 0.36232 0.16714 0.78413
##
## Call: rq(formula = X1 ~ ., tau = 2:8/10, data = patients)
##
## tau: [1] 0.5
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 27.36782 0.88287 61.27839
## X2 0.64368 0.09425 1.53764
## X3 0.42529 0.00393 0.72892
##
## Call: rq(formula = X1 ~ ., tau = 2:8/10, data = patients)
##
## tau: [1] 0.6
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 27.36782 12.20839 59.54008
## X2 0.64368 0.20217 1.67670
## X3 0.42529 -0.12433 0.58920
##
## Call: rq(formula = X1 ~ ., tau = 2:8/10, data = patients)
##
## tau: [1] 0.7
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 15.18654 -13.49924 62.10933
## X2 0.49847 0.07013 1.59303
## X3 0.53823 -0.05272 0.60927
##
## Call: rq(formula = X1 ~ ., tau = 2:8/10, data = patients)
##
## tau: [1] 0.8
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 26.37870 -90.45791 56.34419
## X2 0.83432 -0.30389 1.70043
## X3 0.37870 -0.15552 0.77708
blood.predicted.qr <- predict(blood.qr,patients)
blood.compare.ql = cbind(actual=patients$X1,blood.predicted.lr,blood.predicted.qr)
show(blood.compare.ql)
## actual blood.predicted.lr tau= 0.2 tau= 0.3 tau= 0.4 tau= 0.5 tau= 0.6
## 1 132 133.7183 132.0000 132.0000 133.4348 134.4138 134.4138
## 2 143 143.4317 142.0164 142.1852 143.0000 143.5977 143.5977
## 3 153 153.6716 153.0000 153.0000 153.0000 153.0000 153.0000
## 4 162 164.5327 162.9508 164.1605 163.9420 164.0920 164.0920
## 5 154 151.7570 149.9180 150.7901 151.3333 151.9195 151.9195
## 6 168 168.4078 165.7213 168.0000 168.0000 168.5632 168.5632
## 7 137 140.4640 137.0000 138.7160 140.4638 142.0805 142.0805
## 8 149 146.4939 145.0328 145.3704 146.0435 146.5862 146.5862
## 9 159 156.3019 153.0492 155.2716 156.1159 157.2414 157.2414
## 10 128 126.5407 124.0328 124.3704 126.4783 128.0000 128.0000
## 11 166 165.6804 162.8852 165.1358 165.3188 166.0000 166.0000
## tau= 0.7 tau= 0.8
## 1 134.2202 135.2781
## 2 143.6300 145.2840
## 3 153.0000 155.7456
## 4 165.1407 167.1893
## 5 152.5810 154.0000
## 6 170.4832 171.4320
## 7 143.2905 142.6272
## 8 146.7798 148.4675
## 9 159.0000 159.0000
## 10 128.0000 128.0000
## 11 167.8716 168.6272
To read “xlsx” files, I use R function - read_excel in package readxl.
For dataset cars, variable descriptions are as follows:
cars_data_subset = subset(cars_data,select = c(Price,Mileage,Liter,Cruise,Sound,Leather))
lm(Price~.,cars_data_subset)->cars.lr
summary(cars.lr)
##
## Call:
## lm(formula = Price ~ ., data = cars_data_subset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12328 -5660 -1700 4546 38203
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.318e+03 1.184e+03 6.182 1.01e-09 ***
## Mileage -1.744e-01 3.265e-02 -5.342 1.20e-07 ***
## Liter 3.864e+03 2.636e+02 14.658 < 2e-16 ***
## Cruise 6.251e+03 6.741e+02 9.273 < 2e-16 ***
## Sound -2.126e+03 5.832e+02 -3.646 0.000284 ***
## Leather 3.436e+03 6.121e+02 5.613 2.74e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7574 on 798 degrees of freedom
## Multiple R-squared: 0.4165, Adjusted R-squared: 0.4129
## F-statistic: 113.9 on 5 and 798 DF, p-value: < 2.2e-16
Similar to task 1, I use the library ridge.
##
## Call:
## linearRidge(formula = Price ~ ., data = cars_data_subset)
##
##
## Coefficients:
## Estimate Scaled estimate Std. Error (scaled) t value (scaled)
## (Intercept) 7.430e+03 NA NA NA
## Mileage -1.724e-01 -4.005e+04 7.492e+03 5.346
## Liter 3.828e+03 1.199e+05 8.125e+03 14.761
## Cruise 6.213e+03 7.602e+04 8.115e+03 9.368
## Sound -2.104e+03 -2.785e+04 7.621e+03 3.655
## Leather 3.398e+03 4.308e+04 7.657e+03 5.626
## Pr(>|t|)
## (Intercept) NA
## Mileage 9.01e-08 ***
## Liter < 2e-16 ***
## Cruise < 2e-16 ***
## Sound 0.000258 ***
## Leather 1.85e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Ridge parameter: 0.01153487, chosen automatically, computed using 5 PCs
##
## Degrees of freedom: model 4.938 , variance 4.876 , residual 4.999
LR
cars.predicted.lr <- predict(cars.lr,cars_data_subset)
cars.compare <- cbind(actual=cars_data_subset$Price,cars.predicted.lr)
show(head(cars.compare,5))
## actual cars.predicted.lr
## 1 17314.10 25423.85
## 2 17542.04 21828.63
## 3 16218.85 21120.36
## 4 16336.91 22698.11
## 5 16339.17 25525.24
## [1] 0.7791548
RR with \(\lambda = 0.012\) chosen automatically
cars.predicted.rr <- predict(cars.rr,cars_data_subset)
cars.compare1 <- cbind(actual=cars_data_subset$Price,cars.predicted.rr)
show(head(cars.compare1,5))
## actual cars.predicted.rr
## 1 17314.10 25385.84
## 2 17542.04 21830.27
## 3 16218.85 21130.06
## 4 16336.91 22691.73
## 5 16339.17 25487.94
## [1] 0.7794884
Compare the two results, model cars.rr is more accurate than cars.lr.
## [1] 0.7794884
We use function rq in package quantreg, for different \(\tau\) from \(0.2\) to \(0.8\) with step \(0.1\). The results as shown below.
library(SparseM)
library(quantreg)
cars.qr<- rq(formula = Price~.,tau=2:8/10,data = cars_data_subset,)
summary(cars.qr)
##
## Call: rq(formula = Price ~ ., tau = 2:8/10, data = cars_data_subset)
##
## tau: [1] 0.2
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 6942.50069 5459.65065 7296.08236
## Mileage -0.11506 -0.13654 -0.09319
## Liter 3594.64481 3378.81399 3868.56366
## Cruise 542.71312 333.14546 928.59901
## Sound -769.24211 -1145.59397 -171.74786
## Leather 509.06156 228.23340 909.23020
##
## Call: rq(formula = Price ~ ., tau = 2:8/10, data = cars_data_subset)
##
## tau: [1] 0.3
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 6913.54754 6300.53997 7468.35597
## Mileage -0.12293 -0.14090 -0.10745
## Liter 3793.26932 3586.63649 3941.50503
## Cruise 924.61467 665.72890 1134.85978
## Sound -971.47254 -1297.00791 -644.73647
## Leather 603.48569 362.15424 1019.67261
##
## Call: rq(formula = Price ~ ., tau = 2:8/10, data = cars_data_subset)
##
## tau: [1] 0.4
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 6640.58054 6204.93794 7378.70988
## Mileage -0.12125 -0.14300 -0.10556
## Liter 4025.16643 3712.31439 4251.18936
## Cruise 1195.48266 915.01362 1573.66440
## Sound -1161.82742 -1840.88750 -748.55659
## Leather 779.70216 250.53125 1356.29465
##
## Call: rq(formula = Price ~ ., tau = 2:8/10, data = cars_data_subset)
##
## tau: [1] 0.5
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 6670.48365 5414.67440 8472.41054
## Mileage -0.12289 -0.16614 -0.09624
## Liter 4383.62510 3561.08396 4721.63290
## Cruise 1360.78031 1022.04912 2674.79926
## Sound -1760.28023 -2525.58686 -886.08973
## Leather 1055.59005 391.66341 1734.42700
##
## Call: rq(formula = Price ~ ., tau = 2:8/10, data = cars_data_subset)
##
## tau: [1] 0.6
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 9471.32531 7725.86917 11763.82545
## Mileage -0.14851 -0.20021 -0.10058
## Liter 3429.83459 2861.32636 4122.94348
## Cruise 5296.83351 3888.21959 7159.46934
## Sound -4090.53479 -6138.48532 -2905.45510
## Leather 3516.73942 2017.71036 4711.76740
##
## Call: rq(formula = Price ~ ., tau = 2:8/10, data = cars_data_subset)
##
## tau: [1] 0.7
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 10264.78566 8537.39228 12231.31142
## Mileage -0.14267 -0.19072 -0.09341
## Liter 2572.85577 2236.60727 3136.25310
## Cruise 11640.27168 10590.45354 12161.87968
## Sound -3517.99528 -4887.65352 -2767.39458
## Leather 4572.27575 3018.65572 6120.76708
##
## Call: rq(formula = Price ~ ., tau = 2:8/10, data = cars_data_subset)
##
## tau: [1] 0.8
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 11194.34749 10027.85473 13410.30640
## Mileage -0.15380 -0.20443 -0.09938
## Liter 2947.35812 2429.84071 3456.85731
## Cruise 13993.36377 13263.49899 15233.27569
## Sound -2702.86035 -4105.50144 -1658.69236
## Leather 2490.54804 356.93850 3669.42748
cars.predicted.qr <- predict(cars.qr,cars_data_subset)
cars.compare.ql = cbind(actual=cars_data_subset$Price,cars.predicted.lr,cars.predicted.qr)
head(cars.compare.ql,5)
## actual cars.predicted.lr tau= 0.2 tau= 0.3 tau= 0.4 tau= 0.5 tau= 0.6
## 1 17314.10 25423.85 17422.54 18218.67 18935.15 19905.50 23605.93
## 2 17542.04 21828.63 16808.32 17502.83 18044.62 18737.59 19953.46
## 3 16218.85 21120.36 16341.07 17003.59 17552.22 18238.51 19350.35
## 4 16336.91 22698.11 16748.34 17588.31 18332.59 19612.17 22973.67
## 5 16339.17 25525.24 16855.85 17762.76 18689.13 20238.86 25972.10
## tau= 0.7 tau= 0.8
## 1 29762.30 32847.80
## 2 25059.63 30216.68
## 3 24480.25 29592.08
## 4 27549.40 31811.08
## 5 31623.76 33764.86
To read “txt” files, I use R function - read.table().
read.table('../datasets/cigarettes.dat.txt',header = FALSE,
dec = '.',na.strings = 'NA') -> cigarettes_data
For dataset cigarettes, variable descriptions are as follows:
##
## Call:
## lm(formula = V5 ~ V2 + V3 + V4, data = cigarettes_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.89261 -0.78269 0.00428 0.92891 2.45082
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.2022 3.4618 0.925 0.365464
## V2 0.9626 0.2422 3.974 0.000692 ***
## V3 -2.6317 3.9006 -0.675 0.507234
## V4 -0.1305 3.8853 -0.034 0.973527
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.446 on 21 degrees of freedom
## Multiple R-squared: 0.9186, Adjusted R-squared: 0.907
## F-statistic: 78.98 on 3 and 21 DF, p-value: 1.329e-11
Similar to task 1, I use the library ridge.
##
## Call:
## linearRidge(formula = V5 ~ V2 + V3 + V4, data = cigarettes_data)
##
##
## Coefficients:
## Estimate Scaled estimate Std. Error (scaled) t value (scaled)
## (Intercept) 2.735032 NA NA NA
## V2 0.703002 19.513000 3.731162 5.230
## V3 1.382849 2.398586 3.751193 0.639
## V4 -0.007041 -0.003026 1.619055 0.002
## Pr(>|t|)
## (Intercept) NA
## V2 1.7e-07 ***
## V3 0.523
## V4 0.999
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Ridge parameter: 0.0193945, chosen automatically, computed using 2 PCs
##
## Degrees of freedom: model 2.508 , variance 2.223 , residual 2.793
LR
cigarettes.predicted.lr <- predict(cigarettes.lr,cigarettes_data)
cigarettes.compare <- cbind(actual=cigarettes_data$V5,cigarettes.predicted.lr)
show(head(cigarettes.compare,5))
## actual cigarettes.predicted.lr
## 1 13.6 14.382689
## 2 16.6 15.671090
## 3 23.5 26.392608
## 4 10.2 9.018481
## 5 5.4 5.972616
cigarettes.accuracy <- mean(apply(cigarettes.compare,1,min)/
apply(cigarettes.compare,1,max))
cigarettes.accuracy
## [1] 0.9026594
RR with \(\lambda = 0.019\) chosen automatically
cigarettes.predicted.rr <- predict(cigarettes.rr,cigarettes_data)
cigarettes.compare1 <- cbind(actual=cigarettes_data$V5,cigarettes.predicted.rr)
show(head(cigarettes.compare1,5))
## actual cigarettes.predicted.rr
## 1 13.6 13.829671
## 2 16.6 15.441180
## 3 23.5 26.483468
## 4 10.2 9.279022
## 5 5.4 6.163817
cigarettes.accuracy1 <- mean(apply(cigarettes.compare1,1,min)/
apply(cigarettes.compare1,1,max))
cigarettes.accuracy1
## [1] 0.8974755
Compare the two results, model cigarettes.lr is more accurate than cigarettes.rr.
## [1] 0.9026594
We use function rq in package quantreg, for different \(\tau\) from \(0.2\) to \(0.8\) with step \(0.1\). The results as shown below.
library(SparseM)
library(quantreg)
cigarettes.qr<- rq(formula = V5~V2+V3+V4,tau=2:8/10,data = cigarettes_data)
summary(cigarettes.qr)
##
## Call: rq(formula = V5 ~ V2 + V3 + V4, tau = 2:8/10, data = cigarettes_data)
##
## tau: [1] 0.2
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 3.65249 -4.69097 14.76883
## V2 0.54733 0.53517 1.31290
## V3 4.61991 -8.31325 5.43916
## V4 -2.93206 -16.03331 14.87044
##
## Call: rq(formula = V5 ~ V2 + V3 + V4, tau = 2:8/10, data = cigarettes_data)
##
## tau: [1] 0.3
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 3.91020 -7.49230 12.85782
## V2 0.84280 0.54215 0.98870
## V3 -0.59492 -3.50732 5.43159
## V4 -1.82595 -10.37780 12.09846
##
## Call: rq(formula = V5 ~ V2 + V3 + V4, tau = 2:8/10, data = cigarettes_data)
##
## tau: [1] 0.4
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 2.46006 -7.72342 10.93642
## V2 0.81179 0.73404 1.44977
## V3 0.41790 -16.01431 3.65981
## V4 -0.67556 -6.87898 11.40989
##
## Call: rq(formula = V5 ~ V2 + V3 + V4, tau = 2:8/10, data = cigarettes_data)
##
## tau: [1] 0.5
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 1.84400 -5.88455 12.93210
## V2 1.00921 0.51541 1.57522
## V3 -2.37725 -13.95511 6.75901
## V4 0.39012 -8.28515 8.31436
##
## Call: rq(formula = V5 ~ V2 + V3 + V4, tau = 2:8/10, data = cigarettes_data)
##
## tau: [1] 0.6
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 0.84449 -9.97981 13.63223
## V2 1.17639 0.33371 1.55546
## V3 -5.34000 -13.32838 9.46835
## V4 2.37129 -9.99074 11.59710
##
## Call: rq(formula = V5 ~ V2 + V3 + V4, tau = 2:8/10, data = cigarettes_data)
##
## tau: [1] 0.7
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 0.19785 -8.38333 17.38390
## V2 1.38739 0.26367 1.42203
## V3 -9.08348 -9.41974 9.95496
## V4 4.28449 -13.39838 11.08982
##
## Call: rq(formula = V5 ~ V2 + V3 + V4, tau = 2:8/10, data = cigarettes_data)
##
## tau: [1] 0.8
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) -4.90728 -8.97867 13.50917
## V2 0.93496 0.27969 1.49528
## V3 -1.14784 -10.40564 9.99748
## V4 8.52023 -8.66037 12.21889
cigarettes.predicted.qr <- predict(cigarettes.qr,cigarettes_data)
cigarettes.compare.ql = cbind(actual=cigarettes_data$V5,
cigarettes.predicted.lr,cigarettes.predicted.qr)
head(cigarettes.compare.ql,5)
## actual cigarettes.predicted.lr tau= 0.2 tau= 0.3 tau= 0.4 tau= 0.5
## 1 13.6 14.382689 12.454072 13.482943 13.600000 14.413823
## 2 16.6 15.671090 14.099861 14.767163 15.152674 15.898202
## 3 23.5 26.392608 25.925627 25.690724 26.712576 27.547151
## 4 10.2 9.018481 8.405556 8.559524 8.607416 8.686963
## 5 5.4 5.972616 4.970212 5.400000 5.316325 5.400000
## tau= 0.6 tau= 0.7 tau= 0.8
## 1 15.175579 16.169752 15.683453
## 2 16.600000 17.453962 18.154747
## 3 27.823183 28.094011 30.550383
## 4 8.878342 9.187039 9.710097
## 5 5.775388 6.306739 6.528752
Thanks for knitr designed by(Xie 2015).
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.name/knitr/.