Abstract:
The importance of robust variable selection and regularization as solutions to the collinearity influential
high leverage points’ adverse effects in a quantile regression (QR) setting cannot be overemphasized,
just as the diagnostic tools that identify these high leverage points. In the literature,
researchers have dealt with variable selection and regularization quite extensively for penalized
QR that generalizes the well-known least absolute deviation (LAD) procedure to all quantile levels.
Unlike the least squares (LS) procedures, which are unreliable when deviations from the Gaussian
assumptions (outliers) exist, the QR procedure is robust to Y-space outliers. Although QR is
robust to response variable outliers, it is vulnerable to predictor space data aberrations (high leverage
points and collinearity adverse effects), which may alter the eigen-structure of the predictor
matrix. Therefore, in the literature, it is recommended that the problems of collinearity and high
leverage points be dealt with simultaneously. In this thesis, we propose applying the ridge regression
procedure (RIDGE), LASSO, elastic net (E-NET), adaptive LASSO, and adaptive elastic net
(AE-NET) penalties to weighted QR (WQR) to mitigate the effects of collinearity and collinearity
influential points in the QR setting. The new procedures are the penalized WQR procedures
i.e., the RIDGE penalizedWQR (WQR-RIDGE), the LASSO penalizedWQR (WQR-LASSO), the
E-NET penalized WQR (WQR-E-NET) and the adaptive penalized QR procedures (the adaptive
LASSO penalized QR (QR-ALASSO) and adaptive E-NET penalized QR (QR-AE-NET procedures
and their weighted versions). The penalized WQR procedures are based on the computationally
intensive high-breakdown minimum covariance determinant (MCD) determined weights and the
adaptive penalized QR procedures are based on the RIDGE penalized WQR (WQR-RIDGE) estimator
based adaptive weights. Under regularity conditions, the adaptive penalized procedures
satisfy oracle properties. Although adaptive weights are commonly based on the RIDGE regression
(RR) estimator in the LS setting when regressors are collinear, this estimator may be plausible for the symmetrical distributions at the ℓ1-estimator (RQ at τ = 0.50) rather than at extreme quantile
levels. We carried out simulations and applications to well-known data sets from the literature
to assess the finite sample performance of these procedures in variable selection and regularization
due to the robust weighting formulation and adaptive weighting construction. In the collinearityenhancing
point scenario under the t-distribution, the WQR penalized versions outperformed the
unweighted procedures with respect to average shrunken zero coefficients and correctly fitted models.
Under the Gaussian and t-distributions, at predictor matrices with collinearity-reducing points,
the weighted regularized procedures dominate the prediction performance (WQR-LASSO forms
best). In the collinearity-inducing and collinearity-reducing points scenarios under the Gaussian
distribution, the adaptive penalized procedures outperformed the non-adaptive versions in prediction.
Under the t-distribution, a similar performance pattern is depicted as in the Gaussian scenario,
although the performance of all models is adversely affected by outliers. Under the t-distribution,
the QR-ALASSO andWQR-ALASSO procedures performed better in their respective categories.