Variable Selection Using the Method of the Broken Adaptive Ridge Regression

dc.contributor.advisorLu, Xuewen
dc.contributor.advisorLong, Quan
dc.contributor.authorChan, Christian Zhao Yang
dc.contributor.committeememberGreenberg, Matthew
dc.contributor.committeememberLiao, Wenyuan
dc.date2024-11
dc.date.accessioned2024-07-09T20:40:01Z
dc.date.available2024-07-09T20:40:01Z
dc.date.issued2024-07-08
dc.description.abstractIn this thesis, we consider variable selection methods incorporating the Broken Adaptive Ridge Regression under a few different model frameworks that deal with joint modelling of recurrent and terminal events, high-dimensional covariates, low-dimensional categorical covariates, and low-dimensional continuous covariates in generalized partly linear models and partly linear Cox proportional hazards models. With data being more easily available than ever in the digital era, it is important that only relevant variables are retained when building a statistical model. In Chapter 2, we implement a novel method to simultaneously perform variable selection and estimation in the joint frailty model of recurrent and terminal events using the Broken Adaptive Ridge (BAR) penalty. The BAR penalty can be summarized as an iteratively reweighted squared $L_2$-penalized regression, which approximates the $L_0$-regularization. Our method allows for the number of covariates to diverge with the sample size. Under certain regularity conditions, we prove that the BAR estimator is consistent and asymptotically normally distributed, which are known as the oracle properties in the variable selection literature. In our simulation studies, we compare our proposed method to the Minimum Information Criterion (MIC) method. We apply our method to the Medical Information Mart for Intensive Care (MIMIC-III) database, with the aim of investigating which variables affect the risks of repeated ICU admissions and death during ICU stay. In Chapter 3, motivated by the CATHGEN data, we develop a new method for simultaneous variable selection and parameter estimation under the context of generalized partly linear models for data with high-dimensional covariates. The method is referred to as the BAR estimator, which is an approximation of the $L_0$-penalized regression by iteratively performing reweighted squared $L_2$-penalized regression. The generalized partly linear model extends the generalized linear model by including a non-parametric component to construct a flexible model for modeling various types of covariates, including linear and non-linear effects in different dimensions. We employ the Bernstein polynomials as the sieve space to approximate the non-parametric functions so that our method can be implemented easily using the existing R packages. Extensive simulation studies suggest that the proposed method performs better than other commonly used penalty-based variable selection methods. We apply the method to the CATHGEN data with a binary response from a coronary artery disease study, which motivated our research, and obtain new findings in both high-dimensional genetic and low-dimensional non-genetic covariates. In Chapter 4, we implement the BAR penalty under the partly linear Cox proportional hazards model with right-censored data, where our model framework considers three sets of covariates: high-dimensional covariates, low-dimensional categorical covariates, and low-dimensional continuous covariates. The low-dimensional continuous covariates are considered to have possible non-linear effects. Our variable selection method can be easily implemented by using existing R packages. From our simulation studies, we observe that our method performs better than other existing variable selection methods. Finally, we apply our method to the acute respiratory disease syndrome (ARDS) to discover relevant metabolites that contribute to the risk of dying in the ICU. Finally, we conclude the results from all three projects in Chapter 5.
dc.identifier.citationChan, C. Z. Y. (2024). Variable selection using the method of the broken adaptive ridge regression (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.
dc.identifier.urihttps://hdl.handle.net/1880/119142
dc.identifier.urihttps://doi.org/10.11575/PRISM/46738
dc.language.isoen
dc.publisher.facultyScience
dc.publisher.institutionUniversity of Calgary
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subjectVariable Selection
dc.subjectSurvival Analysis
dc.subjectNonlinear Approximation
dc.subject.classificationBiostatistics
dc.titleVariable Selection Using the Method of the Broken Adaptive Ridge Regression
dc.typedoctoral thesis
thesis.degree.disciplineMathematics & Statistics
thesis.degree.grantorUniversity of Calgary
thesis.degree.nameDoctor of Philosophy (PhD)
ucalgary.thesis.accesssetbystudentI require a thesis withhold – I need to delay the release of my thesis due to a patent application, and other reasons outlined in the link above. I have/will need to submit a thesis withhold application.
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
ucalgary_2024_chan_christian.pdf
Size:
2.65 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: