Bi-level Variable Selection and Dimension-reduction Methods in Complex Lifetime Data Analytics

Date
2019-12
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
For the high-dimensional data, the number of covariates can be large and diverge with the sample size. In many scientific applications, such as biological studies, the predictors or covariates are naturally grouped. In this thesis, we consider bi-level variable selection and dimension-reduction methods in complex lifetime data analytics under various survival models, and study their theoretical properties and finite sample performance under different scenarios. Specifically, in Chapter 2, we focus on the Andersen-Gill regression model for the analysis of recurrent event data with group covariates when the number of covariates is fixed. In order to study the effects of the covariates on the occurrence of recurrent events, a bi-level penalized group selection method is introduced to address the group selection problem. A general group-bridge penalty function with varying weights is invoked to achieve the goal. It is shown that the performance of the bi-level selection depends on the weights. In order to select covariates more efficiently, especially for identifying the important covariates in important groups, adaptive weights are required. The asymptotic oracle properties of the proposed method are investigated in the case of fixed number of covariates. Three methods of tuning parameter selection are proposed. Our simulation studies show that the proposed method performs well in selecting important groups and important individual covariates in these groups simultaneously, and outperforms other popular group selection methods and the traditional unpenalized Wald testing method. In Chapter 3, we extend the proposed method of recurrent event model to the case of a diverging number of covariates. We demonstrate that the proposed method has selection consistency and the penalized estimators have asymptotic normality in the case of diverging a number of covariates. Simulation studies show that the proposed method performs well and the results are consistent with the theoretical properties. We illustrate the method using a real life data set from medicine. In Chapter 4, by imitating the group variable selection procedure with bi-level penalty, we propose a new variable selection method for the analysis of multivariate failure time data, with an adaptive bi-level variable selection penalty function. In the regression setting, we treat the coefficients corresponding to the same prediction variable as a natural group, then consider variable selection at the group level and individual level simultaneously. The proposed adaptive bi-level variable selection method can select a prediction variable in two different levels: the first level is the group level, where the predictor is important to all failure types; the second level is the individual level, where the predictor is only important to some failure types. An algorithm based on cycle coordinate descent (CCD) is proposed to carry out the proposed method. Based on the simulation results, our method outperforms the classical penalty methods, especially in terms of removing unimportant variables for all different failure types. We obtain the asymptotic oracle properties of the proposed variable selection method in the case of diverging number of covariates. We construct a generalized cross validation (GCV) method for the tuning parameter selection and assess model performance based on model errors. We also illustrate the proposed method using a real life data set. Sufficient dimension reduction (SDR) is a powerful tool for dimension reduction in regression and classification problems, which replaces the original covariates with the minimal set of their linear combinations. In Chapter 5, we propose a novel penalty function, called adaptive group composite Lasso (AGCL), for the group sparse sufficient dimension reduction problem. By incorporating this new penalty with the sufficient dimension reduction method, we propose an adaptive group composite Lasso penalized dimension reduction method to simultaneously achieve sufficient dimension reduction and group variable selection in the case of diverging number of covariates. We investigate the asymptotic properties of the penalized sufficient dimension reduction estimators when the number of covariates diverges with the number of sample size. We show that the proposed method can select important groups and individual variables simultaneously. We compare the proposed method with other sparse sufficient dimension reduction methods using simulation studies. The results show that the proposed method outperforms the other methods in terms of removing unimportant covariates, especially in removing the unimportant groups. A real data example is used for illustration.
Description
Keywords
Variable selection, Group variable selection, Bi-level penalty, Dimension reduction, Recurrent events, Multivariate failure time data, Diverging number of covariates, Oracle property
Citation
Cai, K. (2019). Bi-level variable selection and dimension-reduction methods in complex lifetime data analytics (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.