Browsing by Author "De Leon, Alexander R."
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item Open Access Analysis of Misclassified Categorical Response via Incomplete Surrogate Variables and Likelihood Method(2020-12-20) Yu, Zheng; Shen, Hua; Shen, Hua; De Leon, Alexander R.; Kopciuk, Karen A.Misclassification of a dependent categorical variable often occurs in observational studies due to imperfect measuring procedures, and it may result in potential threats to the validity of the analytic results. We first investigate the consequences of naively ignoring the misclassification issue in response variable on parameter estimation using a range of naive methods and ad hoc methods. Then we develop a robust algorithm utilizing the surrogate variables to enable the estimation of the covariate effects in regression models under the framework of latent variable models in the absence of validation data. The resulting estimates are utilized in prediction and estimation of the average treatment effect (ATE). The estimation methods of ATE examined include outcome regression, G-computation, propensity score (PS) stratification, inverse probability weighting (IPW) and augmented inverse probability weighting (AIPW). Variance estimation of ATE is obtained through bootstrap method. Moreover, we extend the algorithm to cope with the complication that some of the surrogate measurements are missing. Simulation studies represent of various scenarios are conducted to assess the performances of the proposed methods with a binary latent response variable. Based on the simulation studies, we show that the proposed method outperforms other approaches and corrects for both problems of misclassification and missingness simultaneously for a binary response variable, ensuring valid statistical inferences. An application to the stimulating study on breast cancer is given for illustration. Discussion and future work are outlined in the end.Item Open Access Bi-level Variable Selection in Semiparametric Transformation Models for Right Censored Data and Cure Rate Data(2019-01-25) Zhong, Wenyan; Wu, Jingjing; Lu, Xuewen; Chen, Gemai; De Leon, Alexander R.; Shen, Hua; Kong, LinglongIn this dissertation, I investigated the bi-level variable selection in the semi-parametric transformation models with right-censored data and the semi-parametric mixture cure models with right censored and cure rate data, respectively. The transformation models under the consideration include the proportional hazards model and the proportional odds model as special cases. In the framework of regularized regression, we proposed a computationally efficient estimation method that selects significant groups and variables simultaneously. Three penalty functions, i.e., Group bridge, adaptive group bridge and composite group bridge penalties which can integrate grouping structure of covariates, were adopted for bi-level variable selection purpose. In Chapter 2, the objective function, which consists of the negative weighted partial log-likelihood function plus one of the three penalties, has a parametric form and is convex with respect to the parameters. This leads to an easy implementation of the optimization algorithm for which convergence is guaranteed numerically. We showed that all the three proposed penalized estimators achieve the group selection consistency, and moreover, the adaptive group bridge estimator and the composite group bridge estimator enjoy the oracle properties, i.e., both estimators possess the group and individual selection consistency simultaneously and are asymptotically normal as if the true unimportant covariates were known. In Chapter 3, we further extended the bi-level variable selection procedure to the semi-parametric mixture cure models. The semi-parametric mixture cure models are formulated by a logistic regression for modelling the cure fraction and a class of semi-parametric transformation models for modelling the survival function of remaining uncured individuals. Incorporating a cure fraction, the proposed model is more flexible than the standard survival models, and the proposed approach is capable to distinguish important covariates and groups from unimportant ones and estimate covariates’ effects simultaneously in both the incidence and the latency parts. We proposed a new iterative E-M algorithm to handle two latent variables. We illustrated the finite sample performance of the proposed methods via simulations and two real data examples. Simulation studies indicated that the proposed methods perform well even with relatively high dimension of covariates.Item Open Access Causal Inference with Mismeasured Confounders or Mediators(2021-09-23) Ren, Mingchen; De Leon, Alexander R.; Yan, Ying; Tekougang, Thierry Chekouo; Shen, Hua; Kopciuk, Karen A.; He, WenqingThis thesis includes three projects to correct measurement error in covariates or mediators when estimating causal estimands under survival model, marginal structure model and covariate balancing models. In Chapter 2, we decompose the causal effect on difference scale with more than one mediator under additive hazard model, and correct the bias caused by error-prone covariates and mediators. The simulation study shows the good performance of the proposed method under various measurement error settings. The method is further applied to a real data study of HIV-infected adults (Hammer et al., 1996), where a causal interpretation of the mediated effects is given. The asymptotic distributions of estimators are provided in the appendix. In Chapter 3, we develop two estimation methods to correct the bias of average treatment effect via marginal structural model when covariate variables are subject to measurement error. We consider the scenario that the confounders and exposures are time-varying and the confounders are error-prone. The first approach depends on a logistic-based correction method, which corrects the error-prone confounders in the logistic regression model of the treatment variable (Stefanski & Carroll, 1987). The second one relies on the simulation-extrapolation-based correction method (Shu & Yi, 2019d), which corrects the error-prone average treatment effect directly and could be used when a closed form of weight can not be found. Simulation studies are provided and the proposed approaches are illustrated by a real data analysis of the Women’s Interagency HIV Study in the United States from 1993 to 2015. In Chapter 4, when pretreatment covariates are subject to measurement error, we apply the augmented simulation extrapolation estimation developed by Shu and Yi (2019d) to correct the estimates of average treatment effect on the treated via entropy balancing and covariate balancing propensity score methods. The correction method is illustrated by a real data set.Item Open Access Cluster analysis of correlated non-Gaussian continuous data via finite mixtures of Gaussian copula distributions(2019-06-12) Burak, Katherine L.; De Leon, Alexander R.; Wu, Jingjing; Kopciuk, Karen Arlene; Lu, XuewenModel-based cluster analysis in non-Gaussian settings is not straightforward due to a lack of standard models for non-Gaussian data. In this thesis, we adopt the class of Gaussian copula distributions (GCDs) to develop a flexible model-based clustering methodology that can accommodate a variety of correlated, non-Gaussian continuous data, where variables may have different marginal distributions and come from different parametric families. Unlike conventional model-based approaches that rely on the assumption of conditional independence, GCDs model conditional dependence among the disparate variables using the matrix of so-called normal correlations. We outline a hybrid approach to cluster analysis that combines the method of inference functions for margins (IFM) and the parameter-expanded EM (PX-EM) algorithm. We then report simulation results to investigate the performance of our methodology. Finally, we highlight the applications of this research by applying this methodology to a dataset regarding the purchases made by clients of a wholesale distributor.Item Open Access Nonparametric Change Point Detection for Univariate and Multivariate Non-Stationary Time Series(2018-12-19) Guan, Zixiang; Chen, Gemai; Lu, Xuewen; De Leon, Alexander R.; Viveros-Aguilera, Roman; Fung, Tak ShingThis thesis investigates the change point detection problem for non-stationary time series in a nonparametric way. Two topics, which are nonparametric change point detection method for univariate time series and multivariate time series, are studied respectively. In the first topic, we consider a nonparametric method for detecting change points in non-stationary time series. The proposed method will divide the time series into several segments so that between two adjacent segments, the normalized spectral density functions are different. The theory is based on the assumption that within each segment, time series is linear process, which means that our method works not only for causal and invertible ARMA process, but also can be applied to non-invertible Moving Average process. We show that our estimations for change points are consistent. Also, a Bayesian information criterion is applied to estimate the member of change points consistently. Simulation results as well as empirical results will be presented. A nonparametric method for detecting change points in multivariate non-stationary time series is our second topic. Under the assumption that non-stationary time series consists of several stationary ones, samples will be segmented into several stationary parts so that between two adjacent time series, the normalized eigenvalues of spectral density matrices are different. Also, we assume that stationary time series are multivariate linear processes, which means that our method works for several classic multivariate time series model, e.g., ARMA process, non-invertible moving average process, which is not considered much in literature. We show that our estimations for change points are consistent. Also, a Bayesian information criterion is applied to estimate the number of change points consistently, which is similar to the univariate case.Item Open Access Treatment Effect Models for Subgroup Analysis with Missing Data(2018-08-31) Fu, Yunting; Shen, Hua; Kopciuk, Karen Arlene; De Leon, Alexander R.; Sajobi, Tolulope T.The need for subgroup analysis in clinical trials in various contexts is increasing and data-driven approaches for subgroup identification based on statistical principles are desired. Among all subgroup identification methods, we focus on the treatment effect models that estimate the treatment contrast, since these models are intuitive and useful to interpretation. We evaluate and address the consequences of having missing data when using the Interaction Trees (IT), Qualitative Interaction Trees (QUINT) and Subgroup Identification based on Differential Effect Search (SIDES) methods. Simulation studies are used to demonstrate the accuracy of variable selection and bias in treatment effects when using complete, incomplete and imputed data across various scenarios when the sample size, proportion of missingness and imputation methods differ. We also applied these methods to a non-small cell lung cancer (NSCLC) dataset obtained from a retrospective study. Our results indicate that both IT and QUINT methods work equivalently well in most situations, while the SIDES results are, in general, less comparable due to the different mechanisms of the methods. The treatment effect models should be chosen based on the objective of the study, the sample size, the number of variables containing missing data, and the data structure. In terms of the methods for addressing missing data, an assumption of the data structure needs to be made during the method selection. MissForest is an excellent choice for a dataset with a tree-based structure, while MI methods would be a good fit for the other situations.