Browsing by Author "de Leon, Alexander R."
Now showing 1 - 12 of 12
Results Per Page
Sort Options
Item Open Access A goodness-of-fit test for the bivariate necessary-but-not-sufficient relationship(2020-07-31) Ilagan, Michael John; de Leon, Alexander R.; Kopciuk, Karen A.; Ngamkham, Thuntida; Godley, JennyIn the social sciences, theory often casts bivariate relationships between constructs in terms of logical asymmetries. For example, in psychology, one theory is that intelligence is necessary but not sufficient for creativity. But as average-based linear models fail to accommodate nuances of logical asymmetries, a mismatch between theory and method is common in the literature. Recent methodological work proposed the Linear Ceiling and Floor Probability Region (LCFPR) model, which analyzes bivariate relationships in terms of necessity and sufficiency. However, an erroneous treatment of nested models and a lack of a formal goodness-of-fit test remain unaddressed in the LCFPR framework. In this thesis, I propose a goodness-of-fit test for LCFPR that addresses such shortcomings. A simulation study shows that, using a nonparametric quantile, the power and size of the test are largely acceptable. Analyses of real datasets demonstrate the proposed procedure. Conclusions and future directions are outlined in the final chapter.Item Open Access A likelihood-based approach to estimating sensitivity and specificity with binocular diagnostic data-application in ophthalmology(2005) Guo, Meijie; de Leon, Alexander R.Item Open Access Analysis of Metabolomics Data via Mixed Models(2020-08) Ren, Austin Mu Qing; de Leon, Alexander R.; Kopciuk, Karen Arlene; Vogel, Hans J.; Sajobi, Tolulope T.Generalized linear mixed models have been widely studied and used in many different disciplines, yet very little application of them can be found with metabolomics data analysis. Traditional methods of cancer classification used to determine disease severity, such as biopsies, can be harmful to the health of the patients. Classification based on metabolomics data analysis demonstrates a main advantage as it only requires non-invasive procedures such as the drawing of a small amount of blood from patients. However, data analysis in cancer research often requires the handling of multiple correlated measurements of disease severity. The methods that are most commonly used with metabolomics data, such as partial least squares discriminant analysis, were traditionally designed to handle univariate data only, and can be very challenging to work with when applied to data with multiple correlated outcomes. Therefore, different methods should be considered for metabolomics data analysis in cancer classification. In this thesis, we proposed bivariate generalized linear mixed models with binary outcomes using the probit link function for the analysis of metabolomics data. The models were specifically designed to handle multiple correlated outcomes via the inclusion of subject-specific random intercepts. Random slopes were not included in the models to reduce complexity. We specifically designed three settings for the random intercept models: shared, independent, and correlated between the outcomes. An extensive number of simulations were carried out to test our models' parameters, including: standard deviation and correlation of the distribution of the random intercepts, correlation between the covariates as well as correlation between the covariates and the outcomes, the proportion of data missing among the covariates, misspecified distribution of the random intercepts, and misspecified conditional correlation between the outcomes. In addition, we also incorporated the nearest neighbors algorithm as a missing values imputation method and LASSO as a feature selection method to our mixed models in order to handle the common issues of high dimensional covariates and missing values in metabolomics data. Finally, our proposed mixed models were applied to a real dataset with prostate cancer patients to evaluate our models' performance on outcome predictions.Item Open Access ANOVA extensions for mixed discrete and continuous data(2006) Zhu, Yongtao; de Leon, Alexander R.Item Open Access Causal Inference with Misclassified Confounder and Missing Data in the Surrogates(2019-07-05) Fan, Zheng; Shen, Hua; Li, Haocheng; Wallace, Michael; de Leon, Alexander R.The causal inference pertains to statistical analyses that researchers evaluate causal effect based on precisely measured data. In an observational study interest often lies in estimating the causal effects which are more naturally interfered by potential confounding factors. However, some confounding variables may be measured with error or classified into an incorrect group or category. It could occur due to the difficulty of tracking a long-term average quantity, unavoidable recall bias in answering a questionnaire, unwillingness of answering sensitive questions, unaffordability of precise measurements, etc. We first investigate the consequences of naively ignoring the misclassification issue in confounding variables on the estimation of average treatment effect (ATE). We then develop an EM algorithm through the latent variable model for parameter estimation and subsequent removal of the estimation bias of ATE in the absence of validation data set. Moreover, we adapt the proposed method to address the additional complication when some surrogates are only partially observed. Variance estimation of ATE is obtained through bootstrap method. Simulation studies are reported to assess the performances of the proposed methods with both continuous and discrete outcome variables. The estimation methods we examined include outcome regression, G-computation, propensity score (PS) matching, PS stratification, inverse probability weighting (IPW) and augmented inverse probability weighting (AIPW). Lastly, we analyze a breast cancer data to illustrate the proposed methods. Discussion and future work are outlined in the end.Item Open Access COM-Poisson Clustering of Correlated Bivariate Over- and Under-Dispersed Counts(2016) Raz, Saifa; de Leon, Alexander R.; Alp, Osman; Wu, Jingjing; Qiu, ChaoRuan (2015) recently proposed using a finite mixture model with components modelled by Gaussian copula with Poisson margins (BP-GCD) as the basis for model-based clustering of bivariate correlated counts. Although the Poisson distribution is a useful model for modelling count data, the distribution is constrained by its equi-dispersion assumption. Motivated by this limitation, the thesis introduces a more flexible model, one with Gaussian copula models as components but with Conway-Maxwell Poisson (COM-Poisson) margins (BCOM-GCD) which allows the accounting of under- and over-dispersion in the correlated count data. We test our proposed method on a variety of simulated settings and on data from the Australian National Health Survey to explore the impact of ignoring the non-equidispersion. Our simulations and real-life data analysis indicate that using BCOM-GCDs as mixture components instead of BP-GCDs provides a better and more flexible approach for performing model-based clustering for under- or over-dispersed counts.Item Open Access Conditional Dependence in Joint Modelling of Longitudinal Non-Gaussian Outcomes(2016-01-07) Roy, Mili; de Leon, Alexander R.; Ambagaspitiya, Rohana; Palacios-Derflingher, Luz Maria; Sun, BingruiThe thesis is motivated by the limitations of conventional joint modelling strategies based on linear and generalized linear mixed models (LMMs/GLMMs). The class of so-called Gaussian copula mixed models (GCMMs), introduced by Wu and de Leon (2014) to generalize conventional LMMs/GLMMs to non-Gaussian settings, was adopted, and simulations were conducted to investigate the impact of incorrectly ignoring the conditional dependence between outcomes, given the random effects, on the performance of maximum likelihood estimates (MLEs). A variety of scenarios involving shared or correlated random effects were considered, and implementation of the correct and misspecified joint models was done in SAS’s PROC NLMIXED. Although MLEs of fixed effects were only slightly impacted by the conditional independence misspecification, MLEs based on the correct GCMM yielded generally better performances than those from the incorrect model. Data on pediatric pain (Weiss, 2005; Withanage et al., 2015) were used for illustration.Item Open Access Contributions to Copula Modeling of Mixed Discrete-Continuous Outcomes(2013-07-10) Beilei, Wu; de Leon, Alexander R.This thesis includes three topics that are concerned with joint modeling and analysis of multiple correlated mixed discrete and continuous outcomes. The first topic is concerned with the analysis of multiple correlated discrete and continuous outcomes that are observed on the same subjects over time in the case of longitudinal studies, or from clustered subjects in cross-sectional settings. Joint analysis of such disparate responses (i.e., mixed discrete and continuous outcomes) is problematic in practice due mainly to the difficulty of defining or constructing a joint model. Our proposed approach is based on a new generalized linear mixed model (GLMM) that accounts for associations between the outcomes (of the same or of different types) for the same subject at the same time point, and/or at different time points for the longitudinal data, or between mixed outcomes within clusters, including the intrinsic association between the mixed outcomes for the same subject, in clustered settings. A latent-variable approach is adopted to sidestep complications of direct application of copula models to discrete data. The approach yields regression parameters that are marginally meaningful, and permits the adoption of flexible non-Gaussian distributions for the mixed outcomes as well as for the random effects. Special cases of our model include conventional GLMMs previously proposed by a number of authors, among whom are Faes (2013), Gueorguieva (2013), and Lin et al. (2010). Full and pairwise likelihood estimation methods are implemented for the model using PROC NLMIXED in SAS. The proposed methodology is illustrated using individual panel data on the wages, work hours, and union memberships, and data on fetal malformation and weight in a developmental toxicity study on mice. In the second topic, we adopt the continuous-ation" approach of Machado and Santos Silva (2005) and Denuit and Lambert (2005) to construct a Gaussian copula joint model for mixed discrete and continuous outcomes. The joint model does not require a latent variable formulation of the discrete outcomes, and does not suffer from the complications of directly using discrete margins in copula models (Genest and Neslehova, 2007). A surrogate likelihood approach to estimation is implemented for the model and empirical results concerning the relative bias and efficiency of the resulting estimates are reported. The proposed methodology is illustrated using data on burn injuries. The third and final topic concerns a methodology for calculating the sample size in clinical trials with multiple mixed binary and continuous co-primary endpoints. The Gaussian copula joint model we proposed permits the adoption of flexible marginal distributions for the mixed endpoints, and includes the conditional grouped continuous model (CGCM) - a popular model for mixed endpoints based on the multivariate Gaussian distribution - as a special case. The proposed methodology adopts a latent variable description of the binary endpoints and makes use of tests on the latent means to test for differences in the binary proportions. This approach results in a simple and streamlined methodology akin to that for multiple continuous co-primary endpoints studied in Sozu et al. (2011). In addition, our approach is more powerful than that recently proposed by Sozu et al. (2012), in that it yields smaller sample sizes at powers comparable to those considered in Sozu et al. (2012). We report the results of empirical comparisons as well as a numerical illustration of our methodology.Item Open Access Copula-based regression models for correlated mixed discrete and continuous outcomes(2009) Wu, Beilei; de Leon, Alexander R.Item Open Access Evaluation of binocular screening tests: a copula approach via 'continued' binary outcomes(2010) Zhu, Yifan; de Leon, Alexander R.; Kim, Hyang MiItem Open Access Joint modeling of clustered binary data with crossed random effects via the Gaussian copula mixed model(2019-07-11) Jaman, Ajmery; de Leon, Alexander R.; Wu, Jingjing; Bingrui, Cindy Sun; Ngamkham, ThuntidaModels with crossed random effects are common in reader-based diagnostic studies, where the same group of readers evaluate patients for certain diseases; an example is diabetic retinopathy study in Alberta, Canada. Although generalized linear mixed models (GLMMs) are well developed for non-Gaussian responses (e.g., binary outcomes) with crossed random effects, evaluation of the marginal likelihood is still technically and computationally demanding and can become prohibitive in applications, since the data cannot be grouped into independent blocks. The available estimation methods are also not free from problems. A recent approach involves application of data cloning (DC) to obtain maximum likelihood (ML) estimates using a Bayesian framework. Their approach is proved to be superior over the other two alternatives they considered in terms of providing relatively unbiased and efficient parameter estimates. However, this approach is based on a multivariate latent Gaussian description of the multiple correlated binary outcomes. In this thesis, we relax this assumption by allowing for disparate non-Gaussian latent variables for the binary responses, and propose a joint modeling via the Gaussian copula mixed model (GCMM). We applied maximum pairwise likelihood (PL) estimation instead of doing full ML analysis to reduce computational complexities. We conducted simulation studies with a setting analogous to the diabetic retinopathy data to see the performance of PL estimators for GCMM with crossed random effects. Simulation results suggest that although the estimation of regression coefficients and correlation parameter exhibit no problem, a much bigger sample size is required for the other scale parameters to provide reasonably accurate approximate results. We also analyzed the retinopathy data with the proposed approach considering three different conditional margins.Item Open Access QDA Classification for Two-Component Mixture with Data of Rare and Weak Signal(2019-12-20) Chen, Hanning; Wu, Jingjing; Shen, Hua; de Leon, Alexander R.; Liu, Shawn X.This thesis deals with the two-class classification problem for data with rare and weak signals, under the modern setup of p >> n (large p small n). Considering the two-component mixture of Gaussian features with different random mean vector of rare and weak signals but common covariance matrix (homoscedastic Gaussian), Fan et al. (2013) discussed the optimality of linear discriminant analysis (LDA) and proposed an efficient variable selection and classification procedure. This thesis is an extension of their work in the sense that we assume the two components have different random covariance matrix (heterogenous Gaussian) of rare and weak signals. As a start of this research, for simplicity we assume the two population mean vectors are the same in order to assess the pure effect of different covariance matrix. In this thesis, we propose intuitively to use quadratic discriminant analysis (QDA) for the classification of data with rare and weak signals. In theoretical aspect, we first derive the detection boundary of QDA at population level, which separates the region of successful classification from the region of unsuccessful classification under the ideal case that the covariance matrix is known. When the covariance matrix is unknown, we then obtain a subregion where successful classification is impossible (for all classifiers) which also forms a subregion of unsuccessful classification region of QDA. For data of rare signals, variable selection will mostly improve the performance of statistical procedures. Thus in implementation aspect, we propose a variable selection procedure for QDA based on the Higher Criticism Thresholding (HCT) that was proved to be efficient for LDA in Fan et al. (2013). Finally, we conduct extensive simulation studies in order to demonstrate and explore the successful and unsuccessful classification regions of QDA and examine the effectiveness of the proposed HCT procedure.