Browsing by Author "De Leon, Alexander"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Open Access Correlated Data Analysis via Variants of EM Algorithm: Application to Data on Physical Activity and Maternal Health(2024-09-13) Li, Jia; De Leon, Alexander; Li, Haocheng; Wu, Jingjing; Lu, Xuewen; Chu, Man-Wai; Sheng, XiaomingThe thesis concerns the analysis of correlated data on multiple variables via the EM algorithm and its variants. Specifically, we focus on (cross-sectional) multivariate iid data comprising a disparate mix of binary and non-Gaussian variables (including the special case of multivariate binary data), and on longitudinal data on multiple Gaussian responses in a regression setting. For the case with correlated data on multiple binary variables and that with mixed data on binary and non-Gaussian continuous variables, we introduced the class of meta-probit (MPMs) and extended meta-probit models (XMPMs) as generalizations to non-Gaussian settings of the grouped continuous model (GCM) – also known as the multivariate probit model (MVPM) – and its extension to mixed data, the conditional GCM (CGCM). Con- structed from Gaussian copula distributions (GCDs), a class of meta-Gaussian distributions based on the Gaussian copula, MPMs and XMPMs broaden the sphere of applications of joint models to settings that involve complex non-standard data on variables with different measurement scales and with marginal distributions, latent and otherwise, from different parametric families. To avoid the computational challenges of maximum likelihood (ML) estimation in MPMs/XMPMs, we adopted the method of inference function for margins, a two-part estimation method that first estimates marginal parameters marginally via (marginal) ML estimation, and then estimates joint parameters (i.e., normal correlations) jointly via profile ML estimation based on the full joint likelihood function, with marginal parameters evaluated at their marginal estimates. The method is especially appropriate for copula models, in general, and MPMs/XMPMs, in particular, because marginal distributions are specified completely independently of their dependence structure in copula models. For joint estimation of the normal correlations, we adopted a parameter expanded EM (PX-EM) algorithm to simplify E-step calculations – all done numerically exactly using freely available R packages – and to make possible a closed-form M-step update, allowing us to avoid the complications associated with having to estimate a correlation matrix. We used the standard theory of inference functions to obtain the (joint) asymptotic Gaussian distribution of the resulting maximum pseudo-likelihood estimates (MPLEs). Results of Monte Carlo simulations confirmed the consistency and asymptotic unbiasedness of MPLEs, with SEs that generally reflected the estimates’ true sampling variability. Finally, we generalized the ECME algorithm to multiple-outcomes setting to implement ML estimation for the joint Gaussian LMMs with atypically large numbers of random effects. Monte Carlo simulations show that the resulting estimates are consistent, with comparable efficiencies with those obtained by pairwise methods. We further illustrate our methodology with longitudinal survey data on physical activity collected by ActivPALTM (www.paltech. plus.com).Item Open Access Novel stabilized models to characterize gene-gene interactions by utilizing transcriptome data(2022-09-28) Kossinna, Thalagala Kossinnage Pathum Subhashana; Long, Quan; Zhang, Qingrun; Arnold, Paul Daniel; De Leon, AlexanderMachine learning models employed in genetics often grapple with issues related to the "curse of dimensionality". Furthermore, due to the inherent noisy nature of most -omics data, most methods suffer from the problem of "stability": i.e., even slight perturbations of the original data may result in wholly different outcomes. This becomes particularly true when dealing with interactions as the number of potential interactions are usually astronomical. In this thesis, we present two novel methods: 1) Stabilized COre gene and Pathway Election (SCOPE) and 2) Interaction Bridged Association Study (IBAS) that uses two differing approaches in analyzing biological interactions. SCOPE employs a stabilized form of the LASSO that is better able to handle highly correlated expression data and a co-expression network analysis that identifies "core" genes that may be of interest as well as the underlying biological pathways or mechanisms by which they interact. Stabilizing these results across six cancers of The Cancer Genome Atlas uncovered hallmark cancer pathways as well as a novel potential therapeutic target of kidney cancer, CD63. IBAS utilizes a "data-bridge" composed of dimensionality reduced pathway level interactions of the transcriptome to identify genes associated with a phenotype of interest using the Sequence Kernel Association Test (SKAT), in a disentangled form of the Transcriptome Wide Association Study. Application to the Wellcome Trust Case Control Consortium reveals novel gene candidates with literature reviews highlighting their potential for further study. In conclusion, we have developed two novel methodologies in analyzing complex interaction patterns in -omics data using stabilized machine learning methods, paving the way to further understand the biological interactions underlying complex disease.