Browsing by Author "Kopciuk, Karen Arlene"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Open Access Analysis of Metabolomics Data via Mixed Models(2020-08) Ren, Austin Mu Qing; de Leon, Alexander R.; Kopciuk, Karen Arlene; Vogel, Hans J.; Sajobi, Tolulope T.Generalized linear mixed models have been widely studied and used in many different disciplines, yet very little application of them can be found with metabolomics data analysis. Traditional methods of cancer classification used to determine disease severity, such as biopsies, can be harmful to the health of the patients. Classification based on metabolomics data analysis demonstrates a main advantage as it only requires non-invasive procedures such as the drawing of a small amount of blood from patients. However, data analysis in cancer research often requires the handling of multiple correlated measurements of disease severity. The methods that are most commonly used with metabolomics data, such as partial least squares discriminant analysis, were traditionally designed to handle univariate data only, and can be very challenging to work with when applied to data with multiple correlated outcomes. Therefore, different methods should be considered for metabolomics data analysis in cancer classification. In this thesis, we proposed bivariate generalized linear mixed models with binary outcomes using the probit link function for the analysis of metabolomics data. The models were specifically designed to handle multiple correlated outcomes via the inclusion of subject-specific random intercepts. Random slopes were not included in the models to reduce complexity. We specifically designed three settings for the random intercept models: shared, independent, and correlated between the outcomes. An extensive number of simulations were carried out to test our models' parameters, including: standard deviation and correlation of the distribution of the random intercepts, correlation between the covariates as well as correlation between the covariates and the outcomes, the proportion of data missing among the covariates, misspecified distribution of the random intercepts, and misspecified conditional correlation between the outcomes. In addition, we also incorporated the nearest neighbors algorithm as a missing values imputation method and LASSO as a feature selection method to our mixed models in order to handle the common issues of high dimensional covariates and missing values in metabolomics data. Finally, our proposed mixed models were applied to a real dataset with prostate cancer patients to evaluate our models' performance on outcome predictions.Item Open Access Bias and Bias-Correction for Individual-Level Models of Infectious Disease(2020-01-30) Jafari, Behnaz; Deardon, Rob; Chekouo, Thierry T.; Kopciuk, Karen ArleneAccurate infectious disease models can help scientists understand how an ongoing disease epidemic spreads and help forecast the course of epidemics more effectively (e.g. O'Neill, 2010; Jewell et al., 2009; Deardon et al., 2010). The main purpose of infectious disease modeling is to capture the main risk factors that affect the spread of a disease and make a prediction based on these factors. In real life, we do not generally have homogeneous and homogeneously mixing populations and various factors affect the spread of a disease (e.g. geographical, social, domestic, and employment networks, genetics factors). Using individual-level-models (ILMs) (Deardon et al., 2010) can help researchers to incorporate population heterogeneity. In these models inferences are made within a Bayesian Markov chain Monte Carlo (MCMC) framework (e.g. Gamerman and Lopes, 2006), obtaining posterior estimates of model parameters. However, parameter estimation and bias of estimates go hand in hand. The issue of bias of parameter estimates, and methods for bias correction, have been widely studied in the context of many of the most established and commonly used statistical models, and associated methods of parameter estimation. However, these methods are not directly applicable to individual-level infections disease data. The focus of this thesis is to investigate circumstances in which ILM parameter estimates may be biased in some simple disease system scenarios. Further, we aim to find bias-corrected estimates of ILM parameters using simulation and compare them with the posterior estimates of the model parameter. We also discuss the factors that affect performance of these estimators.Item Open Access Cluster analysis of correlated non-Gaussian continuous data via finite mixtures of Gaussian copula distributions(2019-06-12) Burak, Katherine L.; De Leon, Alexander R.; Wu, Jingjing; Kopciuk, Karen Arlene; Lu, XuewenModel-based cluster analysis in non-Gaussian settings is not straightforward due to a lack of standard models for non-Gaussian data. In this thesis, we adopt the class of Gaussian copula distributions (GCDs) to develop a flexible model-based clustering methodology that can accommodate a variety of correlated, non-Gaussian continuous data, where variables may have different marginal distributions and come from different parametric families. Unlike conventional model-based approaches that rely on the assumption of conditional independence, GCDs model conditional dependence among the disparate variables using the matrix of so-called normal correlations. We outline a hybrid approach to cluster analysis that combines the method of inference functions for margins (IFM) and the parameter-expanded EM (PX-EM) algorithm. We then report simulation results to investigate the performance of our methodology. Finally, we highlight the applications of this research by applying this methodology to a dataset regarding the purchases made by clients of a wholesale distributor.Item Open Access Effect of Mammography Screening on Incidence and Mortality of Breast Cancer in Alberta(2020-07-13) Efegoma, Yvonne Chuka; Dickinson, James A.; Kopciuk, Karen Arlene; Shack, Lorraine G.Background: Breast cancer is the second leading cause of cancer death among Canadian women, and to decrease this burden mammography screening is widespread. If effective, mammography screening should reduce the incidence of late-stage cancer by early detection, allow time for prompt treatment and result in lower mortality. Given Alberta’s universal health system, with organised screening reaching around 63% of the target population annually, we set out to determine how much screening mammography has decreased presentation of late-stage cancer, and potentially reduced mortality from breast cancer, among Alberta women. Methods: We conducted a historical birth-cohort study and trend analysis using data from the Alberta Cancer registry from 1982 to 2017. We compared stage specific incidence and mortality over the years and by birth cohorts, taking into consideration the introduction and evolution of screening mammography to measure how much effect screening has on observed trends. We used Joinpoint regression analysis to test statistically significance of observed trends. Results: From 2006 to 2017, incidence of early-stage breast cancers among women aged 50 to 79 years increased by 33 per 100,000 women at an average rate of 1.2% annually (p<0.001), while incidence of late-stage cancer decreased by 3 per 100,000 women at a rate of 0.8 annually (p=0.3). From 2001 to 2018, deaths from breast cancer reduced by 29 per 100,000 women at 2.3% annually (p<0.001), while all-cause mortality reduced by 9 per 100,000 at 0.5% annually (p=0.1) in women previously diagnosed with breast cancer. Each subsequent recent birth cohort had higher rates of early breast cancer at specific ages while the incidence of late-stage cancers reduced with recent cohorts at specific ages. Conclusion: There has been some reduction in the incidence of late-stage breast cancer and breast cancer deaths between 2006 and 2018. This has been associated with an excess increase in early-stage cancers, which may be explained by overdiagnosis. These may be related to changes in screening mammography in that period. Women need to be educated on the effectiveness of screening mammography in order to make informed decisions about their screening practicesItem Open Access Treatment Effect Models for Subgroup Analysis with Missing Data(2018-08-31) Fu, Yunting; Shen, Hua; Kopciuk, Karen Arlene; De Leon, Alexander R.; Sajobi, Tolulope T.The need for subgroup analysis in clinical trials in various contexts is increasing and data-driven approaches for subgroup identification based on statistical principles are desired. Among all subgroup identification methods, we focus on the treatment effect models that estimate the treatment contrast, since these models are intuitive and useful to interpretation. We evaluate and address the consequences of having missing data when using the Interaction Trees (IT), Qualitative Interaction Trees (QUINT) and Subgroup Identification based on Differential Effect Search (SIDES) methods. Simulation studies are used to demonstrate the accuracy of variable selection and bias in treatment effects when using complete, incomplete and imputed data across various scenarios when the sample size, proportion of missingness and imputation methods differ. We also applied these methods to a non-small cell lung cancer (NSCLC) dataset obtained from a retrospective study. Our results indicate that both IT and QUINT methods work equivalently well in most situations, while the SIDES results are, in general, less comparable due to the different mechanisms of the methods. The treatment effect models should be chosen based on the objective of the study, the sample size, the number of variables containing missing data, and the data structure. In terms of the methods for addressing missing data, an assumption of the data structure needs to be made during the method selection. MissForest is an excellent choice for a dataset with a tree-based structure, while MI methods would be a good fit for the other situations.