Browsing by Author "Kopciuk, Karen"
Now showing 1 - 19 of 19
Results Per Page
Sort Options
Item Open Access Binary and Ordinal Outcomes: Considerations for the Generalized Linear Model with the Log Link and with the Identity Link(2017) Singh, Gurbakhshash; Fick, Gordon; Kopciuk, Karen; Sajobi, Tolulope; Lu, Xuewen; Horrocks, JulieThere are gaps in the current literature on Generalized Linear Models (GLM) for binary outcomes with the log link. This dissertation explores a number of these gaps and presents specific results: (1) Uniqueness considerations for the Maximum Likelihood Estimate (MLE) are established from the conditions for the strict concavity of the log-likelihood. The full column rank of certain subsets of the covariate matrix is shown to be a condition for the strict concavity of the loglikelihood. (2) Conditions are established for the finiteness of components of the MLE. A method is proposed to address the possibility of non-finite components for the MLE, and it is based on determining directions of recession of the log-likelihood. In addition, it is established when the MLE will be in the interior of the parameter space and when the MLE will possibly be on a boundary of the parameter space. (3) Examples are presented of closed form expressions for the MLE. For a number of models with indicator variables and measured variables, closed form expressions for the MLE are presented. (4) There are considerations for the construction of confidence intervals when the MLE is close to a boundary of the parameter space. A new metric, called the “fraction within the parameter space”, is introduced for assessing intervals for MLEs close to a boundary. A simulation study is provided that offers support for Bootstrap Percentile Intervals having larger fractions when compared to Relative Likelihood Intervals and Normal Confidence Intervals. This dissertation continues by developing a proportional probability model using the log link for ordinal outcomes. For this model, similar results are presented for topics (1) and (3) above. In addition, there is the introduction of a score test to assess proportionality. The dissertation concludes with a discussion of future work. In particular, this discussion includes some preliminary work with the identity link GLM for binary and ordinal outcomes. Throughout this dissertation, there are many practical considerations and illustrations presented. The use of the log link and the identity link for binary and ordinal outcomes should now become a viable modeling option for researchers.Item Open Access Exploring the impact of polygenes on genetic inheritance model identification, with application to Familial Colorectal Cancer Type X (FCCTX)(2017) Scory, Tayler; Kopciuk, Karen; Lu, Xuewen; de Leon, Alexander; Long, QuanAlthough a genetic inheritance pattern has not yet been identified, there seems to be a hereditary component for some types of cancer. The focus of this thesis is on identifying the factors that enable correct identification of genetic inheritance models. Exploring this topic involved complex segregation analysis on real FCCTX cancer registry data, then on simulated data (based on the real data characteristics) to determine what caused the model to be identified. If a strong polygenic effect is present, finding evidence for the correct genetic model is more likely. However, the correct model was identified roughly 50% of the time, so more factors should be explored. If the genetic inheritance pattern of a disease is identified, this would facilitate identifying the gene mutation in question, especially with rapidly advancing genomic technology. This work can be applied to other cancers, and can encourage exploration of non-Mendelian genetic inheritance.Item Open Access Gender Disparities in NSCLC: A Systematic Review(2018-01-26) Alsaadoun, Noor Asaad; Bebb, Gwyn; Hollenberg, Morley; Hao, Diseree; Riabowol, Karl; Kopciuk, KarenLung cancer is the second most common malignancy in both men and women. Non-small cell lung cancer (NSCLC) represents 80-85% of cases while the remaining 15 – 20% are small cell lung cancer. Lung cancer incidence in men has steadily decreased since the mid-1980s while it has increased in women. The sex differences in smoking behavior in the last two decades partly account for this incidence pattern. Interestingly, epidemiological evidence suggests that sex alone impacts most facets of lung cancer including the incidence, susceptibility, severity, and molecular basis of the disease; however, there is lack of consensus on the etiology of the gender-based differences as well its magnitude. Therefore, we conducted an evidence-based research of the literature to identify and describe sex-associated characteristics among non-small cell lung cancer patients. We identified all potentially relevant articles published in English by searching Medline between 1996 and 2016, worldwide. Using a systematic review protocol, all abstracts were reviewed for eligibility, and relevant studies meeting inclusion criteria were retained. We included all studies on NSCLC and its main subtypes for both men and women of age over 45. Pooled data was analyzed using a semi-parametric longitudinal regression model and an ANOVA two-way test. A data-visualization tool was used to demonstrate NSCLC incidence distribution and its sex-based disparities around the world. Our data found sex-based disparities in NSCLC incidence rates, and a possible increase in female’s risk of acquiring this disease. In addition, data reveal that both race and sex have a significant effect on NSCLC incidence rates and these trends changed with time. Our findings also illustrate that global trends are not always reflective of regional ones. Results also confirms that adenocarcinoma in women is the most commonly diagnosed histology regardless of their race; however, data indicate that Asians are the dominant race to express adenocarcinomas in their lungs. The objective of this systematic literature review is to more precisely describe this gender disparity among NSCLC patients worldwide and summarize current opinions about the molecular basis for these observations. Our findings serve as a basis to begin to resolve the inherent controversies in the research, and highlight the importance of the inclusion of sex as a risk modifier in the development of screening initiatives and therapies in NSCLC.Item Open Access Geographically Dependent Individual-level Models for Infectious Disease Transmission(2022-06) Mahsin, MD; Deardon, Rob; Brown, Patrick; Kopciuk, Karen; Shen, Hua; Brown, GrantInfectious disease models can be of great use for understanding the underlying mechanisms that influence the spread of diseases and predicting future disease progression. Modeling has been increasingly used to evaluate the potential impact of different control measures and to guide public health policy decisions. In recent years, there has been rapid progress in developing spatio-temporal modeling of infectious diseases and an example of such recent developments is the discrete time individual-level models (ILMs). These models are well developed and provide a common framework for modeling many disease systems, however, they assume the probability of disease transmission between two individuals depends only on their spatial separation and not on their spatial locations. In cases where spatial location itself is important for understanding the spread of emerging infectious diseases and identifying their causes, it would be beneficial to incorporate the effect of spatial location in the model. In this study, we thus generalize the ILMs to a new class of geographically-dependent ILMs (GD-ILMs), to allow for the evaluation of the effect of spatially varying risk factors (e.g., education, social deprivation, environmental), as well as unobserved spatial structure, upon the transmission of infectious disease. Specifically, we consider a conditional autoregressive (CAR) model to capture the effects of unobserved spatially structured latent covariates or measurement error. This results in flexible infectious disease models that can be used for formulating etiological hypotheses and identifying geographical regions of unusually high risk to formulate preventive action. The reliability of these models are investigated on a combination of simulated epidemic data and Alberta seasonal influenza outbreak data (2009). This new class of models is fitted to data within a Bayesian statistical framework using Markov chain Monte Carlo (MCMC) methods. We also developed the continuous-time GD-ILMs, allowing infection times and infectious periods to be treated as latent variables that are estimated using data-augmented Markov Chain Monte Carlo (MCMC) techniques within a Bayesian framework. This approach results in a flexible infectious disease modeling framework for formulating etiological hypotheses and identifying unusually high-risk geographical regions to develop preventive action. We evaluate the performance of these proposed models on a combination of simulated epidemic data and seasonal influenza data in Alberta in 2009. Finally, we proposed a special case of the GD-ILMs, termed as {\it small-area restricted} GD-ILMs for infectious disease modelling. The reliability of these models are investigated through simulation studies based on disease spread through the Canadian city of Calgary, Alberta.Item Open Access Group Selection in Semiparametric and Nonparametric Accelerated Failure Time Models(2017) Huang, Longlong; Lu, Xuewen; Kopciuk, Karen; Deardon, Rob; Sajobi, Tolulope; Yan, Ying; Hu, JoanIn survival analysis, a number of regression models can be used to estimate the effects of covariates on the censored survival outcome. When covariates can be naturally grouped, group selection is important in these models. Motivated by the group bridge approach for variable selection in a multiple linear regression model, we consider group selection in a semiparametric accelerated failure time (AFT) model using Stute's weighted least squares and a group bridge penalty. This method is able to simultaneously carry out feature selection at both the group and within-group individual variable levels and enjoys the powerful oracle group selection property. Although the group bridge penalized approach can effectively remove unimportant groups, it cannot effectively remove unimportant variables within the important groups. To overcome this limitation, the adaptive group bridge method is proposed. We show that the adaptive group bridge method obtains the oracle property. Simulation studies indicate that the group bridge and adaptive group bridge approaches for the AFT model can correctly identify important groups and variables even with high censoring rates. A real data analysis is provided to illustrate the application of the proposed methods. We further study a nonparametric accelerated failure time additive regression (NP-AFT-AR) model whose covariates have nonparametric effects on the survival time. The proposed model is more flexible than the linear model and can be fitted to high-dimensional censored data when some components are unknown non-linear functions. B-splines are used to approximate the nonparametric components. A group bridge penalized variable selection approach based on the inverse probability-of-censoring weighted least squares is developed to select nonparametric components. The proposed approach is able to distinguish the nonzero components from the zero components and estimate the nonzero components simultaneously. Computational algorithms and theoretical properties of the proposed method are established. Simulation studies indicate that the proposed method has satisfactory performance even with relatively high censoring rates. Two real data analyses are used to illustrate the application of the proposed method to survival data analysis.Item Open Access Jackknife empirical likelihood for smoothed weighted rank regression with censored data(2012-07-24) Huang, Longlong; Lu, Xuewen; Kopciuk, KarenRank regression is a highly-efficient and robust approach to estimate regression coefficients and to make inference in the presence of outlying survival times. Heller (2007) developed a smoothed weighted rank regression function, which is used to estimate the regression parameter vector in an accelerated failure time model with right censored data. This function can be expressed as a U-statistic. However, since inference is based on a normal approximation approach, it could perform poorly when sample sizes are small and censoring rates are high. To increase inference accuracy and robustness, we propose a jackknife empirical likelihood method for the U-statistic obtained from the estimating function of Heller. The jackknife empirical likelihood ratio is shown to be a standard Chi-squared statistic. Simulations were conducted to compare the proposed method with the normal approximation method. As expected, the new method gives better coverage probability for small samples with high censoring rates. The Stanford Heart Transplant Data, Veterans Administration Lung Cancer Data and Multiple Myeloma Data sets are used to illustrate the proposed method.Item Open Access Metabolomic Biomarkers of Response to Systemic Therapy in Colorectal Cancer(2021-07-07) Rattner, Jodi Ilana; Bathe, Oliver; Vogel, Hans; Kopciuk, Karen; Tang, PatriciaColorectal Cancer (CRC) is the 2nd leading cause of cancer death in North America. Besides surgery and radiation, chemotherapy administration is one of the mainstay treatment options used to improve CRC prognosis, and ultimately patient outcomes. Chemotherapies are selected empirically by oncologists, and only a fraction of these patients will experience the benefit of these therapies. Response is assessed through radiographic imaging (CT/MRI scans). However, a myriad of challenges, including time-delays and obstacles in measuring response (such as in molecularly targeted agents), means a new method of assessment is required to prevent unnecessary administration of unbeneficial therapies or undue accumulation of severe toxicities. This work describes the identification and development of metabolomic biomarkers that distinguishes the biological changes associated with the development of cancer progression, hypertension, and fatigue. Metabolomic profiling, a method in which the measurement of biological systems by describing the changes in molecular components constituting the metabolic state, was used. The metabolome is known to change rapidly within pathophysiological contexts and was chosen for its close relationship to phenotype, and its capacity to detect subtle changes in metabolite concentrations. Chapter 2 describes the protocol methodology using gas chromatography-mass spectrometry (GC-MS) and multivariate statistical analysis used in the development of plasma biomarkers. In Chapter 3, a large study using serial plasma samples from 220 CRC patients from an international clinical trial by GC-MS, which resulted in the development of a metabolomic biomarker distinguishing progression from partial response within one week of chemotherapy administration. In Chapter 4, using 70 CRC patients treated with cetuximab and brivanib, we aimed to establish a signature identifying differences in metabolomic changes signifying the development of hypertension within 12 weeks of treatment initiation. Chapter 5 was dedicated to the exploration of metabolomic changes accompanying the complexities associated with severe fatigue in 72 CRC patients. While the concepts presented in this work are not validated, the novel approach and careful considerations taken provide a proof of concept that have the possibility of substantiating and improving upon current clinical methods used. Therefore, this thesis is focused on the understanding of metabolomic perturbations of systemic therapy in CRC and the adaption of this knowledge for the development of signatures of progression for the use of clinically viable biomarkers for therapeutic assessment.Item Open Access Mixture Model Analysis with Misclassified Covariates: Methods and Applications(2024-09-20) Zhang, Ruixuan; Shen, Hua; Kopciuk, Karen; Liu, Juxin; Lu, XuewenMixture models are crucial for analyzing data with underlying sub-populations. Misclassification introduces discrepancies between observations and true values, which can severely bias parameter estimation, especially for mixture models when subgroups are not easily identifiable. We propose a method to enhance parameter estimation within the framework of mixture models, and mitigate the impact of misclassified covariates by utilizing them as surrogates in the Expectation-Maximization algorithm. Simulations consider both non-differential and differential misclassification with varying sample sizes, sensitivities, specificities, subgroup proportions and misclassified covariate proportions. Results demonstrate robust performance compared to naive or ad hoc approaches ignoring the misclassification issue, even under challenging conditions, such as low sensitivity and specificity for the misclassified covariate, or small sample sizes. For illustration, we apply our method to the 2015 Behavioral Risk Factor Surveillance System data. We conclude with a discussion of the implications of our findings and directions for future research.Item Open Access Performance of variable selection methods using stability-based selection(2017-04-04) Lu, Danny; Weljie, Aalim; de Leon, Alexander R; McConnell, Yarrow; Bathe, Oliver F; Kopciuk, KarenAbstract Background Variable selection is frequently carried out during the analysis of many types of high-dimensional data, including those in metabolomics. This study compared the predictive performance of four variable selection methods using stability-based selection, a new secondary selection method that is implemented in the R package BioMark. Two of these methods were evaluated using the more well-known false discovery rate (FDR) as well. Results Simulation studies varied factors relevant to biological data studies, with results based on the median values of 200 partial area under the receiver operating characteristic curve. There was no single top performing method across all factor settings, but the student t test based on stability selection or with FDR adjustment and the variable importance in projection (VIP) scores from partial least squares regression models obtained using a stability-based approach tended to perform well in most settings. Similar results were found with a real spiked-in metabolomics dataset. Group sample size, group effect size, number of significant variables and correlation structure were the most important factors whereas the percentage of significant variables was the least important. Conclusions Researchers can improve prediction scores for their study data by choosing VIP scores based on stability variable selection over the other approaches when the number of variables is small to modest and by increasing the number of samples even moderately. When the number of variables is high and there is block correlation amongst the significant variables (i.e., true biomarkers), the FDR-adjusted student t test performed best. The R package BioMark is an easy-to-use open-source program for variable selection that had excellent performance characteristics for the purposes of this study.Item Open Access Plasma hPG80 (Circulating Progastrin) as a Novel Prognostic Biomarker for early-stage breast cancer in a breast cancer cohort(2023-04-04) Prieur, Alexandre; Harper, Andrew; Khan, Momtafin; Vire, Bérengère; Joubert, Dominique; Payen, Léa; Kopciuk, KarenAbstract Background Recurrence and metastases are still frequent outcomes after initial tumour control in women diagnosed with breast cancer. Although therapies are selected based on tumour characteristics measured at baseline, prognostic biomarkers can identify those at risk of poor outcomes. Circulating progastrin or hPG80 was found to be associated with survival outcomes in renal and hepatocellular carcinomas and was a plausible prognostic biomarker for breast cancer. Methods Women with incident breast cancers from Calgary, Alberta, Canada enrolled in the Breast to Bone (B2B) study between 2010 to 2016 and provided blood samples prior to any treatment initiation. Plasma from these baseline samples were analysed for circulating progastrin or hPG80. Participant characteristics as well as tumour ones were evaluated for their association with hPG80 and survival outcomes (time to recurrence, recurrence – free survival, breast cancer specific survival and overall survival) in Cox proportional hazards regression models. Results The 464 participants with measurable hPG80 in this study had an average age of 57.03 years (standard deviation of 11.17 years) and were predominantly diagnosed with Stage I (52.2%) and Stage II (40.1%) disease. A total of 50 recurrences and 50 deaths were recorded as of June 2022. In Cox PH regression models adjusted for chemotherapy, radiation therapy, cancer stage and age at diagnosis, log hPG80 (pmol/L) significantly increased the risks for recurrence (Hazard Ratio (HR) = 1.330, 95% Confidence Interval (CI) = (0.995 – 1.777, p = 0.054)), recurrence-free survival (HR = 1.399, 95% CI = (1.106 – 1.770), p = 0.005) and overall survival (HR = 1.385, 95% CI = (1.046 – 1.834), = 0.023) but not for breast cancer specific survival (HR = 1.015, 95% CI = (0.684 – 1.505), p = 0.942). Conclusions hPG80 levels measured at diagnosis were significantly associated with the risk of recurrence or death from any cause in women with breast cancer. Since the recurrence rates of breast cancer are still relatively high amongst women diagnosed at an early stage, identifying women at high risk of recurrence at their time of diagnosis is important. hPG80 is a promising new prognostic biomarker that could improve the identification of women at higher risk of poor outcomes.Item Open Access Quality of Life After Prostate Cancer Diagnosis: A Longitudinal Prospective Cohort Study in Alberta, Canada(2016) Farris, Megan; Friedenreich, Christine Marthe; Courneya, Kerry; Kopciuk, Karen; McGregor, S. ElizabethOBJECTIVES: First, we examined the associations of post-diagnosis physical activity and change in pre-diagnosis physical activity on quality of life (QoL) in prostate cancer survivors. Then, we identified post-prostate cancer diagnosis QoL trajectories over time in the population. METHODS: 830 prostate cancer survivors were derived from a prior case-control study where information at diagnosis was collected, then survivors were re-consented into a follow-up study. Three repeated measurements of physical activity and QoL were undertaken post-diagnosis. RESULTS: We observed improvements in physical QoL in prostate cancer survivors who maintained or adopted higher levels of physical activity pre- and post-diagnosis, according to the cancer prevention physical activity guidelines compared to those who were non-exercisers. In the trajectory analysis, three physical and three mental trajectory groups were identified. CONCLUSION: With additional research, these established trajectory groups may help healthcare professionals in improving treatment and follow-up for this population of prostate cancer survivors.Item Open Access Sample size estimation methods for metabolomic data(2010) Wang, Yuan; Kopciuk, KarenItem Open Access Semi-Parametric Spatial Individual-level Disease Transmission Models(2024-07-29) Rahul, Chinmoy Roy; Deardon, Rob; Tekougang, Thierry Chekouo; Kopciuk, Karen; Shen, Hua; Feng, CindyOver recent years, there has been a noticeable increase in research activity on spatio-temporal statistical models to describe infectious disease dynamics. Individual-level models (ILMs), fitted in a Bayesian MCMC framework, can be used to understand the underlying mechanisms responsible for the spread of the infectious diseases, taking into account population heterogeneity via various individual-level covariates. There has also been a noticeable rise in the use of models that incorporate behavioral change dynamics. In either case, in modeling infectious disease spread parametric models are frequently employed, often depending on strong underlying assumptions regarding disease transmission mechanisms within the population. However, selecting appropriate parametric assumptions can be challenging in real-world scenarios, and incorrect assumptions may lead to erroneous conclusions. As an alternative, non-parametric approaches offer greater flexibility and robustness against strong assumptions. The aim of this study is to explore the use of semi-parametric spatial infectious disease transmission models in a Bayesian MCMC framework. This approach will help us to estimate the relationships between explanatory variables and the risk of infection with much more flexible assumptions compared to parametric approaches. To achieve our goal, we begin with considering ILMs that incorporate piecewise constant (step), or piecewise linear spatial functions, which may also have estimated change points. We also investigate the utilization of piecewise constant kernel spatial models for infectious disease transmission that integrate an ``alarm function" to account for population behavioral change (BC) resulting from increased infection prevalence over time. All models are fitted within a Bayesian MCMC framework. In this thesis, we explore results derived from both simulated and real-life epidemics, showing the greater flexibility of constant piecewise functions with and without BC effects, as well as piecewise linear spatial functions with both fixed and estimated change points. We also demonstrate the selection of the number of change points using the Deviance Information Criteria (DIC).Item Open Access Serum metabolomic profile as a means to distinguish stage of colorectal cancer(BioMed Central, 2012-05-14) Bathe, Oliver F.; Farshidfar, Farshad; Weljie, Aalim M.; Kopciuk, Karen; Buie, W Don; MacLean, Anthony; Dixon, Elijah; Sutherland, Francis R; Molckovsky, Andrea; Vogel, Hans JItem Open Access Simulated Metabolomics Biomarkers R Code(2016) Kopciuk, Karen; Lu, DannyItem Open Access Spiked-in Data Set for BMC Notes paper(2016) Kopciuk, Karen; Bathe, Oliver; McConnell, Yarrow; Welje, Aalim; de Leon, AlexanderItem Open Access Steps involved in designing and creating the spiked-in data set(2016) Kopciuk, Karen; McConnell, Yarrow; Bathe, Oliver; Weljie, AalimItem Embargo Targeted Serum Metabolomics for Noninvasive Detection of Colorectal Neoplasia(2023-08) Fitzgerald, Liam Warren; Bathe, Oliver; Orton, Dennis; Kopciuk, Karen; Vogel, HansBackground: Early detection of colorectal cancer (CRC) and its precursor lesions improves CRC-related mortality and reduces disease incidence. However, due to limitations of currently available screening modalities, patient adherence to screening is low. A reliable and inexpensive blood test represents a promising solution to this problem. Our group has previously demonstrated that there are distinct metabolic perturbations in CRC and adenomatous polyps which are measurable in the blood. However, a targeted metabolomic approach is needed to uncover reliable meta-biomarkers for CRC, adenomatous polyps, and serrated polyps. Methods: A targeted metabolomic assay (Biocrates MxP® Quant 500) was used to analyze our discovery set of sera from patients with CRC of all stage (N=111), adenoma (N=63), sessile serrated adenoma (SSA; N=62), and age- and sex-matched disease-free controls (DFC; N=154). Orthogonal partial least squares discriminant analysis (OPLS-DA) and machine learning (ML) were used to derive signatures comprised of metabolite concentration ratios which discriminate between CRC, adenoma, SSA, and DFCs. A large, representative validation cohort (N=838) was also analyzed to test our meta-biomarkers. Results: Two signatures for CRC were derived from OPLS-DA and ML and comprised of 21 and 16 metabolite ratios, respectively. OPLS-DA signatures for adenoma and SSA were comprised of 23 and 56 metabolite ratios, respectively. All models performed well based on 7-fold internal cross-validation, with receiver operating characteristic (ROC) analysis demonstrating areas under the curve (AUC) greater than 0.85. Importantly, external validation confirmed the reliability of our meta-biomarkers, with sensitivities as high as 92-100% for CRC and 80-85% for adenoma and SSA being possible. Conclusion: Our meta-biomarkers discovered using targeted metabolomics demonstrated excellent performance based on internal and external validation. Of significance, our signatures exhibit superior sensitivity compared to other readily available screening modalities such as stool tests.Item Open Access Treatment at Disease-progression in EGFR-mutated NSCLC Patients: Results from a single Canadian Institution(2016) Tudor, Roxana; Bebb, Gwyn; Kopciuk, Karen; Brenner, Darren; Tremblay, Alain; MacEachern, PaulOptimal treatment beyond disease-progression (PD) in non-small cell lung cancer (NSCLC) patients harboring activating epidermal growth factor receptor (EGFR) mutations, treated with tyrosine kinase inhibitors (TKIs), is not well-defined. In this retrospective study, the following aims were set out: 1) compare outcomes and profile of EGFRmut+ NSCLC patients to large cohorts of lung-cancer patients from the Glans-Look lung cancer database -GLD; 2) examine the frequency of continuing TKI treatment beyond PD in EGFRmut+ patients; 3) examine overall survival (OS) and post-progression survival (PPS) according to clinicopathological characteristics and; 4) propose a new PD-scoring model to help guide subsequent treatment formulation. Compared to the GLD-NSCLC cohort without systemic chemotherapy, EGFRmut+ patients were more likely to be younger, female and Asian. Further, continuing TKI treatment beyond PD was associated with improved OS and PPS vs. discontinuation of TKI. A non-independent relationship between EGFRmutation type and smoking history was identified.