Browsing by Author "Li, Haocheng"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
Item Open Access Causal Inference with Misclassified Confounder and Missing Data in the Surrogates(2019-07-05) Fan, Zheng; Shen, Hua; Li, Haocheng; Wallace, Michael; de Leon, Alexander R.The causal inference pertains to statistical analyses that researchers evaluate causal effect based on precisely measured data. In an observational study interest often lies in estimating the causal effects which are more naturally interfered by potential confounding factors. However, some confounding variables may be measured with error or classified into an incorrect group or category. It could occur due to the difficulty of tracking a long-term average quantity, unavoidable recall bias in answering a questionnaire, unwillingness of answering sensitive questions, unaffordability of precise measurements, etc. We first investigate the consequences of naively ignoring the misclassification issue in confounding variables on the estimation of average treatment effect (ATE). We then develop an EM algorithm through the latent variable model for parameter estimation and subsequent removal of the estimation bias of ATE in the absence of validation data set. Moreover, we adapt the proposed method to address the additional complication when some surrogates are only partially observed. Variance estimation of ATE is obtained through bootstrap method. Simulation studies are reported to assess the performances of the proposed methods with both continuous and discrete outcome variables. The estimation methods we examined include outcome regression, G-computation, propensity score (PS) matching, PS stratification, inverse probability weighting (IPW) and augmented inverse probability weighting (AIPW). Lastly, we analyze a breast cancer data to illustrate the proposed methods. Discussion and future work are outlined in the end.Item Open Access Correlated Data Analysis via Variants of EM Algorithm: Application to Data on Physical Activity and Maternal Health(2024-09-13) Li, Jia; De Leon, Alexander; Li, Haocheng; Wu, Jingjing; Lu, Xuewen; Chu, Man-Wai; Sheng, XiaomingThe thesis concerns the analysis of correlated data on multiple variables via the EM algorithm and its variants. Specifically, we focus on (cross-sectional) multivariate iid data comprising a disparate mix of binary and non-Gaussian variables (including the special case of multivariate binary data), and on longitudinal data on multiple Gaussian responses in a regression setting. For the case with correlated data on multiple binary variables and that with mixed data on binary and non-Gaussian continuous variables, we introduced the class of meta-probit (MPMs) and extended meta-probit models (XMPMs) as generalizations to non-Gaussian settings of the grouped continuous model (GCM) – also known as the multivariate probit model (MVPM) – and its extension to mixed data, the conditional GCM (CGCM). Con- structed from Gaussian copula distributions (GCDs), a class of meta-Gaussian distributions based on the Gaussian copula, MPMs and XMPMs broaden the sphere of applications of joint models to settings that involve complex non-standard data on variables with different measurement scales and with marginal distributions, latent and otherwise, from different parametric families. To avoid the computational challenges of maximum likelihood (ML) estimation in MPMs/XMPMs, we adopted the method of inference function for margins, a two-part estimation method that first estimates marginal parameters marginally via (marginal) ML estimation, and then estimates joint parameters (i.e., normal correlations) jointly via profile ML estimation based on the full joint likelihood function, with marginal parameters evaluated at their marginal estimates. The method is especially appropriate for copula models, in general, and MPMs/XMPMs, in particular, because marginal distributions are specified completely independently of their dependence structure in copula models. For joint estimation of the normal correlations, we adopted a parameter expanded EM (PX-EM) algorithm to simplify E-step calculations – all done numerically exactly using freely available R packages – and to make possible a closed-form M-step update, allowing us to avoid the complications associated with having to estimate a correlation matrix. We used the standard theory of inference functions to obtain the (joint) asymptotic Gaussian distribution of the resulting maximum pseudo-likelihood estimates (MPLEs). Results of Monte Carlo simulations confirmed the consistency and asymptotic unbiasedness of MPLEs, with SEs that generally reflected the estimates’ true sampling variability. Finally, we generalized the ECME algorithm to multiple-outcomes setting to implement ML estimation for the joint Gaussian LMMs with atypically large numbers of random effects. Monte Carlo simulations show that the resulting estimates are consistent, with comparable efficiencies with those obtained by pairwise methods. We further illustrate our methodology with longitudinal survey data on physical activity collected by ActivPALTM (www.paltech. plus.com).Item Open Access Determinants of Physical Activity in a Cohort of Prostate Cancer Survivors(2018-07-24) Stone, Chelsea Rose; Friedenreich, Christine Marthe; Courneya, Kerry S.; McGregor, S. Elizabeth; Li, HaochengBackground: Physical activity has been shown to improve overall health, improve cancer outcomes and reduce all-cause mortality as well as prostate cancer specific mortality after diagnosis, however a cancer diagnosis can often cause a change in physical activity patterns. Current guidelines recommend 150 minutes per week of physical activity and the determinants of meeting these guidelines before and after a prostate cancer diagnosis are widely unknown. Objectives: Our first objective was to examine the determinants of meeting physical activity guidelines at pre-diagnosis and three time-points post-diagnosis. Our second objective was to examine the determinants of patterns of meeting guidelines from pre-diagnosis to two years post-diagnosis. The final objective was to examine determinants of patterns of long-term physical activity behaviours from pre-diagnosis to an average measure of post-diagnosis physical activity. Methods: This prospective cohort study included 830 prostate cancer patients who participated in a population-based case-control study between November 1997 and December 2000 in Alberta, Canada. Pre-diagnosis activity levels were self-reported at diagnosis and again at three time points post-diagnosis. Demographic, quality of life and lifestyle variables were collected by questionnaires, while medical chart abstractions were performed to capture clinical variables. Results: Active smoking status, poor physical health and rural living location were commonly found to be statistically significantly associated with failing to meet physical activity guidelines in cross-sectional analyses and in analyses examining patterns of behaviour from pre- to post-diagnosis. Conclusions: Demographic, health and lifestyle variables are associated with meeting or not meeting physical activity guidelines from pre-diagnosis to post-diagnosis. Programming should be aimed at offering survivors support to overcome determinants associated with decline in physical activity patterns.Item Open Access Immediate-term cognitive impairment following intravenous (IV) chemotherapy: a prospective pre-post design study(2019-02-14) Khan, Omar F; Cusano, Ellen; Raissouni, Soundouss; Pabia, Mica; Haeseker, Johanna; Bosma, Nicholas; Ko, Jenny J; Li, Haocheng; Kumar, Aalok; Vickers, Michael M; Tang, Patricia AAbstract Background Cognitive impairment is commonly reported in patients receiving chemotherapy, but the acuity of onset is not known. This study utilized the psychomotor vigilance test (PVT) and trail-making test B (TMT-B) to assess cognitive impairment immediately post-chemotherapy. Methods Patients aged 18–80 years receiving first-line intravenous chemotherapy for any stage of breast or colorectal cancer were eligible. Patient symptoms, peripheral neuropathy and Stanford Sleepiness Scale were assessed. A five-minute PVT and TMT-B were completed on a tablet computer pre-chemotherapy and immediately post-chemotherapy. Using a mixed linear regression model, changes in reciprocal transformed PVT reaction time (mean 1/RT) were assessed. A priori, an increase in median PVT reaction times by > 20 ms (approximating PVT changes with blood alcohol concentrations of 0.04–0.05 g%) was considered clinically relevant. Results One hundred forty-two cancer patients (73 breast, 69 colorectal, median age 55.5 years) were tested. Post-chemotherapy, mean 1/RT values were significantly slowed compared to pre-chemotherapy baseline (p = 0.01). This corresponded to a median PVT reaction time slowed by an average of 12.4 ms. Changes in PVT reaction times were not correlated with age, sex, cancer type, treatment setting, or use of supportive medications. Median post-chemotherapy PVT reaction time slowed by an average of 22.5 ms in breast cancer patients and by 1.6 ms in colorectal cancer patients. Post-chemotherapy median PVT times slowed by > 20 ms in 57 patients (40.1%). Exploratory analyses found no statistically significant association between the primary outcome and self-reported anxiety, fatigue or depression. TMT-B completion speed improved significantly post-chemotherapy (p = 0.03), likely due to test-retest phenomenon. Conclusions PVT reaction time slowed significantly immediately post-chemotherapy compared to a pre-chemotherapy baseline, and levels of impairment similar to effects of alcohol consumption in other studies was seen in 40% of patients. Further studies assessing functional impact of cognitive impairment on patients immediately after chemotherapy are warranted.Item Open Access Influence of Inflammation, Insulin Resistance and Excess Body Size on Breast Cancer Risk: A Nested Case-Control Study(2020-02-06) Haig, Tiffany R.; Brenner, Darren R.; Friedenreich, Christine M.; Li, Haocheng; Robson, PaulaBackground: Breast cancer is the most common malignancy affecting women in Canada. In 2019, breast cancer represented 25% of all new cancers among Canadian women and 13% of all cancer deaths. Excess body size is associated with postmenopausal breast cancer risk. The mechanisms associating adiposity to breast cancer are unclear. Both inflammation and insulin resistance have been implicated in this association; however, literature to date has been inconsistent. Here, we aim to examine the associations between high-sensitivity C-reactive protein (hsCRP) and hemoglobin A1c (HbA1c), common measures of inflammation and insulin resistance, respectively, with breast cancer risk, while adjusting for measures of excess body size. Methods: We conducted a nested case-control study within the Alberta’s Tomorrow Project cohort (Alberta, Canada) including 197 invasive breast cancer cases and 394 matched controls. Serum concentrations of hsCRP and HbA1c were measured from blood samples collected prior to diagnosis, along with anthropometric measurements, general health, and lifestyle data. Conditional logistic regression was used to evaluate the associations between hsCRP, HbA1c, and breast cancer risk adjusted for body fat percentage and other risk factors for breast cancer. Results: Participants included in this study were a mean age of 65.1 years and mostly postmenopausal (147 cases and 293 controls). More than half were categorized as overweight/obese (60.5% for cases; 64.9% for controls), and median values of hsCRP (0.9; interquartile range (IQR) = 1.8) and HbA1c (5.6; IQR = 0.6) were similar between cases and controls. Higher concentrations of hsCRP were associated with elevated breast cancer risk (odds ratio [OR] = 1.27; 95% confidence interval [CI] = 1.03, 1.55). The observed associations were unchanged with adjustment for body fat percentage. Higher HbA1c concentrations were not significantly associated with an increased risk of incident breast cancer relative to controls (OR = 1.22; 95% CI = 0.17, 8.75). Conclusion: These data suggest that hsCRP, a marker of inflammation, may be associated with elevated breast cancer risk, independent of body fat percentage. However, elevated concentrations of HbA1c did not appear to increase breast cancer risk in this group of women in Alberta.Item Open Access Minimum Hellinger Distance Estimation for a Two-component Mixture Model(2017) Zhou, Xiaofan; Wu, Jingjing; Liu, Shawn; Li, Haocheng; Wu, JingjingOver the last two decades, semiparametric mixture model receives increasing attention, simply due to the fact that mixture models arise frequently in real life. In this thesis we consider a semiparametric two-component location-shifted mixture model. We propose to use the minimum Hellinger distance estimator (MHDE) to estimate the two location parameters and the mixing proportion. A MHDE is obtained by minimizing the Hellinger distance between an assumed parametric model and a nonparametric estimation of the model. MHDE was proved to have asymptotic effciency and excellent robustness against small deviations from assumed model. To construct the MHDE,we propose to use a bounded linear operator introduced by Bordes et al. (2006) to estimate the unknown nuisance parameter (an unknown function). To facilitate the calculation of the MHDE,we develop an iterative algorithm and propose a novel initial estimation of the parameters of our interest. To assess the performance of the proposed estimations, we carry out simulation studies and a real data analysis and compare the results with those of the minimum profile Hellinger distance estimator (MPHDE) proposed by Wu et al. (2017) for the same model. The results show that the MHDE is very competitive with the MPHDE in terms of bias and mean squared error, while the MHDE is on average about 2.7 times computationally faster than the MPHDE. The simulation studies also demonstrate that the proposed initial is much more robust than the one used in Wu et al. (2017).Item Open Access Minimum Profile Hellinger Distance Estimation for Semiparametric Simple Linear Regression Model(2021-01-06) Li, Jiang; Wu, Jingjing; Li, Haocheng; Wu, Jingjing; Li, Haocheng; Lu, Xuewen; Zhang, QingrunThe simple linear regression model is essential for analyzing the relation between a response variable and a covariate variable, and the importance of simple linear regression model for statistical analysis of data is well documented. This thesis focuses on the semiparametric simple linear regression model where the distribution of the error term is assumed symmetric but otherwise completely unspecified. Under this model, we constructed a robust estimator of the regression coefficient parameters using the minimum Hellinger distance technique. Minimum Hellinger Distance Estimation (MHDE) was first introduced by Beran (1977) for fully parametric models that has been shown to have good efficiency and robustness properties. In the past decade, the MHDE has been extended to semiparametric models. Furthermore, Wu and Karunamuni (2015) introduced the Minimum Profile Hellinger Distance Estimation (MPHDE) for semeparametric models which retains good efficiency and robustness properties of MHDE in parametric models. In this thesis, I constructed an MPHDE for the semiparametric simple linear regression model. We established in theory the consistency of the proposed MPHDE. Finite-sample performance of the proposed estimator was examined via simulation studies and real data applications. Our numerical results showed that the proposed MPHDE has good efficiency and simultaneously is very robust against outlying observations.Item Open Access Minimum Profile Hellinger Distance Estimation for Two-Sample Location Models(2017) Yang, Jian; Wu, Jingjing; Li, Haocheng; Yan, Ying; Sun, Bingrui (Cindy)Minimum Hellinger distance (MHDE) estimation is obtained by minimizing the Hellinger distance between an assumed parametric model and a nonparametric estimation of the model. This estimation receives increasing attention over the past decades due to its asymptotic efficiency and excellent robustness against small deviations from assumed model. Minimum profile Hellinger dis- tance (MPHDE) estimation, proposed by Wu and Karunamuni (2015), is an extension of MHDE particularly for semiparametric models. In this thesis, we investigate two-sample symmetric loca- tion models and propose to use MPHDE to estimate the unknown location parameters. Asymptotic normality and robustness properties of the estimation are discussed and a comparison with LSE and MLE is carried out through Monte Carlo simulation studies. The results show that MPHDE is very competitive with LSE and MLE in terms of efficiency , while it appears to be much more robust than LSE and MLE against outlying observations. We also demonstrate the application of the estimation to a breast cancer data.