Browsing by Author "Shen, Hua"
Now showing 1 - 20 of 29
Results Per Page
Sort Options
Item Open Access Analysis of Misclassified Categorical Response via Incomplete Surrogate Variables and Likelihood Method(2020-12-20) Yu, Zheng; Shen, Hua; Shen, Hua; De Leon, Alexander R.; Kopciuk, Karen A.Misclassification of a dependent categorical variable often occurs in observational studies due to imperfect measuring procedures, and it may result in potential threats to the validity of the analytic results. We first investigate the consequences of naively ignoring the misclassification issue in response variable on parameter estimation using a range of naive methods and ad hoc methods. Then we develop a robust algorithm utilizing the surrogate variables to enable the estimation of the covariate effects in regression models under the framework of latent variable models in the absence of validation data. The resulting estimates are utilized in prediction and estimation of the average treatment effect (ATE). The estimation methods of ATE examined include outcome regression, G-computation, propensity score (PS) stratification, inverse probability weighting (IPW) and augmented inverse probability weighting (AIPW). Variance estimation of ATE is obtained through bootstrap method. Moreover, we extend the algorithm to cope with the complication that some of the surrogate measurements are missing. Simulation studies represent of various scenarios are conducted to assess the performances of the proposed methods with a binary latent response variable. Based on the simulation studies, we show that the proposed method outperforms other approaches and corrects for both problems of misclassification and missingness simultaneously for a binary response variable, ensuring valid statistical inferences. An application to the stimulating study on breast cancer is given for illustration. Discussion and future work are outlined in the end.Item Open Access Bi-level Variable Selection and Dimension-reduction Methods in Complex Lifetime Data Analytics(2019-12) Cai, Kaida; Lu, Xuewen; Shen, Hua; Lu, Xuewen; Shen, Hua; Tekougang, Thierry Chekouo; Deardon, Rob; Long, Quan; Jin, ZhezhenFor the high-dimensional data, the number of covariates can be large and diverge with the sample size. In many scientific applications, such as biological studies, the predictors or covariates are naturally grouped. In this thesis, we consider bi-level variable selection and dimension-reduction methods in complex lifetime data analytics under various survival models, and study their theoretical properties and finite sample performance under different scenarios. Specifically, in Chapter 2, we focus on the Andersen-Gill regression model for the analysis of recurrent event data with group covariates when the number of covariates is fixed. In order to study the effects of the covariates on the occurrence of recurrent events, a bi-level penalized group selection method is introduced to address the group selection problem. A general group-bridge penalty function with varying weights is invoked to achieve the goal. It is shown that the performance of the bi-level selection depends on the weights. In order to select covariates more efficiently, especially for identifying the important covariates in important groups, adaptive weights are required. The asymptotic oracle properties of the proposed method are investigated in the case of fixed number of covariates. Three methods of tuning parameter selection are proposed. Our simulation studies show that the proposed method performs well in selecting important groups and important individual covariates in these groups simultaneously, and outperforms other popular group selection methods and the traditional unpenalized Wald testing method. In Chapter 3, we extend the proposed method of recurrent event model to the case of a diverging number of covariates. We demonstrate that the proposed method has selection consistency and the penalized estimators have asymptotic normality in the case of diverging a number of covariates. Simulation studies show that the proposed method performs well and the results are consistent with the theoretical properties. We illustrate the method using a real life data set from medicine. In Chapter 4, by imitating the group variable selection procedure with bi-level penalty, we propose a new variable selection method for the analysis of multivariate failure time data, with an adaptive bi-level variable selection penalty function. In the regression setting, we treat the coefficients corresponding to the same prediction variable as a natural group, then consider variable selection at the group level and individual level simultaneously. The proposed adaptive bi-level variable selection method can select a prediction variable in two different levels: the first level is the group level, where the predictor is important to all failure types; the second level is the individual level, where the predictor is only important to some failure types. An algorithm based on cycle coordinate descent (CCD) is proposed to carry out the proposed method. Based on the simulation results, our method outperforms the classical penalty methods, especially in terms of removing unimportant variables for all different failure types. We obtain the asymptotic oracle properties of the proposed variable selection method in the case of diverging number of covariates. We construct a generalized cross validation (GCV) method for the tuning parameter selection and assess model performance based on model errors. We also illustrate the proposed method using a real life data set. Sufficient dimension reduction (SDR) is a powerful tool for dimension reduction in regression and classification problems, which replaces the original covariates with the minimal set of their linear combinations. In Chapter 5, we propose a novel penalty function, called adaptive group composite Lasso (AGCL), for the group sparse sufficient dimension reduction problem. By incorporating this new penalty with the sufficient dimension reduction method, we propose an adaptive group composite Lasso penalized dimension reduction method to simultaneously achieve sufficient dimension reduction and group variable selection in the case of diverging number of covariates. We investigate the asymptotic properties of the penalized sufficient dimension reduction estimators when the number of covariates diverges with the number of sample size. We show that the proposed method can select important groups and individual variables simultaneously. We compare the proposed method with other sparse sufficient dimension reduction methods using simulation studies. The results show that the proposed method outperforms the other methods in terms of removing unimportant covariates, especially in removing the unimportant groups. A real data example is used for illustration.Item Open Access Bi-level Variable Selection in Semiparametric Transformation Models for Right Censored Data and Cure Rate Data(2019-01-25) Zhong, Wenyan; Wu, Jingjing; Lu, Xuewen; Chen, Gemai; De Leon, Alexander R.; Shen, Hua; Kong, LinglongIn this dissertation, I investigated the bi-level variable selection in the semi-parametric transformation models with right-censored data and the semi-parametric mixture cure models with right censored and cure rate data, respectively. The transformation models under the consideration include the proportional hazards model and the proportional odds model as special cases. In the framework of regularized regression, we proposed a computationally efficient estimation method that selects significant groups and variables simultaneously. Three penalty functions, i.e., Group bridge, adaptive group bridge and composite group bridge penalties which can integrate grouping structure of covariates, were adopted for bi-level variable selection purpose. In Chapter 2, the objective function, which consists of the negative weighted partial log-likelihood function plus one of the three penalties, has a parametric form and is convex with respect to the parameters. This leads to an easy implementation of the optimization algorithm for which convergence is guaranteed numerically. We showed that all the three proposed penalized estimators achieve the group selection consistency, and moreover, the adaptive group bridge estimator and the composite group bridge estimator enjoy the oracle properties, i.e., both estimators possess the group and individual selection consistency simultaneously and are asymptotically normal as if the true unimportant covariates were known. In Chapter 3, we further extended the bi-level variable selection procedure to the semi-parametric mixture cure models. The semi-parametric mixture cure models are formulated by a logistic regression for modelling the cure fraction and a class of semi-parametric transformation models for modelling the survival function of remaining uncured individuals. Incorporating a cure fraction, the proposed model is more flexible than the standard survival models, and the proposed approach is capable to distinguish important covariates and groups from unimportant ones and estimate covariates’ effects simultaneously in both the incidence and the latency parts. We proposed a new iterative E-M algorithm to handle two latent variables. We illustrated the finite sample performance of the proposed methods via simulations and two real data examples. Simulation studies indicated that the proposed methods perform well even with relatively high dimension of covariates.Item Open Access Causal Inference with Misclassified Confounder and Missing Data in the Surrogates(2019-07-05) Fan, Zheng; Shen, Hua; Li, Haocheng; Wallace, Michael; de Leon, Alexander R.The causal inference pertains to statistical analyses that researchers evaluate causal effect based on precisely measured data. In an observational study interest often lies in estimating the causal effects which are more naturally interfered by potential confounding factors. However, some confounding variables may be measured with error or classified into an incorrect group or category. It could occur due to the difficulty of tracking a long-term average quantity, unavoidable recall bias in answering a questionnaire, unwillingness of answering sensitive questions, unaffordability of precise measurements, etc. We first investigate the consequences of naively ignoring the misclassification issue in confounding variables on the estimation of average treatment effect (ATE). We then develop an EM algorithm through the latent variable model for parameter estimation and subsequent removal of the estimation bias of ATE in the absence of validation data set. Moreover, we adapt the proposed method to address the additional complication when some surrogates are only partially observed. Variance estimation of ATE is obtained through bootstrap method. Simulation studies are reported to assess the performances of the proposed methods with both continuous and discrete outcome variables. The estimation methods we examined include outcome regression, G-computation, propensity score (PS) matching, PS stratification, inverse probability weighting (IPW) and augmented inverse probability weighting (AIPW). Lastly, we analyze a breast cancer data to illustrate the proposed methods. Discussion and future work are outlined in the end.Item Open Access Causal Inference with Mismeasured Confounders or Mediators(2021-09-23) Ren, Mingchen; De Leon, Alexander R.; Yan, Ying; Tekougang, Thierry Chekouo; Shen, Hua; Kopciuk, Karen A.; He, WenqingThis thesis includes three projects to correct measurement error in covariates or mediators when estimating causal estimands under survival model, marginal structure model and covariate balancing models. In Chapter 2, we decompose the causal effect on difference scale with more than one mediator under additive hazard model, and correct the bias caused by error-prone covariates and mediators. The simulation study shows the good performance of the proposed method under various measurement error settings. The method is further applied to a real data study of HIV-infected adults (Hammer et al., 1996), where a causal interpretation of the mediated effects is given. The asymptotic distributions of estimators are provided in the appendix. In Chapter 3, we develop two estimation methods to correct the bias of average treatment effect via marginal structural model when covariate variables are subject to measurement error. We consider the scenario that the confounders and exposures are time-varying and the confounders are error-prone. The first approach depends on a logistic-based correction method, which corrects the error-prone confounders in the logistic regression model of the treatment variable (Stefanski & Carroll, 1987). The second one relies on the simulation-extrapolation-based correction method (Shu & Yi, 2019d), which corrects the error-prone average treatment effect directly and could be used when a closed form of weight can not be found. Simulation studies are provided and the proposed approaches are illustrated by a real data analysis of the Women’s Interagency HIV Study in the United States from 1993 to 2015. In Chapter 4, when pretreatment covariates are subject to measurement error, we apply the augmented simulation extrapolation estimation developed by Shu and Yi (2019d) to correct the estimates of average treatment effect on the treated via entropy balancing and covariate balancing propensity score methods. The correction method is illustrated by a real data set.Item Open Access Causal Inference with Missingness in Confounders(2019-08-15) Bagmar, Md. Shaddam Hossain; Shen, Hua; Wu, Jingjing; Lu, XuewenCausal inference is the process of uncovering causal connection between the effect variable and disease outcome in epidemiologic research. Confounders that influence both the effect variable and outcome need to be accounted for when obtaining the causal effect in observational studies. In addition, missing data often arise in the data collection procedure, working with complete cases often results in biased estimates. We consider the estimation of causal effect in the presence of missingness in the confounders under the missing at random assumption. We investigate how different estimators namely regression, G-estimation, propensity score-based estimators including matching, stratification, weighting, propensity regression and finally doubly robust estimator, perform when applying complete-case analysis or multiple imputation. Due to the uncertainty of imputation model and computational challenge for large number of imputations, we propose an expectation-maximization (EM) algorithm to estimate the expected values of the missing confounder and utilize weighting approach in the estimation of average treatment effect. Simulation studies are conducted to see whether there is any gain in estimation efficiency under the proposed method than complete case analysis and multiple imputation. The analysis identified EM method as most efficient and accurate method for dealing missingness in confounder except for propensity score matching and inverse weighting estimators. In these two estimators, multiple imputation is found as efficient, however EM is efficient for inverse weighting when the outcome is binary. Real life data application is shown for estimating the effect of adjuvant radiation treatment on patient's survival status after 10 years of breast cancer diagnosis. Under missing completely at random (MCAR) mechanism, EM is found as the most accurate method for handling missingness in confounder than multiple imputation.Item Open Access Causal Inference With Non-probability Sample and Misclassified Covariate(2022-09) Sevinc, Emir; Shen, Hua; Lu, Xuewen; Deardon, Robert; Shen, Hua; Badescu, AlexandruCausal inference refers to the study of analyzing data that is explicitly defined on a question of causality. The problems motivating many, if not most studies in social and biological sciences, tend to be causative and not associative. A well defined and systematically representative sample tends to be the base in such studies. However, sometimes a sample may result from a non-probability process. This often provides a unique challenge in estimating the probability of an individual being in the sample, and generalizing the causality conclusions made off of the non-probability samples to the target population. Additionally, due to issues such as difficulty of precise measurements and human error, certain variables may be classified incorrectly. In this thesis, we address both challenges by implementing causal inferential methods in a case where we have a main non-probability sample with response available, and a probability sample with auxiliary information only. We deal with the presence of incorrectly classified confounder in the non-probability sample only, or both samples. We examine the consequences of naively ignoring misclassification, and develop a latent-variable based method via an Expectation-Maximization algorithm to correct for the misclassified confounder. We incorporate this method with a double-robust mean estimator requiring only the correct specification of either the regression model or the non-probability sample selection model to estimate the average treatment effect. We demonstrate the effectiveness of our methodology via simulation studies, and implement it on smoking data from the Centre of Disease Control and Prevention (CDC).Item Open Access Competing Risk Analysis with Misclassified Covariates(2020-09-25) Li, Ruoyu; Shen, Hua; Lu, Xuewen; Deardon, RobertMisclassification in categorical variables and missing data can often occur concurrently in medical research. Though there has been extensive research on either topic, relatively little work is available to address both issues simultaneously, especially in survival analysis. In this thesis, we first propose a method for the competing risk analysis involving a latent categorical covariate where validation data is absent and the latent variable of interest is only measured subject to misclassification via a set of surrogate variables. We then extend it to a more general setting where the latent covariate is not measured by the same number of surrogate variables for all subjects.For example, the decision to be measured by additional surrogate variable depends on the available faulty measurements of the latent variable by preceding surrogate variables resulting in a sequential missing pattern among the surrogates. In both cases we apply direct approach in the analysis of competing risks focusing on the cumulative incidence functions of the event of interest and its competing events and adopt flexible parametric forms for the baseline cumulative incidence functions. We develop likelihood-based methods based on expectation-maximization algorithms and jointly model the competing risks, surrogate variables and latent covariate of interest. The procedures simultaneously allow estimation of the covariate effects on the event of interest, parameters in the baseline cumulative incidence functions, regression coefficients in the misclassification model and association between the latent covariate and other completely and precisely observed covariates. We evaluate the empirical performance of the proposed methods in simulation studies. We conclude that they outperform the naive and ad hoc approaches in both cases and are relatively robust to sample size, misclassification rate and missing proportion of the surrogate variables. Finally, we apply the proposed method to the stimulating study on breast cancer. Discussion and future work are outlined in the end.Item Open Access Covariate Balancing Using Statistical Learning Methods in the Presence of Missingness in Confounders(2019-09-20) Mason, Levi James; Shen, Hua; Chekouo, Thierry T.; Deardon, RobIn observational studies researchers do not have control over treatment assignment. A consequence of such studies is that an imbalance in observed covariates between the treatment and control groups possibly exists. This imbalance can arise due to the fact that treatment assignment is frequently influenced by observed covariates (Austin, 2011a). As a result, directly comparing the outcomes between these two groups could lead to a biased estimation of the treatment effect (d’Agostino, 1998). The propensity score, defined as the probability of treatment assignment conditional on observed covariates, can be used in matching, stratification, and weighting to balance the observed covariates between the treatment and control groups in order to more accurately estimate the treatment effect (Rosenbaum and Rubin, 1983). This study looked at using statistical learning techniques to estimate the propensity score. The techniques included in this study were: logistic regression, classification and regression trees, pruned classification and regression trees, bagging classification and regression trees, boosted classification and regression trees, and random forests. These estimated propensity scores were then used in linearized propensity score matching, stratification, and inverse probability of treatment weighting using stabilized weights to estimate the treatment effect. Comparisons among these methods were made in a simulation study setting. Both a binary and continuous outcome were analyzed. In addition, a simulation was performed to assess the use of multiple imputation using predictive mean matching when a confounder had data missing at random. Based on the results from the simulation studies it was demonstrated that the most accurate treatment effect estimates came from inverse probability of treatment weighting using stabilized weights where the propensity scores were estimated by logistic regression, random forests, or bagging classification and regression trees. These results were then applied in a retrospective cohort data set with a missing confounder to determine the treatment effect of adjuvant radiation on breast cancer individuals.Item Open Access Data Subset-Based Methods of Inference for Spatial Individual Level Epidemic Models(2023-08) Nyein, Thet Htet Chan; Deardon, Rob; Shen, Hua; Kopciuk, Karen A.Mathematical models are essential to understand infectious disease dynamics, enabling to control the spread of those diseases and preparing for public health measures. Since time and space are important factors affecting the transmission of infectious diseases, spatial individual-level models (ILM) with both temporal and spatial information are developed. Typically, Markov Chain Monte Carlo (MCMC) methods are utilized for the inference of ILM. Nonetheless, this approach can be computationally intensive for complex or large models, resulting in repeated likelihood calculations. This thesis explores various spatial and temporal subset methods to conduct statistical inference for spatial epidemic models, aiming to provide appropriate parameter estimates with minimum computational resources. In this thesis, we utilize the spatial ILM with the Euclidean distance between susceptible individuals and infectious individuals as a kernel function.Item Open Access Efficient Estimation of Partly Linear Transformation Model with Interval-censored Competing Risks Data(2019-09-19) Wang, Yan; Lu, Xuewen; Shen, Hua; Chekouo, Thierry T.We consider the class of semiparametric generalized odds rate transformation models to estimate the cause-specific cumulative incidence function, which is an important quantity under competing risks framework, and assess the contribution of covariates with interval-censored competing risks data. The model is able to handle both linear and non-linear components. The baseline cumulative incidence functions and non-linear components of different competing risks are approximated with B-spline basis functions or Bernstein polynomials, and the estimated parameters are obtained by employing the sieve maximum likelihood estimation. We designed two examples in the simulation studies and the simulation results show that the method performs well. We used the proposed method to analyze the HIV data obtained from patients in a large cohort study in sub-Saharan Africa.Item Open Access Efficient Estimation of the Additive Hazards Model with Bivariate Current Status Data(2020-08-14) Zhang, Ce; Lu, Xuewen; Chekouo, Thierry T.; Shen, HuaIn this thesis, we present sieve maximum likelihood estimators of the both finite and infinite dimensional parameters in the marginal additive hazards model with bivariate current status data, where the joint distribution of the bivariate survival times is modeled by a copula. We assume the two baseline hazard functions and the copula are unknown functions, and use constrained Bernstein polynomials to approximate these functions. Compared with the existing methods for estimation of the copula models for bivariate survival data, the proposed new method has two main advantages. First, our method does not need to specify the form of the copula model and is more flexible. Second, the proposed estimators have strong consistency, optimal rate of convergence and the regression parameter estimator is asymptotically normal and semi-parametrically efficient. Simulation studies reveal that the proposed estimators have good finite-sample properties. Finally, a real data application is provided for illustration.Item Open Access Efficient Estimation of the Varying-Coefficient Partially Linear Proportional Hazards Model with Current Status Data(2016) Dong, Yuan; Lu, Xuewen; Singh, Radhey; Shen, Hua; Chen, GuanminWe consider a semiparametric varying-coefficient proportional hazards model with current status data. This model enables one to assess possibly linear and nonlinear effect of certain covariates on the hazard rate. B-splines are applied to approximate both the unknown baseline hazard function and the varying-coefficient functions. To improve the performance of the model, ridge penalty is added to the log-likelihood to penalize the roughness of the cumulative hazard function. Efficient sieve maximum likelihood estimation method is used for estimation. Simulation studies with the weighted bootstrap method are conducted to examine the finite-sample properties of the proposed estimators. We also present an analysis of renal function recovery data for illustration.Item Open Access Efficient Estimation of the Varying-Coefficient Partially Linear Proportional Odds Models with Current Status Data(2016) Lu, Shanshan; Lu, Xuewen; Wu, Jingjing; Chen, Gemai; Shen, HuaWe consider a varying-coefficient partially linear proportional odds model with current status data. This model enables one to examine the extent to which some covariates interact nonlinearly with an exposure variable, while other covariates present linear effects. B-spline approach and sieve maximum likelihood estimation method are used to get an integrated estimate for the linear coefficients, the varying-coefficient functions and the baseline function. The proposed parameter estimators are proved to be consistent and asymptotically normal, and the estimators for the nonparametric functions achieve the optimal rate of convergence. Simulation studies and a real data analysis are used for assessment and illustration.Item Open Access Estimation and Group Selection in Partially Linear Survival Models(2018-01-17) Afzal, Arfan; Lu, Xuewen; Ambagaspitiya, Rohana; Shen, Hua; Deardon, Rob; Zhao, YichunIn survival analysis, different regression models are available to estimate the effects of covariates on the censored survival outcome. The proportional hazards (PH) model has been the most popular model among them because of its simplicity and desirable theoretical properties. However, the PH model assumes that the hazard ratio is constant over observed time. When this assumption is not met or we are interested in the risk difference, the additive hazards (AH) model is a useful alternative. On the other hand, assuming linear structure of covariate effects on survival in these models may be too strict. As a remedy to that, partially linear survival models are getting increasingly popular as it combines the flexibility of nonparametric modeling with the parsimony and easy interpretability of parametric modeling. Nonetheless, building these models becomes a challenging problem when predictors or covariates are high-dimensional and grouped. Consequently, it becomes crucial to select important groups and important individual variables within groups by the so called bi-level variable selection method to reduce the dimension of the data and build a sensible and useful semiparametric model for applications as the methods for individual variable selection in such cases may perform inefficiently by ignoring the information present in the grouping structure. To fill gaps in estimation and group selection in partially linear survival models with high-dimensional data, in this thesis, we propose new methods for estimation and group selection in two partially linear survival models, namely, the partially linear AH model and the partially linear PH model. In the first part of this thesis, we consider estimation in a partially linear AH model with left-truncated and right-censored data when the dimension of covariates is fixed and the risk function has a partially linear structure. We construct a pseudo-score function to estimate the coefficients of the linear covariates and the B-spline basis functions. The proposed estimators are asymptotically normal under the assumption that the true nonlinear functions are B-spline functions whose knot locations and the number of knots are held fixed. In the second and third parts, we study group variable selection in the partially linear AH model and the partially linear PH model with right censored data. In such regression models with a grouping structure among the explanatory variables, variable selection at the group and within group individual variable level is important to improve model accuracy and interpretability. Motivated by the hierarchical grouped variable selection in the linear PH model and the linear AH model, we propose a hierarchical bi-level variable selection approach for high-dimensional covariates in the linear part of the partially linear AH model and the partially linear PH model, respectively. The proposed methods are capable of conducting simultaneous group selection and individual variable selection within groups in the presence of nonparametric risk functions of low-dimensional covariates. For group selection in the partially linear AH model, the rates of convergence and selection consistency of the proposed estimators are established using martingale and empirical process theory; after reducing the dimension of the covariates, we suggest the use of the method in the first part for inference in the partially linear AH model. For group selection in the partially linear PH model, similar theoretical results of the proposed estimators are obtained, and the oracle properties such as asymptotic normality of the estimators are discussed. Finally, computational algorithms and programs are developed for utilizing the proposed methods. Simulation studies indicate good finite sample performance of the methods. For each model, real data examples are provided to illustrate the application of the methods.Item Open Access Frequentist, Bayesian and Resampling Estimation of Extremes Based on the Generalized Extreme Value Distribution(2024-09-04) Xue, Yutong; Chen, Gemai; Shen, Hua; Lu, Xuewen; Zhang, QingrunExtreme events occur in science, engineering, finance and many related fields. The generalized extreme value (GEV) distribution is often used to model extreme events. In this thesis, we study the estimation of GEV related parameters and events using three different approaches. The maximum likelihood approach is a frequentist approach, which has a fully developed theory for both estimation and inference subject to the existence of maximum likelihood estimators and expected and/or observed information matrix. The Bayesian approach starts with the likelihood function, chooses appropriate prior distributions for the GEV distribution parameters, and works with the posterior distribution of the parameters for estimation and inference. The resampling approach may or may not use the likelihood function to estimate the GEV parameters, and inference is based on the variations generated from resampling the observed data directly or indirectly and repeating the estimation procedure. All three approaches are well known in the literature, the main contribution of this thesis is, to the best of our knowledge, that the three approaches are studied and compared under the same setup for the first time, and based on extensive comparisons and the criteria used we are able to recommend the parametric resampling approach based on the empirical distribution function (EDF) estimation, with percentile confidence intervals to practitioners to use. The use of the maximum likelihood, Bayesian, and resampling approaches is illustrated through a case study.Item Open Access Geographically Dependent Individual-level Models for Infectious Disease Transmission(2022-06) Mahsin, MD; Deardon, Rob; Brown, Patrick; Kopciuk, Karen; Shen, Hua; Brown, GrantInfectious disease models can be of great use for understanding the underlying mechanisms that influence the spread of diseases and predicting future disease progression. Modeling has been increasingly used to evaluate the potential impact of different control measures and to guide public health policy decisions. In recent years, there has been rapid progress in developing spatio-temporal modeling of infectious diseases and an example of such recent developments is the discrete time individual-level models (ILMs). These models are well developed and provide a common framework for modeling many disease systems, however, they assume the probability of disease transmission between two individuals depends only on their spatial separation and not on their spatial locations. In cases where spatial location itself is important for understanding the spread of emerging infectious diseases and identifying their causes, it would be beneficial to incorporate the effect of spatial location in the model. In this study, we thus generalize the ILMs to a new class of geographically-dependent ILMs (GD-ILMs), to allow for the evaluation of the effect of spatially varying risk factors (e.g., education, social deprivation, environmental), as well as unobserved spatial structure, upon the transmission of infectious disease. Specifically, we consider a conditional autoregressive (CAR) model to capture the effects of unobserved spatially structured latent covariates or measurement error. This results in flexible infectious disease models that can be used for formulating etiological hypotheses and identifying geographical regions of unusually high risk to formulate preventive action. The reliability of these models are investigated on a combination of simulated epidemic data and Alberta seasonal influenza outbreak data (2009). This new class of models is fitted to data within a Bayesian statistical framework using Markov chain Monte Carlo (MCMC) methods. We also developed the continuous-time GD-ILMs, allowing infection times and infectious periods to be treated as latent variables that are estimated using data-augmented Markov Chain Monte Carlo (MCMC) techniques within a Bayesian framework. This approach results in a flexible infectious disease modeling framework for formulating etiological hypotheses and identifying unusually high-risk geographical regions to develop preventive action. We evaluate the performance of these proposed models on a combination of simulated epidemic data and seasonal influenza data in Alberta in 2009. Finally, we proposed a special case of the GD-ILMs, termed as {\it small-area restricted} GD-ILMs for infectious disease modelling. The reliability of these models are investigated through simulation studies based on disease spread through the Canadian city of Calgary, Alberta.Item Open Access Methicillin-Resistant Staphylococcus aureus Carriage among Students at a Historically Black University: A Case Study(2013-01-20) Shen, Hua; Akoda, Eyitayo; Zhang, KunyanBackground. Black people in the USA is afflicted with a higher rate of methicillin-resistant Staphylococcus aureus (MRSA) infection. This study determined the prevalence of MRSA carriage among black college students at a university setting. Methods. Hand and nasal swabs were collected and screened for MRSA by mannitol fermentation, coagulase, and DNase activities and their resistance to oxacillin. MRSA isolates were analyzed for antimicrobial resistance pattern, genetic profile for staphylococcal cassette chromosome mec (SCCmec) type, pulsed-field type, multilocus sequence type (ST), and the presence of Panton-Valentine leukocidin (PVL) gene. Results. MRSA was isolated from 1 of the 312 (0.3%) hand swabs and 2 of the 310 (0.65%) nasal swabs, respectively. All isolates lack multidrug resistance and have type IV SCCmec, characteristic of community-associated MRSA. These isolates were a ST8-MRSA-IVa-PVL(+) (USA300 strain), a ST8-MRSA-IVb-PVL(−), and a new MLST, ST2562-MRSA-IV-PVL(−), identified in this study. These isolates were thus not transmitted among students. Conclusion. We found a low rate of MRSA carriage among students in a black university. Our finding highlights the need of future study which involves multiinstitutions and other ethnic group to assess the association of black race with MRSA carriage.Item Open Access Minimum Hellinger Distance Estimation of AFT Models with Right-Censored Data(2024-05-06) Huang, Yifu; Wu, Jingjing; Lu, Xuewen; Wu, Jingjing; Lu, Xuewen; Wang, Haixu; Shen, HuaAccelerated Failure Time (AFT) models are popular models used in survival analysis. AFT models are also an important alternative to the Cox Proportional Hazards (PH) models as they have better interpretability and link the survival time (usually on the log scale) directly to the covariates. The unknown coefficient parameters in AFT models are often estimated by the maximum likelihood estimator (MLE). However, the performance of MLE would be severely affected by the presence of outliers. In this thesis, we proposed two estimators to estimate the parametric AFT models based on minimum Hellinger distance estimation (MHDE). A simulation study and a real data analysis were conducted to examine the performance of the proposed estimators under various scenarios for censoring rate and presence of outliers. Our numerical results demonstrated the excellent robustness of the proposed estimators which also retain good efficiency for many cases.Item Open Access Minimum Hellinger Distance Estimation of ARCH/GARCH Models(2018-05-11) Chen, Liang; Wu, Jingjing; Lu, Xuewen; Shen, HuaIn this thesis, we proposed a minimum Hellinger distance estimator (MHDE) and a minimum profile Hellinger distance estimator (MPHDE) for estimating the parameters in the ARCH and GARCH models depending on whether the innovation distribution is specified or not. The asymptotic properties of MHDE and MPHDE were examined through graphs as the theoretical investigation of them are more involved and needs further study in the future research. Moreover, we demonstrated the finite-sample performance of both MHDE and MPHDE through simulation studies and compared them with the well-established methods including maximum likelihood estimation (MLE), Gaussian Quasi-MLE (GQMLE) and Non-Gaussian Quasi-MLE (NGQMLE). Our numerical results showed that MHDE and MPHDE have better performance in terms of bias, MSE and coverage probability (CP) when the data are contaminated, which testified to the robustness of MHD-type estimators.