Bayesian Variable Selection Model with Semicontinuous Response
dc.contributor.advisor | Chekouo, Thierry | |
dc.contributor.advisor | Sajobi, Tolulope | |
dc.contributor.author | Babatunde, Samuel | |
dc.contributor.committeemember | Zhang, Qingrun | |
dc.contributor.committeemember | Deardon, Robert | |
dc.contributor.committeemember | Bezdek, Karoly | |
dc.date | 2022-01 | |
dc.date.accessioned | 2022-01-18T16:02:06Z | |
dc.date.available | 2022-01-18T16:02:06Z | |
dc.date.issued | 2022-01-14 | |
dc.description.abstract | We propose a novel Bayesian variable selection approach that identifies a set of features associated with a semicontinuous response. We used a two-part model where one of the models is a logit model that estimates the probability of zero responses while the other model is a log-normal model that estimates responses greater than zero (positive values). Stochastic Search Variable Selection (SSVS) procedure is used to randomly sample the indicator variables for variable selection which in turn searches the space of feature subsets and identifies the most promising features in the model. For the logistic model, a data augmentation approach is used to sample from the posterior density. We impose a spike-and-slab prior for the regression effects where the unselected covariates take on a prior mass at zero while the selected covariates follow a normal distribution (including the intercept and clinical covariates). Since the joint posterior density had no closed form, we employed the techniques of the Markov Chain Monte Carlo (MCMC) to sample from the posterior distribution. Simulation studies are used to assess the performance of the proposed method. We computed the average area under the receiver operating characteristic curve (AUC) to assess variable selection and compared it with competing methods. We also assessed the convergence diagnosis of our MCMC algorithm by computing the potential scale reduction factor and correlations between the marginal posterior probabilities. We finally apply our method to the coronary artery disease (CAD) data where the aim is to select important genes associated with the CAD index. This data consists of clinical covariates and gene expressions. | en_US |
dc.identifier.doi | http://dx.doi.org/10.11575/PRISM/39519 | |
dc.identifier.uri | http://hdl.handle.net/1880/114304 | |
dc.language.iso | eng | en_US |
dc.publisher.faculty | Science | en_US |
dc.publisher.institution | University of Calgary | en |
dc.rights | University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. | en_US |
dc.subject | Bayesian variable selection | en_US |
dc.subject | coronary artery disease | en_US |
dc.subject | Markov Chain Monte Carlo | en_US |
dc.subject | Stochastic Search Variable Selection | en_US |
dc.subject.classification | Biostatistics | en_US |
dc.subject.classification | Statistics | en_US |
dc.title | Bayesian Variable Selection Model with Semicontinuous Response | en_US |
dc.type | master thesis | en_US |
thesis.degree.discipline | Mathematics & Statistics | en_US |
thesis.degree.grantor | University of Calgary | en_US |
thesis.degree.name | Master of Science (MSc) | en_US |
ucalgary.item.requestcopy | true | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- ucalgary_2022_babatunde_samuel.pdf
- Size:
- 1.54 MB
- Format:
- Adobe Portable Document Format
- Description:
- Main article
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 2.62 KB
- Format:
- Item-specific license agreed upon to submission
- Description: