Cluster Aanlysis of Gene Expression Profiles via Flexible Count Models for RNA-seq Data
atmire.migration.oldid | 3301 | |
dc.contributor.advisor | de Leon, Alexander | |
dc.contributor.author | Ruan, Ji | |
dc.date.accessioned | 2015-06-10T15:21:17Z | |
dc.date.available | 2015-11-20T08:00:30Z | |
dc.date.issued | 2015-06-10 | |
dc.date.submitted | 2015 | en |
dc.description.abstract | Clustering RNA-seq data is used to characterize environment-induced (e.g., treatment) differences in gene expression profiles by separating genes into clusters based on their expression patterns. Wang et al. [2013] recently adopted the bi-Poisson distribution, obtained via the trivariate reduction method, as a model for clustering bivariate RNA-seq data. We discuss the inadequacy of the bi-Poisson distribution in modelling the correlation between dependent Poisson counts, and its impact on clustering such data. We introduce an alternative Gaussian copula model that incorporates a flexible dependence structure for the counts, report simulation results to compare the performance of the Gaussian copula and bi-Poisson models, and investigate the impact on clustering of Poisson counts of misspecified dependence structures. We illustrate our methodology on a lung cancer RNA-seq data. | en_US |
dc.identifier.citation | Ruan, J. (2015). Cluster Aanlysis of Gene Expression Profiles via Flexible Count Models for RNA-seq Data (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/25338 | en_US |
dc.identifier.doi | http://dx.doi.org/10.11575/PRISM/25338 | |
dc.identifier.uri | http://hdl.handle.net/11023/2293 | |
dc.language.iso | eng | |
dc.publisher.faculty | Graduate Studies | |
dc.publisher.institution | University of Calgary | en |
dc.publisher.place | Calgary | en |
dc.rights | University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. | |
dc.subject | Statistics | |
dc.subject.classification | RNA-seq Data | en_US |
dc.subject.classification | Clustering | en_US |
dc.subject.classification | EM algorithm | en_US |
dc.title | Cluster Aanlysis of Gene Expression Profiles via Flexible Count Models for RNA-seq Data | |
dc.type | master thesis | |
thesis.degree.discipline | Mathematics and Statistics | |
thesis.degree.grantor | University of Calgary | |
thesis.degree.name | Master of Science (MSc) | |
ucalgary.item.requestcopy | true |