Predicting the Evolutionary and Medical Significance of Human Genetic Variations with Machine Learning

dc.contributor.advisorDe Koning, A. P. Jason
dc.contributor.authorSaha Mandal, Arnab
dc.contributor.committeememberBernier, François P.
dc.contributor.committeememberWasmuth, James D.
dc.contributor.committeememberRodrigue, Nicolas
dc.date2019-06
dc.date.accessioned2019-05-06T17:54:59Z
dc.date.available2019-05-06T17:54:59Z
dc.date.issued2019-04-30
dc.description.abstractThe advent of inexpensive and high-throughput genome sequencing technologies has facilitated the acquisition of patient exome and genome sequences at a vast scale. One of the primary challenges of such data is its functional interpretation, and specifically, the ability to distinguish functionally important, deleterious, and pathogenic variants from neutral or benign variants (“variant impact prediction” or VIP). Over the last two decades, many approaches have been proposed for VIP, which utilize data from patterns of evolutionary conservation, population genomics, protein structures and other sources to inform machine learning classification algorithms. However, existing approaches are fraught with limitations, especially when they are trained on databases of putatively pathogenic variants that may have been identified with reference to existing prediction methods (a type of ‘circularity’). This dissertation identifies shortcomings of existing variant impact prediction methods and discusses how they can be better understood (Chapter 1). Approaches to overcome these shortcomings are presented (Chapter 2), and a new method, TAIGA (Transformation and Integration of Genomic Annotations), is developed. The utility of this method and its accompanying refinements are evaluated (Chapter 3) and later scrutinized (Chapter 4). As part of this work, I have produced TAIGA scores for all protein coding positions of the human genome, and I show these have substantially superior performance in distinguishing known pathogenic variations from neutral variations in a number of high-quality datasets. Variant prediction scores from TAIGA are later integrated with clinical information from human phenotypes (Chapter 5) and this extension demonstrated the highest sensitivity and smallest candidate gene search space over a large set of rare genetic disorders. It is my hope that TAIGA will aide clinicians and researchers alike in the new era of personalized genomic medicine in which we find ourselves.en_US
dc.identifier.citationSaha Mandal, A. (2019). Predicting the Evolutionary and Medical Significance of Human Genetic Variations with Machine Learning (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.en_US
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/36479
dc.identifier.urihttp://hdl.handle.net/1880/110303
dc.language.isoengen_US
dc.publisher.facultyCumming School of Medicineen_US
dc.publisher.institutionUniversity of Calgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.en_US
dc.subjectbioinformatics, genomics, machine learning, genomic variants, classification, pathogenic, benign, rare disease, geneticsen_US
dc.subject.classificationBioinformaticsen_US
dc.subject.classificationGeneticsen_US
dc.subject.classificationArtificial Intelligenceen_US
dc.subject.classificationComputer Scienceen_US
dc.titlePredicting the Evolutionary and Medical Significance of Human Genetic Variations with Machine Learningen_US
dc.typedoctoral thesisen_US
thesis.degree.disciplineMedicine – Biochemistry and Molecular Biologyen_US
thesis.degree.grantorUniversity of Calgaryen_US
thesis.degree.nameDoctor of Philosophy (PhD)en_US
ucalgary.item.requestcopytrue
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2019_saha-mandal_arnab.pdf
Size:
13.27 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.74 KB
Format:
Item-specific license agreed upon to submission
Description: