Browsing by Author "Gordon, Paul"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Open Access Improved Basecalling and Base Modification Detection Through Signal-level Analysis of Nanopore Direct RNA Data(2023-09-14) Wang, Scott; Long, Quan; Gordon, Paul; Smith, Mike; Anderson, DavidGenome sequencing technologies emerged as an essential tool for addressing challenges presented by the natural biological complexity of organisms. Unlike traditionally used next-generation sequencing (NGS) methods, which yield short reads, Third-generation sequencing (TGS) methods can sequence transcripts and complete genomes in single contiguous sequencing reads, providing innovative means to address practical topics surrounding viral transmission, evolution, and pathogenesis. TGS alleviates the computational challenges of consensus genome assembly or transcript construction from fragmented reads as required with building NGS libraries. Despite these advantages, as an emerging technology, TGS faces many technical challenges. High error rates make it difficult to distinguish machine errors from low frequency mutations in the genome. Some of the most well known and pervasive diseases in society originate from viruses with ribonucleic acid (RNA) genomes; these include but are not limited to Influenza and Coronaviruses. Advancement towards a comprehensive understanding of RNA viruses has been hindered by their unique biology and high levels of diversity, along with quick replication and mutation rates, which leads to important viral evolutionary signals in individual viral copies. Some of the high basecalling error rate in TGS can be attributed to the presence of unmodeled signal, e.g. calling just the four canonical nucleobases (A, C, G, T/U) when methylation along with other nucleobase modifications are also contributing to the signal. Being able to accurately identify (i.e. signal model) the location of such nucleobase modifications would naturally lead to better nucleobase calling and provide insights into RNA virus biology. The few extant tools in this area for TGS are based on deep-learning AI methods due to computational tractability, and are demonstrably biased. In contrast to such opaque methods, in this work, new efficient implementations of theoretically optimal (“dynamic programming”) methods for Oxford Nanopore Technologies (ONT) TGS raw signal segmentation, alignment, clustering, and consensus are deployed. With follow-on statistical analyses of signal deviations within those results, this defines a minimally biased, statistically grounded procedure for detecting unmodeled signal (i.e. putative nucleobase modifications or mutations), as demonstrated using multiple publicly available raw ONT direct RNA sequencing viral datasets.Item Open Access Interspecies data mining to predict novel ING-protein interactions in human(BioMed Central, 2008) Gordon, Paul; Soliman, Mohamed A; Bose, Pinaki; Sensen, Christoph W; Riabowol, Karl T.Item Open Access Investigating Pulmonary Vascular Disease in Patients with Long COVID using Methylation Patterns in Cell-free DNA(2024-06-26) Iqbal, Fatima; Greenway, Steven; Weatherald, Jason; Halloran, Kieran; Fine, Nowell; Gordon, PaulIntroduction: Coronavirus-19 disease (COVID-19) continues to influence the health and quality of life of Canadians to this day, even after recovering from the initial infection itself. Long COVID is a heterogenous and multi-organ disease that captures a range of symptoms that are prevalent months after infection, including persistent breathlessness (dyspnea for >12 weeks post-infection). Hypoxia and inflammation are important potential mechanisms for long COVID that cause endothelial damage and changes to the pulmonary vasculature which may contribute to unexplained dyspnea. Tissue-specific damage can be characterized using fragments of DNA released into the circulation known as cell-free DNA (cfDNA). Importantly, these fragments retain epigenetic information that can be leveraged to determine the tissue of origin as well as disease-specific methylation changes. Objective: To develop a cfDNA methylation assay to characterize cell-specific damage in PVD groups and delimitate the role of PVD in long COVID. Specific Aims: Aim 1: Identify and validate DMRs for pulmonary cell types. Aim 2: Use Nanopore sequencing to find tissue and disease-specific DMRs. Aim 3: Associate levels of DMRs in patients with PVD and long COVID with clinical presentations. Key Results and Significance: We have validated the specificity of endothelial cell and pulmonary tissue DMRs against a tissue panel to quantify cell-specific injury in patient cfDNA. We have also performed Nanopore sequencing of cfDNA from patients with long COVID, Pulmonary Arterial Hypertension (PAH), and Chronic Thromboembolic Pulmonary Hypertension (CTEPH). We have used this data to demonstrate disease-specific methylation patterning. Our work has also highlighted some gaps to address in order to use the advantages of a PCR-free, Bisulfite-conversion-free and absolute quantification of cfDNA methylation via Nanopore sequencing.Item Open Access Parallelization of Bayesian Phylogenetics to Greatly Improve Run Times(2024-03-24) Yang, David; Zhang, Qingrun; Gordon, Paul; Liao, Wenyuan; van der Meer, Franciscus JohannesPhylogenetic analyses are invaluable to understanding the transmission of viruses, especially during disease outbreaks. In particular, Bayesian phylogenetics has great potential in modeling viral transmission due to the numerous phylogenetic models that can be incorporated. Currently, the availability of user-friendly software and accessibility to sequence data makes phylogenetic analyses easy to perform. However, to date, Bayesian phylogenetic analyses are still limited by long computational run-times which are especially unfavorable during ongoing and evolving disease outbreaks that demand real-time phylogeny results. Current optimization methods of Bayesian phylogenetic analysis mainly focus on iteration-level parallelization and mostly overlook the potential of larger-scale parallelization approaches. In this thesis, we provide an in-depth overview of topics including phylogenetic analysis, relevant biological information, and phylogenetic analysis optimization methods. We also proposed a novel parallelized Markov Chain Monte Carlo method that greatly improved Bayesian phylogenetic run times and integrated the approach into a data pipeline to allow for the direct analysis of viral samples. We demonstrated the validity of our methods by performing phylogenetic analyses on two sets of HIV simulation data and one set of real-world SARS-CoV-2 data. Our results suggested that the parallelization of MCMC in Bayesian phylogenetic analyses drastically reduces run times by 29-fold without causing significant deviations in parameter estimates and predicted phylogenetic trees.