Browsing by Author "Greenberg, Matthew"
Now showing 1 - 18 of 18
Results Per Page
Sort Options
Item Open Access Algorithmic enumeration of quaternionic lattices(2014-10-08) Chisholm, Sarah; Bauer, Mark; Greenberg, MatthewIn this thesis, we develop and implement an algorithm in Magma for enumerating isometry classes of quaternionic lattices. This algorithm can be viewed as a higher rank generalization of that used for enumerating equivalence classes of left-ideals of quaternionic orders. Our key technical innovation is an adaptation of Kneser’s method of neighbours to the setting of quaternionic lattices. Central to this adaptation is the Morita equivalence between Hermitian spaces over the split quaternion algebra M2(F) and symplectic spaces over F.Item Open Access Computing Isogeny Volcanoes of Rank Two Drinfeld Modules(2018-01-19) Caranay, Perlas; Greenberg, Matthew; Scheidler, Renate; Bauer, Kristine; Bauer, Mark; Dimitrov, Vassil Simeonov; Chen, IminElliptic curves have long been widely studied mathematical objects. They do not only feature prominently in established areas of mathematics such as number theory, algebraic geometry, and topology, but have recently gained practical importance due to applications in coding theory and cryptography. More recently, Drinfeld modules have received increased attention due to their surprising similarity to elliptic curves - objects to which they bear little superficial resemblance. However, little is yet known about real-world applications of Drinfeld modules. Elliptic curves come in two kinds -- ordinary and supersingular. Endomorphism rings of ordinary and supersingular elliptic curves, made up of isogenies, are very different. An analogous dichotomy holds for Drinfeld modules. Recently, major progress has been achieved by researchers in explicitly computing endomorphism rings of elliptic curves using isogeny volcanoes, but very little if anything of this kind has yet been done for Drinfeld modules. Our aim here is to study the theoretical and computational aspects of isogeny volcanoes of rank two Drinfeld modules defined over finite fields and determine how to explicitly compute these mathematical structures. We establish theoretical properties of isogeny volcanoes in the Drinfeld module case. Then we design, analyze, and implement algorithms for computing (1) j-invariants and Drinfeld modular polynomials, (2) isogeny volcanoes, and (3) endomorphism rings and explicit isogenies of rank two Drinfeld modules.Item Open Access Dementia risk prediction in individuals with mild cognitive impairment: a comparison of Cox regression and machine learning models(2022-11-02) Wang, Meng; Greenberg, Matthew; Forkert, Nils D.; Chekouo, Thierry; Afriyie, Gabriel; Ismail, Zahinoor; Smith, Eric E.; Sajobi, Tolulope T.Abstract Background Cox proportional hazards regression models and machine learning models are widely used for predicting the risk of dementia. Existing comparisons of these models have mostly been based on empirical datasets and have yielded mixed results. This study examines the accuracy of various machine learning and of the Cox regression models for predicting time-to-event outcomes using Monte Carlo simulation in people with mild cognitive impairment (MCI). Methods The predictive accuracy of nine time-to-event regression and machine learning models were investigated. These models include Cox regression, penalized Cox regression (with Ridge, LASSO, and elastic net penalties), survival trees, random survival forests, survival support vector machines, artificial neural networks, and extreme gradient boosting. Simulation data were generated using study design and data characteristics of a clinical registry and a large community-based registry of patients with MCI. The predictive performance of these models was evaluated based on three-fold cross-validation via Harrell’s concordance index (c-index), integrated calibration index (ICI), and integrated brier score (IBS). Results Cox regression and machine learning model had comparable predictive accuracy across three different performance metrics and data-analytic conditions. The estimated c-index values for Cox regression, random survival forests, and extreme gradient boosting were 0.70, 0.69 and 0.70, respectively, when the data were generated from a Cox regression model in a large sample-size conditions. In contrast, the estimated c-index values for these models were 0.64, 0.64, and 0.65 when the data were generated from a random survival forest in a large sample size conditions. Both Cox regression and random survival forest had the lowest ICI values (0.12 for a large sample size and 0.18 for a small sample size) among all the investigated models regardless of sample size and data generating model. Conclusion Cox regression models have comparable, and sometimes better predictive performance, than more complex machine learning models. We recommend that the choice among these models should be guided by important considerations for research hypotheses, model interpretability, and type of data.Item Embargo Developing Novel Supervised Learning Model Evaluation Metrics Based on Case Difficulty(2024-01-05) Kwon, Hyunjin; Lee, Joon; Josephson, Colin Bruce; Greenberg, MatthewPerformance evaluation is an essential step in the development of machine learning models. The performance of a model informs the direction of its development and provides diverse knowledge to researchers. Most common ways to assess a model’s performance are based on counting the numbers of correct and incorrect predictions the model makes. However, this conventional approach to evaluation is limited in that it does not consider the differences in prediction difficulty between individual cases. Although metrics for quantifying the prediction difficulty of individual cases exist, their usefulness is hindered by the fact that they cannot be applied universally across all types of data; that is, each metric requires specific data conditions be met for its use, which can be a significant obstacle when dealing with real-world data characterized by diversity and complexity. Therefore, this thesis proposes new metrics for calculating case difficulty that perform well across diverse datasets. These new case difficulty metrics were developed using neural networks based on varying definitions of prediction difficulty. In addition, the metrics were validated using various datasets and compared with existing metrics from the literature. New performance evaluation metrics incorporating case difficulty to reward correct predictions of high-difficulty cases and penalize incorrect predictions of low-difficulty cases were also developed. A comparison of these case difficulty-based performance metrics with conventional performance metrics revealed that the novel evaluation metrics could provide a more detailed explanation and deeper understanding of model performance. We anticipate that researchers will be able to calculate case difficulty in diverse datasets under various data conditions with our proposed metrics and use these values to enhance their studies. Moreover, our newly developed evaluation metrics considering case difficulty could serve as an additional source of insight for the evaluation of classification model performance.Item Open Access Divisor Class Group Arithmetic on C3,4 Curves(2020-01-31) MacNeil, Evan; Scheidler, Renate; Jacobson, Michael J.; Bauer, Mark L.; Greenberg, MatthewComputing in the divisor class group of an algebraic curve is a non-trivial component in computing L-series. L-series in turn are at the heart of the Sato-Tate conjecture and related conjectures. The Sato-Tate conjecture has been proven for elliptic curves with complex multiplication, but remains open for other families of algebraic curves. In order to test these conjectures against other curve families, it is desirable to have efficient algorithms to perform divisor class group arithmetic. Fast explicit formulas exist to perform divisor class group arithmetic for genus 1 and genus 2 curves. However, the picture for genus 3 curves is incomplete. Existing explicit formulas for arithmetic on non-hyperelliptic genus 3 curves (C3,4 curves) have been developed with cryptographic applications in mind. They make certain genericity assumptions on their inputs that hold with high probability in cryptographic settings, but are unsuited for number theoretic use cases. More general algorithms exist that can perform divisor class arithmetic over any curve, but they are slow. In this thesis, that gap is bridged. Fast explicit formulae are developed that may be used to add any pair of reduced divisors on any C3,4 curve. Formulae optimized for the generic case considered by previous authors are produced, allowing one to add divisors in 1I+111M+3S+99A and double divisors in 1I+135M+3S+116A (inversions, multiplications, squarings, and additions in a field). The formulae are implemented in Sage. Benchmark tests find that these new formulae allow one to add and double 13.2% and 11.1% faster, respectively, that the previous state-of-the-art in C3,4 curve arithmetic.Item Open Access Galois actions on l-adic local systems and their nearby cycles: a geometrization of fourier eigendistributions on the p-adic lie algebra sl(2)(2012) Christie, Aaron; Cunningham, Clifton; Greenberg, MatthewIn this thesis, two Qe-local systems, and 0 £1 (Definition 3.2.1) on the regular unipotent subvariety Uo,I< of p-adic SL(2)1< are constructed. Making use of the equivalence between Qe-local systems and £-adic representations of the etale fundamental group, we prove that these local systems are equivariant with respect to conjugation by SL(2)1< (Proposition 3.3.5) and that their nearby cycles, when taken with respect to appropriate integral models, descend to local systems on the regular unipotent subvariety of SL(2)k, k the residue field of K (Theorem 4.3.1). Distributions on SL(2, K) are then associated to 0 £ and 0 £1 (Definition 5.1.4) and we prove properties of these distributions. Specifically, they are admissible distributions in the sense of Harish-Chandra (Proposition 5.2.1) and, after being transferred to the Lie algebra, are linearly independent eigendistributions of the Fourier transform (Proposition 5.3.2). Together, this gives a geometrization of important admissible invariant distributions on a nonabelian p-adic group in the context of the Local Langlands program.Item Open Access Mean values related to multiplicative orders of elements of finite fields and heuristics for ranks of apparition of primes in divisibility sequences(2011) Quan, Diane; Greenberg, MatthewItem Open Access Measuring a Lack of Engagement in Raging Skies(2022-01) Mattingly, Peter; Clark, Douglass; Clark, Douglas; Sengupta, Pratim; Greenberg, MatthewA lack of engagement is bad for measurement via assessment (Kane, 2006). Traditional qualitative measures of a lack of engagement are error prone and expensive (V. Shute & Ventura, 2013). Thus, a quantitative approach is used in this study to attempt to address such issues. This study attempts to find a characteristic behaviour associated with a lack of engagement: Rapid-Guessing Behaviour (RGB; Wise, 2017). Data used during analysis comes from a project by Dr. Man-Wai Chu which studied learner data as they played a Game-Based Assessment (GBA; Chu & Chiang, 2018); A GBA is a game used as a platform for assessment (Ren, 2019). Analysis failed to reveal a strong link between RGB and a lack of engagement in this study. However, it is believed that learners with some behaviour patterns were exhibiting Enjoyment Seeking Behaviour; In short, they disengaged from the assessment to seek enjoyment elsewhere.Item Open Access Numerical Tests of Two Conjectures in Fake Real Quadratic Orders(2017) Wang, Hongyan; Scheidler, Renate; Jacobson, Michael John Jr.; Greenberg, Matthew; Cunningham, Clifton L. R.A fake real quadratic order is defined based on an imaginary quadratic field and a prime p but behaves similarly to real quadratic orders. Two conjectures regarding fake real quadratic orders are discussed in the thesis. The first one is the Cohen-Lenstra heuristic. Our computation showed that for fixed p, the proportion of fake real quadratic orders for which the odd part of the class number is one converges to C=0.754458..., which equals exactly the proportion of real quadratic fields for which the odd part of the class number is one. The second one is the Ankeny-Artin-Chowla conjecture, which states that D∤b where b is the second coefficient of the fundamental unit in the real quadratic field Q(√D). No counterexamples have been found in real quadratic fields but we found numerous counterexamples in fake real quadratic orders and this is evidence that the conjecture is false for real quadratic fields.Item Open Access Optimization Approaches for Intensity Modulated Proton Therapy Treatment Planning(2023-07) Mousazadeh, Bahar; Zinchenko, Yuriy; Greenberg, Matthew; Badescu, AlexandruRadiation therapy is a critical modality in the field of oncology. The primary goal of radiation therapy is to destroy or control the growth of cancerous cells while minimizing damage to healthy tissues. Intensity Modulated Proton Therapy (IMPT) is a type of radiation therapy that utilizes protons to irradiate the tumor. The unique physical properties of protons enable precise control over the radiation dose distribution within the tumor and more effective sparing of healthy tissues. Typically, radiation therapy treatment planning is posed as a multi-criteria optimization problem, whereby the challenge is finding the best possible treatment plan. In this study, we formulate and compare two optimization approaches for IMPT treatment planning. We first explore a linear programming (LP) approach, followed by a moment-based approach where we incorporate the dose-volume requirements into the fluence map optimization (FMO) problem. The evaluation of these models is conducted using anonymized patient data corresponding to a lung cancer case, with a focus on generating a good-quality initial plan that is amenable to further refinement. The moment-based approach has a drawback in terms of its high memory usage. To mitigate this limitation, we explore several sparsification strategies aimed at reducing memory requirements. Employing an aggressive sparsification method, we demonstrate that the moment-based approach outperforms the LP model in dosimetric outcomes and computational run-time. We highlight a trade-off between the quality of the treatment plan and computational run-time when utilizing different sparcification strategies for the moment-based approach. By adopting a less strict sparsification method, we anticipate achieving higher-quality treatment plans at the expense of increased computational run-time.Item Open Access Probabilistic Nonlinear Dimensionality Reduction(2022-11-04) Adams, Matthew; Rios, Cristian; Ware, Antony; Greenberg, MatthewHigh-dimensional datasets are present across scientific disciplines. In the analysis of such datasets, dimensionality reduction methods which provide clear interpretations of their model parameters are required. Principal components analysis (PCA) has long been a preferred method for linear dimensionality reduction, but is not recommended for data lying on or near low-dimensional nonlinear manifolds. On the other hand, neural networks have been used for dimension reduction but the associated model parameters have no clear interpretation. The main contribution of the current work is the introduction of probabilistic piecewise PCA, an interpretable model for approximating nonlinear manifolds embedded in high-dimensional space. Probabilistic piecewise PCA serves as a bridge between linear PCA and highly nonlinear neural network approaches to dimensionality reduction. Our model is an extension of probabilistic PCA and may be used when assuming any member of the natural exponential family of distributions on the observations. The model is explicitly defined for Gaussian and Poisson distributions, and posterior distributions for prediction and sampling are computed. A full comparative study of probabilistic piecewise PCA and existing dimensionality reduction methods is presented with a real-world bibliometric dataset.Item Open Access Smooth Integral Models for Certain Congruence Subgroups of Odd Spin Groups(2018-09-13) Shahabi, Majid; Cunningham, Clifton; Greenberg, Matthew; Bauer, Kristine; Bauer, Mark; Gour, Gilad; Gordon, JuliaIn this thesis, we introduce a family of congruence subgroups for general odd spin groups GSpin(2n+1)/Q (see Chapter 1 for definition of GSpin(2n+1)). We prove that our congruence subgroups for GSpin(2n+1) admit integral models that are smooth group schemes over Z with generic fibre isomorphic to GSpin(2n+1)/Q (Proposition 2.7 and Theorem 2.9), have specific special fibres (Proposition 2.12 and Theorem 2.13), and generalize Hecke's congruence subgroup for GL2(Q) (Proposition 2.15) and the paramodular subgroup for GSp4(Q) (Proposition 2.16), where N is a fixed positive integer. We also study a family of congruence subgroups for special odd orthogonal groups SO(2n+1)/Q, introduced by Gross and Tsai [G2, T1, T2]. These congruence subgroups are expected to appear in anologs of modularity for abelian varieties predicted by the Langlands program [G2,CD]. Using scheme theory, we prove that these congruence subgroups for SO(2n+1) admit integral models that are smooth group schemes over Z[1/2] with generic fibre isomorphic to SO(2n+1)/Q (Theorems 3.6, 3.16, 3.10, 3.17), have specific special fibres (Propositions 3.11 and 3.18, and Theorems 3.12 and 3.19), and have Zp-points isomorphic to the congruence subgroups of SO(2n+1) defined by Gross and Tsai [G2, T1, T2] (Theorems 3.13 and 3.20). We also prove that there exists a close relationship between our integral models for these congruence subgroups of GSpin(2n+1) and SO(2n+1) (Theorem 4.1).Item Open Access Solving norm equations over global function fields using compact representations(2023-12-13) Leem, Sumin; Scheidler, Renate; Jacobson, Michael; Bauer, Mark; Greenberg, MatthewThis thesis provides contributions to improving the efficiency of solving norm equations over global function fields F and providing theoretical and empirical evidence of the improvement. We present two new algorithms for solving norm equations over F; one is an exhaustive search algorithm inspired by the sole existing method in [21], and the other is an algorithm using principal ideal testing via index calculus. We employed compact representations in both of the novel algorithms to represent solutions in less space and also to improve their efficiency. In order to compare the asymptotic and practical efficiency of the novel algorithms to the existing one, we performed rigorous complexity analyses and empirical analyses of the algorithms. We provided the asymptotic complexities of the algorithm as the number of bit operations required. We also analyzed for the first time the complexity of the algorithm for computing compact representations in [15] as a part of the complexity analyses of the novel algorithms. The asymptotic complexities of both of our novel algorithms were reduced by double exponential factors in the size of F, when compared to the existing algorithm in [21]. The new algorithm with principal ideal testing requires even fewer bit operations than the new exhaustive search algorithm in terms of the size of F. Empirical testing results demonstrate our novel algorithms are significantly faster than the existing one in practice as expected from the complexity analyses.Item Open Access Some Problems on the Dynamics of Positive Characteristic Tori(2024-05-13) Gunn, Keira; Bauer, Mark; Nguyen, Dang Khoa; Greenberg, Matthew; Scheidler, RenateThe positive characteristic tori T_F are a set of counterparts to the real torus T=R/Z. In positive characteristic we define the ``integers'' as polynomials with coefficients from a finite field F (Z_F:=F[t]) and the ``reals'' as the field of Laurent series with coefficients in F (R_F:=F((t))) so that the positive characteristic torus over F is similarly defined: T_F:=R_F/Z_F. While T and T_F have some structural and operational similarities, they behave fundamentally differently, particularly with regards to dynamics. In this dissertation we seek analogues to known results in R and T. We find that both sets have similar ergodicity results but that orbits of affine maps and other areas of dynamics show significant differences. In particular, we construct an Artin-Mazur zeta function that looks significantly different to its counterpart in T, demonstrate that Furstenberg's orbital density theorem falls apart in positive characteristic, and establish that the intersection of orbits of affine maps rely on sets that depend on powers of the characteristic of F rather than arithmetic progressions. At first glance, the simplicity of working in T_F and its similarities to T suggest that we should be able to find many of the same simple results; however in reality the structure of T_F consists of infinitely defined sub-structures constructed by shifts of Frobenius maps into itself and these sub-structures present themselves frequently in a manner that does not occur in TItem Open Access The Bias of Using Cross-Validation in Genomic Predictions and Its Correction(2023-09-21) Qian, Yanzhao; Greenberg, Matthew; Long, Quan; Shen, Hua; MacDonald, Matthew EthanCross-validation (CV) is a widely used technique in statistical learning for model evaluation and selection. Meanwhile, various of statistical learning methods, such as Generalized Least Square (GLS), Linear Mixed-Effects Models (LMM), and regularization methods are commonly used in genomic predictions, a field that utilizes DNA polymorphisms to predict phenotypic traits. However, due to high dimensionality, relatively small sample sizes, and data sparsity in genomic data, CV in these scenarios may lead to an underestimation of the generalization error. In this work, we analyzed the bias of CV in eight methods: Ordinary Least Square (OLS), GLS, LMM, Lasso, Ridge, elastic-net (ENET), and two hybrid methods: one combining GLS with Ridge regularization (GLS+Ridge), and the other combining LMM with Ridge regularization (LMM+Ridge). Leveraging genomics data from the 1,000 Genomes Project and simulated phenotypes, our investigation revealed the presence of bias in all these methods. To address this bias, we adapted a variance-structure method known as Cross-Validation Correction (CVc). This approach aims to rectify the cross-validation error by providing a more accurate estimate of the generalization error. To quantify the performance of our adapted CVc towards all these methods, we applied the trained model to an independently generated dataset, which served as a gold standard for validating the models and calculating the generalization error. The outcomes show that, by leveraging CVc, we corrected the CV bias for most of the methods mentioned above, with two exceptions that are unrectifiable methods: ENET and Lasso. Our work revealed the substantial bias in the use of CV in genomics, a phenomenon under-appreciated by the field of statistical genomics and medicine. Additionally, we demonstrated that bias-corrected models may be formed by adapting CVc, although more work is needed to cover the full spectrum.Item Open Access Using Brain Topological Features Extracted from Resting State fMRI to Classify Autism Spectrum Disorder(2019-07-03) Kazeminejad, Amirali; Sotero Diaz, Roberto C.; Bray, Signe; Greenberg, MatthewAutism Spectrum Disorder is a neurodevelopmental disease manifesting in early childhood and hindering the social and behavioral outlooks of individuals suffering from it. Early identification of this disorder leads to better patient outcome. Many imaging studies have been conducted in order to gather insight into the inner workings of this disorder with some using machine learning in autism diagnosis. The success of this approach is heavily dependent on the features that are used for the classification task. Graph theoretical measures, extracted from resting state functional MRI, have already proven useful in classifying other neurological disorders. I hypothesized that by using these features for Autism Spectrum Disorder classification, the model performance (accuracy) will improve over previously reported imaging-based methods. Furthermore, this allowed me to identify possible biomarkers for the disorder based on the importance of features selected. This thesis shows that graph theoretical features may help improve classification accuracies and extracting biomarkers relevant to ASD.Item Embargo Variable Selection Using the Method of the Broken Adaptive Ridge Regression(2024-07-08) Chan, Christian Zhao Yang; Lu, Xuewen; Long, Quan; Greenberg, Matthew; Liao, WenyuanIn this thesis, we consider variable selection methods incorporating the Broken Adaptive Ridge Regression under a few different model frameworks that deal with joint modelling of recurrent and terminal events, high-dimensional covariates, low-dimensional categorical covariates, and low-dimensional continuous covariates in generalized partly linear models and partly linear Cox proportional hazards models. With data being more easily available than ever in the digital era, it is important that only relevant variables are retained when building a statistical model. In Chapter 2, we implement a novel method to simultaneously perform variable selection and estimation in the joint frailty model of recurrent and terminal events using the Broken Adaptive Ridge (BAR) penalty. The BAR penalty can be summarized as an iteratively reweighted squared $L_2$-penalized regression, which approximates the $L_0$-regularization. Our method allows for the number of covariates to diverge with the sample size. Under certain regularity conditions, we prove that the BAR estimator is consistent and asymptotically normally distributed, which are known as the oracle properties in the variable selection literature. In our simulation studies, we compare our proposed method to the Minimum Information Criterion (MIC) method. We apply our method to the Medical Information Mart for Intensive Care (MIMIC-III) database, with the aim of investigating which variables affect the risks of repeated ICU admissions and death during ICU stay. In Chapter 3, motivated by the CATHGEN data, we develop a new method for simultaneous variable selection and parameter estimation under the context of generalized partly linear models for data with high-dimensional covariates. The method is referred to as the BAR estimator, which is an approximation of the $L_0$-penalized regression by iteratively performing reweighted squared $L_2$-penalized regression. The generalized partly linear model extends the generalized linear model by including a non-parametric component to construct a flexible model for modeling various types of covariates, including linear and non-linear effects in different dimensions. We employ the Bernstein polynomials as the sieve space to approximate the non-parametric functions so that our method can be implemented easily using the existing R packages. Extensive simulation studies suggest that the proposed method performs better than other commonly used penalty-based variable selection methods. We apply the method to the CATHGEN data with a binary response from a coronary artery disease study, which motivated our research, and obtain new findings in both high-dimensional genetic and low-dimensional non-genetic covariates. In Chapter 4, we implement the BAR penalty under the partly linear Cox proportional hazards model with right-censored data, where our model framework considers three sets of covariates: high-dimensional covariates, low-dimensional categorical covariates, and low-dimensional continuous covariates. The low-dimensional continuous covariates are considered to have possible non-linear effects. Our variable selection method can be easily implemented by using existing R packages. From our simulation studies, we observe that our method performs better than other existing variable selection methods. Finally, we apply our method to the acute respiratory disease syndrome (ARDS) to discover relevant metabolites that contribute to the risk of dying in the ICU. Finally, we conclude the results from all three projects in Chapter 5.Item Open Access WEB-ENABLED SOFTWARE PLATFORM FOR IDENTIFICATION OF MINIMUM-COST POWER TRANSMISSION LINES WITH REDUNDANCY CONSTRAINTS(2022-09) Jha, Amit; Zinchenko, Yuriy; Greenberg, Matthew; Ware, AntonyPath redundancy is essential for safety and reliability in many real-world routing problems, such as the design of networks for power transmission or oil transportation. These problems typically involve finding the shortest path on a weighted graph. Instead, in the shortest path with constrained path redundancy, two disjoint shortest paths are desired, such that the combined weight of the two paths is minimized and a minimum physical path separation is maintained. Finding two disjoint shortest paths subject to constraints on a graph is generally an NP-hard problem. Conventionally, the task of planning for a redundant transmission line requires a large-scale graph and a mixed integer programming (MIP) formulation of the problem. However, due to the the large model dimension MIP is practically intractable. To overcome the intractability, a new 3D embedding scheme has been proposed by [18], which allows avoiding many common pitfalls of available heuristic approaches, while maintaining a modest computation time requirement. To enable this scheme, we developed a software package containing (1) an optimized implementation of a 3D embedding scheme and algorithms which uses it for the identification of disjoint shortest paths with separation constraint on a graph, and (2) a graphical user interface that allows user inputs as well as visualization of the solution on Google Maps. We deployed this software on cloud.