QDA Classification for Two-Component Mixture with Data of Rare and Weak Signal
dc.contributor.advisor | Wu, Jingjing | |
dc.contributor.advisor | Shen, Hua | |
dc.contributor.author | Chen, Hanning | |
dc.contributor.committeemember | de Leon, Alexander R. | |
dc.contributor.committeemember | Liu, Shawn X. | |
dc.date | 2020-02 | |
dc.date.accessioned | 2020-01-14T20:59:39Z | |
dc.date.available | 2020-01-14T20:59:39Z | |
dc.date.issued | 2019-12-20 | |
dc.description.abstract | This thesis deals with the two-class classification problem for data with rare and weak signals, under the modern setup of p >> n (large p small n). Considering the two-component mixture of Gaussian features with different random mean vector of rare and weak signals but common covariance matrix (homoscedastic Gaussian), Fan et al. (2013) discussed the optimality of linear discriminant analysis (LDA) and proposed an efficient variable selection and classification procedure. This thesis is an extension of their work in the sense that we assume the two components have different random covariance matrix (heterogenous Gaussian) of rare and weak signals. As a start of this research, for simplicity we assume the two population mean vectors are the same in order to assess the pure effect of different covariance matrix. In this thesis, we propose intuitively to use quadratic discriminant analysis (QDA) for the classification of data with rare and weak signals. In theoretical aspect, we first derive the detection boundary of QDA at population level, which separates the region of successful classification from the region of unsuccessful classification under the ideal case that the covariance matrix is known. When the covariance matrix is unknown, we then obtain a subregion where successful classification is impossible (for all classifiers) which also forms a subregion of unsuccessful classification region of QDA. For data of rare signals, variable selection will mostly improve the performance of statistical procedures. Thus in implementation aspect, we propose a variable selection procedure for QDA based on the Higher Criticism Thresholding (HCT) that was proved to be efficient for LDA in Fan et al. (2013). Finally, we conduct extensive simulation studies in order to demonstrate and explore the successful and unsuccessful classification regions of QDA and examine the effectiveness of the proposed HCT procedure. | en_US |
dc.identifier.citation | Chen, H. (2019). QDA Classification for Two-Component Mixture with Data of Rare and Weak Signal (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. | en_US |
dc.identifier.doi | http://dx.doi.org/10.11575/PRISM/37452 | |
dc.identifier.uri | http://hdl.handle.net/1880/111495 | |
dc.language.iso | eng | en_US |
dc.publisher.faculty | Science | en_US |
dc.publisher.institution | University of Calgary | en |
dc.rights | University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. | en_US |
dc.subject | high dimensional data | en_US |
dc.subject | quadratic discriminant analysis (QDA) | en_US |
dc.subject | higher criticism | en_US |
dc.subject | classification | en_US |
dc.subject.classification | Education--Sciences | en_US |
dc.title | QDA Classification for Two-Component Mixture with Data of Rare and Weak Signal | en_US |
dc.type | master thesis | en_US |
thesis.degree.discipline | Mathematics & Statistics | en_US |
thesis.degree.grantor | University of Calgary | en_US |
thesis.degree.name | Master of Science (MSc) | en_US |
ucalgary.item.requestcopy | true | en_US |