Using machine learning methods to improve chronic disease case definitions in primary care electronic medical records

dc.contributor.advisorWilliamson, Tyler S.
dc.contributor.advisorSajobi, Tolulope T.
dc.contributor.authorLethebe, Brendan Cord
dc.contributor.committeememberQuan, Hude
dc.contributor.committeememberRonksley, Paul Everett
dc.date2018-06
dc.date.accessioned2018-04-25T14:19:24Z
dc.date.available2018-04-25T14:19:24Z
dc.date.issued2018-04-23
dc.description.abstractBackground: Chronic disease surveillance at the primary care level is becoming more feasible with the increased use of electronic medical records (EMRs). However, the quality of surveillance information is directly dependent on the quality of the case definitions that identify the conditions of interest. Purpose: To determine whether machine learning algorithms can produce chronic disease case definitions comparable to committee created case definitions in a primary care EMR setting. Methods: A chart review was conducted for the presence of hypertension, diabetes, osteoarthritis, and depression in a cohort of 1920 patients from the Canadian Primary Care Sentinel Surveillance Network database. The results of this chart review were used as training data. The C5.0, Classification and Regression Tree, Chi-Squared Automated Interaction Detection decision trees, Forward Stepwise logistic regression, Least Absolute Shrinkage and Selection Operator penalized logistic regression were compared using 10-fold cross validation. Sensitivity, specificity, positive predictive value and negative predictive value were estimated and compared for the four chronic conditions of interest. Results: Validity measures were similar across algorithms. For hypertension, sensitivity ranged between 93.1-96.7%, while specificity ranged from 88.8-93.2%. For diabetes, sensitivities ranged from 93.5-96.3% with specificities between 97.1-99.0%. For osteoarthritis, sensitivities ranged from 82.0-84.4% with specificities between 92.7-94.0%. For depression, sensitivities went from 81.4-88.3%, and specificities ranged from 93.4-94.9%. Compared with the committee-created case definitions, these metrics were equivalent or better using the machine learning method. Conclusions: Machine learning algorithms produced accurate case definitions comparable to committee-created case definitions. It is possible to use machine learning techniques to develop high quality case definitions from EMR data.en_US
dc.identifier.citationLethebe, B. C. (2018). Using machine learning methods to improve chronic disease case definitions in primary care electronic medical records (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/31824en_US
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/31824
dc.identifier.urihttp://hdl.handle.net/1880/106538
dc.language.isoeng
dc.publisher.facultyCumming School of Medicine
dc.publisher.facultyGraduate Studies
dc.publisher.institutionUniversity of Calgaryen
dc.publisher.placeCalgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subjectMachine Learning
dc.subjectPrediction
dc.subjectStatistics
dc.subjectCase Definition
dc.subjectSurveillance
dc.subjectChronic Disease
dc.subject.classificationBiostatisticsen_US
dc.subject.classificationPublic Healthen_US
dc.subject.classificationStatisticsen_US
dc.subject.classificationComputer Scienceen_US
dc.titleUsing machine learning methods to improve chronic disease case definitions in primary care electronic medical records
dc.typemaster thesis
thesis.degree.disciplineCommunity Health Sciences
thesis.degree.grantorUniversity of Calgary
thesis.degree.nameMaster of Science (MSc)
ucalgary.item.requestcopytrue
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2018_lethebe_brendan.pdf
Size:
1.25 MB
Format:
Adobe Portable Document Format
Description:
Thesis Document
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.74 KB
Format:
Item-specific license agreed upon to submission
Description: