Browsing by Author "Xu, Yuan"
Now showing 1 - 9 of 9
Results Per Page
Sort Options
Item Open Access The association between 'compliance with colonoscopy surveillance' after primary treatment and healthcare utilization(2020-09-23) Qaedi, Atena; Lu, Mingshan; Yuan, Lasheng; Xu, YuanChoosing Wisely Canada recommends surveillance with colonoscopy for colorectal cancer patients undergoing curative-intent treatment. Although surveillance with colonoscopy after surgery is beneficial in terms of early detection of recurrence and survival, there is limited real-world evidence on the compliance of recommended colonoscopy surveillance, and the health utilization and costs associated with it. This retrospective study uses existing administrative data sets from Alberta Health Services and Alberta Health, which includes 7120 observations for the 2004-2015 period. The study sample consisted of colorectal cancer patients at stages I and II, who underwent curative-intent surgery. This project compared healthcare utilization (measured by cost) and health outcomes (measured by survival) for patients who complied with colonoscopy surveillance, versus those who did not comply. Cost and survival analysis were conducted, employing multivariate analyses via COX and logistic regressions. For the purposes of this study, cost data was calculated using the physician claims or the physician’s payment. In total, 6,962 patients were eligible for analysis. The median age was 67 (range: 18-104) years old. The proportion of patients with stage Ⅰ and Ⅱ colorectal cancer was 42.46% and 57.54%, respectively. A total of 2,812 (40.39%) patients had a one-year compliance, and 275 patients (3.95%) had two-to-five-year compliance. The average healthcare utilization of one-year and two-to-five-year compliance per person was 3,762 and 4,758 in CAD dollars, respectively. Compliance with colonoscopy surveillance after a primary treatment was associated with lower age, earlier cancer stage (stage Ⅰ), lower cancer grade (grade 1), lower CCI, and higher income. In addition, the overall death ratio and cancer-related death ratio was lower for those patients with compliance in each category (one-year and two-five-year follow-up), compared to those with no compliance. The results of this study suggest that colonoscopy surveillance compliance following primary treatment for early-stage colorectal cancer is associated with lower healthcare utilization and better cancer-specific survival.Item Open Access Characterization of Stability of Non-Negative Matrix Factorization Models: An Application to Single-Cell Data(2023-08-21) Liu, Alexander EJ; Zhang, Qingrun; Wu, Jingjing; Xu, Yuan; Zhang, QingrunThe non-negative matrix factorization (NMF) is a powerful machine learning technique used in mathematics, computer science, and data science. This technique has applications in a wide range of fields including recommender systems, image processing, signal processing, machine learning and genetics. Recently, NMF has gained popularity in the analysis of single-cell gene expression data to identify cell types and gene expression patterns. In this thesis, we have studied the NMF, its rank estimation, classification, and stability using both simulated data and real single-cell gene expression data. We have designed two simulated data sets with desired features and tested two seeding methods, eight NMF algorithms and five rank estimation criteria. Additionally, a real single-cell gene expression data has been used to further characterize the NMF algorithms. We have also investigated the stability of NMF, first over the sample size consideration and then on initialization. The detailed conditions that have been revealed by this thesis may generate practical impact in directing the appropriate use of NMF in analyzing single-cell gene expression data.Item Open Access Comparison of risk adjustment methods in patients with liver disease using electronic medical record data(2017-01-07) Xu, Yuan; Li, Ning; Lu, Mingshan; Dixon, Elijah; Myers, Robert P; Jolley, Rachel J; Quan, HudeAbstract Background Risk adjustment is essential for valid comparison of patients’ health outcomes or performances of health care providers. Several risk adjustment methods for liver diseases are commonly used but the optimal approach is unknown. This study aimed to compare the common risk adjustment methods for predicting in-hospital mortality in cirrhosis patients using electronic medical record (EMR) data. Methods The sample was derived from Beijing YouAn hospital between 2010 and 2014. Previously validated EMR extraction methods were applied to define liver disease conditions, Charlson comorbidity index (CCI), Elixhauser comorbidity index (ECI), Child-Turcotte-Pugh (CTP), model for end-stage liver disease (MELD), MELD sodium (MELDNa), and five-variable MELD (5vMELD). The performance of the common risk adjustment models as well as models combining disease severity and comorbidity indexes for predicting in-hospital mortality was compared using c-statistic. Results Of 11,121 cirrhotic patients, 69.9% were males and 15.8% age 65 or older. The c-statistics across compared models ranged from 0.785 to 0.887. All models significantly outperformed the baseline model with age, sex, and admission status (c-statistic: 0.628). The c-statistics for the CCI, ECI, MELDNa, and CTP were 0.808, 0.825, 0.849, and 0.851, respectively. The c-statistic was 0.887 for combination of CTP and ECI, and 0.882 for combination of MELDNa score and ECI. Conclusions The liver disease severity indexes (i.e., CTP and MELDNa score) outperformed the CCI and ECI for predicting in-hospital mortality among cirrhosis patients using Chinese EMRs. Combining liver disease severity and comorbidities indexes could improve the discrimination power of predicting in-hospital mortality.Item Open Access Development and validation of case-finding algorithms for recurrence of breast cancer using routinely collected administrative data(2019-03-08) Xu, Yuan; Kong, Shiying; Cheung, Winson Y; Bouchard-Fortier, Antoine; Dort, Joseph C; Quan, Hude; Buie, Elizabeth M; McKinnon, Geoff; Quan, May LAbstract Background Recurrence is not explicitly documented in cancer registry data that are widely used for research. Patterns of events after initial treatment such as oncology visits, re-operation, and receipt of subsequent chemotherapy or radiation may indicate recurrence. This study aimed to develop and validate algorithms for identifying breast cancer recurrence using routinely collected administrative data. Methods The study cohort included all young (≤ 40 years) breast cancer patients (2007–2010), and all patients receiving neoadjuvant chemotherapy (2012–2014) in Alberta, Canada. Health events (including mastectomy, chemotherapy, radiation, biopsy and specialist visits) were obtained from provincial administrative data. The algorithms were developed using classification and regression tree (CART) models and validated against primary chart review. Results Among 598 patients, 121 (20.2%) had recurrence after a median follow-up of 4 years. The high sensitivity algorithm achieved 94.2% (95% CI: 90.1–98.4%) sensitivity, 93.7% (91.5–95.9%) specificity, 79.2% (72.5–85.8%) positive predictive value (PPV), and 98.5% (97.3–99.6%) negative predictive value (NPV). The high PPV algorithm had 75.2% (67.5–82.9%) sensitivity, 98.3% (97.2–99.5%) specificity, 91.9% (86.6–97.3%) PPV, and 94% (91.9–96.1%) NPV. Combining high PPV and high sensitivity algorithms with additional (7.5%) chart review to resolve discordant cases resulted in 94.2% (90.1–98.4%) sensitivity, 98.3% (97.2–99.5%) specificity, 93.4% (89.1–97.8%) PPV, and 98.5% (97.4–99.6%) NPV. Conclusion The proposed algorithms based on routinely collected administrative data achieved favorably high validity for identifying breast cancer recurrences in a universal healthcare system in Canada.Item Open Access Development of machine learning models for the detection of surgical site infections following total hip and knee arthroplasty: a multicenter cohort study(2023-09-02) Wu, Guosong; Cheligeer, Cheligeer; Southern, Danielle A.; Martin, Elliot A.; Xu, Yuan; Leal, Jenine; Ellison, Jennifer; Bush, Kathryn; Williamson, Tyler; Quan, Hude; Eastwood, Cathy A.Abstract Background Population based surveillance of surgical site infections (SSIs) requires precise case-finding strategies. We sought to develop and validate machine learning models to automate the process of complex (deep incisional/organ space) SSIs case detection. Methods This retrospective cohort study included adult patients (age ≥ 18 years) admitted to Calgary, Canada acute care hospitals who underwent primary total elective hip (THA) or knee (TKA) arthroplasty between Jan 1st, 2013 and Aug 31st, 2020. True SSI conditions were judged by the Alberta Health Services Infection Prevention and Control (IPC) program staff. Using the IPC cases as labels, we developed and validated nine XGBoost models to identify deep incisional SSIs, organ space SSIs and complex SSIs using administrative data, electronic medical records (EMR) free text data, and both. The performance of machine learning models was assessed by sensitivity, specificity, positive predictive value, negative predictive value, F1 score, the area under the receiver operating characteristic curve (ROC AUC) and the area under the precision–recall curve (PR AUC). In addition, a bootstrap 95% confidence interval (95% CI) was calculated. Results There were 22,059 unique patients with 27,360 hospital admissions resulting in 88,351 days of hospital stay. This included 16,561 (60.5%) TKA and 10,799 (39.5%) THA procedures. There were 235 ascertained SSIs. Of them, 77 (32.8%) were superficial incisional SSIs, 57 (24.3%) were deep incisional SSIs, and 101 (42.9%) were organ space SSIs. The incidence rates were 0.37 for superficial incisional SSIs, 0.21 for deep incisional SSIs, 0.37 for organ space and 0.58 for complex SSIs per 100 surgical procedures, respectively. The optimal XGBoost models using administrative data and text data combined achieved a ROC AUC of 0.906 (95% CI 0.835–0.978), PR AUC of 0.637 (95% CI 0.528–0.746), and F1 score of 0.79 (0.67–0.90). Conclusions Our findings suggest machine learning models derived from administrative data and EMR text data achieved high performance and can be used to automate the detection of complex SSIs.Item Open Access Examining and predicting outcomes among early-onset breast cancer patients in Alberta using real-world and genomic data(2023-11-23) Basmadjian, Robert Barkev; Brenner, Darren; Cheung, Winson; Quan, May Lynn; Lupichuk, Sasha; Xu, YuanBackground: It is well accepted patients with early-onset breast cancer (EoBC), defined by a diagnosis <40 years of age, are at greater risks of recurrence and mortality compared to later-onset cases (≥40 years). However, robust evidence of tailored treatment approaches in EoBC is lacking. This thesis intersected causal inference methodology, outcomes prediction research, and bioinformatics to better understand the effectiveness of real-world treatments and decision support tools in EoBC, as well as discover biological drivers of poor prognosis. Methods: Three manuscripts were produced using population-based data of adult breast cancer diagnoses <40 years in Alberta from 2004 to 2020 and whole-exome sequence data from 100 tumour samples in this population. In Manuscript One, we described treatment patterns of ovarian function suppression (OFS) and applied the target trial emulation framework to estimate two treatments effects: 1) 2-year per-protocol effect of tamoxifen alone (TAM) vs. TAM + OFS (T-OFS) vs. aromatase inhibitor + OFS (AI-OFS); and 2) the effect of remaining on hormone therapy + OFS (H-OFS) for ≥2 years vs. <2 years on recurrence-free survival (RFS). In Manuscript Two, we assessed the performance of PREDICT v2.1 for predicting 10-year all-cause mortality in EoBC and developed 10-year mortality prediction models using machine learning. In Manuscript Three, we characterize somatic mutational signatures in 100 EoBC tumour samples and examine their association with clinicopathological variables and survival outcomes. Results: In a target trial that included 2647 premenopausal hormone receptor-positive breast cancer patients, RFS tended to be better in the AI-OFS group (HR=0.76; 95% CI: 0.41-1.37) and T-OFS group (HR=0.87; 95% CI: 0.50-1.45) compared to TAM. Patients on H-OFS for ≥ 2 years had significantly better RFS compared to those on H-OFS for <2 years (HR=0.69; 95% CI:0.54-0.90). In data from 1414 EoBC patients, PREDICT showed good discrimination (AUC=0.76) but tended to overestimate 10-year mortality in patients with high predicted risk. Building a 10-year mortality prediction model on EoBC patient data using penalized multivariable Cox regression showed better discrimination and calibration statistics versus using random survival forests. Among 100 EoBC tumour samples, we extracted five single-base substitution (SBS) and two insertion-deletion signatures. The SBS13-like signature was more common in the HER2 subtype. Higher than median expression of the SBS13-like signature may be associated with better RFS (HR=0.29; 95% CI: 0.08-1.06). Conclusions: These investigations contribute knowledge of tailored approaches in the clinical management of EoBC in Alberta. Our findings provide clearer understandings of the effectiveness of real world treatments and the performance of routinely used prediction models in EoBC. We also provide insights on how additional routinely collected variables and novel mutational variables may improve outcome prediction.Item Open Access Identification of multiple isoforms of glucocorticoid receptor in nasal polyps of patients with chronic rhinosinusitis(2022-06-11) Shao, Shan; Wang, Yue; Zhao, Yan; Xu, Yuan; Wang, Tie; Du, Kun; Bao, Shiping; Wang, Xiangdong; Zhang, LuoAbstract Background The conventional belief that glucocorticosteroid (GC) acts through a single brand glucocorticoid receptor (GR)α protein has changed dramatically with the discovery of multiple GR isoforms. We aimed to evaluate whether multiple GR protein isoforms are expressed in chronic rhinosinusitis with nasal polyps (CRSwNP) and whether GR protein isoform expression profiles differ between different endotypes of CRSwNP. Methods Thirty-eight patients with CRSwNP and ten healthy volunteers were included. The protein expression of multiple GR isoforms in nasal polyps (NPs) tissue and control mucosae was examined by western blot analysis with different GR antibodies. Results Five bands, including three bands for known proteins (GRα-A/B, GRα-C, and GRα-D) and two bands for unidentified proteins at 67 kilodaltons (kDa) and 60 kDa, were identified with both total GR antibody (PA1-511A) and GRα-specific antibody (PA1-516). GRα-D intensity, which was abundant in nasal mucosa, was significantly increased in the CRSwNP group and was especially elevated in the noneosinophilic CRSwNP (NE-CRSwNP) group (PA1-511A: P < 0.001 and P = 0.0018; PA1-516: P < 0.003 and P = 0.006, respectively). Additionally, the intensities of the newly recognized 67 kDa and 60 kDa bands were much greater in the NE-CRSwNP subgroup than in the eosinophilic CRSwNP (E-CRSwNP) subgroup; in the E-CRSwNP subgroup, the median intensities were even lower than those in the control group. Conclusions This study provides evidence that nasal tissues express multiple GR protein isoforms. GR protein isoforms presented disease and tissue-specific expression profiles that differed between the CRSwNP and control groups and between the E-CRSwNP and NE-CRSwNP subgroups. Graphical abstractItem Open Access New method for determining breast cancer recurrence-free survival using routinely collected real-world health data(2022-03-16) Jung, Hyunmin; Lu, Mingshan; Quan, May L.; Cheung, Winson Y.; Kong, Shiying; Lupichuk, Sasha; Feng, Yuanchao; Xu, YuanAbstract Background In cancer survival analyses using population-based data, researchers face the challenge of ascertaining the timing of recurrence. We previously developed algorithms to identify recurrence of breast cancer. This is a follow-up study to detect the timing of recurrence. Methods Health events that signified recurrence and timing were obtained from routinely collected administrative data. The timing of recurrence was estimated by finding the timing of key indicator events using three different algorithms, respectively. For validation, we compared algorithm-estimated timing of recurrence with that obtained from chart-reviewed data. We further compared the results of cox regressions models (modeling recurrence-free survival) based on the algorithms versus chart review. Results In total, 598 breast cancer patients were included. 121 (20.2%) had recurrence after a median follow-up of 4 years. Based on the high accuracy algorithm for identifying the presence of recurrence (with 94.2% sensitivity and 79.2% positive predictive value), the majority (64.5%) of the algorithm-estimated recurrence dates fell within 3 months of the corresponding chart review determined recurrence dates. The algorithm estimated and chart-reviewed data generated Kaplan–Meier (K-M) curves and Cox regression results for recurrence-free survival (hazard ratios and P-values) were very similar. Conclusion The proposed algorithms for identifying the timing of breast cancer recurrence achieved similar results to the chart review data and were potentially useful in survival analysis.Item Open Access Validation of large language models for detecting pathologic complete response in breast cancer using population-based pathology reports(2024-10-03) Cheligeer, Ken; Wu, Guosong; Laws, Alison; Quan, May L.; Li, Andrea; Brisson, Anne-Marie; Xie, Jason; Xu, YuanAbstract Aims The primary goal of this study is to evaluate the capabilities of Large Language Models (LLMs) in understanding and processing complex medical documentation. We chose to focus on the identification of pathologic complete response (pCR) in narrative pathology reports. This approach aims to contribute to the advancement of comprehensive reporting, health research, and public health surveillance, thereby enhancing patient care and breast cancer management strategies. Methods The study utilized two analytical pipelines, developed with open-source LLMs within the healthcare system’s computing environment. First, we extracted embeddings from pathology reports using 15 different transformer-based models and then employed logistic regression on these embeddings to classify the presence or absence of pCR. Secondly, we fine-tuned the Generative Pre-trained Transformer-2 (GPT-2) model by attaching a simple feed-forward neural network (FFNN) layer to improve the detection performance of pCR from pathology reports. Results In a cohort of 351 female breast cancer patients who underwent neoadjuvant chemotherapy (NAC) and subsequent surgery between 2010 and 2017 in Calgary, the optimized method displayed a sensitivity of 95.3% (95%CI: 84.0–100.0%), a positive predictive value of 90.9% (95%CI: 76.5–100.0%), and an F1 score of 93.0% (95%CI: 83.7–100.0%). The results, achieved through diverse LLM integration, surpassed traditional machine learning models, underscoring the potential of LLMs in clinical pathology information extraction. Conclusions The study successfully demonstrates the efficacy of LLMs in interpreting and processing digital pathology data, particularly for determining pCR in breast cancer patients post-NAC. The superior performance of LLM-based pipelines over traditional models highlights their significant potential in extracting and analyzing key clinical data from narrative reports. While promising, these findings highlight the need for future external validation to confirm the reliability and broader applicability of these methods.