Exploring the potential of artificial intelligence and machine learning to advance the diagnosis and treatment of bladder pain syndrome / interstitial cystitis

Inal Gultekin G1, Kalkan H2, Agit S2, Cetin Z3, Mangir N4

Research Type

Pure and Applied Science / Translational

Abstract Category

Pelvic Pain Syndromes

Abstract 659
Open Discussion ePosters
Scientific Open Discussion Session 105
Thursday 24th October 2024
13:45 - 13:50 (ePoster Station 6)
Exhibition Hall
Pathophysiology Molecular Biology Painful Bladder Syndrome/Interstitial Cystitis (IC) Physiology Basic Science
1. Istanbul Okan University Faculty of Medicine Department of Physiology, 2. Gebze Technical University Computer Engineering, 3. Technische Hochschule Bingen University of Applied Sciences, 4. Hacettepe University Faculty of Medicine Department of Urology
Presenter
Links

Poster

Abstract

Hypothesis / aims of study
Bladder Pain Syndrome/Interstitial Cystitis (BPS/IC) is a debilitating pain syndrome of unknown origin. Despite its well-recognized adverse impact on patients affected, society, and healthcare systems, there have not been any major advancements in the treatment of this chronic condition. Artificial intelligence (AI) is increasingly being used to advance the diagnosis and treatment of medical conditions using  powerful computational tools that help in the handling and fusion of several layers of ‘big data, such as omics and imaging data. Machine learning (ML) is a subset of AI that enables machines to learn from the provided data. 
In this study, ML methods and bioinformatics tools were used to identify biomarkers for disease diagnosis utilising the publicly available expression datasets, and BPS/IC disease pathways were investigated with suggested biomarkers.
Study design, materials and methods
This study addresses the preprocessing, normalization, and integration of three BPS/ IC datasets (GSE11783, GSE28242, GSE57560) composed of patients with and without Hunner’s lesion disease and healthy individuals as controls. We present a methodology that facilitates comprehensive analyses of gene expression data. Initially, gene symbols in the datasets were converted to Ensemble IDs, ensuring inter-dataset consistency and addressing the absence of gene symbols. After purification, the samples were binary labeled as either patient or non-patient, and each dataset was normalized internally. Subsequently, the three datasets were merged and subjected to a general normalization process to increase consistency across the datasets. 
The LASSO method selected 94 genes out of 7004 genes as the most significant genes prior to classifier training. Using the selected genes, a Random Forest Classifier was trained for classification. In addition to the classification accuracy using 94 top genes, the Shapley explainable AI method revealed the most trusted genes by the classifier. 
The 26 common genes and 94 selected features with ML were analysed with String v.12.0 for gene enrichment and clinical relevance with the most recent versions of EnrichR and Reactome v.87 pathway analysis bioinformatic tools.
Results
The classifier was trained and tested by 5-fold cross-validation, and averages of 86.67% accuracy, 90.0% precision, 56.667% recall, 79.333% F1 score, and 97.1429% ROC area under the curve (AUC) values were obtained (Fig 1). The trained model's feature importance attribute was utilized to identify the 26 most influential genes in the model's predictions. The gene enrichment and pathway analysis screenings revealed high activity in the immune system.
Interpretation of results
This study, conducted with data engineering, has successfully combined, harmonized, and normalized three independent human BPS/ IC patient and control datasets. 
By using data engineering techniques, the merging has enabled the combination of the features, the filling of the missing informations of each feature, resulting in a large pool of more then 7.000 genes,
The use of AI methods reduced the feature connections to 26 common genes and 94 selected genes, with high accuracy and precision. 
The high AUC values represent that the ML is successful in accuracy and precision for disease and healthy patient identification. 
Additionally, gene enrichment analysis of the selected genes increases the importance of inflammatory defense pathways for BPS/ IC.
Concluding message
The curated dataset is now ready for future ML and deep learning processes. Biomarker identification is crucial for future treatment options in BPS/ IC. These identified genes could have crucial roles in biological processes and disease pathophysiology mechanisms, making them potential targets for different settings, including in vitro and in vivo models.
Figure 1 Fiğ 1: ROC curve and area under the curve (AUC) for PBS/IC+ and PBS/IC- classification
Disclosures
Funding We do not have funding for this study Clinical Trial No Subjects None
12/12/2024 17:37:53