The Future of Patient Education: A Study on AI-Driven Responses to Urinary Incontinence Inquiries

Rotem R1, Zamstein O2, O'reilly B1, Weintraub A2

Research Type

Clinical

Abstract Category

Prevention and Public Health

Abstract 242
Best of the Best Conservative Management
Scientific Podium Session 23
Friday 25th October 2024
11:45 - 12:00
Hall N106
Incontinence Overactive Bladder Stress Urinary Incontinence Questionnaire
1. Department of Urogynaecology, Cork University Maternity Hospital, Cork, Ireland, 2. Soroka University Medical Center, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
Presenter
Links

Abstract

Hypothesis / aims of study
This study was designed to evaluate the effectiveness of ChatGPT in providing insights into common urinary incontinence concerns within Urogynecology. By aanalyzing the model's responses against established benchmarks of accuracy, completeness, and safety, the study aimed to quantify its usefulness for informing patients and aiding healthcare providers.
Study design, materials and methods
An expert-driven questionnaire was developed, inviting urogynecologists worldwide to assess ChatGPT's answers to 10 carefully selected questions on urinary incontinence. These assessments focused on the accuracy of the responses, their comprehensiveness, and whether they raised any safety issues. Subsequent statistical analyses determined the average consensus among experts and identified the proportion of responses receiving favorable evaluations (a score of 4 or higher). Participants were unaware that the answers were generated by ChatGPT. To ensure the validity of our findings, one of ChatGPT's responses was deliberately altered to contain a misinformation element, functioning as a control question, testing for the experts' ability to detect inaccuracies.
Results
Among the 50 urogynecologists contacted globally, 37 provided valuable feedback on ChatGPT's UI responses. A substantial 75% of these specialists had pursued advanced fellowship training. The majority (54%) brought to bear over a decade of experience, with diverse geographical representation from the Middle East (n=10), Europe (n=8), South America (n=7), North America (n=5), and Asia (n=4). Overall, the responses garnered an encouraging average score of 4.0. The distribution of ratings, as detailed in Figure 1, demonstrated favorable evaluations across all but the intentionally altered control question. In particular, ChatGPT demonstrated strengths in response comprehensiveness and safety, with 74% scoring 4 or higher. However, the precision of responses indicated a potential for enhancement, with a slightly lower 71% achieving a score of 4 or above in accuracy. Our control question, notably, had an average rating of 2.6 with only 6% of raters awarding a score of 4 or higher.
Interpretation of results
While the results attest to a generally favorable reception in the key areas assessed, they concurrently underscore an imperative for refinement, especially in enhancing the accuracy of information. Future iterations of ChatGPT should endeavor for meticulous precision to augment its reliability further. Additionally, ongoing adaptation and learning from user interactions could serve as a catalyst for its evolution.  Such progress will be pivotal in maximizing the effectiveness of ChatGPT, ensuring that it remains an up-to-date and trusted resource that can adapt to the dynamic nature of medical knowledge and patient education needs.
Concluding message
This study highlights ChatGPT's proficient handling of UI-related inquiries, reflecting its  potential as an informative tool for patients and a supplemental asset for medical professionals. Nonetheless, the study also signals a clear avenue for improvement, particularly in the precision of the provided information
Figure 1 The distribution of ratings based on criteria across ChatGPT's answers.
Disclosures
Funding None Clinical Trial No Subjects None
Citation

Continence 12S (2024) 101584
DOI: 10.1016/j.cont.2024.101584

11/12/2024 17:08:31