Improving biomedical literature search engines for medical professionals

Frihat, Sameh

doi:10.17185/duepublico/82698

Dissertation Mi., 04. Dez.. 2024 CC BY-NC 4.0

Veröffentlicht

Improving biomedical literature search engines for medical professionals

Medical search engines for experts are concerned with retrieving relevant information to support medical professionals' information-seeking tasks. However, current search engines for medical research publications often fail to provide a quick access to accurate and reliable information because they rely on a bag-of-words approach that treats documents as collections of words to be matched with the keywords in the search query.

This thesis seeks to improve the state-of-the-art search techniques in biomedical research publication repositories by exploring methods of semantic information retrieval, and for context modeling by considering the characteristics of professional users performing the search. To this end, this work adopts the notion of multidimensional relevance in biomedical publication search and evaluates the performance of the retrieval systems in this context by addressing the following factors: personalization, credibility, semantic relevance, interactivity and integration with Large Language Models.

Specifically, as a way to address personalization we evaluate the methods for identifying document difficulty levels, expressed as readability and technicality of a document, as well as methods for classification of medical documents into medical sub-fields they address. Credibility is tackled by proposing methods of classifying biomedical publications according to the Level of Evidence (LoE) they are based on and evaluating how this affects medical document retrieval. Semantic relevance is addressed by identifying bio-concepts, ie genes, diseases and chemicals in the research papers and evaluating multiple methods of incorporating them into the document retrieval. Bio-concepts are also used to enhance the interactivity capability of a medical search engine, where the extracted bio-concepts are made available to the user during search by visualization. Finally, conversational search engine settings, where information retrieval is combined with the capabilities of a Large Language Model (LLM) in a Retrieval Augmented Generation (RAG) setup are investigated.

The results of these investigations are implemented into WisPerMed, a new search engine for the Medline database. A user study conducted with 131 medical practitioners demonstrated that this search engine reduces the time spent searching and the number of queries needed to finish a particular task, supports quicker and more accurate decision-making in medicine and increases user satisfaction with the search process.

Our findings show that (1) incorporating semantic relevance significantly improves the quality of retrieved information from medical literature, (2) the use of bio-concepts and LoE improves both the precision and trustworthiness of search results and (3) the developed models for predicting user-specific parameters allows for personalizing search results on aspects as document difficulty and medical sub-fields for more relevant outcomes. As a result, this work contributes to creating more efficient and effective tools for medical professionals, facilitating better patient care, and advancing medical research.