Reshaping free-text radiology notes into structured reports with generative question answering transformers.

in Artificial intelligence in medicine by Laura Bergomi, Tommaso M Buonocore, Paolo Antonazzo, Lorenzo Alberghi, Riccardo Bellazzi, Lorenzo Preda, Chandra Bortolotto, Enea Parimbelli

TLDR

  • The study proposes a way to automatically extract information from Italian radiology reports. This information is important for doctors to make decisions about patients' health. The study uses a special kind of computer program called a Transformer-based model to extract the information. The model is trained on a large number of radiology reports to learn how to extract the information. The study also collects feedback from human experts to make sure the information extracted by the model is accurate and complete. The study finds that the model can extract information from more than one place in the report and can also understand the context of the report to provide more accurate answers. The study suggests that this information extraction system could be used in other clinical settings or for other types of medical reports.

Abstract

Radiology reports are typically written in a free-text format, making clinical information difficult to extract and use. Recently, the adoption of structured reporting (SR) has been recommended by various medical societies thanks to the advantages it offers, e.g. standardization, completeness, and information retrieval. We propose a pipeline to extract information from Italian free-text radiology reports that fits with the items of the reference SR registry proposed by a national society of interventional and medical radiology, focusing on CT staging of patients with lymphoma. Our work aims to leverage the potential of Natural Language Processing and Transformer-based models to deal with automatic SR registry filling. With the availability of 174 Italian radiology reports, we investigate a rule-free generative Question Answering approach based on the Italian-specific version of T5: IT5. To address information content discrepancies, we focus on the six most frequently filled items in the annotations made on the reports: three categorical (multichoice), one free-text (free-text), and two continuous numerical (factual). In the preprocessing phase, we encode also information that is not supposed to be entered. Two strategies (batch-truncation and ex-post combination) are implemented to comply with the IT5 context length limitations. Performance is evaluated in terms of strict accuracy, f1, and format accuracy, and compared with the widely used GPT-3.5 Large Language Model. Unlike multichoice and factual, free-text answers do not have 1-to-1 correspondence with their reference annotations. For this reason, we collect human-expert feedback on the similarity between medical annotations and generated free-text answers, using a 5-point Likert scale questionnaire (evaluating the criteria of correctness and completeness). The combination of fine-tuning and batch splitting allows IT5 ex-post combination to achieve notable results in terms of information extraction of different types of structured data, performing on par with GPT-3.5. Human-based assessment scores of free-text answers show a high correlation with the AI performance metrics f1 (Spearman's correlation coefficients>0.5, p-values<0.001) for both IT5 ex-post combination and GPT-3.5. The latter is better at generating plausible human-like statements, even if it systematically provides answers even when they are not supposed to be given. In our experimental setting, a fine-tuned Transformer-based model with a modest number of parameters (i.e., IT5, 220 M) performs well as a clinical information extraction system for automatic SR registry filling task. It can extract information from more than one place in the report, elaborating it in a manner that complies with the response specifications provided by the SR registry (for multichoice and factual items), or that closely approximates the work of a human-expert (free-text items); with the ability to discern when an answer is supposed to be given or not to a user query.

Overview

  • The study proposes a pipeline to extract information from Italian free-text radiology reports using Natural Language Processing and Transformer-based models. The pipeline aims to leverage the potential of structured reporting (SR) to deal with automatic SR registry filling, focusing on CT staging of patients with lymphoma. The study investigates a rule-free generative Question Answering approach based on the Italian-specific version of T5: IT5, and evaluates its performance in terms of strict accuracy, f1, and format accuracy, compared with the widely used GPT-3.5 Large Language Model. The study collects human-expert feedback on the similarity between medical annotations and generated free-text answers, using a 5-point Likert scale questionnaire. The combination of fine-tuning and batch splitting allows IT5 ex-post combination to achieve notable results in terms of information extraction of different types of structured data, performing on par with GPT-3.5. A fine-tuned Transformer-based model with a modest number of parameters (i.e., IT5, 220 M) performs well as a clinical information extraction system for automatic SR registry filling task. It can extract information from more than one place in the report, elaborating it in a manner that complies with the response specifications provided by the SR registry (for multichoice and factual items), or that closely approximates the work of a human-expert (free-text items); with the ability to discern when an answer is supposed to be given or not to a user query.

Comparative Analysis & Findings

  • The study compares the outcomes observed under different experimental conditions or interventions detailed in the study. The study investigates a rule-free generative Question Answering approach based on the Italian-specific version of T5: IT5, and evaluates its performance in terms of strict accuracy, f1, and format accuracy, compared with the widely used GPT-3.5 Large Language Model. The study collects human-expert feedback on the similarity between medical annotations and generated free-text answers, using a 5-point Likert scale questionnaire. The combination of fine-tuning and batch splitting allows IT5 ex-post combination to achieve notable results in terms of information extraction of different types of structured data, performing on par with GPT-3.5. The study finds that a fine-tuned Transformer-based model with a modest number of parameters (i.e., IT5, 220 M) performs well as a clinical information extraction system for automatic SR registry filling task. It can extract information from more than one place in the report, elaborating it in a manner that complies with the response specifications provided by the SR registry (for multichoice and factual items), or that closely approximates the work of a human-expert (free-text items); with the ability to discern when an answer is supposed to be given or not to a user query.

Implications and Future Directions

  • The study's findings have significant implications for the field of research or clinical practice. The study demonstrates the potential of structured reporting (SR) to deal with automatic SR registry filling, focusing on CT staging of patients with lymphoma. The study shows that a rule-free generative Question Answering approach based on the Italian-specific version of T5: IT5 can achieve notable results in terms of information extraction of different types of structured data, performing on par with GPT-3.5. The study highlights the importance of collecting human-expert feedback on the similarity between medical annotations and generated free-text answers, using a 5-point Likert scale questionnaire. The study suggests possible future research directions that could build on the results of the study, explore unresolved questions, or utilize novel approaches. For example, the study could investigate the use of other Transformer-based models or other NLP techniques to improve the performance of the information extraction system. The study could also investigate the use of the information extraction system in other clinical settings or for other types of medical reports.