2025
- (4 November 2025)
Paper accepted by AAAI 2026 Artificial Intelligence for Social Impact Track – Medical Knowledge Infusion and Multi-Stage Correction Make LLMs Better Proofreaders for Radiology Reports
The increasing complexity and workload of clinical radiology leads to inevitable oversights and mistakes in their use as diagnostic tools, causing delayed treatments and sometimes life-threatening harms to patients. While large language models (LLMs) have shown remarkable progress in many tasks, their utilities in detecting and correcting errors in radiology reporting are limited. We present a novel dual-knowledge infusion framework that enhances LLMs’ capability for radiology report proofreading through systematic integration of medical expertise. Specifically, our knowledge infusion combines medical knowledge graph distillation (MKGD) with external knowledge retrieval (EXKR), enabling an effective automated approach in tackling mistakes in radiology reporting. By decomposing the complex proofreading task into three specialized stages of detection, localization, and correction, our method mirrors the systematic review process employed by expert radiologists, ensuring both precision and clinical interpretability. The dual-knowledge framework captures intricate medical relationships through structured graph representations while leveraging curated clinical expertise from reference reports. To perform a robust, clinically relevant evaluation, we constructed a comprehensive benchmark using real-world radiology reports with error patterns derived from real-world scenarios, including speech recognition confusions, terminology ambiguities, and template-related inconsistencies, all validated by practicing radiologists. Extensive evaluations across multiple LLM architectures demonstrate substantial improvements of our approach - up to 31.56% increase in error detection accuracy and 37.4% reduction in processing time. Human evaluation by radiologists confirms superior clinical relevance and factual consistency compared to existing approaches. Our framework addresses critical needs in clinical practice by enhancing report quality while reducing radiologist burden, particularly benefiting resource-constrained healthcare environments.
Read an earlier preprint version at https://arxiv.org/abs/2406.15045v1.
- (21 October 2025)
Paper accepted by Nature Machine Intelligence – Towards deployment-centric multimodal AI beyond vision and language – at https://doi.org/10.1038/s42256-025-01116-5
Multimodal artificial intelligence (AI) integrates diverse types of data via machine learning to improve understanding, prediction and decision-making across disciplines such as healthcare, science and engineering. However, most multimodal AI advances focus on models for vision and language data, and their deployability remains a key challenge. We advocate a deployment-centric workflow that incorporates deployment constraints early on to reduce the likelihood of undeployable solutions, complementing data-centric and model-centric approaches. We also emphasize deeper integration across multiple levels of multimodality through stakeholder engagement and interdisciplinary collaboration to broaden the research scope beyond vision and language. To facilitate this approach, we identify common multimodal-AI-specific challenges shared across disciplines and examine three real-world use cases - pandemic response, self-driving car design and climate change adaptation, drawing expertise from healthcare, social science, engineering, science, sustainability and finance. By fostering interdisciplinary dialogue and open research practices, our community can accelerate deployment-centric development for broad societal impact.
Read the full paper at https://doi.org/10.1038/s42256-025-01116-5.
- (19 September 2025)
Paper accepted by EMNLP 2025 findings – HARE- an entity and relation centric evaluation framework for histopathology reports – at https://arxiv.org/abs/2507.09097
Large Vision-Language Models (LVLMs) have demonstrated promising performance in chest X-ray (CXR) analysis. To enhance human-computer interaction, several studies have incorporated radiologists’ eye gaze, typically through heatmaps or textual prompts. However, these methods often overlook the sequential order of eye movements, which could provide valuable insights by highlighting both the areas of interest and the order in which they are examined. In this work, we propose a novel approach called RadEyeVideo that integrates radiologists’ eye-fixation data as a video sequence, capturing both the temporal and spatial dynamics of their gaze. We evaluate this method in CXR report generation and disease diagnosis using three general-domain, open-source LVLMs with video input capabilities. When prompted with eye-gaze videos, model performance improves by up to 24.6% in the report generation task and on average 15.2% for both tasks using scaled evaluation metrics. Notably, RadEyeVideo enhanced an open-domain LVLM model, LLaVA-OneVision, to surpass task-specific medical LVLMs such as MAIRA-2 and CheXagent, trained on large Chest X-ray data. This work highlights that domain expert’s knowledge (eye-gaze information in this case), when effectively integrated with LVLMs, can significantly enhance general-domain models’ capabilities in clinical tasks. RadEyeVideo is a step toward a scalable human-centered approach of utilizing LVLMs in medical image analytics.
Read the full paper at https://arxiv.org/abs/2507.09097.
- (8 August 2025)
Tokenizer work published - Infusing clinical knowledge into language models by subword optimisation and embedding initialisation - now with Computers in Biology and Medicine.
This study introduces a novel knowledge enhanced tokenisation mechanism, K-Tokeniser, for clinical text processing. Technically, at The study proposes a novel tokenisation method utilising global representations of tokens based on domain-specific concepts (e.g., drugs, diseases) from ontologies like UMLS or task-specific corpora. At training or inference, word and sentence-level optimisation is used to select the optimal token representations. It proposes an embedding initialisation approach for new tokens, eliminating the need for pre-training the language models. The Model built using K-Tokeniser achieves a notable 13% increase in Micro F1 score for automated clinical coding. It requires only 50% of training data for concept extraction and less than 20% for automated coding to outperform the baseline clinicalBERT model.
Read it at 10.1016/j.compbiomed.2025.110747.
- (28 May 2025)
Paper accepted by ACL 2025 findings – Look & Mark - Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation – at https://arxiv.org/abs/2505.22222
This study introduces Look & Mark (L&M), a novel approach to radiology report generation that integrates radiologist fixation cues (Look) with bounding box annotations (Mark) to guide multimodal Large Language Models (LLMs). By combining these complementary grounding strategies, L&M significantly improves the clinical relevance of generated reports, reduces hallucinations, and enhances model alignment with real-world diagnostic workflows. Importantly, L&M achieves these gains without requiring extensive fine-tuning, leveraging in-context learning to adapt both general-purpose and domain-specific models alike.
Our experiments demonstrate that L&M significantly boosts performance across both lexical and clinical evaluation metrics, with the largest gains observed in clinical metrics such as RaTEScore and RadGraph-XL. For instance, CXR-LLaVA achieved a 1.2% improvement in overall metrics (A.AVG) compared to baseline prompting, while LLaVA-Med demonstrated a remarkable 9.2% boost. General-purpose models also benefited significantly, with LLaVA-OV achieving an 87.3% clinical average (C.AVG), the highest among all tested models, even surpassing domain-specific models trained explicitly for chest X-ray report generation.
Read the full paper at https://arxiv.org/abs/2505.22222.
Acknowledgement: Logo designed by Yuchen Wu