We are hiring a postdoctoral researcher on Clinical AI and Health Equity. Check job advert.
2025
- (28 May 2025)
Paper accepted by ACL 2025 findings – Look & Mark - Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation – at https://arxiv.org/abs/2505.22222
This study introduces Look & Mark (L&M), a novel approach to radiology report generation that integrates radiologist fixation cues (Look) with bounding box annotations (Mark) to guide multimodal Large Language Models (LLMs). By combining these complementary grounding strategies, L&M significantly improves the clinical relevance of generated reports, reduces hallucinations, and enhances model alignment with real-world diagnostic workflows. Importantly, L&M achieves these gains without requiring extensive fine-tuning, leveraging in-context learning to adapt both general-purpose and domain-specific models alike.
Our experiments demonstrate that L&M significantly boosts performance across both lexical and clinical evaluation metrics, with the largest gains observed in clinical metrics such as RaTEScore and RadGraph-XL. For instance, CXR-LLaVA achieved a 1.2% improvement in overall metrics (A.AVG) compared to baseline prompting, while LLaVA-Med demonstrated a remarkable 9.2% boost. General-purpose models also benefited significantly, with LLaVA-OV achieving an 87.3% clinical average (C.AVG), the highest among all tested models, even surpassing domain-specific models trained explicitly for chest X-ray report generation.
Read the full paper at https://arxiv.org/abs/2505.22222.
- (28 May 2025)
Paper accepted by ACL 2025 findings – BioHopR - A Benchmark for Multi-Hop, Multi-Answer Reasoning in Biomedicine – at https://arxiv.org/abs/2505.22240
This paper introduces BioHopR, a benchmark for evaluating multi-hop, multi-answer reasoning in the biomedical domain. Built on the PrimeKG knowledge graph, BioHopR captures the complexity of real-world biomedical queries through one-tomany and many-to-many relationships, rigorously assessing reasoning over 1-hop and 2-hop tasks. Evaluation results highlight that O3-mini, a proprietary model with a reasoning step, outperforms open-source models including biomedical models like HuatuoGPT-o1. Across all models, the performance drop from 1-hop to 2-hop tasks underscores the difficulty of aligning intermediate reasoning steps, especially in bridging entities.
Read the full paper at https://arxiv.org/abs/2505.22240.
- (18 April 2025)
Published in American Journal of Hypertension – A Transformer-Based Framework for Counterfactual Estimation of Antihypertensive Treatment Effect on COVID-19 Infection Risk - A Proof-of-Concept Study – at https://doi.org/10.1093/ajh/hpaf055
A new study in the American Journal of Hypertension investigates the relationship between antihypertensive medications and COVID-19 infection risk. The research employed a transformer-based framework to analyze real-world data from over 300,000 patients.
Key findings indicate that while ACE inhibitors showed a negligible effect on COVID-19 risk, beta-blockers and calcium channel blockers were associated with a protective effect. Statins and thiazides showed a slight increase in risk.
This study demonstrates the potential of advanced causal inference models in evaluating treatment outcomes in complex healthcare scenarios and offers important insights for clinical consideration.
Read the full paper at https://doi.org/10.1093/ajh/hpaf055.
- (8 April 2025)
Paper published in Journal of Imaging Informatics in Medicine – How Do Radiologists Currently Monitor AI in Radiology and What Challenges Do They Face? – at DOI:10.1007/s10278-025-01493-8
As AI tools become more common in radiology, monitoring their performance is increasingly important—but still underdeveloped. A recent qualitative study interviewed 16 radiologists across Europe and the U.S., revealing that many AI systems are still in early validation phases. Current monitoring typically involves manual, retrospective comparisons to radiology reports—effective, but labor-intensive.
Key barriers include a lack of standardized monitoring guidelines, limited technological tools, and constrained resources. The study recommends mixed-method monitoring, dedicated governance teams, and long-term resource planning.
This research highlights the need for clearer frameworks and investment to ensure AI improves clinical workflows.
Read it at DOI:10.1007/s10278-025-01493-8.
- (4 April 2025)
Preprint published on arXiv – Towards Deployment-Centric Multimodal AI Beyond Vision and Language – at arXiv:2504.03603v1
This work introduces a deployment-centric workflow for multimodal AI, emphasizing real-world applicability beyond just vision and language models. While multimodal systems hold immense potential in areas like healthcare and engineering, deployment challenges are often an afterthought. This paper pushes for a proactive approach—integrating data readiness, model robustness, and system integration into early development.
By shifting focus from just performance benchmarks to deployment feasibility, this research bridges the gap between prototypes and practical implementation.
It’s a compelling case for aligning AI innovation with real-world impact.
Read it at arXiv:2504.03603v1.
Acknowledgement: Logo designed by Yuchen Wu