2025
- (8 August 2025)
Tokenizer work published - Infusing clinical knowledge into language models by subword optimisation and embedding initialisation - now with Computers in Biology and Medicine.
This study introduces a novel knowledge enhanced tokenisation mechanism, K-Tokeniser, for clinical text processing. Technically, at The study proposes a novel tokenisation method utilising global representations of tokens based on domain-specific concepts (e.g., drugs, diseases) from ontologies like UMLS or task-specific corpora. At training or inference, word and sentence-level optimisation is used to select the optimal token representations. It proposes an embedding initialisation approach for new tokens, eliminating the need for pre-training the language models. The Model built using K-Tokeniser achieves a notable 13% increase in Micro F1 score for automated clinical coding. It requires only 50% of training data for concept extraction and less than 20% for automated coding to outperform the baseline clinicalBERT model.
Read it at 10.1016/j.compbiomed.2025.110747.
- (28 May 2025)
Paper accepted by ACL 2025 findings – Look & Mark - Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation – at https://arxiv.org/abs/2505.22222
This study introduces Look & Mark (L&M), a novel approach to radiology report generation that integrates radiologist fixation cues (Look) with bounding box annotations (Mark) to guide multimodal Large Language Models (LLMs). By combining these complementary grounding strategies, L&M significantly improves the clinical relevance of generated reports, reduces hallucinations, and enhances model alignment with real-world diagnostic workflows. Importantly, L&M achieves these gains without requiring extensive fine-tuning, leveraging in-context learning to adapt both general-purpose and domain-specific models alike.
Our experiments demonstrate that L&M significantly boosts performance across both lexical and clinical evaluation metrics, with the largest gains observed in clinical metrics such as RaTEScore and RadGraph-XL. For instance, CXR-LLaVA achieved a 1.2% improvement in overall metrics (A.AVG) compared to baseline prompting, while LLaVA-Med demonstrated a remarkable 9.2% boost. General-purpose models also benefited significantly, with LLaVA-OV achieving an 87.3% clinical average (C.AVG), the highest among all tested models, even surpassing domain-specific models trained explicitly for chest X-ray report generation.
Read the full paper at https://arxiv.org/abs/2505.22222.
- (28 May 2025)
Paper accepted by ACL 2025 findings – BioHopR - A Benchmark for Multi-Hop, Multi-Answer Reasoning in Biomedicine – at https://arxiv.org/abs/2505.22240
This paper introduces BioHopR, a benchmark for evaluating multi-hop, multi-answer reasoning in the biomedical domain. Built on the PrimeKG knowledge graph, BioHopR captures the complexity of real-world biomedical queries through one-tomany and many-to-many relationships, rigorously assessing reasoning over 1-hop and 2-hop tasks. Evaluation results highlight that O3-mini, a proprietary model with a reasoning step, outperforms open-source models including biomedical models like HuatuoGPT-o1. Across all models, the performance drop from 1-hop to 2-hop tasks underscores the difficulty of aligning intermediate reasoning steps, especially in bridging entities.
Read the full paper at https://arxiv.org/abs/2505.22240.
- (12 May 2025)
Paper accepted by Journal of Medical Internet Research – Evaluation and Bias Analysis of Large Language Models in Generating Synthetic Electronic Health Records - Comparative Study – at https://doi.org/10.2196/65317
Larger models, such as Yi-34B, Qwen-14B, and Llama 2 to 13 B, showed improved performance in generating more comprehensive EHRs, as reflected in higher EPS values. However, this increased performance was accompanied by a notable escalation in both gender and racial biases, highlighting a performance-bias trade-off. The study identified 4 key findings as follows (1) as model size increased, EHR generation improved, but demographic biases also became more pronounced; (2) biases were observed across all models, not just the larger ones; (3) gender bias closely aligned with real-world disease prevalence, while racial bias was evident in only a subset of diseases; and (4) racial biases varied, with some diseases showing overrepresentation of White or Black populations and underrepresentation of Hispanic and Asian groups. These findings underline the need for effective bias mitigation strategies and the development of benchmarks to ensure fairness in artificial intelligence applications for health care.
Read the full paper at https://doi.org/10.2196/65317.
- (18 April 2025)
Published in American Journal of Hypertension – A Transformer-Based Framework for Counterfactual Estimation of Antihypertensive Treatment Effect on COVID-19 Infection Risk - A Proof-of-Concept Study – at https://doi.org/10.1093/ajh/hpaf055
A new study in the American Journal of Hypertension investigates the relationship between antihypertensive medications and COVID-19 infection risk. The research employed a transformer-based framework to analyze real-world data from over 300,000 patients.
Key findings indicate that while ACE inhibitors showed a negligible effect on COVID-19 risk, beta-blockers and calcium channel blockers were associated with a protective effect. Statins and thiazides showed a slight increase in risk.
This study demonstrates the potential of advanced causal inference models in evaluating treatment outcomes in complex healthcare scenarios and offers important insights for clinical consideration.
Read the full paper at https://doi.org/10.1093/ajh/hpaf055.
Acknowledgement: Logo designed by Yuchen Wu