2024
- (3 August 2024)
Paper accepted by the International Workshop on Trustworthy Artificial Intelligence for Healthcare 2024 - Human-in-the-Loop Chest X-Ray Diagnosis - Enhancing Large Multimodal Models with Eye Fixation Inputs - at DOI:10.1007/978-3-031-67751-9_6
In the realm of artificial intelligence-assisted diagnostics, recent advances in foundational models have shown great promise, particularly in medical image computing. However, the current scope of human-computer interaction with these models is often limited to inputting images and text prompts. In this study, we propose a novel human-in-the-loop approach for chest X-ray diagnosis with a large language and vision assistant using eye fixation prompts. The eye fixation prompts contain the location and duration of a radiologist’s attention during chest X-ray analysis. This assistant interacts with a radiologist in two ways - diagnosis recommendations of possible diseases and diagnosis report confirmation. The results show the enhanced human-computer interaction with the eye fixation prompt significantly improves the accuracy of the large multimodal model’s performance in differential diagnosis and report confirmation. Fine-tuning with just 658 reports with fixation information further boosted the performance of the LLaVA-1.5, surpassing the previous state-of-the-art model LLaVA-ERR, which was trained on 17k MIMIC reports, by 5%. Our study highlights that this novel approach can better assist radiologists in clinical decision-making in a reciprocal interaction where the models also benefit from the domain expertise of radiologists.
Read it at DOI:10.1007/978-3-031-67751-9_6.
- (1 August 2024)
Paper accepted by the BMJ Medicine - Multimorbidity and adverse outcomes following emergency department attendance - population based cohort study - at DOI:10.1136/bmjmed-2023-000731
As part of the NIHR funded AIM-CISC project, this collaborative work led by Edinburgh University colleagues aims to describe the effect of multimorbidity on adverse patient centred outcomes in people attending emergency department. Multimorbidity was defined as at least two conditions from the Elixhauser comorbidity index. Multivariable logistic or linear regression was used to assess associations of multimorbidity with 30 day mortality (primary outcome), hospital admission, reattendance at the emergency department within seven days, and time spent in emergency department (secondary outcomes). Primary analysis was stratified by age (<65v ≥65years). 451,291 people had 1,273,937 attendances to emergency departments during the study period. 43,504 (9.6%) had multimorbidity, and people with multimorbidity were older (median 73 v 43 years), more likely to arrive by emergency ambulance (57.8% v 23.7%), and more likely to be triaged as very urgent (23.5% v 9.2%) than people who do not have multimorbidity. After adjusting for other prognostic covariates, multimorbidity, compared with no multimorbidity, was associated with higher 30 day mortality (8.2% v 1.2%, adjusted odds ratio 1.81 (95% confidence interval (CI) 1.72 to 1.91)), higher rate of hospital admission (60.1% v 20.5%, 1.81 (1.76 to 1.86)), higher reattendance to an emergency department within seven days (7.8% v 3.5%, 1.41 (1.32 to 1.50)), and longer time spent in the department (adjusted coefficient 0.27h (95% CI 0.26 to 0.27)). The size of associations between multimorbidity and all outcomes were larger in younger patients - for example, the adjusted odds ratio of 30 day mortality was 3.03 (95% CI 2.68 to 3.42) in people younger than 65years versus 1.61 (95% CI 1.53 to 1.71) in those 65years or older. Almost one in ten patients presenting to emergency department had multimorbidity using Elixhauser index conditions. Multimorbidity was strongly associated with adverse outcomes and these associations were stronger in younger people. The increasing prevalence of multimorbidity in the population is likely to exacerbate strain on emergency departments unless practice and policy evolve to meet the growing demand.
Read it at DOI:10.1136/bmjmed-2023-000731 .
- (1 August 2024)
Paper accepted by the 1st Workshop on Language+ Molecules (L+ M 2024) - Knowlab’s Submission to L+ M Shared Task - All you need is continued pretraining of chemistry texts even for molecule captioning - at https://aclanthology.org/2024.langmol-1.11
Led by Yunsoo, this paper presents our submission to the L+M-24 shared task, focused on translating molecular structures into natural language descriptions, known as the molecule captioning task. We selected a small language model (SLM), Phi-3-mini-4k, to evaluate the impact of continued pretraining and instruction tuning for domain-specific chemical knowledge. The Phi-3 model was continued pretrained with 90M chemistry textbooks and abstracts, followed by instruction tuning on 150K question answering sets of SMILES and general chemistry knowledge. Despite the continued pretraining phase not including direct exposure to SMILES representations, it significantly enhanced the Phi-3 model’s performance, a 300% increase for the BLEU scores, in the molecule captioning task. The code and model are released at https://github.com/bluesky333/Phi3KnowChem to facilitate research in chemical small language modeling.
Read it at https://aclanthology.org/2024.langmol-1.11.
- (21 June 2024)
Preprint - Harnessing Knowledge Retrieval with Large Language Models for Clinical Report Error Correction - at arXiv:2406.15045v1
Led by Jinge and Zhaolong, this study proposes an approach for error correction in clinical radiology reports, leveraging large language models (LLMs) and retrieval-augmented generation (RAG) techniques. The proposed framework employs internal and external retrieval mechanisms to extract relevant medical entities and relations from the report and external knowledge sources. A three-stage inference process is introduced, decomposing the task into error detection, localization, and correction subtasks, which enhances the explainability and performance of the system. The effectiveness of the approach is evaluated using a benchmark dataset created by corrupting real-world radiology reports with realistic errors, guided by domain experts. Experimental results demonstrate the benefits of the proposed methods, with the combination of internal and external retrieval significantly improving the accuracy of error detection, localization, and correction across various state-of-the-art LLMs. The findings contribute to the development of more robust and reliable error correction systems for clinical documentation.
Read it at arXiv:2406.15045v1.
- (20 June 2024)
New preprint - Infusing clinical knowledge into tokenisers for language models - on arXiv:2406.14312
This study introduces a novel knowledge enhanced tokenisation mechanism, K-Tokeniser, for clinical text processing. Technically, at initialisation stage, K-Tokeniser populates global representations of tokens based on semantic types of domain concepts (such as drugs or diseases) from either a domain ontology like Unified Medical Language System or the training data of the task related corpus. At training or inference stage, sentence level localised context will be utilised for choosing the optimal global token representation to realise the semantic-based tokenisation. To avoid pretraining using the new tokeniser, an embedding initialisation approach is proposed to generate representations for new tokens. Using three transformer-based language models, a comprehensive set of experiments are conducted on four real-world datasets for evaluating K-Tokeniser in a wide range of clinical text analytics tasks including clinical concept and relation extraction, automated clinical coding, clinical phenotype identification, and clinical research article classification. Overall, our models demonstrate consistent improvements over their counterparts in all tasks. In particular, substantial improvements are observed in the automated clinical coding task with 13\% increase on Micro F1 score. Furthermore, K-Tokeniser also shows significant capacities in facilitating quicker converge of language models. Specifically, using K-Tokeniser, the language models would only require 50\% of the training data to achieve the best performance of the baseline tokeniser using all training data in the concept extraction task and less than 20\% of the data for the automated coding task. It is worth mentioning that all these improvements require no pre-training process, making the approach generalisable.
Read it at arXiv:2406.14312.
Acknowledgement: Logo designed by Yuchen Wu