(28 October 2024) Invited talk at KnowLab - The world of non-clinical safety, artificial intelligence, and foundation models
Speaker - Dr Arijit Patra
Title - The world of non-clinical safety, artificial intelligence, and foundation models
Abstract - There has been a significant amount of interest in artificial intelligence in the pharmaceutical industry, particularly with respect to drug discovery. While drug discovery has been established as a domain where algorithmic involvement can create substantial efficiency gains, other aspects of the pharmaceutical development process such as non-clinical and clinical safety are of critical importance towards creating patient value. This talk will explore the drug development process, the concepts of non-clinical safety with a particular focus on toxicologic pathology, and recent insights on building ML pipelines and foundation models therein.
Bio - Dr Arijit Patra is a Senior Principal Scientist at UCB Biopharma UK. He holds a PhD in machine learning for healthcare imaging from the University of Oxford, where he was a Rhodes Scholar (India & Exeter, 2016). Prior to that, he completed a dual degree in Mechanical Engineering from the Indian Institute of Technology (IIT) at Kharagpur, India. He has also been associated with AstraZeneca, Shell, Microsoft Research and CSIR-South Africa at various points in his career and has been actively involved in the AI4SG (AI for Social Good) community. He has authored several publications around machine learning and medical imaging and is a reviewer for multiple peer reviewed venues such as NeurIPS, ICML, MICCAI and several journals. Arijit has been appointed as a Rising Leaders fellow of the Aspen Institute UK, and as an International Strategy Forum fellow in 2024.
Date/Time - 13.30-14.30 14th November 2024
(10 October 2024) Tutorial on Foundation Models For Medical Imaging - with MICCAI 2024, Morocco
Generative AI and large-scale self-supervised foundation models are poised to have a profound impact on human decision making across occupations. Healthcare is one such area where such models have the capacity to impact patients, clinicians, and other care providers.
As part of MICCAI2024, Yunsoo, Jinge and Honghan worked with colleagues from IBM, ÉTS Montréal, and Stanford University to organize a tutorial on Foundation Models For Medical Imaging. Yunsoo and the rest of KnowLab team took the lead on “the multimodal LLMs in medicine” section.
Tutorial website at https://sites.google.com/view/miccai-2024-tutorial/home.
(8 October 2024) Paper published on BMC Medical Informatics and Decision Making - A hybrid framework with large language models for rare disease phenotyping - at DOI:10.1186/s12911-024-02698-7
Rare diseases pose significant challenges in diagnosis and treatment due to their low prevalence and heterogeneous clinical presentations. Unstructured clinical notes contain valuable information for identifying rare diseases, but manual curation is time-consuming and prone to subjectivity. This study aims to develop a hybrid approach combining dictionary-based natural language processing (NLP) tools with large language models (LLMs) to improve rare disease identification from unstructured clinical reports. The proposed hybrid approach demonstrates superior performance compared to traditional NLP systems and standalone LLMs. LLaMA3 and Phi3-mini achieve the highest F1 scores in rare disease identification. Few-shot prompting with 1-3 examples yields the best results, while knowledge-augmented generation shows limited improvement. Notably, the approach uncovers a significant number of potential rare disease cases not documented in structured diagnostic records, highlighting its ability to identify previously unrecognized patients.
Read it at DOI:10.1186/s12911-024-02698-7.
(3 August 2024) Paper accepted by the International Workshop on Trustworthy Artificial Intelligence for Healthcare 2024 - Human-in-the-Loop Chest X-Ray Diagnosis - Enhancing Large Multimodal Models with Eye Fixation Inputs - at DOI:10.1007/978-3-031-67751-9_6
In the realm of artificial intelligence-assisted diagnostics, recent advances in foundational models have shown great promise, particularly in medical image computing. However, the current scope of human-computer interaction with these models is often limited to inputting images and text prompts. In this study, we propose a novel human-in-the-loop approach for chest X-ray diagnosis with a large language and vision assistant using eye fixation prompts. The eye fixation prompts contain the location and duration of a radiologist’s attention during chest X-ray analysis. This assistant interacts with a radiologist in two ways - diagnosis recommendations of possible diseases and diagnosis report confirmation. The results show the enhanced human-computer interaction with the eye fixation prompt significantly improves the accuracy of the large multimodal model’s performance in differential diagnosis and report confirmation. Fine-tuning with just 658 reports with fixation information further boosted the performance of the LLaVA-1.5, surpassing the previous state-of-the-art model LLaVA-ERR, which was trained on 17k MIMIC reports, by 5%. Our study highlights that this novel approach can better assist radiologists in clinical decision-making in a reciprocal interaction where the models also benefit from the domain expertise of radiologists.
Read it at DOI:10.1007/978-3-031-67751-9_6.
(1 August 2024) Paper accepted by the BMJ Medicine - Multimorbidity and adverse outcomes following emergency department attendance - population based cohort study - at DOI:10.1136/bmjmed-2023-000731
As part of the NIHR funded AIM-CISC project, this collaborative work led by Edinburgh University colleagues aims to describe the effect of multimorbidity on adverse patient centred outcomes in people attending emergency department. Multimorbidity was defined as at least two conditions from the Elixhauser comorbidity index. Multivariable logistic or linear regression was used to assess associations of multimorbidity with 30 day mortality (primary outcome), hospital admission, reattendance at the emergency department within seven days, and time spent in emergency department (secondary outcomes). Primary analysis was stratified by age (<65v ≥65years). 451,291 people had 1,273,937 attendances to emergency departments during the study period. 43,504 (9.6%) had multimorbidity, and people with multimorbidity were older (median 73 v 43 years), more likely to arrive by emergency ambulance (57.8% v 23.7%), and more likely to be triaged as very urgent (23.5% v 9.2%) than people who do not have multimorbidity. After adjusting for other prognostic covariates, multimorbidity, compared with no multimorbidity, was associated with higher 30 day mortality (8.2% v 1.2%, adjusted odds ratio 1.81 (95% confidence interval (CI) 1.72 to 1.91)), higher rate of hospital admission (60.1% v 20.5%, 1.81 (1.76 to 1.86)), higher reattendance to an emergency department within seven days (7.8% v 3.5%, 1.41 (1.32 to 1.50)), and longer time spent in the department (adjusted coefficient 0.27h (95% CI 0.26 to 0.27)). The size of associations between multimorbidity and all outcomes were larger in younger patients - for example, the adjusted odds ratio of 30 day mortality was 3.03 (95% CI 2.68 to 3.42) in people younger than 65years versus 1.61 (95% CI 1.53 to 1.71) in those 65years or older. Almost one in ten patients presenting to emergency department had multimorbidity using Elixhauser index conditions. Multimorbidity was strongly associated with adverse outcomes and these associations were stronger in younger people. The increasing prevalence of multimorbidity in the population is likely to exacerbate strain on emergency departments unless practice and policy evolve to meet the growing demand.
Read it at DOI:10.1136/bmjmed-2023-000731 .
(1 August 2024) Paper accepted by the 1st Workshop on Language+ Molecules (L+ M 2024) - Knowlab’s Submission to L+ M Shared Task - All you need is continued pretraining of chemistry texts even for molecule captioning - at https://aclanthology.org/2024.langmol-1.11
Led by Yunsoo, this paper presents our submission to the L+M-24 shared task, focused on translating molecular structures into natural language descriptions, known as the molecule captioning task. We selected a small language model (SLM), Phi-3-mini-4k, to evaluate the impact of continued pretraining and instruction tuning for domain-specific chemical knowledge. The Phi-3 model was continued pretrained with 90M chemistry textbooks and abstracts, followed by instruction tuning on 150K question answering sets of SMILES and general chemistry knowledge. Despite the continued pretraining phase not including direct exposure to SMILES representations, it significantly enhanced the Phi-3 model’s performance, a 300% increase for the BLEU scores, in the molecule captioning task. The code and model are released at https://github.com/bluesky333/Phi3KnowChem to facilitate research in chemical small language modeling.
Read it at https://aclanthology.org/2024.langmol-1.11.
(21 June 2024) Preprint - Harnessing Knowledge Retrieval with Large Language Models for Clinical Report Error Correction - at arXiv:2406.15045v1
Led by Jinge and Zhaolong, this study proposes an approach for error correction in clinical radiology reports, leveraging large language models (LLMs) and retrieval-augmented generation (RAG) techniques. The proposed framework employs internal and external retrieval mechanisms to extract relevant medical entities and relations from the report and external knowledge sources. A three-stage inference process is introduced, decomposing the task into error detection, localization, and correction subtasks, which enhances the explainability and performance of the system. The effectiveness of the approach is evaluated using a benchmark dataset created by corrupting real-world radiology reports with realistic errors, guided by domain experts. Experimental results demonstrate the benefits of the proposed methods, with the combination of internal and external retrieval significantly improving the accuracy of error detection, localization, and correction across various state-of-the-art LLMs. The findings contribute to the development of more robust and reliable error correction systems for clinical documentation.
Read it at arXiv:2406.15045v1.
(20 June 2024) New preprint - Infusing clinical knowledge into tokenisers for language models - on arXiv:2406.14312
This study introduces a novel knowledge enhanced tokenisation mechanism, K-Tokeniser, for clinical text processing. Technically, at initialisation stage, K-Tokeniser populates global representations of tokens based on semantic types of domain concepts (such as drugs or diseases) from either a domain ontology like Unified Medical Language System or the training data of the task related corpus. At training or inference stage, sentence level localised context will be utilised for choosing the optimal global token representation to realise the semantic-based tokenisation. To avoid pretraining using the new tokeniser, an embedding initialisation approach is proposed to generate representations for new tokens. Using three transformer-based language models, a comprehensive set of experiments are conducted on four real-world datasets for evaluating K-Tokeniser in a wide range of clinical text analytics tasks including clinical concept and relation extraction, automated clinical coding, clinical phenotype identification, and clinical research article classification. Overall, our models demonstrate consistent improvements over their counterparts in all tasks. In particular, substantial improvements are observed in the automated clinical coding task with 13\% increase on Micro F1 score. Furthermore, K-Tokeniser also shows significant capacities in facilitating quicker converge of language models. Specifically, using K-Tokeniser, the language models would only require 50\% of the training data to achieve the best performance of the baseline tokeniser using all training data in the concept extraction task and less than 20\% of the data for the automated coding task. It is worth mentioning that all these improvements require no pre-training process, making the approach generalisable.
Read it at arXiv:2406.14312.
(18 June 2024) Paper accepted by MICCAI 2024 - Enhancing Human-Computer Interaction in Chest X-ray Analysis using Vision and Language Model with Eye Gaze Patterns - on arxiv
Yunsoo and folk’s paper on utilising eye gaze data for chext x-ray analysis has now been accepted by MICCAI 2024.
Read the preprint version at arXiv:2404.02370.
(10 June 2024) New preprint - MedExQA- Medical Question Answering Benchmark with Multiple Explanations - at DOI:10.1007/978-3-031-67751-9_6
Led by Yunsoo, this paper introduces MedExQA, a novel benchmark in medical question-answering, to evaluate large language models’ (LLMs) understanding of medical knowledge through explanations. By constructing datasets across five distinct medical specialties that are underrepresented in current datasets and further incorporating multiple explanations for each question-answer pair, we address a major gap in current medical QA benchmarks which is the absence of comprehensive assessments of LLMs’ ability to generate nuanced medical explanations. Our work highlights the importance of explainability in medical LLMs, proposes an effective methodology for evaluating models beyond classification accuracy, and sheds light on one specific domain, speech language pathology, where current LLMs including GPT4 lack good understanding. Our results show generation evaluation with multiple explanations aligns better with human assessment, highlighting an opportunity for a more robust automated comprehension assessment for LLMs. To diversify open-source medical LLMs (currently mostly based on Llama2), this work also proposes a new medical model, MedPhi-2, based on Phi-2 (2.7B). The model outperformed medical LLMs based on Llama2-70B in generating explanations, showing its effectiveness in the resource-constrained medical domain. We will share our benchmark datasets and the trained model.
Read it at arXiv:2406.06331 .
(5 June 2024) Paper accepted by NAACL 2024 Workshop Clinical NLP - Chain-of-Though (CoT) prompting strategies for medical error detection and correction - on arXiv:2406.09103
Zhaolong and folk’s work has been accepted by NAACL 2024 Workshop Clinical NLP. This paper describes our submission to the MEDIQA-CORR 2024 shared task for automatically detecting and correcting medical errors in clinical notes. We report results for three methods of few-shot In-Context Learning (ICL) augmented with Chain-of-Thought (CoT) and reason prompts using a large language model (LLM). In the first method, we manually analyse a subset of train and validation dataset to infer three CoT prompts by examining error types in the clinical notes. In the second method, we utilise the training dataset to prompt the LLM to deduce reasons about their correctness or incorrectness. The constructed CoTs and reasons are then augmented with ICL examples to solve the tasks of error detection, span identification, and error correction. Finally, we combine the two methods using a rule-based ensemble method. Across the three sub-tasks, our ensemble method achieves a ranking of 3rd for both sub-task 1 and 2, while securing 7th place in sub-task 3 among all submissions.
Read it at arXiv:2406.09103.
(5 June 2024) New preprint - RadBARTsum - Domain Specific Adaption of Denoising Sequence-to-Sequence Models for Abstractive Radiology Report Summarization - on arXiv:2406.03062
Radiology report summarization is a crucial task that can help doctors quickly identify clinically significant findings without the need to review detailed sections of reports. This study proposes RadBARTsum, a domain-specific and ontology facilitated adaptation of the BART model for abstractive radiology report summarization. The approach involves two main steps 1) re-training the BART model on a large corpus of radiology reports using a novel entity masking strategy to improving biomedical domain knowledge learning, and 2) fine-tuning the model for the summarization task using the Findings and Background sections to predict the Impression section. Experiments are conducted using different masking strategies. Results show that the re-training process with domain knowledge facilitated masking improves performances consistently across various settings. This work contributes a domain-specific generative language model for radiology report summarization and a method for utilising medical knowledge to realise entity masking language model. The proposed approach demonstrates a promising direction of enhancing the efficiency of language models by deepening its understanding of clinical knowledge in radiology reports.
Read it at arXiv:2406.03062.
(20 May 2024) Our survey paper titled A Unified Review of Deep Learning for Automated Medical Coding has just been accepted by ACM Surveys
Automated medical coding, an essential task for healthcare operation and delivery, makes unstructured data manageable by predicting medical codes from clinical documents. Recent advances in deep learning and natural language processing have been widely applied to this task. However, deep learning-based medical coding lacks a unified view of the design of neural network architectures. This review proposes a unified framework to provide a general understanding of the building blocks of medical coding models and summarizes recent advanced models under the proposed framework. Our unified framework decomposes medical coding into four main components, i.e., encoder modules for text feature extraction, mechanisms for building deep encoder architectures, decoder modules for transforming hidden representations into medical codes, and the usage of auxiliary information. Finally, we introduce the benchmarks and real-world usage and discuss key research challenges and future directions.
Read the paper at doi:10.1145/3664615.
(16 May 2024) New preprint - Retrieving and Refining - A Hybrid Framework with Large Language Models for Rare Disease Identification - on arxiv
The infrequency and heterogeneity of clinical presentations in rare diseases often lead to underdiagnosis and their exclusion from structured datasets. This necessitates the utilization of unstructured text data for comprehensive analysis. However, the manual identification from clinical reports is an arduous and intrinsically subjective task. This study proposes a novel hybrid approach that synergistically combines a traditional dictionary-based natural language processing (NLP) tool with the powerful capabilities of large language models (LLMs) to enhance the identification of rare diseases from unstructured clinical notes. We comprehensively evaluate various prompting strategies on six large language models (LLMs) of varying sizes and domains (general and medical). This evaluation encompasses zero-shot, few-shot, and retrieval-augmented generation (RAG) techniques to enhance the LLMs’ ability to reason about and understand contextual information in patient reports. The results demonstrate effectiveness in rare disease identification, highlighting the potential for identifying underdiagnosed patients from clinical notes.
Read it at arXiv:2405.10440.
(15 May 2024) News - Our research has been cited by World Health Organization and a news outlet
Our work “Automated Clinical Coding - What, Why, and Where We Are.” npj Digital Medicine 5, 159 (2022) has been cited by a policy paper from the World Health Organization Road traffic death coding quality in the WHO Mortality Database and an news article Despite AI advancements, human oversight remains essential.
(1 May 2024) Dr Honghan Wu joins the University of Glasgow as a Professor of Health Informatics and Data Science at the School of Health and Wellbeing. Honghan is continuous to hold an honorary associate professor at the University College London to continue his funded research projects and supervisions.
(5 April 2024) New preprint - Enhancing Human-Computer Interaction in Chest X-ray Analysis using Vision and Language Model with Eye Gaze Patterns - on arxiv
Recent advancements in Computer Assisted Diagnosis have shown promising performance in medical imaging tasks, particularly in chest X-ray analysis. However, the interaction between these models and radiologists has been primarily limited to input images. This work proposes a novel approach to enhance human-computer interaction in chest X-ray analysis using Vision-Language Models (VLMs) enhanced with radiologists’ attention by incorporating eye gaze data alongside textual prompts. Our approach leverages heatmaps generated from eye gaze data, overlaying them onto medical images to highlight areas of intense radiologist’s focus during chest X-ray evaluation. We evaluate this methodology in tasks such as visual question answering, chest X-ray report automation, error detection, and differential diagnosis. Our results demonstrate the inclusion of eye gaze information significantly enhances the accuracy of chest X-ray analysis. Also, the impact of eye gaze on fine-tuning was confirmed as it outperformed other medical VLMs in all tasks except visual question answering. This work marks the potential of leveraging both the VLM’s capabilities and the radiologist’s domain knowledge to improve the capabilities of AI models in medical imaging, paving a novel way for Computer Assisted Diagnosis with a human-centred AI.
Read it at arXiv:2404.02370.
(19 February 2024) New preprint - Enhancing Patient Outcome Prediction through Deep Learning with Sequential Diagnosis Codes from structural EHR - A systematic review - on researchgate
Dr Tuankasfee Hama led a systematic review to identify and summarise existing deep learning studies predicting patient outcome using sequences of diagnosis codes, as a key part of their predictors. Additionally, this study also investigates the challenge of generalisability and explainability of the predictive models.
Briefly, the main conclusion is the application of deep learning in sequence of diagnosis has demonstrated remarkable promise in predicting patient outcomes. Using multiple types of features and integration of time intervals was found to improve the predictive performance. Addressing challenges related to generalisation and explainability will be instrumental in unlocking the full potential of DL for enhancing healthcare outcomes and patient care.
Read it at here.
(15 February 2024) Tiny paper - Hallucination Benchmark in Medical Visual Question Answering accepted by ICLR 2024
Many congratulations to Jinge Wu and Yunsoo Kim, PhD students at KnowLab, on the acceptance of a paper to ICLR 2024 - one of the top AI/ML conferences! not just an acceptance but also a super positive review from the area chair - “This particular work is worth presenting at a notable level, as it introduces a dataset that people in the field should be aware of – it is a substantial contribution that can spur further advancements in the VLM field.”
Read it at arXiv:2401.05827.
(11 January 2024) New preprint - Hallucination Benchmark in Medical Visual Question Answering - on arxiv
The recent success of large language and vision models on vision question answering (VQA), particularly their applications in medicine (Med-VQA), has shown a great potential of realizing effective visual assistants for healthcare. However, these models are not extensively tested on the hallucination phenomenon in clinical settings. Here, we created a hallucination benchmark of medical images paired with question-answer sets and conducted a comprehensive evaluation of the state-of-the-art models. The study provides an in-depth analysis of current models limitations and reveals the effectiveness of various prompting strategies.
Read it at arXiv:2401.05827.
(20 December 2023) New preprint - Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation - a focused study on chemical entities of biological interest - on arxiv
We had a task to implement an automated approach for knowledge curation for a biomedical ontology - ChEBI (Chemical Entities of Biological Interest). We asked ourselves the above question and decided to compare and analyze three NLP paradigms for curation tasks - in-context learning, fine tuning, and supervised learning. We broke down the general question into four specific questions. After comprehensive experiments and analysis on 3 GPT models (including gpt4), one domain specific PubmedBERT, 6 embedding models for supervised learning in 15 experiment setups on >1.8m triples, we believe we obtained good evidence for answering them properly.
Read it at arXiv:2312.12989 or this short LinkedIn post.
(20 December 2023) New preprint - Exploring Multimodal Large Language Models for Radiology Report Error-checking - on arxiv
Given all the exciting developments of generativeai and foundation models, in the context of radiology, JINGE WU and Yunsoo Kim set out to ask the question “whether these models can be good assistants in spotting errors in radiology reports by cross-checking radiographs”? To answer this question, they conducted a study on using multimodal large language models (LLMs) for assisting radiologists to check errors in their reports. 1,000 reports with “synthetic errors” were created using two real-world Chest X-ray datasets. Two types of tasks were introduced - binary (is there an error) vs multiclass (what types of errors) classifications.
Read it at arXiv:2312.13103 or this short LinkedIn post.
(7 December 2023) New paper - Applying contrastive pre-training for depression and anxiety risk prediction in type 2 diabetes patients based on heterogeneous electronic health records - a primary healthcare case study - published on JAMIA
Due to heterogeneity and limited medical data in primary healthcare services (PHS) in China, assessing the psychological risk of type 2 diabetes mellitus (T2DM) patients in PHS is difficult. Using unsupervised contrastive pre-training, we proposed a deep learning framework named depression and anxiety prediction (DAP) to predict depression and anxiety in T2DM patients. The main aim was to use good quality EHR data (with 85,085 T2DM in-patients) from a secondary care service provider, the First Affiliated Hospital of Nanjing Medical University and pre-train a foundation model with transfer learning capabilities via unsupervised contrastive learning. This was then further fine-tuned for depression prediction using 149,596 T2DM patients’ EHRs in the Nanjing Health Information Platform (NHIP). Experiments showed the approach had great utilities in predicting post-discharge depression and anxiety in T2DM patients at PHS, with much higher performances compared with baseline models.
Read it at DOI:10.1093/jamia/ocad228
(23 November 2023) New paper - Term-BLAST-Like Alignment Tool for Concept Recognition in Noisy Clinical Texts - published on Bioinformatics
Texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other non-standard ways of representing clinical concepts.Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely-used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches.
Read it at DOI:10.1093/bioinformatics/btad716
(27 October 2023) Two papers published on Frontiers in Digital Health
[1] Casey, Arlene, Emma Davidson, Claire Grover, Richard Tobin, Andreas Grivas, Huayu Zhang, Patrick Schrempf, Alison Q. O’Neil, Liam Lee, Michael Walsh, Freya Pellie, Karen Ferguson, Vera Cvoro, Honghan Wu, Heather Whalley, Grant Mair, William Whiteley, Beatrice Alex. “Understanding the performance and reliability of NLP tools - a comparison of four NLP tools predicting stroke phenotypes in radiology reports.” Frontiers in Digital Health 5 (2023), 1184919. Read it here
[2] Zhang, Huayu, Arlene Casey, Imane Guellil, Víctor Suárez-Paniagua, Clare Macrae, Charis Marwick, Honghan Wu, Bruce Guthrie, and Beatrice Alex. “FLAP - A framework for linking free-text addresses to the Ordnance Survey Unique Property Reference Number database.” Frontiers in Digital Health 5 (2023), 1186208. Read it here
(25 September 2023) Three new starters at KnowLab!
Yunsoo Kim is a new PhD students working on multimodal large language model in health data. His research interest also includes applications of the models in diagnosis and prognosis of neurodiseases such as dementia.
Yusuf Abdulle is a Research Assistant based at the Institute of Health Informatics at University College London. He is currently working on using Graph Neural Networks and Knowledge Graphs to work on early diagnosis of rare neurodegenerative diseases.
Yue Gao is a PhD student from Beijing University of Posts and Telecommunications. Funded by CSC, Yue is visiting KnowLab for one year doing research on human in the loop AI models for automated clinical coding.
(11 September 2023) New grant! KnowLab is awarded £649,218 by MRC for Quantifying and Mitigating Bias affecting and induced by AI in Medicine!
Artificial Intelligence (AI) has demonstrated exciting potential in improving healthcare. However, these technologies come with a big caveat. They do not work effectively for minority groups. A recent study published in Science shows a widely used AI tool in the US concludes Black patients are healthier than equally sick Whites. Using this tool, a health system would favour White people when allocating resources, such as hospital beds. AI models like this would do more harm than good for health equity. Funded by Medical Research Council, KnowLab is leading a 30-month research project focusing on using data science and machine learning to quantify and mitigate data embedded and AI induced bias and inequality. Clearly, this is a challenge too grand to be tackled by a single institute. We will be working closely with BHF Data Science Centre, University of Edinburgh, University of Birmingham, Nanjing Medical University (China), and wider communities including Health Dat Research UK, the Alan Turing Institute and beyond.
Check the Project Page at UKRI
(23 July 2023) Hard exudate plays an important role in grading diabetic retinopathy (DR) as a critical indicator. Therefore, the accurate segmentation of hard exudates is of clinical importance. However, the percentage of hard exudates in the whole fundus image is relatively small, and their shapes are often irregular and the contrasts are usually not high enough. Hence, they are prone to misclassifications e.g., misclassified as part of the optic disc structure or cotton wool spots, which results in the low segmentation accuracy and efficiency. This paper proposes a novel neural network RMCA U-net to accurately segmentation hard exudate in fundus images. The network features a U-shape framework combined with a residual structure to obtain the subtle features of hard exudate. A multi-scale feature fusion (MSFF) module and an improved channel attention (CA) module are designed and involved to effectively segmentation sparse small lesions. The proposed method in this paper has been trained and evaluated on three data sets - IDRID, Kaggle and one local data set. Experiments are shown and indicate that RMCA U-net of this paper is superior to the other convolutional neural networks. The method in this paper is increased by 6% higher in PR-MAP than U-net on the IDRID dataset, increased by 10% in Recall than U-net on the Kaggle dataset and increased by 20% in F1-score than U-net on the local dataset.
Read it at DOI:10.1016/j.eswa.2023.120987
(15 July 2023) This paper presents our contribution to the RadSum23 shared task organized as part of the BioNLP 2023. We compared state-of-the-art generative language models in generating high-quality summaries from radiology reports. A two-stage fine-tuning approach was introduced for utilizing knowledge learnt from different datasets. We evaluated the performance of our method using a variety of metrics, including BLEU, ROUGE, bertscore, CheXbert, and RadGraph. Our results revealed the potentials of different models in summarizing radiology reports and demonstrated the effectiveness of the two-stage fine-tuning approach. We also discussed the limitations and future directions of our work, highlighting the need for better understanding the architecture design’s effect and optimal way of fine-tuning accordingly in automatic clinical summarizations.
Read it at DOI:10.18653/v1/2023.bionlp-1.54
(5 May 2023) Ontology-driven and weakly supervised rare disease identification from clinical notes published on BMC Medical Informatics and Decision Making
Superb work from Dr Hong Dong and colleagues, demonstrating how weak supervised NLP + Ontology techniques can greatly facilitate the identification of rare disease mentions from electronic health records with >90% accuracy. This uses training data that need no human annotations!
Read it at DOI:10.1186/s12911-023-02181-9
(5 May 2023) New paper titled Prediction of disease comorbidity using explainable artificial intelligence and machine learning techniques - A systematic review published on International Journal of Medical Informatics
Mohanad M. Alsaleh - a PhD student at UCL - did a great systematic review on explainable AI methods for predicting comorbidity from electronic health records. It finds “(a) The use of explainable artificial intelligence (XAI) can improve predictions of comorbidities by providing a transparent understanding of the reasoning behind predictions and helping healthcare providers make informed decisions. (b) There is a great potential to uncover novel disease associations and better understand the mechanisms of diseases by integrating genetic and electronic health record (EHR) data, leading to improved quality of care and earlier diagnoses. (c) The use of AI in healthcare can improve patient outcomes and reduce healthcare costs by identifying disease risks and making personalised treatment plans.”
Read it at DOI:10.1016/j.ijmedinf.2023.105088
(7 April 2023) New paper - a systematic review on antidepressant and antipsychotic drug prescribing and diabetes outcomes
As part of her PhD research, Charlotte led this systematic review to investigate the association between antidepressant or antipsychotic drug prescribing and type 2 diabetes outcomes. It concludes - Studies of antidepressant and antipsychotic drug prescribing in relation to diabetes outcomes are scarce, with shortcomings and mixed findings. Until further evidence is available, people with diabetes prescribed antidepressants and antipsychotics should receive monitoring and appropriate treatment of risk factors and screening for complications as recommended in general diabetes guidelines.
Read it at DOI:10.1016/j.diabres.2023.110649
(20 March 2023) <h3>Workshop paper on “Ontology-driven Self-supervision for Adverse Childhood Experiences Identification using Social Media Datasets” now published!</h3> Adverse Childhood Experiences (ACEs) are defined as a collection of highly stressful, and potentially traumatic, events or circumstances that occur throughout childhood and/or adolescence. They have been shown to be associated with increased risks of mental health diseases or other abnormal behaviours in later lives. In this paper, Jinge and colleages present an ontology-driven self-supervised approach (derive concept embeddings using an auto-encoder from baseline NLP results) for producing a publicly available resource that would support large-scale machine learning (e.g., training transformer based large language models) on social media corpus. Jinge presented this paper in 2022 summer at the 1st Workshop on Scarce Data in Artificial Intelligence for Healthcare, which was with IJCAI 2022 in Vienna. Check - Paper, Github Repo
(7 March 2023) npj Digital Medical’s editorial on automating clinical coding echoes our prospective
Recently, npj Digital Medicine’s editor Dr Kvedar and colleagues have published a great editorial on automating clinical coding (link), pointing out the main challenges including technological and implementation levels; clinical documents are redundant and complex, code sets like the ICD-10 are rapidly evolving, training sets are not comprehensive of codes; capturing the logic and rules of coding decisions. Great to see our prospectives on the automated coding research challenges and future directions were echoed in the editorial!
(21 February 2023) New paper - The impact of inconsistent human annotations on AI driven clinical decision making now published by npj Digital Medicine
Annotation inconsistencies commonly occur when even highly experienced clinical experts annotate the same phenomenon (e.g., medical image, diagnostics, or prognostic status), due to inherent expert bias, judgements, and slips, among other factors. While their existence is relatively well-known, the implications of such inconsistencies are largely understudied in real-world settings.
Aneeta Sylolypavan did her MSc with us addressing this hugely important research question using real-world ICU datasets with annotated data from 11 ICU consultants. The results suggest that (a) there may not always be a “super expert” in acute clinical settings; and (b) standard consensus seeking (such as majority vote) consistently leads to suboptimal models. Further analysis, however, suggests that assessing annotation learnability and using only ‘learnable’ annotated datasets for determining consensus achieves optimal models in most cases.
Read the paper from here
(23 January 2023) KnowLab is funded by NIHR/HDR UK to use health data science + machine learning for addressing the NHS winter pressure - Using rare disease phenotype models to identify people at risk of COVID-19 related adverse outcome KnowLab is proud to be part of 16 projects funded by HDR UK and funded by NIHR which will use data-driven approaches to pin-point pressures in the health care system, understand their causes and develop ways to overcome or avoid them. Particularly, we will use machine learning and rare disease phenotype models to uncover much-needed information on the added risks of severe COVID-19 in people who are clinically more vulnerable and come from disadvantaged socioeconomic backgrounds. This can then inform policy responses to provide better management and treatment for these most vulnerable groups who might have been overlooked. The team are well placed to derive quick actionable findings for the winter pressures as they have been working with CVD-COVID-UK/COVID-IMPACT on rare diseases since October 2021. HDR UK Press Release on the funded projects |
HDR UK News on this project | Herald Scotland News |
(21 December 2022) Our UK’s clinical NLP landscaping (a survey) paper is now published with npj Digital Medicine
Aiming to survey the landscape of Clinical NLP in the UK, we used a relatively extraordinary approach - start with finding all relevant funded projects and extract their interlinked information. Then, conducted community analysis and literature review. We described WHO (key players of funders, universities, companies, researchers), WHAT (techs, applications, disease areas, clinical questions, datasets), WHERE (the community developments, tech trends & maturity), GAPS (barriers to unleash the full power of NLP in health). While on the community level we focused on the UK, analyses and discussions on the research, tech and developments were beyond the country boundary. In particular, we compared tech, data, regulatory similarities and differences of the US and the UK. This is one of the key outputs of HDR UK funded National Text Analytics Project.
Read it at DOI:10.1038/s41746-022-00730-6
(20 December 2022) KnowLab co-edits a new cross journal collection with BMC Series on Ethics of Artificial Intelligence in Health and Medicine.
As the implementation of artificial intelligence (AI)-based innovations in health and care services become more and more common, it is increasingly pressing to address the ethical challenges associated with AI in healthcare to find appropriate solutions. In the cross-journal BMC collection Ethics of Artificial Intelligence in Health and Medicine, we urge the research communities, industry, policy makers and other stakeholders to join forces in tackling the grand challenges of realising Ethical and fair AI in health and medicine. Check our blog article with BMC Series on the topic for what & why. Please spread the words and contribute to the collection, current deadline is 31 Oct 2023.
(22 November 2022) KnowLab is awarded £5,000 from UCL Global Engagement Funding.
The funding is to extend and deepen our collaboration with iris.ai - a Norway based start-up behind the award-winning AI engine for scientific text understanding. The funded project is titled - Towards self-updatable knowledge base for evidence based medicine - join force with iris.ai (Norway) and beyond.
(27 October 2022) The Alan Turing Heath Equity Interest Group - https://www.turing.ac.uk/research/interest-groups/health-equity, co-organised by KnowLab and colleagues, is now official online!
In an era where AI is expected to improve our daily life - particularly in health, “How can we ensure that developments and applications of data science and AI improve everyone’s health?” This is a pressing and very challenging question. Please join forces with a multidisciplinary group to form a formidable synergistic force tackling one of the biggest challenges of AI in medicine.
(22 October 2022) Dr Hang Dong’s great perspective piece on automated coding using NLP and knowledge-driven approaches has now been published on npj Digital Medicine. The work illustrates how NLP and AI can help improve the efficiency of clinical coding in healthcare - i.e., assign ICD/SNOMED codes to hospital visits, which currently is a very inefficient/erroneous process in NHS and, for that matter, in many other health systems across the world.
(5 October 2022) Study on The Impact of Inconsistent Human Annotations on AI driven Clinical Decision Making.
Annotation inconsistencies commonly occur when even highly experienced clinical experts annotate the same phenomenon (e.g., medical image, diagnostics, or prognostic status), due to inherent expert bias, judgements, and slips, among other factors. While their existence is relatively well-known, the implications of such inconsistencies are largely understudied in real-world settings.
Aneeta Sylolypavan did her MSc with us addressing this hugely important research question using real-world ICU datasets with annotated data from 11 ICU consultants. The results suggest that (a) there may not always be a “super expert” in acute clinical settings; and (b) standard consensus seeking (such as majority vote) consistently leads to suboptimal models. Further analysis, however, suggests that assessing annotation learnability and using only ‘learnable’ annotated datasets for determining consensus achieves optimal models in most cases.
The manuscript is now under review with npj Digital Medicine and preprint is available at doi:10.21203/rs.3.rs-1937575/v1.
(18 August 2022) Our systematic review titled “Artificial intelligence models for predicting cardiovascular diseases in people with type 2 diabetes - a systematic review”, led by Minhong Wang, has been accepted by Intelligence-Based Medicine. This study identified and reviewed existing AI models for predicting risk of cardiovascular diseases in people with type 2 diabetes. We found that compared to risk scores developed using conventional methods, AI approaches have the potential to achieve more accurate predictions than risk scores developed using conventional methods. However, none of the reviewed models is directly reusable or reproducible, due to incomplete reporting and lack of transparency. Clinically, none of the AI models includes interventions that may affect risks such as medications and lifestyle changes. There were no indications in the studies on whether the prediction models might be able to adapt to include these factors.
(29 July 2022) Our collaboration study titled “Prediction of Five-Year Cardiovascular Disease Risk in People with Type 2 Diabetes Mellitus - Derivation in Nanjing, China and External Validation in Scotland, UK”, led by Cheng Wan from Nanjing Medical University, has been published by Global Heart. This study shows it is feasible to generate a risk prediction model using routinely collected Chinese hospital data. This indicates there is a great potential to make use of the large-scale and relatively easy accessible route data for identifying those at risk of CVD and help significantly improve CVD prevention in people with diabetes.
(11 July 2022) Our health inequality studies - one led by Isabel Straw and one with Minhong, Aneeta and Prof Sarah Wild (University of Edinburgh) - are featured in a science piece on i news.
(16 June 2022) Our collaboration study titled “Spine-GFlow - A Hybrid Learning Framework for Robust Multi-tissue Segmentation in Lumbar MRI without Manual Annotation”, led by Dr Teng Zhang from Hong Kong University, has been accepted by Computerized Medical Imaging and Graphics. Results of this study show that our method, without requiring manual annotation, has achieved a segmentation performance comparable to a model trained with full supervision (mean Dice 0.914 vs 0.916).
(10 June 2022) Out work, titled COVID-19 trajectories among 57 million adults in England - a cohort study using electronic health records, is now out with Lancet Digital Health. Our analyses illustrate the wide spectrum of disease trajectories as shown by differences in incidence, survival, and clinical pathways. We have provided a modular analytical framework that can be used to monitor the impact of the pandemic and generate evidence of clinical and policy relevance using multiple EHR sources.
(2 May 2022) Our work Quantifying Health Inequalities Induced by Data and AI Models has been accepted by IJCAI-ECAI2022 ‘AI for Good track’. This work introduced a generic allocation-deterioration framework for detecting and quantifying AI induced inequality. Extensive experiments were carried out to quantify health inequalities (a) embedded in two real-world ICU datasets of HiRID and MIMIC III; (b) induced by AI models trained for two resource allocation scenarios. The results showed that compared to men, women had up to 33% poorer deterioration in markers of prognosis when admitted to HiRID ICUs. All four AI models assessed were shown to induce significant inequalities (2.45% to 43.2%) for non-White compared to White patients. The models exacerbated data embedded inequalities significantly in 3 out of 8 assessments, one of which was >9 times worse. preprint, slides, recording, repo.
(26 April 2022) Study led by Isabel Straw, Investigating for bias in healthcare algorithms - a sex-stratified analysis of supervised machine learning models in liver disease prediction, demonstrates a previously unobserved sex disparity present in published machine learning models. It suggests “To ensure sex-based inequalities do not manifest in medical AI, an evaluation of demographic performance disparities must be integrated into model development.” The work has been published on BMJ Health & Care Informatics.
(22 April 2022) Dr Honghan Wu joined the editorial board of BMC Digital Health. BMC Digital Health considers research on all aspects of the development and implementation of digital technology in both medicine and public health, such as mobile health applications, virtual healthcare and wearable technology, as well as the role of social media and other communications technology in digital health.
(25 March 2022) Study led by Huayu, Increased COVID-19 mortality rate in rare disease patients - a retrospective cohort study in participants of the Genomics England 100,000 Genomes project, has shown rare disease patients, especially ones affected by neurology and neurodevelopmental disorders, in the Genomics England cohort had increased risk of COVID-19 related death during the first wave of the pandemic in UK. This work has now been accepted by Orphanet Journal of Rare Diseases.
(20 March 2022) Clinical coding is the task of transforming medical information in a patient’s health records into structured codes like ICD-10 for diagnosis, which is cognitive, time-consuming task and error-prone. In this preprint, titled Automated Clinical Coding - What, Why, and Where We Are? , Hang introduces the idea of automated clinical coding and summarises its challenges from the perspective of Artificial Intelligence (AI) and Natural Language Processing (NLP), based on the literature, our project experience over the past two and half years (late 2019 - early 2022), and discussions with clinical coding experts in Scotland and the UK.
(18 January 2022) KnowLab was awarded an enabling grant (£29k) from British Council to strengthen academic exchanges and deepen our collaborations with the two Nanjing based universities of Nanjing Medical University (Prof Yun Liu’s group) and Southeast University (Dr Xiang Zhang’s group). At UCL side, in addition to KnowLab colleagues, we have Prof Paul Taylor and Dr Holger Kunz. For research focuses, we will focus on Artificial Intelligence in Medicine - tackling challenges of low generalisability and health inequality. This will involve both teaching and research activities.
(15 January 2022) Great work by Shaoxiong Ji and colleagues from Aalto University on reviewing automated coding from free-text clinical notes using a unified/abstract architecture view - now on arXiv. Great that KnowLab is part of this.
(13 January 2022) Minhong’s project “COOLNeo-an automated COOLing therapy for NEOnates” has been awarded £9,960.00 in funding from the ACCELERATE Innovation Team Challenge, finacially supported throught the Wellcome Trust Translational Partnership Award linked to Translational Research Office.
(8 January 2022) New members! A very warm welcome to Xuezhe Wang and Zhaolong Wu to join KnowLab for doing their MSc projects. Both are MSc students based in Institute of Health Informatics, UCL. Xuezhe will be working on graphic neural networks and Zhaolong will be doing clinical natural language processing.
(3 December 2021) Led by Dr Rebecca Bendayan at King’s College London, our work Investigating the Association between Physical Health Comorbidities and Disability in Individuals with Severe Mental Illness is now accepted by European Psychiatry. We found individuals with Severe Mental Illness and musculoskeletal, skin/dermatological, respiratory endocrine, neurological, haematological or circulatory disorders are at higher risk of disability compared to those that do not have those comorbidities. There is a great and urgent need to provide targeted prevention and intervention programs for these vulnerable people.
(28 November 2021) Huayu’s work on rare disease is now out with the Lancet as a conference abstract. Common conditions are widely recognised as risk factors for COVID-19 death, BUT effects of Rare Diseases are largely unknown. This study on Genomics England data shows significant increased mortality risks (OR 3·47) among rare disease individuals.
(23 November 2021) New members! A very warm welcome to Yun-Hsuan Chang and Hengrui Zhang to join KnowLab for doing their MSc projects. Both are MSc students based in Institute of Health Informatics, UCL. Yun will be working on Parkinson’s Disease Modelling using mutimodal data and Hengrui will do deep learning models for automated coding from discharge summaries. Both projects are exciting!
(9 October 2021) Great to know that Emma and Claire’s work on COVID subtype identification work has featured in Health Data Research UK’s website as a case study. Health Data Research UK is UK’s national institute of health data science.
(5 October 2021) NLP of radiology reports has wide applications. However, the current literature has suboptimal reporting quality. This impedes comparison, reproducibility, and replication. Check our systematic review on reporting quality of NLP on radiology reports on BMC Medical Imaging.
(26 September 2021) We are pleased to announce the grand open of our KnowLab Blog. We aim to irregularly share our research in a layman way. This is to reach out to the general public for disseminating what we are doing and why we are doing these.
(6 September 2021) KnowLab is thrilled to be part of a new £3.9m NIHR Research Collaboration on Artificial Intelligence and Multimorbidity, called AIM-CISC. We will lead the objective 4 work in England - use machine learning and natural language processing on multimodal health data for better understanding of disease clusters.
(18 August 2021) Great news - Dr Honghan Wu has become a Turing Fellow at The Alan Turing Institute, UK’s national institute for data science and artificial intelligence.
(1 August 2021) Dr Minhong Wang takes up a new position as a research fellow at IHI, UCL (Top 5 in the world for Public Health according to Shanghai Ranking 2021) to work on exciting projects on health data using NLP + ML! Congratulations, Minhong!
(28 July 2021) A warm welcome to Nickil Maveli who will be working with Dr Hang Dong on the automated medical coding project. Nickil will focus on (a) tackling the shortcomings of BERT models that only deal with 512 tokens or fewer; (b) utilising multiple documents in the task.
(15 July 2021) Paper accepted by EMBC 2021, titled “Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision”. arXiv:2105.01995
(9 July 2021) Dr Honghan Wu joined the editorial board of BMC Medical Informatics and Decision Making.
(7 July 2021) Exciting news - Dr Honghan Wu has been promoted to Associate Professor at UCL! (effective 1 October 2021).
(5 July 2021) We are recruiting a Research Fellow in Health Data Research to be based at IHI, UCL. Part of the role will be conducting exciting collaborations with iris.ai on The AI Chemist project. Please apply here.
(15 June 2021) COVID-19 subtyping work has now been accepted by AMIA 2021 Annual Symposium. What a great work, Emma and Claire! Both are joint first authors and first year PhD students of HDR UK, Turing Institute and Wellcome CDT, who worked with KnowLab on the COVID-19 project. Axes of Prognosis - Identifying Subtypes of COVID-19 Outcomes. medRxiv. Github Repo
(15 June 2021) Led by Zina, our work “A Knowledge Distillation Ensemble Framework for Predicting Short and Long-term Hospitalisation Outcomes from Electronic Health Records Data” has been published on IEEE Journal of Biomedical and Health Informatics.
(8 June 2021) Our paper “Developing automated methods for disease subtyping in UK Biobank - an exemplar study on stroke” has been accepted by BMC Medical Informatics and Decision Making. This is our first work to combine NLP + reusable domain knowledge (encoded as rules) to derive sub-phenotypes (specific conditions like intracerebral hemorrhage stroke).
(17 May 2021) Our paper “A Systematic Review of Natural Language Processing Applied to Radiology Reports” has been accepted by BMC Medical Informatics and Decision Making. Preprint arXiv. Well done, Arlene!
(5 May 2021) Our work with Dr Adam Levine on Pathology NLP has been accepted for oral presentation at 13th Joint meeting of the BDIAP and The Pathological Society on 6-8 July 2021, titled Natural Language Processing for the Automated Extraction of Tumour Immunohistochemical Profiles from Diagnostic Histopathology Reports. What a great start to work on Pathology NLP!
(21 April 2021) Hang’s work - “Weakly supervised entity linking and ontology matching to enrich patients’ rare disease coding” has been accepted to the 2021 virtual workshop on Personal Health Knowledge Graphs. Well done, Hang!
(12 April 2021) Honghan is invited to speak at Personal Knowledge Graph Workshop 2021 about our work and thoughts on knowledge graph and particularly the “personal” aspect in health data science.
(24 March 2021) Axes of Prognosis - Identifying Subtypes of COVID-19 Outcomes - a work led by Emma and Claire now on medRxiv. Both are first year PhD students of HDR UK, Turing Institute and Wellcome CDT. Great work, Emma and Claire! Github Repo
(25 February 2021) Our paper “Explainable Automated Coding of Clinical Notes using Hierarchical Label-wise Attention Networks and Label Embedding Initialisation” has been accepted by Journal of Biomedical Informatics. Well done, Hang!
(25 February 2021) Our recent work of using deep learning to automate diagnosis coding (ICD) from discharge summaries. Explainable Automated Coding of Clinical Notes using Hierarchical Label-wise Attention Networks and Label Embedding Initialisation, preprint, GitHub Repo
(20 February 2021) Invited talk from Jose Manuel Gomez-Perez “On the Role of Knowledge Graphs and Language Models in Machine Understanding of Scientific Documents”.
(1 February 2021) Excited to kick off our project with NHS Lothian to use Natural Language Processing + Machine learning to automatically triage patients in long-waiting-list (due to the impact of COVID-19) of dermatology. This study is funded by Data-Driven Innovation.
(1 February 2021) Many congratulations to Minhong for having successfully passed her viva with minor corrections! Huge achievement, Dr Wang!
(21 January 2021) Our paper “Evaluation of NEWS2 for predicting severe COVID outcome”, now published with BMC Medicine. Key findings - NEWS2 had poor-moderate discrimination for severe outcome (ICU/death) at 14 days. But improved with blood/physio params.. Mentions in News Stories - dailymail / eurekalert / KCL
(4 January 2021) Our paper “Benchmarking network-based gene prioritization methods for cerebral small vessel disease” has been accepted by Briefings in Bioinformatics.
(16 December 2020) Evaluation and Improvement of the National Early Warning Score (NEWS2) for COVID-19 a multi-hospital study. MedRxiv, accepted by BMC Medicine now.
(10 December 2020) Excess deaths in people with cardiovascular diseases during the COVID-19 pandemic, MedRxiv, now accepted by European Journal of Preventive Cardiology.
(11 November 2020) Our ensemble learning for COVID-19 work has been accepted by Journal of the American Medical Informatics Association. This study synergies seven multinational prediction models to realise a robust and high-performing prediction model. This is the first work to use ensemble learning for risk prediction of COVID-19 and the validation cohorts are one of the most diverse international COVID-19 datasets (4 cohorts with mortality rates 2.4-45%). The ensemble model consistently outperformed any single models in all aspects validated. DOI:10.1093/jamia/ocaa295, GitHub Repo
(16 October 2020) Many congratulations to Victor and Hang, both are awarded £1,000 by The iTPA Translational Innovation Competition. Details
(23 September 2020) We are hiring! Two posts (£33,797 - £40,322) in NLP / Health Data research. Deadline 14 Oct 2020, based in Usher, Edinburgh Medical School.
(12 September 2020) Our study uses ensemble learning to synergise seven multinational prediction models to realise a robust and high-performing prediction model. This is the first work to use ensemble learning for risk prediction of COVID-19 and the validation cohorts are one of the most diverse international COVID-19 datasets (4 cohorts with mortality rates 2.4-45%). The ensemble model consistently outperformed any single models in all aspects validated. preprint, GitHub Repo
(11 August 2020) Great news - Huayu Zhang’s project, titled “Towards data-driven fine management of COVID-19 hospitalization risk for rare-disease patients” is awarded £1,000 by The iTPA Translational Innovation Competition. Quite an achievement for an early stage postdoctoral researcher!
(11 June 2020) Our study shows “Adding age and a minimal set of blood parameters to NEWS2 improves the detection of patients likely to develop severe COVID-19 outcomes” - “Evaluation and Improvement of the National Early Warning Score (NEWS2) for COVID-19 - a multi-hospital study”. MedRxiv.
(11 June 2020) Our study shows “CVD services have dramatically reduced across countries, leading to potential (probably avoidable) excess mortality during and after the COVID-19 pandemic” - “Excess deaths in people with cardiovascular diseases during the COVID-19 pandemic”. MedRxiv.
(3 May 2020) Our COVID-19 risk prediction preprint out on Medrxiv - “Risk prediction for poor outcome and death in hospital in-patients with COVID-19 - derivation in Wuhan, China and external validation in London, UK”. MedRxiv.
(1 May 2020) Dr Honghan Wu started his new job as a lecturer in health informatics at IHI, UCL. He will continue his personal fellowship project on both University of Edinburgh and UCL.
(27 February 2020) Paper accepted by HealTAC 2020 - “Identifying physical health comorbidities in a cohort of individuals with severe mental illness - An application of SemEHR”
(21 February 2020) Paper accepted by ECAI 2020 - “Modeling Rare Interactions in Time Series Data Through Qualitative Change - Application to Outcome Prediction in Intensive Care Units”
(27 January 2020) Delighted to co-develop a NLP work package in the exciting Advanced Care Research Centre programme, a £20m investment dedicated to the field of ageing and care.
(20 January 2020) Great to have a short visit to Department of Orthopaedics and Traumatology, Hong Kong University, discussing the exciting opportunity of personalised pain prediction after spine treatments using multimodal data (free text + imaging).
(17 January 2020) our paper - “On Classifying Sepsis Heterogeneity in the ICU - Insight Using Machine Learning” has been published by JAMIA. https://doi.org/10.1093/jamia/ocz211
(14 January 2020) Delighted to have a kick-off meeting with Edinburgh Innovations team for our project “Towards an AI-driven Health Informatics Platform for supporting clinical decision making in Scotland – a pilot study in NHS Lothian” funded by Wellcome iTPA 2019.
(6 December 2019) Knowledge driven phenotyping on medrxiv now - an automated approach to translating phenotypes defined in domain vocabularies into queries executable on heterogenous and distributed health datasets.
(28 November 2019) Using SemEHR on EHRs to answer an important clinical question – “Association of physical health multimorbidity with mortality in people with schizophrenia spectrum disorders - Using a novel semantic search system that captures physical diseases in electronic patient records” has been accepted by Schizophrenia Research. DOI:10.1016/j.schres.2019.10.061
(11 November 2019) Great to know our paper - “Semantic computational analysis of anticoagulation use in atrial fibrillation from real world data” has been accepted by PLoS One. Preprint
(22 October 2019) Our first NLP transfer learning paper for identifying phenotype mentions has been accepted by JMIR Medical Informatics a relatively new journal (started in 2013) with an inaugural impact factor of 3.188.
(20 July 2019) Delighted to know our proposal for HDR UK NLP Implementation Project has been awarded as one of the 3 National Implementation Projects Look forward to working on this exciting UK-wide collaboration
(5 April 2019) The 4th International Workshop on Knowledge Discovery in Healthcare Data will be with IJCAI2019 in Macao, China. Website, submission and dates.
(13 January 2019) Delighted to know our “Sprint” project proposal – “Building the Knowledge Graph for UK Healthcare Data Science” is awarded by HDR UK as part of Digital Innovation Hub Programme and one of the ten innovative data solutions to prove the potential of health data to transform lives
(19 October 2018) Thrilled to be invited to give a talk about our CogStack EHR platform in China’s National Centre for Cardiovascular Diseases. Great to learn their excellent infrastructures, research and datasets; and grand vision! Look forward to the first CogStack deployment in China for supporting EHR based research.
(15 February 2018) Proudly begin a MRC/Rutherford Fund Fellowship of HRD UK hosted by Centre for Medical Informatics of University of Edinburgh. My research focuses on “Deriving an actionable patient phenome from healthcare data“
(10 February 2018) An application paper describing our SemEHR toolkit has been accepted by JAMIA, titled “SemEHR- A General-purpose Semantic Search System to Surface Semantic Data from Clinical Notes for Tailored Care, Trial Recruitment and Clinical Research”.
(27 November 2017) Our work of using knowledge graph techniques in predicting adverse drug reactions has published by Scientific Reports.
(5 July 2017) Our data harmonisation and search toolkit for EHR – CogStack is mentioned in Annual Report of the Chief Medical Officer 2016 by the UK Government.