Leveraging large language models for rare disease named entity recognition is drawing significant interest across the industry.
Author summary Rare diseases are individually uncommon but together affect many patients. Clinicians often describe rare conditions, physical findings, and patient-reported symptoms in medical notes, which makes it hard to identify patients for research or follow-up. In this study, we ask whether modern large language models can pull these key terms from text when only limited labeled data are available. Using the public RareDis corpus, we evaluate several ways to use these models, including giving the model instructions alone, adding a small number of labeled examples, adding short background passages retrieved from a reference source, and further training a smaller model on the same task. We find that a few well selected examples markedly improve extraction of rare disease names at low cost, and further training achieves the best overall accuracy. Adding background passages provides limited average gains, but it sometimes can help capture more true mentions of harder categories such as signs and symptoms. Symptom extraction remains the most challenging because symptom labels are context dependent and can overlap with objective findings. These results support using large language models as decision-support tools paired with expert review to speed chart screening and rare disease research.
Experts suggest this could influence future trends and innovation in the sector.
More updates are expected as the story develops.
Source: Original →