Data enrichment using unstructured clinical reports
Numerous studies have demonstrated the vast reservoir of clinical insights hidden within unstructured textual data, such as clinical letters or pathology reports (1). Unfortunately, this highlights the volume of data unavailable for direct analysis. At Arcturis, our team of researchers have pioneered the development of ArcTEX (Arcturis Text Enrichment and Extraction) model. ArcTEX is a flexible Natural Language Processing (NLP) framework, engineered to systematically extract biomarker and disease-specific data from unstructured clinical reports. Notably, ArcTEX stands out for its versatility, and is capable of being easily finetuned to cater to diverse project needs. Moreover, our model has been meticulously optimized to underpin high-quality real-world evidence (RWE) initiatives, ensuring robust and reliable outcomes across multiple disease areas.
Introduction
Many leading pharmaceutical companies enhance their clinical development and post-market launch strategies through the integration of real-world data. These can encompass a spectrum of methodologies, ranging from retrospective cohort studies to the optimisation of patient selection criteria or the incorporation of external control arms. However, a significant hurdle lies in the fact that a substantial portion of vital healthcare data required for these initiatives reside within unstructured textual formats, impeding direct accessibility for analysis.
If we consider, for instance, critical biomarker statuses like human epidermal growth factor receptor-2 (HER2) or oestrogen and progesterone receptors (ER or PR). These biomarkers have a profound influence on the treatment trajectory for breast cancer patients. This crucial information often finds itself embedded within unstructured pathology reports, characterized by variations in style and content across different hospital sites and pathologists. This also applies to, for example, nuances regarding ‘response to treatment’ or ‘disease progression’, which further exemplify the breadth of which unstructured data can crucially inform treatment pathways.
Unlocking these insights presents a formidable yet essential challenge in advancing pharmaceutical research and patient care. To meet the evolving needs of these communities, Arcturis are proud to introduce ArcTEX, a sophisticated NLP model designed to cater to a diverse array of clinical reports. Our innovative solution empowers real-world evidence studies by seamlessly automating the extraction of biomarkers and other disease-specific data at scale. ArcTEX stands as a testament to our commitment to advancing healthcare research through cutting-edge technology, to ensure that the availability of high-quality data is at the forefront of evidence decision-making.