Data Curation and Enrichment
We are experts in the curation, enrichment and management of diverse types of real-world health data.
How We Work
Our data engineering, clinical informatics and machine-learning teams have developed a unique data platform that allows us to create research-ready datasets at scale.
By this we mean that health data accessed through our HRA-approved RWD Network is standardised into our common data model, quality controlled and further enriched with other data where possible. The data platform was developed by our Data Engineering and Clinical Informatics teams in close collaboration with our clinical experts and supported through the latest advances in machine learning.
The data platform has been designed to enable the seamless refreshment of datasets to provide the latest disease information from which RWE insights are generated, whilst multiple tools have been developed for deployment across the RWD Network to support efficient, quality-controlled and scalable curation of anonymised data.
Our Advanced Curation Approach
Data Standardisation
Arcturis has established its own proprietary Common Data Model which is compatible with the Observational Medical Outcomes Partnership (OMOP) but is more expansive, incorporating genetic and functional information that is critical to life sciences research and regulatory submission.
Our Common Data Model allows us to work with diverse source data accessed through the RWD Network, harmonising it for the purposes of data quality and ease of analysis.
Data Enrichment
A significant amount of clinical information resides within unstructured clinical notes and reports, impeding direct access to data for analysis.
We have developed ArcTEX (Arcturis Text Enrichment and Extraction), a natural language processing or NLP based model to extract biomarker and other disease specific information from unstructured reports.
ML-Assisted Concept Mapping
We have developed ArcMAP, a ML-Assisted Concept Mapping tool which enables us to seamlessly create suggested mappings from any source data to recognised standard concepts, including SNOMED CT and ICD10. These suggestions are then validated by our clinicians, allowing our researchers to benefit from clean, quality concept data.
Regulatory Grade Data Quality Management
The collation of guidance across industry regulators (e.g. FDA, EMA, MHRA), technology appraisers such as NICE and academic research has allowed us to produce a comprehensive Data Quality Framework.
This allows us to report and monitor metrics relating to how plausible and complete data is and whether in conforms to pre-specified data standards.
Free-text PII Redaction
Protecting patient confidentiality is paramount. Our sophisticated redaction tools use state-of-the-art techniques to securely ensure sensitive information has been anonymised, ensuring compliance with privacy laws and ethical standards.
Robust Data Security
Our commitment to data security is integral to our data engineering ethos. We employ cutting-edge techniques to ensure strict compliance standards while facilitating vital research. Our NLP technologies further bolster our capabilities, enabling the extraction of meaningful insights from anonymised unstructured data sources without compromising security.
Arcturis is ISO 27001 and Cyber Essentials Plus certified and is rated as “standards exceeded” for the NHS Data Security and Protection Toolkit.
ArcTEX
Data enrichment through the use of unstructured clinical reports and other documents.
Our team of researchers have pioneered the development of the NLP model, ArcTEX.
ArcTEX is a sophisticated yet flexible Natural Language Processing (NLP) framework designed to cater to a diverse array of clinical reports.
We have engineered ArcTEX to systematically extract biomarker and disease-specific data from unstructured clinical reports and notes at scale. Notably, ArcTEX stands out for its versatility, and is capable of being easily fine tuned to cater to diverse project needs.
ArcMAP
Transforming RWD Standardisation.
AI-Powered Mapping for Faster, Smarter Real-World Evidence Research.
ArcMAP, a powerful new data standardisation tool developed to tackle one of the biggest challenges in real-world data (RWD) research: Inconsistent medical terminology across healthcare providers. Built on advanced machine learning and natural language processing, ArcMAP intelligently maps local medical concepts to standard vocabularies like OMOP, SNOMED, and LOINC, significantly reducing the need for manual clinical review.
ArcMAP makes it faster and easier to prepare RWD for real-world evidence (RWE) studies, helping researchers unlock insights and improve patient outcomes.