Case study

Learn About ArcMAP

Back to all resources
Learn About ArcMAP

Working with real-world data (RWD) to support research is inherently challenging as it is collected primarily for patient care. There are significant variations in how the data is collected and managed between NHS Hospitals, even if the same Electronic Patient Record (EPR) system is used. Without robust data standardisation, RWD can’t be used reliably for large real-world evidence (RWE) studies. 

The process of data standardisation involves two key steps: first, aligning the original NHS Hospital’s data schema with a common data model; and second, mapping each individual medical concept such as specific lab tests or medications to a standardised terminology.

This second step is particularly critical for ensuring the quality and consistency of RWE studies, but it is also time consuming, typically requiring clinical experts to manually map each concept. One widely adopted standard is the OMOP (Observational Medical Outcomes Partnership) common data model, which leverages standardised vocabularies such as LOINC (Logical Observation Identifiers Names and Codes) and SNOMED (Systematised Nomenclature of Medicine).

The complexity increases further when data is sourced from multiple trusts, each with its own documentation style. For instance, in a recent RWE study, we encountered four different medical terms from different NHS Hospital, all referring to the same standard concept: Serum alanine aminotransferase level.As illustrated in this example, some of the original medical concepts are very close to the common concept such as “Alanine Transaminase”, however others require expert knowledge such as “UELFT BONE_XHIT_ALT”.

To streamline the data standardisation process, researchers at Arcturis have developed ArcMAP, an advanced machine learning-based tool designed specifically for mapping medical concepts to standard terminologies. ArcMap transforms the traditional approach to data standardisation by offering several key advantages:

  • High Efficiency: ArcMAP intelligently analyses original medical concepts from individual NHS Hospitals and automatically suggests the most likely standardised equivalents. Clinical experts are only required to review and confirm these suggestions, significantly reducing the manual effort typically involved in the process.
  • Scalability: ArcMAP continuously learns and improves with each reviewed mapping. As new data is introduced—for example, from an unfamiliar NHS Hospital, ArcMAP draws on its growing understanding of medical language and documentation styles to make more accurate predictions.
  • Traceability: Every mapping action is recorded in an internal database, capturing details such as the reviewer, timestamp, and any subsequent modifications. This comprehensive audit trail ensures full transparency and supports regulatory compliance, making the data suitable for submissions and longitudinal studies.

Click for further information

ArcMAP is built on recent advancements in natural language processing (NLP) and leverages a domain-specific language model that has been further optimised by Arcturis to meet the unique challenges of medical data standardisation.

To evaluate its performance, we conducted a series of experiments comparing ArcMAP with BioLORD [1], another state-of-the-art data standardisation model. In this evaluation, over 14,000 original medical concepts, including laboratory tests and medication names, were manually mapped to standardised codes by Arcturis’ internal informatics team. These concepts were sourced from four NHS trusts.

  • Laboratory test names were mapped to SNOMED CT codes [2]

  • Medication names were mapped to DM+D codes [3]

The experiment simulated the onboarding of a new NHS Hospital by training the models only on data from three trusts and testing performance on the fourth—ensuring that the target data had not been seen previously. Model accuracy was assessed using two metrics:

  • Exact Match: The predicted concept matched the correct standard concept exactly.

  • Top-5 Match: The correct concept appeared within the top five predicted options.

The results are summarised in the figure below, which presents mean accuracy and standard deviation for both data types across all four NHS Hospitals.

  • For medication names, both models performed strongly, with ArcMap achieving 98.20% exact match compared to 96.99% for BioLORD.

  • For laboratory test names, ArcMAP showed a clear advantage in top-5 accuracy, reaching 91.07% versus 85.59% for BioLORD.

These results demonstrate that ArcMap can standardise medication and laboratory test names from a new NHS Hospital with over 90% top-five accuracy across both modalities. In practical terms, this means that in more than 90% of cases, a clinical reviewer can select the correct standardised concept from a shortlist of five, greatly accelerating the data standardisation process. For medication names specifically, ArcMAP identifies the correct code in nearly 98% of cases.

In conclusion, the integration of ArcMAP into the data standardisation process marks a significant step forward in the effective use of real-world data. By harnessing the power of machine learning and natural language processing, ArcMAP enhances efficiency, supports scalable deployment across diverse data sources, and ensures full traceability—making it a trusted solution for regulatory-grade data preparation. This innovation enables more reliable, high-quality RWE studies and ultimately contributes to advancing both healthcare research and patient care.

 

References

[1]         Remy et al.: “BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights”, arXiv: 2311.16075, 2023

[2]         SNOMED CT codes, NHS England, https://digital.nhs.uk/services/terminology-and-classifications/snomed-ct

[3]         Dictionary of medicines and devices (DM+D ), NHS England,  https://digital.nhs.uk/services/terminology-and-classifications/dm-d