Published: March 05, 2018
Anyone au fait with openEHR knows the benefits of using its dual-modeling architecture to underpin an electronic health record. It is, after all what it was designed for. But I recently came across a paper which espoused the virtues of using openEHR as the basis for a data warehouse which made me think about the wider attributes of that underlying reference model.
The paper in question, Archetype-based data warehouse environment to enable the reuse of electronic health record data by Marco-Ruiz et al. (2015)1 reflects on a real world implementation of an openEHR based clinical decision support system in Norway. Their primary goal was to reuse data available in multiple different systems that unfortunately had been stored in legacy data formats. The challenge was to take this, transform it and then make it available for the successor to a disease surveillance system called SNOW.
The authors give a great overview of the principle problems with semantic interoperability, and the need for common syntax and clinical definitions. It also introduced me to the concept of the "impedance mismatch"2 that exists between the information model (i.e. the electronic health record) and the inference model. The latter is needed for decision support and data warehousing to understand clinical guidelines and protocols. An inference model is defined by Rector et al. as3
models that encapsulate knowledge needed to derive the conclusions, decisions, and actions that follow from what is stored.
SNOW facilitated some clever use cases to support microbiology teams. Based on certain factors, staff were able to order additional tests for infectious diseases such as Norovirus if they believed that there was the beginning of an epidemic or some form of outbreak.
But the underlying architecture was problematic and a standards based approach was sought. Traditional clinical information systems were not usually structured in a way that supported ad-hoc queries (with accompanying services), but of course openEHR is different thanks to the Archetype Query Language (AQL). This is part of the reason why the new data warehouse architecture was based on the openEHR standard.
The authors used LinkEHR as their "normalization platform" to transform data from the legacy source to the target archetype structures. Marand's Think!EHR was used for the persistent clinical data repository (CDR).
One of the interesting points they make is the importance of the clinical modeling. In most cases when looking to migrate legacy data, an Extract, Transform and Load (ETL) process is carried out. These steps refer to firstly getting the legacy data out, then converting it to some format or structure that fits the new system, and finally getting loaded into the new repository. But Marco-Ruiz and friends added modeling to the first stage of proceedings.
Of course this makes perfect sense as the archetype structure needed to be clearly built up front. But they make recommendations within the paper to indicate that where health data is concerned we should potentially consider a new acronym based on Model/Extract/Transform/Load (METL) as standard. Part of the reason for this, and their support for openEHR architecture, is that the reference model takes a lot of the effort away from defining a complex target schema in which the transformed data needs to reside.
The paper describes a model based on a composition per patient, and per microbiology request. So a single patient may have more than one request, but importantly that the batches (or result profiles) were maintained. This translates roughly to;
<OBSERVATION> <BATTERY TESTS> <TEST 1> <RESULT DATA> <TEST 2> <RESULT DATA> <TEST n> <TEST n DATA> <PATIENT DETAILS> <TEST REQUESTER DETAILS>
Note: I've greatly simplified the above so please refer to the original article if you are interested in the actual template structure.
The paper has a lot of positive points to make on the newly developed architecture;
It was not all plain sailing however. The authors point out some issues that were encountered;
SUBQUERYoperators available at the time of writing and they required a more powerful language for some database queries like SPARQL (although this was indicated as potentially requiring complex mappings).
The authors state that the techniques used were were mature enough to be easily integrated with new systems. In the round, they found a way to ensure semantic coherence for legacy data and support a new, operational platform in the process. It sounds pretty impressive.
Although as a basis for a data warehouse seems like an interesting proposition, I am not totally convinced. Without a doubt a jewel in the crown of openEHR is the ability to use AQL to query across the longitudinal patient record as well as vertically through a composition. This is thanks to the reference model that binds these facets together. However, as Marco-Ruiz et al. state, the language is not necessarily robust enough for detailed data warehouse tasks.
For example, AQL will find it difficult to compete against the R language if you need to undertake some serious data science. There will be undoubtedly better tools available for these tasks. But regardless, using AQL to even prep for data manipulation is a significant benefit over competing EHR technologies. This research just makes me think that there may be even more flexibility under the bonnet of openEHR than I already thought.
Marco-Ruiz, L., Moner, D., Maldonado, J. A., Kolstrup, N., & Bellika, J. G. (2015). Archetype-based data warehouse environment to enable the reuse of electronic health record data. International Journal of Medical Informatics. https://doi.org/10.1016/j.ijmedinf.2015.05.016↩
G. Schadow, D.C. Russler, C.N. Mead, C.J. McDonald, Integrating medical information and knowledge in the HL7 RIM, Proc. AMIA Annu. Symp. AMIA Symp. (2000) 764–768.↩
Rector, A. L., Johnson, P. D., Tu, S., Wroe, C., & Rogers, J. (n.d.). Interface of Inference Models with Concept and Medical Record Models. Retrieved from https://pdfs.semanticscholar.org/0126/abd316a4c7afee2ec9390434087ccb976042.pdf↩