What is clinical data infrastructure?

As the chief product officer at Q-Centrix, I have conversations all the time with customers, colleagues, and partners about technology. Through these conversations, it is obvious that it can be challenging to understand the importance of data infrastructure if you’re not living it day to day. In a clinical setting, data infrastructure is what operationalizes data so it can be shared and consumed: the difference between collecting data and using it to understand trends, become more efficient, improve care, and introduce innovations to make clinical interventions more precise and effective.

To help make the message clear, I’d like to share an example that almost all of us will know, but few of us probably ever connected to clinical data – Costco – and its relation to data infrastructure.

What does Costco have to do with clinical data management?

If you’ve never been to Costco, you’ll first want to appreciate the scale. The sheer variety and the number of products they buy and sell are easily related to the features of clinical data points collected each year.

When you’re in Costco, there’s a flow to the store. You enter on one side and circulate around the store until you finish at the checkout. If there was no organization in the store – if the food was interspersed with the ladders and the TVs were paired with the bread and the lotions – it would be chaotic. Even with the organization, I’m sure we’ve all witnessed a fellow shopper going against this flow… it’s very disruptive! Both Costco and its customers benefit from some modicum of organization.

Here comes the point:

Think about clinical data. The healthcare industry collects these data from all sorts of sources using various methods. We get demographics on admission, discharge, and transfer (ADT) feeds. We get extracts from hospital information systems like electronic medical records, lab, radiology, and pathology systems. We get blurbs of clinical text. We get manually abstracted data from our Q-Apps® tools. We get extracts from vendor tools – you get the idea.

All these different sources come in different digital formats (text files, JSON, image files, spreadsheets, custom, and proprietary formats). Furthermore, they come in different frequencies – daily, weekly, monthly, and yearly. As you can imagine, it gets complicated quickly when you continuously add other data formats, as is common in healthcare.

Using our Costco metaphor, think of all the ways you can get potato chips; large bags, big boxes full of individual bags, and all the different flavors. You could even get raw potatoes and make them yourself. You have to have an idea of what you want to do with the potato chips before you buy them at Costco. You almost certainly would not get what you wanted if you just told your partner to get you some chips as they left for a Costco run!

How clinical data warehouses use these principles

To accomplish everything we want to do with clinical data, it must be neat and organized. Like Costco, we need a functional flow for our most important resource.

A vast, clean data warehouse with a ginormous capacity is what allows everyone to access a variety of information easily. It takes all these data in all these formats and gives them structure so that we can do everything we want to do with it – build reports, share with trusted external parties, round trip our data, analyze them, and do data science. As I mentioned at the very start of this blog, the data warehouse elevates clinical data to the point of powerful insight generation.

Identifying unique data

There is yet another level of organization at Costco that we rarely think about: universal product codes –bar codes. Every product has a unique code standardized across grocery retailers and readable by a machine. (Fun fact: UPC bar codes first appeared in a store on a pack of Wrigley Gum in Troy, Ohio, in June of 1974.) If you know the code for an avocado, you can enter this code in any grocery store, and the point-of-service system will recognize it! This is perhaps useful to you as more and more grocers convert to self-checkout.

UPCs are estimated to save the grocery industry $17 billion annually by eliminating the need for manual entry and manipulation of product information. That advantage improves speed, efficiency, and productivity. This is a massive win for grocery stores and consumers, providing a model for us in the clinical data space to follow. A whole competitive industry, standardized in one format.

A patient (a human being and a person) could soon be represented by data in different ways depending on how the source system identified that person. That one individual will be examined in different contexts and from many different sources. This makes it difficult to match the data and build a comprehensive understanding of that individual. But what if we had a UPC for each unique patient?

We call this general idea an index; in healthcare, we call it an “MPI,” or master patient index. This index is the key technique for identifying different data records that unlock new information and methods of interpreting that information across the full spectrum of the patient’s care. As our clinical data infrastructure develops as an industry, this index and its advancements will shape everything from care delivery to operations, clinical research, and more.

What will it take to organize an entire industry’s data this way?

Costco’s organizational model helps us illustrate the importance of data infrastructure – a tool that allows us to operationalize data and use it to improve the quality of care. While building a similar model for clinical data is a perpetual process, the benefits to healthcare operations, research, and patient care make it all but mandatory.

The functional master patient index is the goal, but it’s not the only benefit health care will reap from this process. As the industry continues to grow and evolve, the importance of data infrastructure, and the number of uses for it, will only continue to increase.

PS: The PLU (similar to the UPC) for Hass Avocado, medium, is 4046.

Blog