Meaningful AI-Driven Data Curation

Introduction

Artificial intelligence (AI) is fueling innovation throughout the healthcare industry. Whether healthcare facilities are using AI to detect diseases earlier, assist in clinical decision-making, or identify patients suited for clinical trials, AI’s impact is becoming increasingly evident.

However, some often overlooked limitations highlight the importance of proceeding thoughtfully when exploring the use of AI in healthcare. Models that are improperly trained, fed low-quality data, or given tasks that exceed AI’s capabilities can lead to incorrect diagnoses and potentially harmful clinical decisions. During the COVID-19 pandemic, for example, errors in training or testing AI tools resulted in models that functioned improperly. In one instance, AI picked up on the fonts hospitals used to label scans, causing the model to falsely correlate fonts with predictors of COVID-19 risk. Diagnostic AI has also revealed biases, such as less accurate skin cancer diagnoses for patients with darker skin due to a lack of diversity in training datasets. Further, an evaluation of a sepsis prediction model found that the model failed to identify two-thirds of patients with sepsis—underscoring the dangers of relying on AI alone.

While AI holds massive potential to transform patient care, the industry is still in the early stages of exploring this technology. AI models cannot work effectively without high-quality clinical data. To develop accurate AI models, we must get the data right first.

This article discusses the value AI-driven data curation offers in hospital settings and clinical research. It explores common pitfalls associated with developing AI models for data curation, along with the factors necessary for successful AI implementation. Lastly, it shares why Q-Centrix is uniquely positioned to explore the use of AI to curate clinical data.

AI is poised to help hospitals derive greater insights from the vast amounts of data they have. Patients generate an average of 50 million gigabytes of data every year—and 97 percent of the clinical data hospitals possess go unused. When the vast majority of these data are unstructured, often taking the form of doctors’ notes, image scans, and other formats that require interpretation, making sense of this staggering amount of data is an impossible undertaking for any person or team—but an algorithm can be trained to quickly go through massive datasets and extract valuable insights.

AI also has the potential to make a significant impact in addressing clinical research challenges. Currently, nine in 10 drugs that reach the clinical trial stage fail to receive FDA approval due to challenges in the clinical trial process. Insufficient patient enrollment remains one of the biggest hurdles in clinical trials—and it’s the reason why 20 percent of cancer clinical trials fail. Finding eligible patients for a study often involves combing through electronic medical records (EMRs) and other information systems not built for clinical research purposes, which is very time-consuming.

Even research teams that manage to identify and enroll enough patients for a clinical trial may find that their sample is not representative of the general population. Many racial and ethnic groups are underrepresented in clinical research, emphasizing a need to improve diversity among clinical trial patients. With the aid of AI-powered tools to sift through data dispersed across various information systems and find patients that meet trial criteria, research teams may be able to conduct clinical research more efficiently. This can both greatly reduce the time and costs associated with patient recruitment and aid in increasing diversity in clinical trials.

In addition to improving clinical research processes, AI can support research that relies on existing patient data, such as observational studies. These studies can be conducted using the real world data hospitals and health systems already have (such as data from electronic medical records, billing and claims data, and other sources of information).

Although unstructured data and inconsistent data preparation practices are common challenges associated with observational studies, using AI-enabled techniques to curate these data allows facilities to overcome these barriers and produce custom, high-quality, research-ready datasets. These datasets can be used for facilities’ internal research purposes or for funded opportunities in which healthcare facilities contribute data to retrospective studies for sponsors in the pharmaceutical and life sciences industries. As observational studies are less expensive to conduct than clinical trials—and can be completed much more quickly—AI driven data curation offers researchers a valuable, cost-effective, and efficient way to gather findings and advance medical research.

Poor data quality. The standards for clinical data quality are very high, and high-quality data are essential for training and refining AI models to ensure their accuracy and reliability. Many failures in AI tools have been linked to the poor quality of data researchers have used to develop these tools.

Outdated data. New drugs and treatment pathways are developed every year, changing how medicine is practiced. Models trained on data from even a year ago would miss crucial insights from the rapid pace of innovation.

Documentation practices. Documentation practices vary from physician to physician and, on a larger scale, from facility to facility. A model trained on one physician's or one facility's data may not be applicable elsewhere.

Distinct EMR setups. Drastically different data capture practices occur not just across hospitals that use different EMRs, but even among hospitals that use the same EMR. Customized configurations, differing data entry protocols, or unique workflow integrations greatly alter how patient data are entered and stored.

Differences across care settings. The setting of care, whether inpatient, outpatient, academic, or community-based, introduces additional layers of variability. Each care setting may have specific requirements, workflows, and priorities that dictate data capture practices.

Combining technology with clinical experts. Experts in the technology sector don’t often have the medical knowledge necessary to interpret complex clinical datasets—a disconnect that ultimately led to the failure of many AI tools created during the pandemic. Because Q-Centrix relies on a combination of proprietary AI-powered software and clinical experts to curate data and perform quality checks, its approach bridges the gap between medical knowledge and technological expertise. Q-Centrix’s more than 1,300 clinical data experts have strong backgrounds in healthcare and abstract millions of cases each year.

Deep understanding of nuanced clinical terms. Clinical concepts can have different definitions depending on the context. For example, some registries define a family history of heart disease as having an immediate family member dying of a heart attack or stroke before age 60 in women and before age 55 in men. Off-the-shelf AI models may react to any mention of a family heart condition—regardless of its severity, the direct relationship, or age. This highlights the need for nuanced clinical context in data curation, which only clinical experts can provide.

Streamlined processes that prioritize data integrity. Q-Centrix is committed to maintaining the highest data integrity standards in the industry. Q-Centrix implements a series of quality checks throughout the data lifecycle, spending approximately 10,000 hours per month conducting quality-related checks on data.

User-friendly software. Q-Centrix’s offerings go beyond data curation to ensure that healthcare facilities have the tools they need to engage meaningfully with their data. Q-Centrix’s market-leading clinical data management software provides a comprehensive suite of analytics and reporting tools, empowering clinical and quality leaders to uncover valuable insights that drive clinical decision-making and quality improvements.

Experience. Q-Centrix has over a decade of experience managing clinical data for more than 1,200 hospital partners, making its AI-driven technology extremely well-trained in reviewing data to ensure data integrity.

Hospitals need to trust their data. When clinical data are the cornerstone of patient care, groundbreaking research, quality improvement, and so much more, ensuring the integrity of these data is paramount.

For AI-driven data curation to be meaningful and effective, it must be capable of maintaining the highest data quality standards—which today’s technologies can’t yet do alone. Due to the complexities of clinical data curation—and the risks inherent in low-quality data—a combination of clinical data experts, software, and optimized processes must be used alongside AI technologies to curate high-quality data effectively.

Q-Centrix is well positioned to lead in AI-driven data curation given our strong commitment to data quality, our experience curating clinical data for more than 1,200 hospital partners, our 1,300+ clinical data experts, and our investments in technology. Through our efforts, hospitals and health systems can improve data integrity and derive deeper meaning from their data. Moreover, life sciences organizations and research institutions can gain valuable assistance in identifying study patients and overcoming common research roadblocks.

As we move forward, we are excited to advance our use of AI technology while recognizing that, like all new technologies, it will require time, investment, and deliberate effort to ensure high data standards and continued progress in the healthcare industry.

The road to meaningful AI-driven data curation

Introduction

The potential of AI in healthcare

Unlock the value of clinical data

Improve clinical research

Support observational studies

Considerations in using AI for data curation

Common pitfalls in AI model development

Elements for successful AI implementation

Access to large volumes of data.

Ability to normalize large amounts of disparate data.

Consistently training and refining the AI model for accuracy.

Thoughtfully integrating AI technology into the workflow and user dynamics.

Consistent quality validation of the model to ensure continued accuracy.

Cost-effectiveness.

Q-Centrix’s unique role in leading AI-driven data curation

Conclusion