Adding the “Real” to Real-World Data

Picture of Collin Labar

Collin Labar

Collin has spent his career advising pharmaceutical and biotech companies on their medical, commercial, and market access strategy. After years as a consultant, he joined Lynx.MD to lead partnerships, shaping data strategy for life sciences and AI companies

Table of Contents

The Irony of “Real-World Data”

Have you noticed how many companies now include “AI” as a buzzword on their websites or in their product descriptions? How many do you think actually have advanced “Artificial Intelligence” technology (as opposed to basic machine learning, predictive models, or simply a backend connection to ChatGPT)?

For years, the same has been going on with the term “Real-World Data” (RWD). The life science industry has adopted this buzzword to represent data supporting observational research. Yet, most of the RWD used lacks key variables that are required to understand the holistic patient experience.

New technology and partnerships have enabled direct access to data sources housing valuable clinical information. While many companies have begun to explore these new databases, adoption has been slow. Why is one of the most innovative industries dragging their feet?

Let’s take a step back.

What is “Real-World Data”?

The Federal Drug Administration (FDA) defines real-world data as: data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources. Examples of RWD include data derived from electronic health records, medical claims data, data from product or disease registries, and data gathered from other sources (such as digital health technologies) that can inform on health status.

Now, go search “real-world data” on the internet (don’t worry, I did it for you). What you will find is a variety of companies offering RWD or RWE (real-world evidence) research services. That sounds great! Until you dive into the limitations of that RWD.

The Limitations of Traditional “Real-World Data”

When exploring RWD access, you often see datasets that contain medical claims data, linked to structured EHR data, and if you are lucky some of the patients may have laboratory results linked to their data.

Sure, this aligns with the FDA’s definition of RWD, and these datasets work great if you want to understand the “whats” about the patient journey:

  • WHAT treatment did an Inflammatory bowel disease (IBD) patient switch to after adalimumab?
  • WHAT procedures did the patient undergo?
  • WHAT was the timeline of key events (procedures and treatment switch)?

But what if you wanted to know why a patient was switched from adalimumab? Claims and structured EHR data have holes that limit the answers to important questions such as:

  • How was the patient responding to adalimumab?
  • Were symptoms still occurring, impacting that patients’ well-being?
  • What was the observed inflammation activity of the gut?
  • Was the patient experiencing Adverse Events (AE) or comorbidities that required a treatment switch?
  • Did the patient develop antibodies to adalimumab?
  • Or does the patient simply prefer a different route of administration / uncompliant?

Unlocking the “Why” Behind the “What

With the help of novel technology, the paradigm of real-world data has experienced a massive shift.

In the past, medical claims data and structured EHR data were the only data sources that were easily accessible, actionable, and compliant. Now, researchers have access to all the data siloed and locked within unstructured data sources.

Think about a patient that is going to a gastroenterologist for an IBD flare. When they visit their doctor, a nurse will take the patient’s vitals and start to type answers to questions about their symptoms, medication adherence, steroid / Over-The-Counter (OTC) treatment use, overall health, and quality of life. At that point their doctor will come in to probe more and suggest an endoscopy. During the endoscopy the physician will record details around level of inflammation, location of inflammation, and may even take a biopsy for pathology. The doctor may then recommend a treatment switch or other management interventions for the patient to undergo and record it in the patient’s record.

How can researchers fully understand the patient experience without accessing all of the clinical details mentioned above? There is key information impacting treatment decisions, such as symptom occurrence and severity, medication compliance, observed levels of inflammation, and pathologic findings.

Advanced technology has unlocked this clinical data, siloed across data sources and locked within unstructured fields, to be ingested, mapped, and de-identified for research. Companies like Lynx.MD, Truveta, and nference , have emerged to take “real-world data” to the next level, with deeper data access than ever before.

“Traditional” Real-World Data Sources

  • Medical Claims (Diagnosis, Procedures)
  • Prescription Claims (Prescription Details)
  • EHR Data (Typically Structured Fields)
  • Linked Lab Results from Outside Party

“New” Real-World Data Sources (Lynx.MD, etc.)

  • Medical Claims (Diagnosis, Procedures)
  • Prescription Claims (Prescription Details)
  • EHR Data (Structured Fields and Unstructured Free Text Notes)
  • Lab Results (PDFs and Reports, Directly from Source)
  • Procedure Reports
  • Pathology Reports
  • Precision Medicine Findings (PDFs and Reports)
  • Images / Videos
  • Referral, Coordinated Care, and Hospitalization Discharge Notes
  • Other Scanned Reports
  • Invoices and Cost Data
  • Internal Communications (Office Notes, Insurance Paperwork, Patient Portal Messages
Unstructured RWD goes beyond the EHR to move from the “What” to the “Why”

Sure, registries are organized to gain access to deeper clinical data. Although, registries are expensive, require doctor and patient recruitment, only have a limited number of patients, and typically do not include diverse patient populations because of the centers they are hosted at.

These new technology-enabled companies have access to rich clinical data for millions of patients across the US. Each patient has 10,000 data points associated with their health journey, often in unstructured format (free text, PDF documents, audio, scanned reports, images). Variables can be extracted from millions of these files in a matter of days, as opposed to years.

Real-world datasets can be curated, de-identified, and made actionable at the click of a button. No need to spend months to years recruiting patients for clinical studies or working with large health groups just to collect a small, static, real-world dataset.

Expanding the Data Science and Evidence Generation Landscape

Often times, research questions that require this level of data access are not included in evidence generation planning, because there was no data source available to answer the questions. Life Science companies now have the ability to expand their research and accelerate business activities using this RWD by:

  • Understanding real-world effectiveness and safety of their products (and compared to competitor products or no treatment)
  • Analyzing the impact of clinical outcomes and disease progression on treatment and management decision making
  • Running inclusion / exclusion criteria on patient populations to better identify sites and proactively recruit eligible patients for accelerated enrollment
  • Developing algorithms to analyze clinical data and flag patients eligible for research in real-time, reducing touchpoints needed to engage patients
  • Identifying the impact of precision medicine technology and improving its utilization
  • Training and validating AI models with real-world patient data to generate evidence for FDA submissions (think colonoscopy images for polyp detection and reference reports to test against)
  • Creating digital twins for clinical trials to minimize control groups
  • Improving clinical quality and standardization of care
  • Improved diversity in patient cohorts and a better understanding of health equity and disparities

This level of real-world data access has its limitations too, typically due to lack of data access generated by physicians outside of the groups sourcing the data. These datasets are also smaller than medical claims datasets, which analyze millions of patients as opposed to thousands.

While all data sources have their optimal use cases, robust real-world clinical data can expand and accelerate research going on across the industry and take data science to the next level. As awareness of these new data sources increases, more companies and more research teams will start to recognize the increased value it provides.

Holding True to Innovation in Life Sciences

Life sciences companies have a massive opportunity for healthcare research innovation right in front of them. Whether it is discovering new / improved treatments, accelerating the clinical research process, or communicating key outcomes impacts, we can do a lot better to ultimately improve patient care.

These new companies have laid the groundwork by developing novel technology, proactively forming new networks, and processing massive amounts of data to be actionable. Life science companies just have to ask the right questions, since this data contains all of the answers. Each day life science companies go without leveraging this new technology, they are wasting valuable time. The future is here, and now it is time to accelerate research innovation.

Scroll to Top