Healthcare Data and the Emergence of Artificial Intelligence: What’s Next?
digital healthcare AI image
Listen to our popular webinar recording.
Volume XXV, Issue 101 |


The healthcare industry is generating vast amounts of data, accounting for approximately 30% of the world’s data volume.i Growth in healthcare data has been significant in recent years, and the trend is expected to continue.


Between 2020 and 2025, the total amount of global healthcare data is projected to increase from 2,300 to 10,800 exabytes. This represents an annual growth rate of 36%ii and is growing faster than data from other industries such as manufacturing, financial services, and media and entertainment.iii

The key reasons for this rapid data expansion are the explosion of institution-recorded medical data (driven by factors such as increased global use of clinical medical records in hospitals, new imaging technologies including video capture and digital pathology, and an increase in the amount of information collected in -omics1 data) and externally recorded data (e.g., through digital personal devices including smartphones, wearables and home monitoring technologies).

This increased recording of data and its use have the potential to improve various aspects of healthcare, including significantly enhancing patient outcomes through upgraded and new drugs, medtech devices, wearables, and clinical services. Some data can also be used to increase operational productivity and the efficiency of the healthcare providers themselves. Typical third-party customers of data and associated services include pharma companies/ contract research organisations (CROs), medtech companies, tech companies, data intermediaries and insurance companies.

Additionally, healthcare data can also provide new revenue flows for both the data sharer and data user. Organisations sharing the data, such as healthcare service providers, have the potential to charge fees for access to their data sets and associated services. Our research shows that the price of a patient record varies significantly, from as low as c.€50 for a simple episode medical record to as high as c.€3,000 for a complex clinical data set with detailed genomics informationiv (see Figure 1). The advantages of acquiring data include faster validation for new drugs and reduced timescales for clinical trials.

With new drug development costing on average nearly US$1bn and the development timeline lasting several years, any insights from data that can speed up the time to market are of high value.

Many healthcare companies are already investing in the development of data strategies and corresponding business models, as the following examples illustrate:

  • In the US in 2021, several healthcare systems collaborated to form a data company called Truveta, to pool and analyse patient data for research and drug development. Total funding rose to US$200m and Truveta now comprises data from 30 health systems and daily updates data from nearly 100m patients from c.800 hospitals and c.20k clinics.

  • Other leading US institutions are also developing healthcare data platforms in conjunction with partners across the value chain. Mayo Clinic allows third parties to contribute their unique data sets to a common platform and all partners and customers have access to de-identified patient data.

  • Owkin, a medical AI company established in 2016, has developed a federated learning system connecting top academic researchers and data scientists with biopharma companies. Owkin had raised over US$250m in funding by the end of 2021, achieving ‘unicorn’ status (>US$1bn valuation) in 2021 and helping with the discovery and assessment of new drugs. Key investors include Sanofi, Mubadala, Bpifrance, Bristol Myers Squibb and others. Owkin also collaborates with renowned providers such as Cleveland Clinic, where it was one of 12 companies originally selected for their Discovery Accelerator program in 2018.

  • Healthcare IT companies are also creating data platforms and exploring new business models. Dedalus, one of Europe’s largest hospital and outpatient electronic medical record (EMR) providers, has created Open Health Connect, which consolidates healthcare data sets from multiple sources. In the US, Epic, a leading global player in the hospital EMR field, is increasingly supporting the exchange of healthcare data. Epic joined the Trusted Exchange Framework and Common Agreement in February 2023, and recently announced that 23 healthcare systems would be using its software to share information within this framework.

The 2023 L.E.K. Data Survey in Healthcare, Medtech and Life Sciences


To better understand the use cases, purchase behaviour, pain points and monetisation opportunities of healthcare data, we conducted a global healthcare data survey and interview campaign in early 2023.

The survey was conducted among 200 executives - decision-makers or influential in purchasing healthcare data in organisations, ranging in size from below 50 to over 20,000 employees. The survey included respondents from Europe, the Americas and Asia, representing executives from pharma, medtech, tech (including AI startups and established software companies), and health insurance, as well as data intermediaries (see Figure 2).

We evaluated respondents’ interest in a data set sample from one of our clients – a business with a network of outpatient specialist healthcare clinics. The sample data set contained an extract of over 200k patient records and was relevant to a wide range of potential use cases including primary care, cardiology, diabetes, oncology and urology. Our survey results provide valuable insights for companies looking to launch data-related offerings, including the importance of addressing concerns related to privacy, data quality and data standardisation.

Data customers and their use cases

Our survey targeted the following key stakeholders seeking healthcare data:

key stakeholders seeking healthcare data

Key challenges with data use

Though healthcare data applications are varied, data requirements are often consistent across customers and use cases. End users cite three key challenges for data usage:

  • Data privacy and security: 56% of pharma/CRO respondents reported that ensuring data privacy and security was highly challenging.v Ensuring that the data originator acquired the data in a compliant way and that it has been anonymised according to the right norms and regulations is paramount.

  • Insufficient data quality and completeness: The key concern is when data sets have clear gaps and/or lack sufficient detail, reducing their usefulness.

  • Lack of standardisation and/or scope: Data sets with inputs standardised across data originators and geographies allow linkages with other data sets and are highly valued but hard to obtain. Data users are also often interested in longitudinal tracking opportunities such as of linking outpatient and inpatient visits as well as imaging centres and laboratory results for a more holistic patient view.

Providers looking to offer data-related services need to focus on mitigating these common challenges.

In terms of geographic coverage, our survey also shows that interest in obtaining clinical data from Europe and Asia-Pacific is significantly higher than data from the US, as these regions are typically less represented in data sets. This is not surprising, given the US has led the world in the digitisation of patient data and is still significantly ahead of most geographies for fully digital clinical medical records.

Sources of healthcare data

Our survey feedback shows that customers typically acquire data from a wide range of vendors (see Figure 3), with an average of two to three data vendors per use case. Outpatient healthcare facilities are the most frequently cited data providers (47%), with health data marketplaces (36%) and private insurance companies (31%) also used by many respondents, highlighting the recognised commercial potential of data sets owned by these

Some governments provide data too, and in France, healthcare data must be provided for research (which can include private companies) free of charge.

Data regulations

It is important to understand the regulations governing how healthcare data can be shared with third parties. McDermott Will & Emery (McDermott)2 provide their perspective on our summary:

  1. With patient consent: In this scenario, data can be shared without the need for anonymisation leaving all patient data intact. Consent can often be gained from patients with complex diseases where sharing information might lead to better future treatment outcomes. Consent is difficult to obtain for legacy healthcare data. In our survey, c.25%- 35%3 of data was used with patient consent.
  1. Anonymised: Data is obfuscated or changed to make it impossible to trace back to any individual patient. This process reduces data value in certain use cases since there is information missing in the data set, but it is often still valid enough for certain types of research. In our survey, c.50%-75% of data was anonymised.
  1. Through federated or swarm access: allowing the data to be used without being transferred to the third party. Data does not leave the healthcare providers’ premises (or their cloud environment, where applicable) and the research algorithm or the analytics are deployed on the premises and ‘learn’ from the data there. When the algorithm/analytics are extracted, they have been trained on the full data from the healthcare provider, but the results cannot be traced back to any individual patient. Swarm access describes a very similar process. In our survey, c.5%-15% of data was already being accessed via federated/ swarm technologies.

Our survey results show varying levels of data anonymisation to accommodate regulatory constraints and customer requirements. We believe there is likely to be a shift towards federated and swarm access given the higher perceived privacy safeguards and data security. We expect this transition to be slow, however, since it requires significant investment in a technology and human capital. Data obtained with patient consent still retains higher utility, as it can be reused multiple times and analysed in different ways by the acquiring party.

Data as a service and how to leverage it

The market for data-led offerings is large, provided the data pool is interesting enough. Harnessing this opportunity requires careful consideration of the route to market and the target end users.


Our research estimated the total addressable market for our hypothetical outpatient care data to be nearly €2bn annually. With the EBITDA margin expected to be much higher than for routine clinical operations as the data is already captured.

Ultimate success is highly dependent on the route-to-market strategy. Providers have several options for this, including the direct to end user approach, the data intermediary approach, and the mixed approach. Regardless of the approach taken, providers must develop or hire in-house data expertise, build or commission a data technology platform, and set up specialist teams for business development and commercialisation of the data set. Direct to end user, intermediary, or mixed approach - depending on the route to market selected, the practicalities vary. Selecting the optimal approach requires a detailed assessment of the expected pricing power, share capture and opportunity for value-added services as well as the required investment to build the necessary capabilities. The direct to end user approach typically renders a higher price per record but requires significant investment in building a sizeable in-house data and sales team. However, the data intermediary approach typically achieves a lower price per patient record but requires a much smaller data set and sales team and can usually be launched much faster.

Price per patient record also varies significantly across the end user segments, depending on the value to their use case. Pharma/CRO and medtech companies (in our example) represented the greatest opportunity with the highest willingness to pay for this specialist data, with their average price per patient record being c.€900 and c.€700, respectively. By contrast, the average price per patient record for insurance companies is €50.vii Figure 4 below shows price ranges for different care settings and different types of data based on our primary and secondary research.


The implication for healthcare providers is clear - launching a data-led offering has the potential to be a profitable new business area. For impact investors in the healthcare sector, these offerings can also lead to faster new drug, medtech and/or AI development and better patient outcomes in the medium to long term.

However, there are crucial issues to consider:

  • The data set itself must be fit for purpose. A data set that is compliant with privacy and security regulations, complete (ideally longitudinal) and highly detailed, standardised to a format that is transferable across data originators and geographies, and with broad geographic coverage is best.

  • The appropriate route-to-market strategy must be carefully selected through analysis of implementation costs and monetisation profiles. Implementation costs are driven not only by the business development team, but also by a suitable data architecture that needs to be developed, a technology platform/provider that needs to be selected, and the day- to-day running of the business operations.

With knowledge and care, healthcare providers can position themselves to launch a successful new business with data-led offerings that can grow in value as new patient data is continually added to the existing base.

Adjacent companies such as healthcare IT companies providing EMRs, practice management systems or imaging software have opportunities too. They should begin developing their strategies and consider investing in and developing corresponding data platforms so that their clients are also enabled to launch these new data-driven business models and/or can better leverage data and benchmarks to enhance their own operations.

How L.E.K. Consulting can help

As data becomes an increasingly important part of the healthcare and life sciences landscape, we have extensive experience advising and supporting business leaders and their companies on their data vision and strategy, and can help develop actionable plans to deliver tangible benefits.

The authors would like to thank Amy Owens, Associate; Adorjan Gyarfas, Senior Associate Consultant; and Kunle Olaoye, Consultant, for their valuable contributions to this work.

L.E.K. thanks Sharon Lamb and Deniz Tschammler from McDermott Will & Emery for their contributions on legal analysis in this article.

L.E.K. Consulting is a registered trademark of L.E.K. Consulting. All other products and brands mentioned in this document are properties of their respective owners. © 2023 L.E.K. Consulting


1Genomics, transcriptomics, metagenomics, proteomics, metabolomics, inflamomics, lipidomics, glycomics, etc.
2This communication has been prepared for the general information of clients of L.E.K. Consulting. You should not rely on the contents. Although it touches on certain legal and regulatory aspects it does not constitute legal advice and should not be regarded as a substitute for legal advice or any recommendation.
3The ranges result from different percentages driven by stakeholder group — e.g. usage of non-anonymised/pseudonymised data is higher with insurance companies than with life sciences companies.
i-iiiRBC Capital Markets
ivValue range of patient clinical data was derived from both primary and secondary L.E.K. research, based on direct survey results and transactions that were valued on data volume and quality
v-viiThe 2023 L.E.K. Data Survey in Healthcare, Medtech and Life Sciences

Questions about our latest thinking?

Related Insights