What is big data?

The term “Big Data“ is increasingly used to describe very large collections of data, called repositories.

The term originates from the discipline of computer science. It was initially used to describe large, rapidly growing and complex databases. Big databases have high volume of data, hold a variety of data that has high velocity and needs to have veracity.


The communities of patients with rare diseases, their health professionals and scientists are trailblazers in how to work together to advance our understanding of disease and the best treatments


In healthcare the term Big Data is most often used to mean large healthcare databases (such as electronic health record systems) or networks of interconnected healthcare databases (called ‘linked’ databases) coming from multiple healthcare organisations. Big data repositories might typically contain information on a million or more patients, perhaps reflecting the population of a health region, or country, or all of the people with a particular condition across Europe.

We need to study the data on large patient numbers in order to conduct certain types of research. For example, we would use big data to identify very specific or unusual patterns of a health condition, to investigate the impact of different treatments used to treat a condition, or to discover rare side-effects or long term health outcomes. These might occur only in a small proportion of those million or more people. Because we have this research need, more and more money is being invested by countries and by companies to establish networks that can permit large-scale data analysis. We will expect to see a growing number of new insights and medical advances from the analysis of big data, and hopefully we will see new medicines, new medical devices and smart applications to support healthcare professionals and patients being developed more quickly.

Examples of benefits to society that may arise from knowledge from analysing big data include:

  • increase the effectiveness and quality of treatments,

  • identify risk factors and thus preventing diseases or conditions

  • improve patient safety by delivering patient information directly to healthcare professionals,

  • predict outcomes and identify pathways in disease transmission, making them preventable,

  • disseminate knowledge,

  • reduce inefficiencies by identifying healthcare systems that do not work well

Examples of published findings derived from big data, using databases of a million patients or more, include:

  • Validating >200 novel biomarkers predicting cardiovascular risk

  • Investigating variation of 174,000 observed national prescribing patterns to national guidelines for COPD

  • Comparing ~8,000 treatment outcomes for leukaemia by age: uncovering a major unmet treatment need

  • Developing new cancer risk stratification algorithms by mining >700 million records

You can find these and other examples of research using big data in our case study collection