Big data - the big promise of the new digitised world
Big data is a widely used buzzword in today's information era. The use of big data in the digital world presents both an opportunity and a risk. Mass data is now used and analysed in almost all areas of life. Even the healthcare sector is undergoing extensive digitisation.
Searching for the term “big data” on the internet results in around 100 million hits within a fraction of a second. But what does it mean? “Big data” usually has two meanings. Firstly, it reflects a current-day problem, namely the ever-increasing flood of data, which long ago surpassed the capacity of human imagination and is close to exceeding the limits of traditional analysis methods (Klein). Secondly, the term also encompasses the solution to the problem. When people refer to “big data”, they usually also have in mind the techniques and analytical methods used to capture, evaluate and store this flood of data.
Dazzling ambivalence
The term “big data” is relatively new. As recently as 2008, the term decorated the title page of a special issue of the science magazine “Nature”, highlighting an urgent problem in the world of research, i.e. how to deal with the growing amount of data generated by experiments and simulations that exceeds all previously known and manageable orders of magnitude. Two years later, the business magazine “The Economist” looked at the data deluge in terms of its economic value, debating its opportunities and challenges.
In late 2017, the German Ethics Council published a statement – “Big Data und Gesundheit – Datensouveränität als informationelle Freiheitsgestaltung” ("Big data and health – data sovereignty as informational freedom”) - which offers a “working definition” of big data. Based on international discussions and findings, the German Ethics Council concluded that there was no uniform definition of the term (cf. de Mauro). It also attributed to big data a property that is especially important for the healthcare sector, i.e. the possibility of exchanging and recombining data.
According to the German Ethics Council, big data is about using large amounts of data to identify patterns and gain new insights from the latter. Moreover, due to the vast quantity and diversity of data and the speed with which the data is collected, analysed and recombined, using big data requires the application of innovative IT approaches that are continually progressing (German Ethics Council, 2017, p. 36).
Suddenly, all data can become relevant to health
It has been around three years since the association “Deutsche Hochschulmedizin” (position paper on IT solutions, p. 2) claimed that one third of the data collected and exchanged worldwide will in future be collected and exchanged in the healthcare sector. In its 2017 statement, the German Ethics Council goes one step further, stating that the logic of big data is to constantly recombine personal data from an increasing number of sources, so that almost anything can become relevant to health, including data from so-called wearables, sensors or statements in social media (German Ethics Council, 2017, p. 4ff., 54).
In the healthcare sector, big data comes from a variety of sources: clinical and cohort studies, electronic health records (which are still under development in Germany, but already offered by private companies), registries, biobanks and panomics (i.e. the combined use of all omics data generated with high-throughput molecular technologies). The UK Biobank is an impressive illustration of where this development will take us. In 2020, the UK Biobank intends to make available for research the genomic data of half a million British citizens. Already at present, pharmaceutical companies are paying for access to this data treasure trove, hoping that it will boost drug development and biological research (Technology Review, 8th January 2018).
Big IT companies are beginning to focus their business on human health
However, the necessary infrastructure is still not in place in many healthcare areas, thus preventing the systematic use of the big data that would be required for bringing together, analysing and evaluating health-related data such as ECGs, blood values, images, billing and –omics data of all the patients in a particular hospital or hospital chain.
In addition, the age of digitisation affects virtually all areas of life, and is characterised by data generated by the internet, mobile communications companies, published in scientific papers, recorded in medical settings or with medical assistance and medical devices, as well as data from so-called wearables and similar activity-tracking devices. The large number of shares, investments, research projects and collaborations suggest that IT companies such as Alphabet, Google’s parent company, have high expectations for the healthcare data business (Gigerenzer, 2016, p 34f).
The smart combination of health data is expected to increase dramatically and contribute to securing medical knowledge in the future. Computers can independently generate new knowledge from these data with methods such as knowledge modelling, machine learning and cognitive computing (see article entitled “Eliciting reliable information from big data with classifiers and multimodal data fusion”). Stakeholders in the healthcare sector are hoping that the integration and analysis of these data will help identify previously unknown patterns and generate new insights and decision-making tools. This opens up opportunities for improving diagnostics (e.g. the panomics-driven identification of biomarkers), therapy, prevention and for reducing costs.
The field of oncology is pioneering the use of big data
It is in the field of oncology in particular where big data from research is combined with a broad range of clinical data under what is known as precision, individualised or stratified medicine (acatech, p. 48f). Precision medicine is expected to make possible more accurate diagnoses and personalised therapies based on the molecular profiles of individual patients (see article entitled “Big data make therapy work better”) However, the necessary conditions will first have to be put in place at university hospitals, both as regards patients as data providers and (non-) medical researchers. With the “Cornerstones for a Heidelberg practice of genome sequencing” (Marsilius-Kolleg of the University Heidelberg and the EURAT group, 2013) researchers from the University of Heidelberg and the Heidelberg-based DKFZ have drawn up their own self-regulation in the era of big data that seeks to achieve a balance between patient well-being, freedom of research and clinical progress.
In the healthcare sector, big data is used by a large number of stakeholders with partially conflicting interests (German Ethics Council, 2017, p. 11). Biomedical researchers hope to gain a better understanding of scientifically relevant relationships and processes through the application of imaging and molecular biology techniques. Healthcare (the so-called primary market) stakeholders expect the utilisation of big data to lead to more personalised treatments and greater effectiveness and efficiency. Insurers and employers expect big data to provide them with accurate profiles of individuals and groups of people. Global IT and Internet companies (the so-called secondary market) strive for better commercial exploitation by linking and analysing health-related data with numerous other pieces of information. Affected people or patients hope that the growing use of tracking systems will provide them with relevant information on healthy lifestyles and personal well-being (German Ethics Council, 2017, p. 11ff).
Will evidence-based medicine come under attack?
While big data creates opportunities, it is also associated with risks (dual use). Experts believe that in the highly sensitive healthcare sector, the increasing use of big data will challenge traditional data protection rights, especially as far as patient data is concerned. A number of people (see article entitled “More data does not automatically mean more information”) have warned against the danger of replacing theories and laws with correlations and data patterns, thus insidiously abandoning evidence-based medicine as the gold standard and attaching greater importance to correlations than causalities (e.g. science theoreticians such as Prof. Klaus Mainzer, 2015). The German Ethics Council concludes that big data analyses need to be complemented with other data and be verified (German Ethics Council, 2017, chairman Prof. Steffen Augsberg, press conference of the German Ethics Council on 30th November 2017).
The Ethics Council further states that the potential benefit of big data depends on the expertise and integrity of individuals and institutions that generate, select, link and interpret data (German Ethics Council, 2017, p. 45). In light of the hype about big data fueled by various interests, the Ethics Council concludes that it would be misunderstanding the issue to believe that more data automatically leads to more knowledge about causal effects (German Ethics Council, 2017, p. 47).