Skip to main content

Data sharing to accelerate research

20.02.2020
Laura van Knippenberg
Opinion

Academic research needs data and there is so much of it available today. Yet, the share of data that can be used for research has never been more limited1. What good is the vast amount of data that is collected if we do not allow researchers to generate insights from it? Shared data and open data can fuel research which can help us understand the world around us better. Much of the data out there is hidden away deep in the bowels of organisations – purposefully or because the value of it for others is not understood. Publishing datasets as open data or making it available for use and re-use under specific conditions (data sharing), can accelerate the way we do research. In turn, this research can help us make more informed decisions.

For example, our ability to understand the state of biodiversity is already substantially impacted by open data, through a vast amount of available species data in e.g., the Global Biodiversity Information Facility (GBIF).2 This open data on biodiversity can play a role in further developments and research that will help policymakers be more informed about biodiversity issues. In turn, better policies can be created for improving biodiversity. This principle is not only true for biodiversity, but could reflect in a wide array of topics, such as education, healthcare, public services, business, and social sciences.

In Europe, open science policy aiming to support the publication of scientific research as open data, has developed progressively since 2016. Open science is defined by the European Commission as:

a new approach to the scientific process based on cooperative work and new ways of diffusing knowledge by using digital technologies and new collaborative tools. The idea captures a systemic change to the way science and research have been carried out for the last fifty years: shifting from the standard practices of publishing research results in scientific publications towards sharing and using all available knowledge at an earlier stage in the research process3

One of the ambitions of the European Commission refers to having results of EU-funded scientific research shared as open data by default. Moreover, the development of the European Open Science Cloud (EOSC), a ‘federated ecosystem of research data infrastructures’, will allow the scientific community to share and re-use publicly funded research results and data across borders and scientific domains. 4 

However, not all data is necessarily suitable to be open data. Publishing datasets as open data means making it available for re-use to the public free of charge or at least at marginal cost. Often, limited resources and concerns about compliance with legal restrictions, especially regarding datasets that include personal information, result in measures that could decrease the value of the dataset. Sometimes, alternative approaches to making data openly available, such as data sharing, e.g. only with registered organisations, can be just as effective without impacting the value of the datasets (see Esther Huyer’s (2020) opinion piece5 for more information). 

One such alternative approach is recently developed. On February 13th, Social Science One announced that data from Facebook is now available to academic researchers via a data sharing model. Specifically, it is data on about 38 million URLs that are shared on Facebook and aggregated data on the views, shares, likes, reactions, and other interactions with these links. Yet, there are some challenges, as Gary King, co-chair of Social Science One, explains in an interview with the Support Centre for Data Sharing (see the upcoming practice example on SCDS). Some organisations do not yet have the legal, engineering, and data science infrastructures in place, for a data sharing initiative of this kind of magnitude. Therefore, a new organisational model of industry-academic partnerships is adopted, in which interests of the organisations and academic researchers are safeguarded. In addition, a regime of “differential privacy” is applied to facilitate data access while complying with applicable privacy laws. This way of data sharing enables academics in the field of social science to gain “unprecedented insights into behaviour and communication on social media6  which can, in turn, benefit society in several ways. 

These types of data sharing can open up potential new areas of research that would not be feasible to gather data on without data sharing initiatives. Another benefit of data sharing and open data in academia, is more efficient or productive science. Researchers spend a substantial amount of their time on producing a dataset for their papers.7 When researchers have the option to re-use open or shared data rather than having to produce it, they have more time to produce scientific products or to start other scientific inquiries, leading to increased productivity in science. 

All in all, sharing data – whether in the open or between selected parties – is incredibly useful for creating value from data in academic research. The impact and implications of data sharing in academia is underscored by the previously mentioned examples of the Global Biodiversity Information Facility for biodiversity as well as the initiative of Social Science One for social sciences. On the other hand, academics need to step out of their “ivory tower of the university” (Gary King, 2020)8 and be open to these industry-academic partnerships to keep research meaningful and put to practical use. More creative approaches to data sharing are needed that can not only mitigate the legal risks but also find a balance between the – sometimes – opposing interests of commercial organisations and academic researchers. This way, we can accelerate the way we do research and make more informed decisions based on it. 

Photo of laptop
Image credit:
Photo by Junior Teixeira from Pexels