Skip to main content

Social Science One and the Facebook URLs dataset

 

“The ivory tower (of governments and academia as custodian of data) has to be broken down. We actually have to integrate more into the rest of society, understand what it is they're doing, why they're doing it, showing them that they can make a big difference in the world.” 


Prof. Gary King - Social Science One and Harvard University

 

 

 

 

 

 

Soon after Social Science One announced on February 13 2020 the readiness of their "Facebook URLs dataset" inaugural project, the SCDS team had the privilege to interview prof. Gary King, one of its founders.

A new form of partnership to share data

As described on the organisation's website, Social Science One implements a new type of data sharing partnership between academic researchers and private industry to advance the goals of social science in understanding and solving society’s challenges. The partnership enables academics to gain access to and analyse information from the private industry in a manner that is responsible and socially beneficial. In addition, it ensures that the privacy of the people described in the data is protected while gaining societal value from academic research. Finally, it enables the companies willing to offer their data to support research and produce social good, without compromising their competitive positions.

The first project

Social Science One's inaugural project in partnership with Facebook can be used to explain its model. Following the 2016 presidential election in the United States, the effects of social media - and of Facebook in particular - on democracy and elections have become a hot topic. The platform wanted to collaborate with researchers to limit speculation and fully understand the phenomenon. However, at the same time, there was no obvious way for the company to share the so called "URLs dataset".

The URLs dataset is the collection of about 38 million URLs shared by Facebook users worldwide that triggered at least 100 interactions (viewed, shared, liked, reacted to, shared without viewing...) between 1 January 2017 and 31 July 2019. It also includes detail such as in which country they were shared and whether they were fact-checked or flagged by users as hate speech, and the aggregated data concerning the types of people who interacted with the URLs. 

How does it work?

Social Science One was created to act as an independent intermediary between the researchers and the data provider. The provider - e.g. Facebook - entrusts Social Science One to evaluate the academics' research proposals according to their academic merit and to release authorisations accordingly. The board reviewing the applications is a group of academics, who are committed not to research the topics for the duration of their participation to the board, as a guarantee of independence. In prof. King's words, they "take one for the team".

The challenges of the data and the model

It took 20 months from the original idea to the moment Social Science One was ready to start releasing authorisation to researchers. There were many challenges, including legal, technical and related to how to communicate the project.

From a legal perspective, among the challenges were creating first the best form of organisation for Social Science One, and supporting Facebook's legal department to get to a point where they could feel comfortable to share their data.

From a technical perspective, the main challenge - besides the complication and cost of maintaining a dataset this size - was to protect the privacy of Facebook users. Conventional anonymisation was unfortunately not an option, as user behaviour on a social platform offers itself to re-identification more easily than other kinds of data. Strong aggregation was also not suitable, as it would have compromised the detail of the original information. Social Science One opted for "differential privacy": a technique that manipulates a dataset in order to preserve its statistical characteristics while ensuring the anonymity of the Facebook users.

Finally, even just communicating the project is a challenge, and not a secondary one. During our interview, prof. King highlighted how it is a key element of success as much as the other elements that make data sharing possible. Differential privacy, for example, is a relatively new technique most people do not know about. Presenting Social Science One's proposition every time needs to be accompanied by a long explanation of how it was made possible: legally, technically, and in terms of respecting people's privacy.

Is this the future of social science research?

Governments and universities used to be where the best methods and the most valuable data for social science research were - highlights prof. King - but this is no longer true. Successful digital platforms today have a better visibility of social behaviour than anyone else. Academia needs to leave its ivory tower to work with them, showing them that they can make a big difference in the world.

 

Name Social Science One
Sector Academic research - Social sciences
Region USA / World
Countries Any
Time 2019 - ongoing
URL https://socialscience.one/
Business model  Non-profit
Participants Social Science One is a non-profit organisation being incubated at Harvard's Institute for Quantitative Social Science. Founders were prof. Gary King and Stanford University's Nathaniel Persily.
Type of organisation Non-profit. More detail on the organisational structure is explained in the paper “A New Model for Industry-Academic Partnerships” by Gary King and Nathaniel Persily.
Data sharing model(s)  Academic-private partnership for the independent administration of access to confidential data for research purposes.
Core impact  The project enabled the release to social science research of what is likely the largest ever dataset relevant to studying user behaviour on social networks, the distribution on news on such platforms, disinformation etc.
Context Social Science One implements a new type of partnership between academic researchers and private industry to advance the goals of social science in understanding and solving society’s challenges. The partnership enables academics to analyse the information available to private industry in responsible and socially beneficial ways. It ensures that the public maintains privacy while gaining societal value from scholarly research. And it enables firms to enlist the scientific community to improve their business and produce social good, while protecting their competitive positions.

 

Prof. Gary King
Image credit:
(c) 2020 Support Centre for Data Sharing

Comments