Storing data has costs, both economic and environmental—and both of these costs are often invisible to consumers. Many cloud-based companies with freemium or ad-based revenue models give gigabytes of storage to consumers for free. Google, for example, provides 15 GB of free storage, meaning cluttered inboxes can remain untrimmed for years, creating digital waste. With 1.5 billion Gmail users worldwide, just 1GB of old and ignored emails per user would mean 1.5 exabytes of storage capacity requirements in data centres.
But of course, unwanted or forgotten data has value, which is one reason why digital platforms do not worry—and, in fact, have a vested interest to ensure that old data is kept so that information can be mined to extract as much value as possible. Google conveniently provides search tools so that users do not need to categorize and cull masses of data—they can mine their own datasets to find what might be lost. The incentive to encourage users to save data is high.
As more sectors rely on mining data to create efficiencies, the incentives to create and store data multiply beyond consumers creating their own data. Autonomous vehicles generate 300 gigabytes to 5.4 terabytes of data per hour, while all sensors could record between 1.4 terabytes to around 19 terabytes per hour. As autonomous vehicles struggle to make sense of their surroundings, it may be helpful to keep this data for later study—data points that engineers consider unhelpful in early iterations may find them useful in latter iterations. As another example, the possibilities to understand student performance over their educational lifetimes could create the incentive to collect numerous data points over a span of multiple decades. And given that ongoing research will want to find new correlations between student performance and some aspect of their lives, the incentive is again to keep as much data as possible for research.
Keeping data of unknown value has additional costs beyond what it takes to store because it also has data privacy (and even cybersecurity) implications. The more points data collected and combined, the more vectors exist for that data to be used for unintended purposes. Yes, in Europe, the General Data Protection Regulation (GDPR) provides a framework around how personal data is collected and processed, which affords some protection to ensure that data is collected with consent and only for the purposes for which it was intended. Nonetheless, the concept of “legitimate interest” provides scope to keep data for an undetermined time.
Ever increasing levels of storage provides a lot of convenience—rather than worrying about what data is legitimately required to keep, data centres can continue to store data without giving consideration to its worth. But there are costs involved to indiscriminate data collection and storage.
About the author
David Regeczi is a managing consultant for the Digital Economy at Capgemini Invent with 13+ years’ experience in economic development for high-tech industries in both developed and developing economies, focused on innovation and competitiveness in various subsectors of ICT. His focus is on helping policymakers better understand how the digital economy continues to transform public policy, focused on competition, fair relationships, and the economics of data.