IBM and data sharing

“After [the AI] does an engagement, (…) just like the people (…) going to a customer and consulting with them, you walk out of the engagement and you’re a different person, you’ve learnt something from your experience in solving that customer’s problem (…). The machine, basically, is doing something similar, or at least analogous.”

Chris O’Neill, Associate General Counsel for IP at IBM

IBM is not new to the open technologies, from quantum and blockchain to containers, AI, and operating systems, data and software. The company is actively involved in many mainstream projects, 1 and, more recently, its interest extended to the space of open data and data sharing. SCDS’s Hans Graux and Gianfranco Cecconi have had the opportunity to interview Chris O’Neil: IBM associate general counsel, intellectual property law, specializing in data-related matters and law.

Specifying the value of data is a challenge. The experience being developed around the topic is limited, and often bound to old-fashioned information technology models. Before the Cloud, companies, used to operate resources they owned physically: the servers, the data stored on those servers, the software processing it, the networks. The “borders” surrounding all of these assets were clear.

Today those borders are often immaterial. Servers, data and software are used every day that could be anywhere in the world. Their physical location lost great part of their meaning, if not in court for the applicability of a legislative framework or another. The word “ownership” itself can be misleading with data, and talking about intellectual property rights and licences is much more meaningful than about “data ownership”.

The problem becomes even more articulated when focusing on the value of the insight generated from data, or the learning gathered by that insight. What rights does the owner of an artificial intelligence (AI) has, for example, to exploit the learning from data held by a client?

Who owns the insight? Who owns the learning?

IBM faced this riddle when they introduced their Watson family of products. With Watson, the company offers a service that – in short – processes the clients’ data outside of those safe, known borders of the office. The legal aspects related to the data being processed that way, and to the data being produced as the outcome of the process – the insight, the learning – were novel.

To make the picture more complicated, with AI there is no clean separation between the software and the data being used to train the AI. The Watson software, owned and operated as a service by IBM, becomes smarter thanks to the data held by the clients, data that is most times proprietary and confidential.

How could this conundrum be solved, and in a way that would stand the challenge of law?

Artificial intelligence like human consultants

One of IBM’s intuitions in dealing with the issue was to consider AI’s as similar to human consultants, learning as they work for their clients, and leaving the projects as “different persons”, more skilled and capable.

It is commonly accepted how this category of professionals can capitalise on its learning, build on it, and re-use it as they work for the next client. They don’t take with them the source’s commercial secrets, but rather develop language, skills, and experience.

What happens, though, when that next client is a competitor of the first? For human professionals, it is not an issue: work ethics will prevent the consultant from sharing confidential information from the source. However, they will be free to re-use the acquired skills.

Indirectly, the consultancy industry educated the business to a culture of sharing. We all benefit from the accumulated experience of those professionals, as they perform their work across many different industries and organisations.

and the need for “data spaces”

Healthcare and pharmaceutics were among the first industries IBM decided to develop Watson in. Chris O’Neil observed how, in most industries, the value associated to data is to be found in the insights that the data can produce, but healthcare was different. IBM realised that, in the sector, the data itself has commercial value. For example, the data created as a new chemical component is developed, the history of its evolution, the clinical trials and the medical records of the patients involved are valuable per se, as it captures, as capital, the large research investment effort.

This capital can be exploited beyond the company that holds the data. Sharing is necessary to exploit its potential, to cross-over the data with other data resources held by others. At the same time, sharing the data openly, for anybody to re-use, is hardly commercially acceptable, if not simply an impossibility because of the need to protect the patients’ privacy. One option is to anonymise or aggregate data before sharing, but that can hinder substantially its potential to support further research. It becomes inevitable then to implement models where access is restricted – e.g. to known researchers only - and the rights on that data – e.g. limiting the research to topics the patients provided consent to. These protected arrangement in Europe are often called “data spaces”, e.g. in the European Commission’s latest strategy for data. 2

The role for licences

All of the aforementioned legal elements of data sharing, from model to restrictions to rights to consent, are captured in licences and contracts. Through Chris O’Neil’s work IBM has already contributed to the specifications of licences such as the Linux Foundation’s “Community Data License Agreement” 3 (CDLA). The licence specification captures and acts upon many of the learnings from IBM’s experience with Watson.

Looking at the future, one of the challenges to effective data sharing is the potential fragmentation and extreme specialisation of licensing terms. Fragmentation creates friction to sharing and the exploitation of the opportunities arising from data. Initiatives such as the CDLA and the work of others in the licensing community – including the Support Centre for Data Sharing itself for the EU - will be instrumental to converge effort and maximise potential.


We recommend watching the full video interview to get Chris’ insight on additional aspects related to data sharing, from the difference between a licence and a contract, to the intellectual property of collections of public domain data, or the “hub and spoke model” for liability and more.




Software and related advisory services


USA / World






Business model

Commercial with open elements


IBM is a multinational corporation active in cloud, information technology, software and AI technologies.

Type of organisation


Data sharing model(s)

Miscellaneous, from open to shared

Core impact

IBM has contributed to the development of licence agreements dedicated to data sharing, such as the “Community Data License Agreement” (CDLA) by the Linux Foundation.


In the context of developing its Watson line of services, IBM needed to explore the legal aspects around the intellectual property rights of its clients and its own, when applying artificial intelligence technology to third parties’ data. This brought to IBM’s investment and contribution to the wider licensing community working on standardised terms for data sharing.

For questions and comments, please visit our forum on Futurium.