Data provenance & lineage

With the proliferation of data sharing, data analytics, and the diffusion of data in decision-making, a crucial concern has settled in the minds of decision makers, technologists, and analysts: How can you trust the data that you rely on? Trust in data relies on many variables, such as knowing where data comes from and understanding how it is processed. Answering these questions allows data users to approximate whether e.g. the claimed origins of data are genuine, whether data has been tampered with, or whether it is being used in ways that are explainable and legitimate. Data provenance and lineage address many of these issues, ensuring that data remains traceable.

This report provides an integrated perspective and guidance on how to ensure the traceability of data in practice. It is meant to be an information primer and guide, aimed at domain experts and decision-makers tasked with defining their organisation’s approach to data traceability. Practitioners should gain a more structured, categorical view of the relevant concepts, challenges, and technologies. To realise this objective, the report provides readers with:

  • An in-depth understanding of data provenance and lineage;
  • a clear sense for common applications areas and challenges;
  • and insights into approaches that allow the tracing of data provenance and lineage.