[midPoint] Blog: Data Provenance Prototype Finished

Radovan Semancik radovan.semancik at evolveum.com
Thu Sep 17 16:33:39 CEST 2020


Dear midPoint community,

The development of data provenance prototype 
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/> 
is finished. The prototype will be a part of midPoint 4.2 release. This 
concludes the first phase of midPrivacy 
<https://docs.evolveum.com/midpoint/midprivacy/> initiative. There are 
interesting results, both practical and theoretical.

Metadata, data about data. That is the core of data provenance. However, 
metadata have their structure similar to the structure of ordinary data. 
The first problem was how to express that structure. None of the 
existing popular data modeling languages had any support for metadata. 
Therefore we had to invent our own language: Axiom 
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/axiom/spec/>. 
Creating new language is a major task and we have considered all the 
options to avoid reinventing the wheel. But in the end, Axiom was the 
right way to go.

We have used Axiom to create metadata schemas. We have updated all of 
the midPoint core to support metadata. Metadata are stored in the 
repository, there are metadata mappings and value consolidation and 
reconciliation algorithms are fully metadata-aware. MidPoint user 
interface was extended to display value metadata.

If you want to see the results of our work, there is an recording from 
our workshop 
<https://docs.evolveum.com/media/2020-09-10-data-provenance-workshop.mp4> 
(and slides 
<https://docs.evolveum.com/talks/files/2020-09-data-provenance-workshop.pdf>) 
that also includes the demo of metadata functionality. All the other 
details can be found on project page under the midPrivacy initiative 
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/>. 
If the concept of metadata is new to you, then perhaps the Identity 
Metadata In A Nutshell 
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/identity-metadata-in-a-nutshell/> 
story is a good place to start.

This project was really interesting and enlightening. Metadata are one 
of the fundamental building blocks for data protection functionality. 
But it is also an area that was not completely explored yet. We have 
encountered a lot of challenges during the project. Some of them were 
very expected, such as the difficulty to design Axiom. But other 
challenges came entirely out of the blue, such as metadata multiplicity 
problem 
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/metadata-multiplicity-problem/>. 
Some of these challenges may perhaps be even classified as discoveries. 
Anyway, we have dealt with them in one way or another. The prototype was 
a success in both ways: it uncovered hidden problems and we have a 
working code in the end.

The prototype code is now integral part of midPoint. It will be released 
in midPoint 4.2, which is planned to happen soon. However, this is still 
a prototype. Entire metadata functionality is marked as /experimental/. 
The new implicit /value metadata/ live alongside the old explicit 
metadata. The old metadata as we know them from midPoint 3.x are still 
there and they are fully supported. We have preferred compatibility and 
decided not to use the new experimental code until it is sufficiently 
stable. The new metadata functionality is part of midPoint, but it is 
turned off by default.

Most of the costs of this project were covered by European community 
funding, in the form of NGI_TRUST initiative. We are more than thankful 
for this opportunity. I would like to thank the mentors which were very 
helpful, especially given that this was our first “Europroject”. 
However, we felt that we have to go beyond the scope of original project 
proposal and therefore we have also invested our own resources into the 
project.

Phase 1 of midPrivacy initiative 
<https://docs.evolveum.com/midpoint/midprivacy/> is done. But we are 
still far from our ultimate goal. There is still a lot to work on to 
develop the data protection and privacy functionality that we need. 
However, data protection is quite a special field in many ways. One of 
the characteristics of data protection is that it is very difficult to 
secure commercial funding for data protection and privacy features. We 
all know that data protection is needed, but it is hard to get anyone to 
actually pay for it. Therefore the major obstacle to continue midPrivacy 
initiative is, of course, the funding. We have tried to follow-up by 
submitting several proposals for European community funding. But sadly, 
none of the proposals to continue midPrivacy was successful. Therefore 
the future of midPrivacy is not certain yet. But one thing is certain: 
data protection and privacy is absolutely necessary and we are not 
giving up!

This project has received funding from the European Union’s Horizon 2020 
research and innovation programme under the NGI_TRUST grant agreement no 
825618.

(Reposted from Evolveum blog 
<https://evolveum.com/data-provenance-prototype-is-finished/>)

-- 
Radovan Semancik
Software Architect
evolveum.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.evolveum.com/pipermail/midpoint/attachments/20200917/269bc2ab/attachment.htm>


More information about the midPoint mailing list