[midPoint] Blog: Data Provenance Prototype Finished
Radovan Semancik
radovan.semancik at evolveum.com
Thu Sep 17 16:33:39 CEST 2020
Dear midPoint community,
The development of data provenance prototype
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/>
is finished. The prototype will be a part of midPoint 4.2 release. This
concludes the first phase of midPrivacy
<https://docs.evolveum.com/midpoint/midprivacy/> initiative. There are
interesting results, both practical and theoretical.
Metadata, data about data. That is the core of data provenance. However,
metadata have their structure similar to the structure of ordinary data.
The first problem was how to express that structure. None of the
existing popular data modeling languages had any support for metadata.
Therefore we had to invent our own language: Axiom
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/axiom/spec/>.
Creating new language is a major task and we have considered all the
options to avoid reinventing the wheel. But in the end, Axiom was the
right way to go.
We have used Axiom to create metadata schemas. We have updated all of
the midPoint core to support metadata. Metadata are stored in the
repository, there are metadata mappings and value consolidation and
reconciliation algorithms are fully metadata-aware. MidPoint user
interface was extended to display value metadata.
If you want to see the results of our work, there is an recording from
our workshop
<https://docs.evolveum.com/media/2020-09-10-data-provenance-workshop.mp4>
(and slides
<https://docs.evolveum.com/talks/files/2020-09-data-provenance-workshop.pdf>)
that also includes the demo of metadata functionality. All the other
details can be found on project page under the midPrivacy initiative
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/>.
If the concept of metadata is new to you, then perhaps the Identity
Metadata In A Nutshell
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/identity-metadata-in-a-nutshell/>
story is a good place to start.
This project was really interesting and enlightening. Metadata are one
of the fundamental building blocks for data protection functionality.
But it is also an area that was not completely explored yet. We have
encountered a lot of challenges during the project. Some of them were
very expected, such as the difficulty to design Axiom. But other
challenges came entirely out of the blue, such as metadata multiplicity
problem
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/metadata-multiplicity-problem/>.
Some of these challenges may perhaps be even classified as discoveries.
Anyway, we have dealt with them in one way or another. The prototype was
a success in both ways: it uncovered hidden problems and we have a
working code in the end.
The prototype code is now integral part of midPoint. It will be released
in midPoint 4.2, which is planned to happen soon. However, this is still
a prototype. Entire metadata functionality is marked as /experimental/.
The new implicit /value metadata/ live alongside the old explicit
metadata. The old metadata as we know them from midPoint 3.x are still
there and they are fully supported. We have preferred compatibility and
decided not to use the new experimental code until it is sufficiently
stable. The new metadata functionality is part of midPoint, but it is
turned off by default.
Most of the costs of this project were covered by European community
funding, in the form of NGI_TRUST initiative. We are more than thankful
for this opportunity. I would like to thank the mentors which were very
helpful, especially given that this was our first “Europroject”.
However, we felt that we have to go beyond the scope of original project
proposal and therefore we have also invested our own resources into the
project.
Phase 1 of midPrivacy initiative
<https://docs.evolveum.com/midpoint/midprivacy/> is done. But we are
still far from our ultimate goal. There is still a lot to work on to
develop the data protection and privacy functionality that we need.
However, data protection is quite a special field in many ways. One of
the characteristics of data protection is that it is very difficult to
secure commercial funding for data protection and privacy features. We
all know that data protection is needed, but it is hard to get anyone to
actually pay for it. Therefore the major obstacle to continue midPrivacy
initiative is, of course, the funding. We have tried to follow-up by
submitting several proposals for European community funding. But sadly,
none of the proposals to continue midPrivacy was successful. Therefore
the future of midPrivacy is not certain yet. But one thing is certain:
data protection and privacy is absolutely necessary and we are not
giving up!
This project has received funding from the European Union’s Horizon 2020
research and innovation programme under the NGI_TRUST grant agreement no
825618.
(Reposted from Evolveum blog
<https://evolveum.com/data-provenance-prototype-is-finished/>)
--
Radovan Semancik
Software Architect
evolveum.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.evolveum.com/pipermail/midpoint/attachments/20200917/269bc2ab/attachment.htm>
More information about the midPoint
mailing list