[midPoint] Blog: Data Provenance, Milestone 2

Thu Jul 16 15:59:04 CEST 2020

Dear midPoint community,

Data provenance 
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/> 
development in midPoint has reached its second milestone. While we are 
not at the end yet, there is already a pile of interesting materials to 
have a look at. There are good news, but there are also not so good news.

First of all, some of you might be wondering what that /provenance/ 
thing is and why it is so important. You are not alone. Data protection 
may look easy, but it is not an easy thing to understand. Therefore I 
have put together “Identity Metadata In A Nutshell” 
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/identity-metadata-in-a-nutshell/>, 
a document that explains the metadata concepts and the way how we are 
going to implement it in midPoint.

There is good news from the implementation effort. Axiom 
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/axiom/spec/>, 
our data modeling language, is taking shape very nicely. As usual, 
designing the language was much harder than we have anticipated (even 
though we have expected it won’t be easy). But I’m very pleased with the 
results so far. Most aspects of the language are designed and it looks 
it works well for metadata modeling. Significant part of 
Axiom-processing code is developed and integrated into midPoint. The 
code works for metadata modeling. There is still a long way to go to 
make Axiom a universal data modeling language for midPoint (and other 
uses), but the first results look more than promising.

Having a modeling language is one thing, but designing actual metadata 
model 
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/metadata-usecases/> 
is quite a different thing. There is no metadata standard and there 
seems to be no general agreement how identity metadata should look like. 
Therefore we have done our best to create a reasonable set of metadata 
and express them in Axiom 
<https://github.com/Evolveum/midpoint/blob/master/infra/schema/src/main/resources/xml/ns/public/common/common-metadata-3.axiom>. 
It is quite likely that this schema is not final yet, but it allows us 
to go on to the next step of testing and validation.

We have modeling language and metadata models now. But we still need to 
set up midPoint to use the metadata. For all of you that know us, it is 
perhaps no bit surprise that we have reused an existing mechanism. Enter 
metadata mappings. As ordinary mappings are applied to ordinary data, 
metadata mappings are applied to metadata. The documentation is not yet 
completely up to date, but the “Nutshell” document 
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/identity-metadata-in-a-nutshell/> 
has some nice examples of metadata mappings.

As usual, we have made the system a bit more generic than strictly 
necessary. Goal of this project phase was identity provenance, but we 
have created a system that can handle almost any kind of metadata. There 
are several built-in metadata types in midPoint schemas and you can 
extend the system with a completely custom metadata. Nevertheless, we 
have still kept our primary goal in mind and identity provenance 
metadata play a primary role in the solution. There is a robust schema 
for identity provenance metadata and we have invested a lot of design 
time into that. We especially focused on making the provenance schema 
“future proof”, to make sure it can be extended in the future to support 
advanced data protection functionality.

Of course, everything is harder that it seems 
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/challenges/>. 
Data protection may seem simple enough. Yet, it is everything but 
simple. Identity management was all about moving data around. But data 
protection adds a completely new /dimension/ to that. Data protection is 
all about reasoning /behind/ the data: how the data got here, how we can 
process them, where we can send them, when to delete them. We were aware 
of most of the data protection difficulties, but work on identity 
provenance exposed even deeper issues. We had to make a rough design of 
future data protection functionality 
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/provenance-origin-basis/> 
to make sure that our identity provenance functionality design was right.

This is our second milestone and there is still one final part of the 
project to finish. The final part is mostly focused on testing, 
bugfixing and overall validation of the results. We still have to 
improve user interface and experiment with user experience, which may 
also lead to adjustments of metadata schemas. Final weeks will be 
focused on demonstration of the result and gathering user feedback.

We are very excited about the development that this project brings. It 
is not just about the metadata. Axiom brings a radical change and we 
have high hopes about the future. However, please keep in mind that the 
goal of this project is identity provenance /prototype/. Being a 
prototype, we are not yet sure how useful this is going to be for 
practical deployments. That is exactly what the prototype has to find 
out. You are more than welcome to test the functionality. Just please 
keep in mind that there may be limitations.

What we are not so much happy about is the immediate future after this 
“provenance” phase of midPrivacy is finished. We would absolutely love 
to move data protection functionality out of prototype stage and make it 
production-ready. We have spent a lot of time during the past six months 
to secure funding for future development. We have submitted several 
proposals, mostly to NGI open calls. Sadly, none of the proposals were 
successful. Therefore it looks like we will have to put our data 
protection efforts aside, at least for a while. It is a real pity to 
suspend this project, especially after such a promising start. We 
strongly believe that data protection is absolutely necessary for the 
safety of our digital future. Yet it is almost impossible to get funding 
for data protection feature development from our commercial engagements. 
Therefore we will be more than grateful to anyone willing to sponsor our 
data protection efforts or anyone that knows about any other form of 
funding that we could use. We keep a strong hope that we would be able 
to resume working on midPrivacy 
<https://docs.evolveum.com/midpoint/midprivacy/> as soon as possible.

This project has received funding from the European Union’s Horizon 2020 
research and innovation programme under the NGI_TRUST grant agreement no 
825618.

(Reposted from Evolveum blog 
<https://evolveum.com/data-provenance-milestone-2/>)

-- 
Radovan Semancik
Software Architect
evolveum.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.evolveum.com/pipermail/midpoint/attachments/20200716/a753b861/attachment.htm>