<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Dear midPoint community,</p>
<span class="byline"><span class="author vcard"></span></span>
<div class="entry-content">
<p>MidPoint is a fully schema-aware system. MidPoint eats and
breaths the schema from the very bottom to the very top.
Therefore we need a language to express the schema. MidPoint was
built on XML Schema Definition (XSD) and we have lived in that
uneasy relationship for years. But now it is the right time to
make big step forward.</p>
<p><span id="more-6565"></span></p>
<p>The concept of schema completely permeates midPoint. You cannot
really do anything with midPoint without dealing with schema,
directly or indirectly. Connectors represent attribute names and
types using schema. That schema is used by midPoint mappings to
correctly convert data types. The schema is used by user
interface to automatically create correct input fields for data.
Schema is used to customize and extend midPoint data model.
Schema is everywhere. This is one of fundamental principles of
midPoint. It lowers deployment effort, it makes customization
easier and it provides some guarantees about correctness of the
configuration.</p>
<p>MidPoint project started in 2011, but some parts of midPoint
design go back even further. XML Schema Definition (XSD) was an
obvious choice for schema definition language at that time. We
were not happy with XSD and the XML ecosystem even at that early
time, but there was nothing better we could use. MidPoint has
evolved during all these years. XML is no longer the only data
language we support, there is also JSON and YAML. But XSD
remained as a schema language to this day. We considered using
JSON Schema instead, but it does not provide any significant
advantage over XSD. In fact, we <a
href="https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/existing-languages-analysis/">considered
several schema languages</a> at several points in midPoint
development process. But the result was always the same: there
is no schema language really suiting our needs. Switching from
XSD to any other existing language would mean that we have to do
a lot of work to get to the same place where we already are.</p>
<p>The problem with XML schema is that it describes XML data
structures. The problem with JSON schema is that it describes
JSON data structures. These languages are designed to describe
data represented in a very specific format. We need something
else. We need way how to describe data structures that can be
used in a wide variety of ways: data in JSON file, data in
relational database tables, data provided by a RESTful
interface, data displayed in user interface and so on. This may
seem easy, but the devil is in the details. E.g XML has
namespaces, JSON does not (unless it is JSON-LD which kind of
has namespaces). XML has attributes, JSON does not. JSON and XML
assume ordering in multivalue data, but such assumption is a
problem when data are stored in relational database or LDAP. XML
has XPath which is an overkill and JSONPath is pretty much the
same. It is all one big mess. One can survive in this world by
making a lot of compromises and violating a couple of standards.
That is what we have done with XSD and it kind of worked. We
have been (ab)using XSD for the purpose of data modelling for
many years. But we got to know all the problems quite
intimately. Nobody can say that we have not <a
href="https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/xsd-keywords-use/">tried
hard enough</a>. What is even worse, JSON Schema, YANG or SCIM
schema are built on the same principles as XSD and therefore
they are not going to solve the fundamental issues either.</p>
<p>What we need to do is to go one level of abstraction up. We do
not want to model XML or JSON data. We want to model <i>data</i>,
regardless of their actual representation or storage mechanism.
That was quite clear as early as in 2012 when we designed <a
href="https://wiki.evolveum.com/display/midPoint/Prism+Objects">Prism</a>
as an abstraction layer in midPoint code. Prism was used to
model the <i>data</i>, not just their XML representation. That
decision allowed us to implement JSON and YAML support in
midPoint in quite an elegant way. Prism has evolved during all
these years, but it was always limited in its capabilities. And
XSD played a significant part in these limitations. We planned
for years that we have to do something about it. But solving
this problem properly is not an easy task. And we always managed
to push XSD a bit further, to make it play one more dirty trick.
This worked for more than 6 years.</p>
<p>Enter <a href="https://docs.evolveum.com/midpoint/midprivacy/">midPrivacy</a>.
We have been <a
href="https://evolveum.com/introducing-midprivacy-initiative/">working
on data protection features</a> for quite some time. But it
was 2019 when we got our chance to take it to the next level. <a
href="https://www.ngi.eu/">NGI</a> has an <a
href="https://www.ngi.eu/ngi-projects/ngi-trust/">NGI_TRUST</a>
project that looked like a perfect opportunity for us. We were
more than aware that data protection is as much about <i>meta-data</i>
as it is about data. You can make proper use of the data only if
you know how reliable the data are, where they come from and
whether you are entitled to use them at all. Meta-data
capability is basic building block for pretty much any data
protection platform. It provides visibility and accountability.
Obviously, we needed that in midPoint as well. Therefore we have
put together a proposal to NGI_TRUST open call. And we were very
lucky to get the funding.</p>
<p>However, everything gets quite complex when it comes to
meta-data. We need to keep such meta-data for every value of
every data item. And the meta-data are going to be slightly
different for every midPoint deployment. This adds an entirely
new <i>dimension</i> of data modeling, a new dimension of
complexity. This is very hard to do with conventional data
modeling languages. We might try to make XSD one more dirty
trick – and after all these years of XSD hacking we might
actually succeed. But we have decided that this is the point
where we finally say good-bye to XSD and do it properly. We
started by <a
href="https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/existing-languages-analysis/">double
checking</a> that we are not missing any obvious solution. But
there was no solution that could satisfy our needs.</p>
<p>That is how <a
href="https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/axiom-notes/">Axiom
was born</a>. Axiom is a new data modeling language we are
working on right now. It is still a baby, still wildly evolving.
But it <a
href="https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/axiom/">starts
to take its shape</a>. First ambition of Axiom is to replace
XSD in midPoint. But that would not be enough to justify
existence of the new language. We need Axiom to do more than
that. Our goal is to use Axiom to define a <i>meta-data schema</i>.
We want to maintain complex meta-data structures for every data
value. The data will be modeled by Axiom schema, but also the
meta-data will be modeled by independent Axiom schema. These
schemas will be <i>orthogonal</i>, independently developed,
independently maintained, independently extended and customized
for every deployment. We want to join the schemas inside
midPoint at run-time. This is a method how to create
two-dimensional schema from two simple schemas without getting a
code of insane complexity. This is the right way how to
implement data provenance capabilities.</p>
<p>We are now working on prototype implementation of a processing
code for Axiom and adjusting the Axiom language specification at
the same time. We believe that something like Axiom cannot be
designed on a drawing board or in a standards committee. This
needs experimentation, prototyping and evolution. We are
proceeding in iterations, using the midPoint code as a test bed.
Therefore we expect that Axiom will be evolving for quite some
time until it is completely ready. But we believe that this is a
step in the right direction. This is more than likely to bring a
lot of long-term benefits.</p>
<p>Finally, we are more than grateful for this opportunity and we
would like to thank everyone in NGI for our chance to make
another step towards robust and professional data protection
platform that can be used by everybody. We appreciate that
European Union is not just imposing data protection regulations,
but that it is also contributing to open source technologies
that can be used to implement practical data protection
mechanisms. We are more than happy for this opportunity to push
the technology one small step forward.</p>
<p>This project has received funding from the European Union’s
Horizon 2020 research and innovation programme under the
NGI_TRUST grant agreement no 825618.</p>
</div>
<p>(Reposted from <a moz-do-not-send="true"
href="https://evolveum.com/a-road-to-axiom/">Evolveum blog</a>)</p>
<pre class="moz-signature" cols="72">--
Radovan Semancik
Software Architect
evolveum.com</pre>
</body>
</html>