[midPoint] Blog: A Road To Axiom
Radovan Semancik
radovan.semancik at evolveum.com
Wed May 13 16:51:23 CEST 2020
Dear midPoint community,
MidPoint is a fully schema-aware system. MidPoint eats and breaths the
schema from the very bottom to the very top. Therefore we need a
language to express the schema. MidPoint was built on XML Schema
Definition (XSD) and we have lived in that uneasy relationship for
years. But now it is the right time to make big step forward.
The concept of schema completely permeates midPoint. You cannot really
do anything with midPoint without dealing with schema, directly or
indirectly. Connectors represent attribute names and types using schema.
That schema is used by midPoint mappings to correctly convert data
types. The schema is used by user interface to automatically create
correct input fields for data. Schema is used to customize and extend
midPoint data model. Schema is everywhere. This is one of fundamental
principles of midPoint. It lowers deployment effort, it makes
customization easier and it provides some guarantees about correctness
of the configuration.
MidPoint project started in 2011, but some parts of midPoint design go
back even further. XML Schema Definition (XSD) was an obvious choice for
schema definition language at that time. We were not happy with XSD and
the XML ecosystem even at that early time, but there was nothing better
we could use. MidPoint has evolved during all these years. XML is no
longer the only data language we support, there is also JSON and YAML.
But XSD remained as a schema language to this day. We considered using
JSON Schema instead, but it does not provide any significant advantage
over XSD. In fact, we considered several schema languages
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/existing-languages-analysis/>
at several points in midPoint development process. But the result was
always the same: there is no schema language really suiting our needs.
Switching from XSD to any other existing language would mean that we
have to do a lot of work to get to the same place where we already are.
The problem with XML schema is that it describes XML data structures.
The problem with JSON schema is that it describes JSON data structures.
These languages are designed to describe data represented in a very
specific format. We need something else. We need way how to describe
data structures that can be used in a wide variety of ways: data in JSON
file, data in relational database tables, data provided by a RESTful
interface, data displayed in user interface and so on. This may seem
easy, but the devil is in the details. E.g XML has namespaces, JSON does
not (unless it is JSON-LD which kind of has namespaces). XML has
attributes, JSON does not. JSON and XML assume ordering in multivalue
data, but such assumption is a problem when data are stored in
relational database or LDAP. XML has XPath which is an overkill and
JSONPath is pretty much the same. It is all one big mess. One can
survive in this world by making a lot of compromises and violating a
couple of standards. That is what we have done with XSD and it kind of
worked. We have been (ab)using XSD for the purpose of data modelling for
many years. But we got to know all the problems quite intimately. Nobody
can say that we have not tried hard enough
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/xsd-keywords-use/>.
What is even worse, JSON Schema, YANG or SCIM schema are built on the
same principles as XSD and therefore they are not going to solve the
fundamental issues either.
What we need to do is to go one level of abstraction up. We do not want
to model XML or JSON data. We want to model /data/, regardless of their
actual representation or storage mechanism. That was quite clear as
early as in 2012 when we designed Prism
<https://wiki.evolveum.com/display/midPoint/Prism+Objects> as an
abstraction layer in midPoint code. Prism was used to model the /data/,
not just their XML representation. That decision allowed us to implement
JSON and YAML support in midPoint in quite an elegant way. Prism has
evolved during all these years, but it was always limited in its
capabilities. And XSD played a significant part in these limitations. We
planned for years that we have to do something about it. But solving
this problem properly is not an easy task. And we always managed to push
XSD a bit further, to make it play one more dirty trick. This worked for
more than 6 years.
Enter midPrivacy <https://docs.evolveum.com/midpoint/midprivacy/>. We
have been working on data protection features
<https://evolveum.com/introducing-midprivacy-initiative/> for quite some
time. But it was 2019 when we got our chance to take it to the next
level. NGI <https://www.ngi.eu/> has an NGI_TRUST
<https://www.ngi.eu/ngi-projects/ngi-trust/> project that looked like a
perfect opportunity for us. We were more than aware that data protection
is as much about /meta-data/ as it is about data. You can make proper
use of the data only if you know how reliable the data are, where they
come from and whether you are entitled to use them at all. Meta-data
capability is basic building block for pretty much any data protection
platform. It provides visibility and accountability. Obviously, we
needed that in midPoint as well. Therefore we have put together a
proposal to NGI_TRUST open call. And we were very lucky to get the funding.
However, everything gets quite complex when it comes to meta-data. We
need to keep such meta-data for every value of every data item. And the
meta-data are going to be slightly different for every midPoint
deployment. This adds an entirely new /dimension/ of data modeling, a
new dimension of complexity. This is very hard to do with conventional
data modeling languages. We might try to make XSD one more dirty trick –
and after all these years of XSD hacking we might actually succeed. But
we have decided that this is the point where we finally say good-bye to
XSD and do it properly. We started by double checking
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/existing-languages-analysis/>
that we are not missing any obvious solution. But there was no solution
that could satisfy our needs.
That is how Axiom was born
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/axiom-notes/>.
Axiom is a new data modeling language we are working on right now. It is
still a baby, still wildly evolving. But it starts to take its shape
<https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/axiom/>.
First ambition of Axiom is to replace XSD in midPoint. But that would
not be enough to justify existence of the new language. We need Axiom to
do more than that. Our goal is to use Axiom to define a /meta-data
schema/. We want to maintain complex meta-data structures for every data
value. The data will be modeled by Axiom schema, but also the meta-data
will be modeled by independent Axiom schema. These schemas will be
/orthogonal/, independently developed, independently maintained,
independently extended and customized for every deployment. We want to
join the schemas inside midPoint at run-time. This is a method how to
create two-dimensional schema from two simple schemas without getting a
code of insane complexity. This is the right way how to implement data
provenance capabilities.
We are now working on prototype implementation of a processing code for
Axiom and adjusting the Axiom language specification at the same time.
We believe that something like Axiom cannot be designed on a drawing
board or in a standards committee. This needs experimentation,
prototyping and evolution. We are proceeding in iterations, using the
midPoint code as a test bed. Therefore we expect that Axiom will be
evolving for quite some time until it is completely ready. But we
believe that this is a step in the right direction. This is more than
likely to bring a lot of long-term benefits.
Finally, we are more than grateful for this opportunity and we would
like to thank everyone in NGI for our chance to make another step
towards robust and professional data protection platform that can be
used by everybody. We appreciate that European Union is not just
imposing data protection regulations, but that it is also contributing
to open source technologies that can be used to implement practical data
protection mechanisms. We are more than happy for this opportunity to
push the technology one small step forward.
This project has received funding from the European Union’s Horizon 2020
research and innovation programme under the NGI_TRUST grant agreement no
825618.
(Reposted from Evolveum blog <https://evolveum.com/a-road-to-axiom/>)
--
Radovan Semancik
Software Architect
evolveum.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.evolveum.com/pipermail/midpoint/attachments/20200513/59d82633/attachment.htm>
More information about the midPoint
mailing list