<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Dear midPoint community,</p>

    <span class="byline"><span class="author vcard"></span></span>

    <div class="entry-content">

      <p>MidPoint is a fully schema-aware system. MidPoint eats and

        breaths the schema from the very bottom to the very top.

        Therefore we need a language to express the schema. MidPoint was

        built on XML Schema Definition (XSD) and we have lived in that

        uneasy relationship for years. But now it is the right time to

        make big step forward.</p>

      <p><span id="more-6565"></span></p>

      <p>The concept of schema completely permeates midPoint. You cannot

        really do anything with midPoint without dealing with schema,

        directly or indirectly. Connectors represent attribute names and

        types using schema. That schema is used by midPoint mappings to

        correctly convert data types. The schema is used by user

        interface to automatically create correct input fields for data.

        Schema is used to customize and extend midPoint data model.

        Schema is everywhere. This is one of fundamental principles of

        midPoint. It lowers deployment effort, it makes customization

        easier and it provides some guarantees about correctness of the

        configuration.</p>

      <p>MidPoint project started in 2011, but some parts of midPoint

        design go back even further. XML Schema Definition (XSD) was an

        obvious choice for schema definition language at that time. We

        were not happy with XSD and the XML ecosystem even at that early

        time, but there was nothing better we could use. MidPoint has

        evolved during all these years. XML is no longer the only data

        language we support, there is also JSON and YAML. But XSD

        remained as a schema language to this day. We considered using

        JSON Schema instead, but it does not provide any significant

        advantage over XSD. In fact, we <a

href="https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/existing-languages-analysis/">considered

          several schema languages</a> at several points in midPoint

        development process. But the result was always the same: there

        is no schema language really suiting our needs. Switching from

        XSD to any other existing language would mean that we have to do

        a lot of work to get to the same place where we already are.</p>

      <p>The problem with XML schema is that it describes XML data

        structures. The problem with JSON schema is that it describes

        JSON data structures. These languages are designed to describe

        data represented in a very specific format. We need something

        else. We need way how to describe data structures that can be

        used in a wide variety of ways: data in JSON file, data in

        relational database tables, data provided by a RESTful

        interface, data displayed in user interface and so on. This may

        seem easy, but the devil is in the details. E.g XML has

        namespaces, JSON does not (unless it is JSON-LD which kind of

        has namespaces). XML has attributes, JSON does not. JSON and XML

        assume ordering in multivalue data, but such assumption is a

        problem when data are stored in relational database or LDAP. XML

        has XPath which is an overkill and JSONPath is pretty much the

        same. It is all one big mess. One can survive in this world by

        making a lot of compromises and violating a couple of standards.

        That is what we have done with XSD and it kind of worked. We

        have been (ab)using XSD for the purpose of data modelling for

        many years. But we got to know all the problems quite

        intimately. Nobody can say that we have not <a

href="https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/xsd-keywords-use/">tried

          hard enough</a>. What is even worse, JSON Schema, YANG or SCIM

        schema are built on the same principles as XSD and therefore

        they are not going to solve the fundamental issues either.</p>

      <p>What we need to do is to go one level of abstraction up. We do

        not want to model XML or JSON data. We want to model <i>data</i>,

        regardless of their actual representation or storage mechanism.

        That was quite clear as early as in 2012 when we designed <a

          href="https://wiki.evolveum.com/display/midPoint/Prism+Objects">Prism</a>

        as an abstraction layer in midPoint code. Prism was used to

        model the <i>data</i>, not just their XML representation. That

        decision allowed us to implement JSON and YAML support in

        midPoint in quite an elegant way. Prism has evolved during all

        these years, but it was always limited in its capabilities. And

        XSD played a significant part in these limitations. We planned

        for years that we have to do something about it. But solving

        this problem properly is not an easy task. And we always managed

        to push XSD a bit further, to make it play one more dirty trick.

        This worked for more than 6 years.</p>

      <p>Enter <a href="https://docs.evolveum.com/midpoint/midprivacy/">midPrivacy</a>.

        We have been <a

          href="https://evolveum.com/introducing-midprivacy-initiative/">working

          on data protection features</a> for quite some time. But it

        was 2019 when we got our chance to take it to the next level. <a

          href="https://www.ngi.eu/">NGI</a> has an <a

          href="https://www.ngi.eu/ngi-projects/ngi-trust/">NGI_TRUST</a>

        project that looked like a perfect opportunity for us. We were

        more than aware that data protection is as much about <i>meta-data</i>

        as it is about data. You can make proper use of the data only if

        you know how reliable the data are, where they come from and

        whether you are entitled to use them at all. Meta-data

        capability is basic building block for pretty much any data

        protection platform. It provides visibility and accountability.

        Obviously, we needed that in midPoint as well. Therefore we have

        put together a proposal to NGI_TRUST open call. And we were very

        lucky to get the funding.</p>

      <p>However, everything gets quite complex when it comes to

        meta-data. We need to keep such meta-data for every value of

        every data item. And the meta-data are going to be slightly

        different for every midPoint deployment. This adds an entirely

        new <i>dimension</i> of data modeling, a new dimension of

        complexity. This is very hard to do with conventional data

        modeling languages. We might try to make XSD one more dirty

        trick – and after all these years of XSD hacking we might

        actually succeed. But we have decided that this is the point

        where we finally say good-bye to XSD and do it properly. We

        started by <a

href="https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/existing-languages-analysis/">double

          checking</a> that we are not missing any obvious solution. But

        there was no solution that could satisfy our needs.</p>

      <p>That is how <a

href="https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/axiom-notes/">Axiom

          was born</a>. Axiom is a new data modeling language we are

        working on right now. It is still a baby, still wildly evolving.

        But it <a

href="https://docs.evolveum.com/midpoint/midprivacy/phases/01-data-provenance-prototype/axiom/">starts

          to take its shape</a>. First ambition of Axiom is to replace

        XSD in midPoint. But that would not be enough to justify

        existence of the new language. We need Axiom to do more than

        that. Our goal is to use Axiom to define a <i>meta-data schema</i>.

        We want to maintain complex meta-data structures for every data

        value. The data will be modeled by Axiom schema, but also the

        meta-data will be modeled by independent Axiom schema. These

        schemas will be <i>orthogonal</i>, independently developed,

        independently maintained, independently extended and customized

        for every deployment. We want to join the schemas inside

        midPoint at run-time. This is a method how to create

        two-dimensional schema from two simple schemas without getting a

        code of insane complexity. This is the right way how to

        implement data provenance capabilities.</p>

      <p>We are now working on prototype implementation of a processing

        code for Axiom and adjusting the Axiom language specification at

        the same time. We believe that something like Axiom cannot be

        designed on a drawing board or in a standards committee. This

        needs experimentation, prototyping and evolution. We are

        proceeding in iterations, using the midPoint code as a test bed.

        Therefore we expect that Axiom will be evolving for quite some

        time until it is completely ready. But we believe that this is a

        step in the right direction. This is more than likely to bring a

        lot of long-term benefits.</p>

      <p>Finally, we are more than grateful for this opportunity and we

        would like to thank everyone in NGI for our chance to make

        another step towards robust and professional data protection

        platform that can be used by everybody. We appreciate that

        European Union is not just imposing data protection regulations,

        but that it is also contributing to open source technologies

        that can be used to implement practical data protection

        mechanisms. We are more than happy for this opportunity to push

        the technology one small step forward.</p>

      <p>This project has received funding from the European Union’s

        Horizon 2020 research and innovation programme under the

        NGI_TRUST grant agreement no 825618.</p>

    </div>

    <p>(Reposted from <a moz-do-not-send="true"

        href="https://evolveum.com/a-road-to-axiom/">Evolveum blog</a>)</p>

    <pre class="moz-signature" cols="72">-- 

Radovan Semancik

Software Architect

evolveum.com</pre>

  </body>

</html>