[midPoint] The Story of MidPoint's Super Scalability

Fri Dec 17 17:01:01 CET 2021

Dear midPoint community,

As we have announced before, the midScale project has finished lately. 
Step into the weekend with a thrilling story of midPoint’s super 
scalability possibilities and the challenges we met on the way!

MidScale <https://docs.evolveum.com/midpoint/projects/midscale/> project 
has finished lately. The project aimed at the increase of midPoint 
scalability, performance and manageability to support large and complex 
midPoint deployments. The project was a success! Yet, it was far from 
being easy.

When midPoint started a decade ago, the primary target was a mid-size 
enterprise with thousands of identities to manage. It made a perfect 
sense back then, that was the scale we could handle – both from business 
and technology perspective. However, the world is a different place now. 
Deployments reaching beyond millions of managed identities are much more 
common. As our customers have changed, we have changed as well. MidPoint 
had to adapt to the new environment.

We have been working on midPoint performance improvements for years. 
First results were delivered in 2018 when “Watt 
<https://docs.evolveum.com/midpoint/release/3.8/>” was released. 
However, at that time, we have fully realized that there is a component 
limiting out the potential. MidPoint data storage layer (which we call 
“repository”) was built in a generic way, supporting several database 
engines. However, every abstraction has its cost. Supporting many 
databases with the same code meant that we are doomed to mediocrity. It 
was very difficult to take advantage of any database-specific features. 
Every improvement we made had to be implemented and tested for all the 
supported databases. The effort was prohibitively high, and the results 
were somehow disappointing. We realized that this was not the way to go.

The way forward was quite clear. As the support for many databases 
dragged us down, we had to specialize on a single database. The choice 
of the database engine was quite clear as well. MidPoint is open source 
platform, therefore we had to choose open source database. PostgreSQL 
was an obvious choice. The approach was clear as well. A decade ago, 
when midPoint was designed, we anticipated that we may need to re-work 
our “repository” code. In fact, that had already happened once. 
Therefore, the plan was to do it again. This time, we would take the 
full advantage of PostgreSQL features. We had everything we needed. 
Except for two little things, those two notorious troublemakers: time 
and money.

Fortune favors the prepared. In 2019 we came across NGI_TRUST 
<https://www.ngi.eu/ngi-projects/ngi-trust/>. We had very little 
experience with European community funding, and coming from Eastern 
Europe, most of the experiences were quite negative. Therefore we did 
not know what to expect. However, NGI_TRUST looked good, and we decided 
to submit a proposal. The proposal was accepted, and the MidPrivacy: 
Data Provenance Prototype 
<https://docs.evolveum.com/midpoint/projects/midprivacy/phases/01-data-provenance-prototype/> 
project started. The project went well, and it was a success. After 
that, we were prepared for a bigger challenge. We took the chance, and 
we submitted a proposal for MidScale 
<https://docs.evolveum.com/midpoint/projects/midscale/>. The proposal 
was not accepted immediately and the committee kept us in suspense for 
quite some time. Fortunately, the proposal was accepted at last, and the 
project took off.

The project was a challenge from the beginning. Due to various reasons, 
we got the green light a month later than originally planned. This was a 
complication, as the original plan was to synchronize the project with 
midPoint development cycle. Also, midScale was meant to be the very last 
project of the funding program, therefore our project had to be finished 
exactly on time, not a day later. This has stirred the project plan at 
the very beginning of the project. Yet, due to rules given by funding, 
we were not able to change the plan. This created a challenge that 
rolled through the entire project, from milestone to milestone. We have 
added few more people (including myself) to the project, completely 
funded by Evolveum, on top of original budget. This helped to smooth out 
the project progress, and we were back on track. With a good deal of 
flexibility, management acrobatics, and a dash of personal heroism, we 
have managed to keep things going according to plan.

Of course, the repository replacement was the most challenging part of 
the project. We have never expected this to be easy. However, the amount 
of work was still quite surprising, more than we expected. More 
flexibility, management acrobatics and heroism did it, and at the end we 
had brand-new, lemon-scented, native PostgreSQL repository 
implementation 
<https://docs.evolveum.com/midpoint/reference/repository/native-postgresql/>.

While the repository was a crucial part, it would not boost up midPoint 
scalability just by itself. We have significantly improved (read: 
reworked beyond recognition) management of distributed tasks, improving 
horizontal scalability. There were performance improvements in almost 
every part of midPoint, from the low-level data representation libraries 
all the way to the user interface. The error detection and handling was 
improved, many bugs fixed, including those nasty multi-threading issues, 
improving robustness. MidPoint is much more scalable, faster and more 
reliable system now.

However, much more than raw power is needed to run a large-scale 
identity management and governance deployment. Identity management is, 
quite obviously, all about management of identities. Therefore we had to 
improve manageability and overall visibility of midPoint. There are 
numerous diagnostic improvements in many parts of the system, most 
notably in the task management subsystem. A brand-new Axiom Query 
Language 
<https://docs.evolveum.com/midpoint/reference/concepts/query/axiom-query-language/> 
was designed and implemented, providing ability to construct complex 
queries in a (reasonably) human-friendly way. User interface was 
improved, providing much better user experience. On top of the original 
project plan, there are improved dashboards and native reports. New 
connectors can be auto-loaded now, reducing downtime. Large midPoint 
deployments are much easier to manage than they were a year ago.

None of this would be possible without testing. We have had automated 
tests for ages. However, the tests mostly focused on functionality. 
There was only a handful performance-oriented tests, and we could not 
even do much more in our rudimentary testing environment. Design and 
buildup of the new testing environment 
<https://docs.evolveum.com/midpoint/projects/midscale/infrastructure/> 
was an essential activity in midScale project. The environment turned up 
to be much better than we expected, yet it was also much harder to build 
it. It took a lot of time, with several improvement rounds. This was 
supplemented with major improvements to Schrödinger 
<https://docs.evolveum.com/midpoint/tools/schrodinger/>, the framework 
for automated testing of user interface. MidPoint user interface is 
quite a big and complicated piece, Schrödinger was a crucial component 
to keep it in working condition. At the end, we got excellent testing 
results 
<https://docs.evolveum.com/midpoint/projects/midscale/performance-scalability-test-results/>. 
It is officially confirmed that midPoint is much better now and ready 
for the future.

MidScale project was finished on time and with excellent results. Due to 
the management acrobatics, the project did not end with midPoint 
release, but the last milestone was a release candidate. There were 
still some bugfixes to do before midPoint could be released.
At last, midPoint 4.4 “Tesla” 
<https://docs.evolveum.com/midpoint/release/4.4/> has been released 
lately. Tesla follows up on Faraday 
<https://docs.evolveum.com/midpoint/release/4.3/> release, which brought 
some results of midScale project to the community. MidPoint 4.4 “Tesla” 
<https://docs.evolveum.com/midpoint/release/4.4/> will be a major 
milestone in midPoint history. It is also a long-term support 
<https://docs.evolveum.com/support/long-term-support/> release, 
therefore Tesla will be with us for quite a long time.

MidScale project has been completed, yet the work continues. This is 
only the start. We will further improve midPoint in following releases. 
There is also a lot of work on business side, documentation, practices, 
and lot of other things. Software development never ends.

(Written by Radovan Semancik, reposted from Evolveum blog 
<https://evolveum.com/midscale-is-finished/>)

-- 

Veronika Kolpascikova
Marketing Specialist
evolveum.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.evolveum.com/pipermail/midpoint/attachments/20211217/928452a7/attachment.htm>