[midPoint] Slow Performance on Bulk Load Using Rest

Radovan Semancik radovan.semancik at evolveum.com
Tue Jul 26 22:01:16 CEST 2016


Hi,

You can use midPoint logging to get stats about performance of 
individual midPoint components. And then you can figure out where 
exactly is the problem.

But I think the root of this issue is in the architecture. REST 
interfaces are strictly object-oriented (or web-resource-oriented to be 
precise). Each REST operation can operate only on a single web resource. 
When translated to midPoint design this means (at least) one operation 
for each object. Which means network latencies, authentication (REST 
explicitly prohibits sessions), request processing, authorizations, 
executing the operation, response processing and latencies again. This 
happens for every object. The overhead is simply too high. In other 
words: RESTful services are absolutely terrible for any kind of bulk 
operations. And this is more-or-less given by the principles of REST 
architectural style. It is not easy to do anything about it without 
bending or openly violating the REST principles.

Theoretically there is a way around this: we could create a specialized 
web resource for bulk operations. But it is complex, ugly and difficult 
to use. And it will actually mean doing RPC and disguising that as REST. 
Therefore currently we do not plan to do this. We will do that only if 
there is someone explicitly sponsoring that feature. And even then I 
will personally put big red stickers all over it saying that "this may 
work, but it is not REST".

I would suggest that the right way to do bulk operations is not to use 
REST at all. It makes no sense to transport water in bottles when you 
need water for entire city, does it? You should use the right tool for 
the job.

MidPoint has a very good built-in features that support bulk operations. 
Simply use the synchronization features of midPoint. These are designed 
to handle bulk data. Connect the database as midPoint resource and pull 
in the data using import or reconciliation task. If this is a one-off 
data load you can delete the resource afterwards. But actually there is 
usually a good benefit of keeping the resource around for a longer time 
in case that the migration needs to be retried or objects need to be 
updated.

-- 
Radovan Semancik
Software Architect
evolveum.com



On 07/26/2016 09:13 PM, Martin Marchese wrote:
> Hi,
>
> We have a large database (aprox. 400000-500000 users, most of them 
> linked with 2 platforms).
>
> We are using PostgreSQL and still loading users with some python 
> scripts we developed to consume data from files and execute REST 
> services to create and/or recompute users.
>
> With this sizing, we are experiencing a very slow performance in the 
> bulk load. Is there a way to troubleshoot this or tune the database to 
> increase performance?
>
> Thanks
>
> *Ing. Martín Marchese*
> Identicum S.A.
> Anchorena 1357 PB
> Tel: +54 (11) 3526.5509
> mmarchese at identicum.com <mailto:mmarchese at identicum.com>
> www.identicum.com <http://www.identicum.com>
>
>
> _______________________________________________
> midPoint mailing list
> midPoint at lists.evolveum.com
> http://lists.evolveum.com/mailman/listinfo/midpoint

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.evolveum.com/pipermail/midpoint/attachments/20160726/1b53051e/attachment.htm>


More information about the midPoint mailing list