[midPoint] Slow Performance on Bulk Load Using Rest
Radovan Semancik
radovan.semancik at evolveum.com
Tue Jul 26 22:01:16 CEST 2016
Hi,
You can use midPoint logging to get stats about performance of
individual midPoint components. And then you can figure out where
exactly is the problem.
But I think the root of this issue is in the architecture. REST
interfaces are strictly object-oriented (or web-resource-oriented to be
precise). Each REST operation can operate only on a single web resource.
When translated to midPoint design this means (at least) one operation
for each object. Which means network latencies, authentication (REST
explicitly prohibits sessions), request processing, authorizations,
executing the operation, response processing and latencies again. This
happens for every object. The overhead is simply too high. In other
words: RESTful services are absolutely terrible for any kind of bulk
operations. And this is more-or-less given by the principles of REST
architectural style. It is not easy to do anything about it without
bending or openly violating the REST principles.
Theoretically there is a way around this: we could create a specialized
web resource for bulk operations. But it is complex, ugly and difficult
to use. And it will actually mean doing RPC and disguising that as REST.
Therefore currently we do not plan to do this. We will do that only if
there is someone explicitly sponsoring that feature. And even then I
will personally put big red stickers all over it saying that "this may
work, but it is not REST".
I would suggest that the right way to do bulk operations is not to use
REST at all. It makes no sense to transport water in bottles when you
need water for entire city, does it? You should use the right tool for
the job.
MidPoint has a very good built-in features that support bulk operations.
Simply use the synchronization features of midPoint. These are designed
to handle bulk data. Connect the database as midPoint resource and pull
in the data using import or reconciliation task. If this is a one-off
data load you can delete the resource afterwards. But actually there is
usually a good benefit of keeping the resource around for a longer time
in case that the migration needs to be retried or objects need to be
updated.
--
Radovan Semancik
Software Architect
evolveum.com
On 07/26/2016 09:13 PM, Martin Marchese wrote:
> Hi,
>
> We have a large database (aprox. 400000-500000 users, most of them
> linked with 2 platforms).
>
> We are using PostgreSQL and still loading users with some python
> scripts we developed to consume data from files and execute REST
> services to create and/or recompute users.
>
> With this sizing, we are experiencing a very slow performance in the
> bulk load. Is there a way to troubleshoot this or tune the database to
> increase performance?
>
> Thanks
>
> *Ing. MartÃn Marchese*
> Identicum S.A.
> Anchorena 1357 PB
> Tel: +54 (11) 3526.5509
> mmarchese at identicum.com <mailto:mmarchese at identicum.com>
> www.identicum.com <http://www.identicum.com>
>
>
> _______________________________________________
> midPoint mailing list
> midPoint at lists.evolveum.com
> http://lists.evolveum.com/mailman/listinfo/midpoint
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.evolveum.com/pipermail/midpoint/attachments/20160726/1b53051e/attachment.htm>
More information about the midPoint
mailing list