[midPoint] Updates can get lost during a running recomputation task (SOLVED)

Wed Feb 7 14:02:02 CET 2018

> Not sure if "fetches objects one-after-another" makes the picture 
> clear. As i understand it the default reading workflow goes in a 
> single query - all objects with full details in a single query/result 
> set that is processed one by one by the handlers. Don't know how 
> fetching rows from the result set works.
It is quite easy to explain. Please look here 
<https://github.com/Evolveum/midpoint/blob/54112a0ad266f8cd3f3024111a195fb064f79ae6/repo/repo-common/src/main/java/com/evolveum/midpoint/repo/common/task/AbstractSearchIterativeTaskHandler.java#L289>:

repositoryService.*searchObjectsIterative*((Class<O>) type, query, 
resultHandler, searchOptions, false, opResult);

The searchObjectsIterative method starts a search operation, and each 
object - as soon as it's returned from the repository - is handled by 
the resultHandler. (Using ScrollableResults - as can be seen here 
<https://github.com/Evolveum/midpoint/blob/42a1a66e93347d8c8b30624a574e7dfaf3743e88/repo/repo-sql-impl/src/main/java/com/evolveum/midpoint/repo/sql/helpers/ObjectRetriever.java#L680>.)

I do not know how this works internally in hibernate, JDBC driver and 
DBMS itself. But I suppose that if there's any 
caching/chunking/prefetching there, it does not gather all objects 
before processing them.

Anyway, I think we can implement the OID processing. (But it's not me 
who decides about the budgets :))

Pavol Mederly
Software developer
evolveum.com

On 07.02.2018 13:45, Arnošt Starosta - AMI Praha a.s. wrote:
> Hi Pavol,
>
> that unintended workaround saved my life for the moment .)
>
> Not sure if "fetches objects one-after-another" makes the picture 
> clear. As i understand it the default reading workflow goes in a 
> single query - all objects with full details in a single query/result 
> set that is processed one by one by the handlers. Don't know how 
> fetching rows from the result set works.
>
> Tweaking the transaction isolation did not really help, even with 
> default set to 'read committed'. Thats why i think the object 
> 'fetching' happens in larger chunks and may not be affected by weaker 
> transaction isolation. Or maybe i just misconfigured.
>
> Working with oids in iterative tasks would be great! You want the 
> worker threads to process 'that object' not 'this chunk of data'.
>
> The jira is already there - https://jira.evolveum.com/browse/MID-4414
>
> arnost
>
> 2018-02-07 12:17 GMT+01:00 Pavol Mederly <mederly at evolveum.com 
> <mailto:mederly at evolveum.com>>:
>
>     Hello Arnošt,
>
>     this is a good observation.
>
>     To be honest, iterative search by paging was meant as a workaround
>     for databases that do not support search with subsequent modify
>     operations on the returned objects. But, as we see from your
>     message, it can be used to avoid these problems as well :)
>
>     Just a slight correction:
>
>>     Midpoint in default configuration recomputes objects by first
>>     retrieving them ALL from repository, then passing each object to
>>     a worker thread.
>     This is not quite true. MidPoint fetches objects
>     one-after-another, and just after fetching each one from the
>     repository it passes the object to a worker thread (or processes
>     it directly if there are no worker threads defined). However,
>     because of quite strong transaction isolation setting
>     (serializable) the DBMS ensures that changes that occur on objects
>     after the transaction started (i.e. after the search was started)
>     are not reflected in their values.
>
>     I can imagine an option that would make this more optimized. E.g.
>     by retrieving just a list of OIDs and reading each object just
>     before its processing. If you have a second of free time, you
>     could create a jira for this.
>
>     Moreover, in 3.8 we loose transaction isolation a bit, from
>     serializable to repeatable_read. But I think this will not change
>     this behavior.
>
>     Pavol Mederly
>     Software developer
>     evolveum.com <http://evolveum.com>
>
>     On 29.01.2018 13:22, Arnošt Starosta - AMI Praha a.s. wrote:
>>     *Problem : *
>>
>>     Midpoint in default configuration recomputes objects by first
>>     retrieving them ALL from repository, then passing each object to
>>     a worker thread. If the object was updated meanwhile (e.g.
>>     live-synced or updated from gui) before it is recomputed by the
>>     worker thread, this update can be overwritten by the object
>>     version retrieved when the recompute task started. It happened on
>>     my deployment several times.
>>
>>     *Is your deployment affected? :*
>>
>>     Hard to say, i don't see any relevant log message to check. I had
>>     to check by debugging the running recompute task and verifying
>>     that SqlRepositoryServiceImpl.searchObjectsIterative calls
>>     ObjectRetriever.searchObjectsIterativeByPaging (ok) and not
>>     ObjectRetriever.searchObjectsIterativeAttempt (can loose updates).
>>
>>     Deployments with MySQL or H2 backend should be ok with default
>>     configuration (check sources
>>     SqlRepositoryConfiguration.computeDefaultIterativeSearchParameters).
>>     Did not verify the runtime.
>>
>>     *Solution:*
>>
>>     Configure iterativeSearchByPaging and
>>     iterativeSearchByPagingBatchSize in config.xml
>>     midpoint/repository element. Don't know if all backends support
>>     this setting but postgres (which i use) does.
>>
>>     <configuration>
>>
>>        <midpoint>
>>
>>            <repository>
>>
>>                …
>>
>>                <iterativeSearchByPaging>true</iterativeSearchByPaging>
>>
>>            
>>     <iterativeSearchByPagingBatchSize>17</iterativeSearchByPagingBatchSize>
>>
>>                …
>>
>>            </repository>
>>
>>        </midpoint>
>>
>>     </configuration>
>>
>>
>>     After setting these parameters the objects to recompute are read
>>     in 'pages' and fed to worker threads until the request queue
>>     between the reader thread and worker threads is full, then the
>>     reader is blocked. The size of the queue is hardcoded as 2 *
>>     number-of-worker-threads.
>>
>>     By setting the iterativeSearchByPagingBatchSize you can still
>>     loose updates, but the time window when this can happen shrinks
>>     from number-of-objects to max(page size,
>>     2*num-of-worker-threads). Without much thought i set the page
>>     size to (2 * number-of-worker-threads) + 1.
>>
>>     good luck
>>     arnost
>>
>>     -- 
>>
>>     Arnošt Starosta
>>     solution architect
>>
>>     gsm: [+420] 603 794 932 <tel:+420%20603%20794%20932>
>>     e-mail: arnost.starosta at ami.cz <mailto:arnost.starosta at ami.cz>
>>
>>     			
>>
>>     AMI Praha a.s.
>>     Pláničkova 11
>>     162 00 Praha 6
>>     tel.: [+420] 274 783 239 <tel:+420%20274%20783%20239>
>>     web: www.ami.cz <http://www.ami.cz/>
>>
>>     			
>>
>>     AMI Praha a.s.
>>
>>
>>     AMI Praha a.s.
>>     <http://www.ami.cz/reseni-a-sluzby/bezpecnost-dat/identity-management>
>>
>>
>>     Textem tohoto e-mailu podepisující neslibuje uzavřít ani
>>     neuzavírá za společnost AMI Praha a.s.
>>     jakoukoliv smlouvu. Každá smlouva, pokud bude uzavřena, musí mít
>>     výhradně písemnou formu.
>>
>>
>>
>>     _______________________________________________
>>     midPoint mailing list
>>     midPoint at lists.evolveum.com <mailto:midPoint at lists.evolveum.com>
>>     http://lists.evolveum.com/mailman/listinfo/midpoint
>>     <http://lists.evolveum.com/mailman/listinfo/midpoint>
>
>
>     _______________________________________________
>     midPoint mailing list
>     midPoint at lists.evolveum.com <mailto:midPoint at lists.evolveum.com>
>     http://lists.evolveum.com/mailman/listinfo/midpoint
>     <http://lists.evolveum.com/mailman/listinfo/midpoint>
>
>
>
>
> -- 
>
> Arnošt Starosta
> solution architect
>
> gsm: [+420] 603 794 932
> e-mail: arnost.starosta at ami.cz <mailto:arnost.starosta at ami.cz>
>
> 			
>
> AMI Praha a.s.
> Pláničkova 11
> 162 00 Praha 6
> tel.: [+420] 274 783 239
> web: www.ami.cz <http://www.ami.cz/>
>
> 			
>
> AMI Praha a.s.
>
>
> AMI Praha a.s. 
> <http://www.ami.cz/reseni-a-sluzby/bezpecnost-dat/identity-management>
>
> Textem tohoto e-mailu podepisující neslibuje uzavřít ani neuzavírá za 
> společnost AMI Praha a.s.
> jakoukoliv smlouvu. Každá smlouva, pokud bude uzavřena, musí mít 
> výhradně písemnou formu.
>
>
>
> _______________________________________________
> midPoint mailing list
> midPoint at lists.evolveum.com
> http://lists.evolveum.com/mailman/listinfo/midpoint

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.evolveum.com/pipermail/midpoint/attachments/20180207/afff2584/attachment.htm>