[midPoint] Updates can get lost during a running recomputation task (SOLVED)

Wed Feb 7 15:04:55 CET 2018

Wow. This is a bit beyond my current competence. I am afraid that the 
only way how to check it is to try it. :)

If you can build midPoint from sources, could you just switch the scroll 
mode to TYPE_SCROLL_SENSITIVE and try that? I could write a special test 
to do that but ... :) hard to find the time.

Pavol Mederly
Software developer
evolveum.com

On 07.02.2018 14:41, Arnošt Starosta - AMI Praha a.s. wrote:
> That might be the root of the problem!
>
>     The searchObjectsIterative method starts a search operation, and
>     each object - as soon as it's returned from the repository - is
>     handled by the resultHandler. (Using ScrollableResults - as can be
>     seen here
>     <https://github.com/Evolveum/midpoint/blob/42a1a66e93347d8c8b30624a574e7dfaf3743e88/repo/repo-sql-impl/src/main/java/com/evolveum/midpoint/repo/sql/helpers/ObjectRetriever.java#L680>.)
>
> where 'here' is
>
> ScrollableResults results = rQuery.scroll(ScrollMode.FORWARD_ONLY);
>
> The jdbc spec says about TYPE_FORWARD_ONLY
>
> "The rows contained in the result set depend on how the underlying 
> database materializes the results. That is, it contains the rows that 
> satisfy the query at either the time the query is executed or as the 
> rows are retrieved"
>
> If we wanted to see the changes, we would have to use 
> TYPE_SCROLL_SENSITIVE.
>
> What I don't understand is how this plays together with transaction 
> isolation settings. Does specifying ResultSet type override them or is 
> it the other way around? No time to read the whole spec :/
>
> arnost
>
>     I do not know how this works internally in hibernate, JDBC driver
>     and DBMS itself. But I suppose that if there's any
>     caching/chunking/prefetching there, it does not gather all objects
>     before processing them.
>
>     Anyway, I think we can implement the OID processing. (But it's not
>     me who decides about the budgets :))
>
>     Pavol Mederly
>     Software developer
>     evolveum.com <http://evolveum.com>
>
>     On 07.02.2018 13:45, Arnošt Starosta - AMI Praha a.s. wrote:
>>     Hi Pavol,
>>
>>     that unintended workaround saved my life for the moment .)
>>
>>     Not sure if "fetches objects one-after-another" makes the picture
>>     clear. As i understand it the default reading workflow goes in a
>>     single query - all objects with full details in a single
>>     query/result set that is processed one by one by the handlers.
>>     Don't know how fetching rows from the result set works.
>>
>>     Tweaking the transaction isolation did not really help, even with
>>     default set to 'read committed'. Thats why i think the object
>>     'fetching' happens in larger chunks and may not be affected by
>>     weaker transaction isolation. Or maybe i just misconfigured.
>>
>>     Working with oids in iterative tasks would be great! You want the
>>     worker threads to process 'that object' not 'this chunk of data'.
>>
>>     The jira is already there -
>>     https://jira.evolveum.com/browse/MID-4414
>>     <https://jira.evolveum.com/browse/MID-4414>
>>
>>     arnost
>>
>>     2018-02-07 12:17 GMT+01:00 Pavol Mederly <mederly at evolveum.com
>>     <mailto:mederly at evolveum.com>>:
>>
>>         Hello Arnošt,
>>
>>         this is a good observation.
>>
>>         To be honest, iterative search by paging was meant as a
>>         workaround for databases that do not support search with
>>         subsequent modify operations on the returned objects. But, as
>>         we see from your message, it can be used to avoid these
>>         problems as well :)
>>
>>         Just a slight correction:
>>
>>>         Midpoint in default configuration recomputes objects by
>>>         first retrieving them ALL from repository, then passing each
>>>         object to a worker thread.
>>         This is not quite true. MidPoint fetches objects
>>         one-after-another, and just after fetching each one from the
>>         repository it passes the object to a worker thread (or
>>         processes it directly if there are no worker threads
>>         defined). However, because of quite strong transaction
>>         isolation setting (serializable) the DBMS ensures that
>>         changes that occur on objects after the transaction started
>>         (i.e. after the search was started) are not reflected in
>>         their values.
>>
>>         I can imagine an option that would make this more optimized.
>>         E.g. by retrieving just a list of OIDs and reading each
>>         object just before its processing. If you have a second of
>>         free time, you could create a jira for this.
>>
>>         Moreover, in 3.8 we loose transaction isolation a bit, from
>>         serializable to repeatable_read. But I think this will not
>>         change this behavior.
>>
>>         Pavol Mederly
>>         Software developer
>>         evolveum.com <http://evolveum.com>
>>
>>         On 29.01.2018 13:22, Arnošt Starosta - AMI Praha a.s. wrote:
>>>         *Problem : *
>>>
>>>         Midpoint in default configuration recomputes objects by
>>>         first retrieving them ALL from repository, then passing each
>>>         object to a worker thread. If the object was updated
>>>         meanwhile (e.g. live-synced or updated from gui) before it
>>>         is recomputed by the worker thread, this update can be
>>>         overwritten by the object version retrieved when the
>>>         recompute task started. It happened on my deployment several
>>>         times.
>>>
>>>         *Is your deployment affected? :*
>>>
>>>         Hard to say, i don't see any relevant log message to check.
>>>         I had to check by debugging the running recompute task and
>>>         verifying that
>>>         SqlRepositoryServiceImpl.searchObjectsIterative calls
>>>         ObjectRetriever.searchObjectsIterativeByPaging (ok) and not
>>>         ObjectRetriever.searchObjectsIterativeAttempt (can loose
>>>         updates).
>>>
>>>         Deployments with MySQL or H2 backend should be ok with
>>>         default configuration (check sources
>>>         SqlRepositoryConfiguration.computeDefaultIterativeSearchParameters).
>>>         Did not verify the runtime.
>>>
>>>         *Solution:*
>>>
>>>         Configure iterativeSearchByPaging and
>>>         iterativeSearchByPagingBatchSize in config.xml
>>>         midpoint/repository element. Don't know if all backends
>>>         support this setting but postgres (which i use) does.
>>>
>>>         <configuration>
>>>
>>>            <midpoint>
>>>
>>>                <repository>
>>>
>>>                    …
>>>
>>>                    <iterativeSearchByPaging>true</iterativeSearchByPaging>
>>>
>>>                
>>>         <iterativeSearchByPagingBatchSize>17</iterativeSearchByPagingBatchSize>
>>>
>>>                    …
>>>
>>>                </repository>
>>>
>>>            </midpoint>
>>>
>>>         </configuration>
>>>
>>>
>>>         After setting these parameters the objects to recompute are
>>>         read in 'pages' and fed to worker threads until the request
>>>         queue between the reader thread and worker threads is full,
>>>         then the reader is blocked. The size of the queue is
>>>         hardcoded as 2 * number-of-worker-threads.
>>>
>>>         By setting the iterativeSearchByPagingBatchSize you can
>>>         still loose updates, but the time window when this can
>>>         happen shrinks from number-of-objects to max(page size,
>>>         2*num-of-worker-threads). Without much thought i set the
>>>         page size to (2 * number-of-worker-threads) + 1.
>>>
>>>         good luck
>>>         arnost
>>>
>>>         -- 
>>>
>>>         Arnošt Starosta
>>>         solution architect
>>>
>>>         gsm: [+420] 603 794 932 <tel:+420%20603%20794%20932>
>>>         e-mail: arnost.starosta at ami.cz <mailto:arnost.starosta at ami.cz>
>>>
>>>         			
>>>
>>>         AMI Praha a.s.
>>>         Pláničkova 11
>>>         162 00 Praha 6
>>>         tel.: [+420] 274 783 239 <tel:+420%20274%20783%20239>
>>>         web: www.ami.cz <http://www.ami.cz/>
>>>
>>>         			
>>>
>>>         AMI Praha a.s.
>>>
>>>
>>>         AMI Praha a.s.
>>>         <http://www.ami.cz/reseni-a-sluzby/bezpecnost-dat/identity-management>
>>>
>>>
>>>         Textem tohoto e-mailu podepisující neslibuje uzavřít ani
>>>         neuzavírá za společnost AMI Praha a.s.
>>>         jakoukoliv smlouvu. Každá smlouva, pokud bude uzavřena, musí
>>>         mít výhradně písemnou formu.
>>>
>>>
>>>
>>>         _______________________________________________
>>>         midPoint mailing list
>>>         midPoint at lists.evolveum.com <mailto:midPoint at lists.evolveum.com>
>>>         http://lists.evolveum.com/mailman/listinfo/midpoint
>>>         <http://lists.evolveum.com/mailman/listinfo/midpoint>
>>
>>
>>         _______________________________________________
>>         midPoint mailing list
>>         midPoint at lists.evolveum.com <mailto:midPoint at lists.evolveum.com>
>>         http://lists.evolveum.com/mailman/listinfo/midpoint
>>         <http://lists.evolveum.com/mailman/listinfo/midpoint>
>>
>>
>>
>>
>>     -- 
>>
>>     Arnošt Starosta
>>     solution architect
>>
>>     gsm: [+420] 603 794 932 <tel:+420%20603%20794%20932>
>>     e-mail: arnost.starosta at ami.cz <mailto:arnost.starosta at ami.cz>
>>
>>     			
>>
>>     AMI Praha a.s.
>>     Pláničkova 11
>>     162 00 Praha 6
>>     tel.: [+420] 274 783 239 <tel:+420%20274%20783%20239>
>>     web: www.ami.cz <http://www.ami.cz/>
>>
>>     			
>>
>>     AMI Praha a.s.
>>
>>
>>     AMI Praha a.s.
>>     <http://www.ami.cz/reseni-a-sluzby/bezpecnost-dat/identity-management>
>>
>>
>>     Textem tohoto e-mailu podepisující neslibuje uzavřít ani
>>     neuzavírá za společnost AMI Praha a.s.
>>     jakoukoliv smlouvu. Každá smlouva, pokud bude uzavřena, musí mít
>>     výhradně písemnou formu.
>>
>>
>>
>>     _______________________________________________
>>     midPoint mailing list
>>     midPoint at lists.evolveum.com <mailto:midPoint at lists.evolveum.com>
>>     http://lists.evolveum.com/mailman/listinfo/midpoint
>>     <http://lists.evolveum.com/mailman/listinfo/midpoint>
>
>
>     _______________________________________________
>     midPoint mailing list
>     midPoint at lists.evolveum.com <mailto:midPoint at lists.evolveum.com>
>     http://lists.evolveum.com/mailman/listinfo/midpoint
>     <http://lists.evolveum.com/mailman/listinfo/midpoint>
>
>
>
>
> -- 
>
> Arnošt Starosta
> solution architect
>
> gsm: [+420] 603 794 932
> e-mail: arnost.starosta at ami.cz <mailto:arnost.starosta at ami.cz>
>
> 			
>
> AMI Praha a.s.
> Pláničkova 11
> 162 00 Praha 6
> tel.: [+420] 274 783 239
> web: www.ami.cz <http://www.ami.cz/>
>
> 			
>
> AMI Praha a.s.
>
>
> AMI Praha a.s. 
> <http://www.ami.cz/reseni-a-sluzby/bezpecnost-dat/identity-management>
>
> Textem tohoto e-mailu podepisující neslibuje uzavřít ani neuzavírá za 
> společnost AMI Praha a.s.
> jakoukoliv smlouvu. Každá smlouva, pokud bude uzavřena, musí mít 
> výhradně písemnou formu.
>
>
>
> _______________________________________________
> midPoint mailing list
> midPoint at lists.evolveum.com
> http://lists.evolveum.com/mailman/listinfo/midpoint

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.evolveum.com/pipermail/midpoint/attachments/20180207/7dca583d/attachment.htm>