[midPoint] Users list is slow with large numbers of users

Pavol Mederly mederly at evolveum.com
Tue Jul 31 11:18:14 CEST 2018


Hello,

the fix is implemented on support-3.7 branch. Please do not forget to 
apply DB migration scripts.

Best regards,

Pavol Mederly
Software developer
evolveum.com

On 30.07.2018 19:24, Pavol Mederly wrote:
>
> Andrew,
>
> I think I got it. It seems to be a difference between 3.7 and 3.8, indeed.
>
> The query in 3.7 is like this:
>
> select
>   ruser0_.oid as col_0_0_,
>   ruser0_2_.fullObject as col_1_0_,
>   ruser0_2_.stringsCount as col_2_0_,
>   ruser0_2_.longsCount as col_3_0_,
>   ruser0_2_.datesCount as col_4_0_,
>   ruser0_2_.referencesCount as col_5_0_,
>   ruser0_2_.polysCount as col_6_0_,
>   ruser0_2_.booleansCount as col_7_0_
> from
>   m_user ruser0_
>   inner join m_focus ruser0_1_ on ruser0_.oid=ruser0_1_.oid
>   inner join m_object ruser0_2_ on ruser0_.oid=ruser0_2_.oid
> *order by **
> **  ruser0_2_.name_orig asc **
> *limit ?
>
> and in 3.8 it is:
>
> select
>   ruser0_.oid as col_0_0_,
>   ruser0_2_.fullObject as col_1_0_,
>   ruser0_2_.stringsCount as col_2_0_,
>   ruser0_2_.longsCount as col_3_0_,
>   ruser0_2_.datesCount as col_4_0_,
>   ruser0_2_.referencesCount as col_5_0_,
>   ruser0_2_.polysCount as col_6_0_,
>   ruser0_2_.booleansCount as col_7_0_
> from
>   m_user ruser0_
>   inner join m_focus ruser0_1_ on ruser0_.oid=ruser0_1_.oid
>   inner join m_object ruser0_2_ on ruser0_.oid=ruser0_2_.oid
> *order by**
> **  ruser0_.name_orig asc**
> *limit 20
>
> Note the difference. The 3.7 version orders on a row in doubly joined 
> table, whereas the 3.8 version orders on a row in the "main" table. 
> When executing against my test database (10K rows, each having ~100 KB 
> of XML data), the former takes 25 seconds, whereas the latter takes 46 
> milliseconds.
>
> Having seen this I remember that our colleague Viliam really 
> implemented querying over "local" name column in 3.8, instead of the 
> topmost one. :) Maybe this was the reason. Anyway, tomorrow I will try 
> to find the least painful way how to bring this change into 3.6/3.7; 
> and will let you know.
>
> Best regards,
>
> Pavol Mederly
> Software developer
> evolveum.com
> On 30.07.2018 17:58, Andrew Morgan wrote:
>> Pavol,
>>
>> I have attached the section of midpoint.log as requested.  As you can 
>> see, it takes about 20 seconds to display the results in the GUI.
>>
>> When I run the same query in the mysql client, is says, "20 rows in 
>> set (13.93 sec)".
>>
>> Here is the EXPLAIN for it again:
>>
>> +------+-------------+-----------+--------+---------------+--------------+---------+-------------------------+--------+----------------------------------------------+ 
>>
>> | id   | select_type | table     | type   | possible_keys | 
>> key          | key_len | ref                     | rows   | 
>> Extra                                        |
>> +------+-------------+-----------+--------+---------------+--------------+---------+-------------------------+--------+----------------------------------------------+ 
>>
>> |    1 | SIMPLE      | ruser0_   | index  | PRIMARY       | 
>> uc_user_name | 768     | NULL                    | 100970 | Using 
>> index; Using temporary; Using filesort |
>> |    1 | SIMPLE      | ruser0_1_ | eq_ref | PRIMARY       | 
>> PRIMARY      | 110     | midpointdev.ruser0_.oid |      1 | Using 
>> index                                  |
>> |    1 | SIMPLE      | ruser0_2_ | eq_ref | PRIMARY       | 
>> PRIMARY      | 110     | midpointdev.ruser0_.oid |      1 
>> |                                              |
>> +------+-------------+-----------+--------+---------------+--------------+---------+-------------------------+--------+----------------------------------------------+ 
>>
>>
>> Our database server is CentOS 7.5 running 
>> MariaDB-server-10.2.16-1.el7.centos.x86_64.  The VM has 4GB of RAM 
>> and 2 CPUs (Xeon Gold 5118 CPU @ 2.30GHz).  We haven't done much 
>> tuning to it yet:
>>
>> [mysqld]
>> innodb_buffer_pool_size = 2G
>> max_allowed_packet = 256M
>>
>>
>> Thanks for helping us investigate!
>>
>> Andy Morgan
>> Systems Administrator, Identity & Access Management
>> Information Services | Oregon State University
>> 541-737-8877 | is.oregonstate.edu
>>
>> On Mon, 30 Jul 2018, Pavol Mederly wrote:
>>
>>> Andrew, Petr,
>>>
>>> my observations on my developer machine running midPoint (master) 
>>> and MySQL 5.7 locally are here:
>>>
>>> I have imported 100K users (almost empty objects, only a name, 
>>> nothing more). After listing them via User List page the response is 
>>> almost immediate. Profiling says:
>>>
>>> 2018-07-30 16:56:08,973 [MODEL] [http-nio-8080-exec-9] DEBUG 
>>> (PROFILING): #### Entry: 602 
>>> ...model.impl.controller.ModelController->searchObjects
>>> 2018-07-30 16:56:08,973 [MODEL] [http-nio-8080-exec-9] TRACE 
>>> (PROFILING): ###### args: (UserType, Q{null filterPAGING: O: 0,M: 
>>> 20,ORD: [name ASCENDING], ,, 
>>> [ObjectOperationOptions(/:distinct=true)], 
>>> Task(id:1532962568973-0-1, name:null, oid:null), 
>>> R(com.evolveum.midpoint.web.component.data.SelectableBeanObjectDataProvider.searchObjects 
>>> UNKNOWN null))
>>> 2018-07-30 16:56:08,973 [REPOSITORY] [http-nio-8080-exec-9] DEBUG 
>>> (PROFILING): #### Entry: 604 
>>> ...repo.sql.SqlRepositoryServiceImpl->searchObjects
>>> 2018-07-30 16:56:08,973 [REPOSITORY] [http-nio-8080-exec-9] TRACE 
>>> (PROFILING): ###### args: (UserType, Q{null filterPAGING: O: 0,M: 
>>> 20,ORD: [name ASCENDING], ,, 
>>> [ObjectOperationOptions(/:distinct=true)], 
>>> R(com.evolveum.midpoint.model.api.ModelService.searchObjects UNKNOWN 
>>> null))
>>> 2018-07-30 16:56:08,973 [REPOSITORY] [http-nio-8080-exec-9] DEBUG 
>>> (org.hibernate.SQL): select ruser0_.oid as col_0_0_, 
>>> ruser0_2_.fullObject as col_1_0_, ruser0_2_.stringsCount as 
>>> col_2_0_, ruser0_2_.longsCount as col_3_0_, ruser0_2_.datesCount as 
>>> col_4_0_, ruser0_2_.referencesCount as col_5_0_, 
>>> ruser0_2_.polysCount as col_6_0_, ruser0_2_.booleansCount as 
>>> col_7_0_ from m_user ruser0_ inner join m_focus ruser0_1_ on 
>>> ruser0_.oid=ruser0_1_.oid inner join m_object ruser0_2_ on 
>>> ruser0_.oid=ruser0_2_.oid order by ruser0_.name_orig asc limit ?
>>> 2018-07-30 16:56:08,973 [REPOSITORY] [http-nio-8080-exec-9] DEBUG 
>>> (PROFILING): ##### Exit: 604 
>>> ...repo.sql.SqlRepositoryServiceImpl->searchObjects etime: 6.079 ms
>>> 2018-07-30 16:56:08,973 [REPOSITORY] [http-nio-8080-exec-9] TRACE 
>>> (PROFILING): ###### retval: 
>>> [user:00000000-0000-0000-0000-000000000002(administrator),user:63e9574b-6fa8-4f72-ab47-d7bc4a9b81b6(user 
>>> 100000),...]
>>> 2018-07-30 16:56:08,993 [MODEL] [http-nio-8080-exec-9] DEBUG 
>>> (PROFILING): ##### Exit: 602 
>>> ...model.impl.controller.ModelController->searchObjects etime: 
>>> 21.325 ms
>>> 2018-07-30 16:56:08,993 [MODEL] [http-nio-8080-exec-9] TRACE 
>>> (PROFILING): ###### retval: 
>>> [user:00000000-0000-0000-0000-000000000002(administrator),user:63e9574b-6fa8-4f72-ab47-d7bc4a9b81b6(user 
>>> 100000),...]
>>>
>>> The time needed to get first 20 user records was 21 milliseconds. 
>>> The repository processing itself took about 6 milliseconds.
>>>
>>> The SQL command itself was:
>>>
>>> select
>>>  ruser0_.oid as col_0_0_,
>>>  ruser0_2_.fullObject as col_1_0_,
>>>  ruser0_2_.stringsCount as col_2_0_,
>>>  ruser0_2_.longsCount as col_3_0_,
>>>  ruser0_2_.datesCount as col_4_0_,
>>>  ruser0_2_.referencesCount as col_5_0_,
>>>  ruser0_2_.polysCount as col_6_0_,
>>>  ruser0_2_.booleansCount as col_7_0_
>>> from
>>>  m_user ruser0_
>>>  inner join m_focus ruser0_1_ on ruser0_.oid=ruser0_1_.oid
>>>  inner join m_object ruser0_2_ on ruser0_.oid=ruser0_2_.oid
>>> order by
>>>  ruser0_.name_orig asc
>>> limit ?
>>>
>>> So, yes, the command is the same as in your case.
>>>
>>> Please, could you do the same profiling as I did? I.e. in your 
>>> config.xml file please set up:
>>>
>>> <configuration>
>>>    <midpoint>
>>>        ...
>>>        <profilingEnabled>true</profilingEnabled>
>>>    </midpoint>
>>> </configuration>
>>>
>>> And restart midPoint after the change. Then in the system 
>>> configuration (GUI: System -> Profiling) set the profiling:
>>>
>>> [cid:part1.EDEB0394.06F63FFB at evolveum.com]
>>>
>>> You can also set logging for org.hibernate.SQL to ALL (or TRACE) to 
>>> see SQL commands. And then please try opening the user list and send 
>>> us the relevant parts of midpoint.log file (i.e. text between Entry 
>>> and Exit of "...model.impl.controller.ModelController->searchObjects").
>>>
>>> 3.7.2 vs. 3.8 vs. 3.9-SNAPSHOT should make no difference in this 
>>> request, as far as I know. And I think that the user objects' size 
>>> should not perhaps make such a dramatic difference. I will have a 
>>> look at it more, however.
>>>
>>> Best regards,
>>>
>>> Pavol Mederly
>>> Software developer
>>> evolveum.com
>>>
>>>
>>> On 28.07.2018 16:51, Pavol Mederly wrote:
>>> Andrew,
>>>
>>> now I see I am going to have a look at it anyhow: 
>>> https://jira.evolveum.com/browse/MID-4776. I will let you know.
>>>
>>> Pavol Mederly
>>> Software developer
>>> evolveum.com
>>>
>>> On 28.07.2018 10:39, Pavol Mederly wrote:
>>> Hello Andrew,
>>>
>>> how long does it take to display the user list (in GUI) in your case?
>>>
>>> I know the query is like you say - and we know it can be optimized - 
>>> but in reality the SQL command sent to the database should limit the 
>>> results to first 20 records, so it should be faster than processing 
>>> SQL query without limits. (Anyway, we can sort this out if needed.)
>>>
>>> As for the columns, please have a look at "Custom columns 
>>> configuration" in 
>>> https://wiki.evolveum.com/display/midPoint/Admin+GUI+Configuration.
>>>
>>> Hope this helps,
>>>
>>> Pavol Mederly
>>> Software developer
>>> evolveum.com
>>>
>>> On 27.07.2018 18:03, Andrew Morgan wrote:
>>> We are running midPoint v3.7.2 (we delayed upgrading to v3.8 because 
>>> the Google connector is broken).  We recently loaded about 110,000 
>>> users in midPoint.  When I click on Users > List Users, midPoint is 
>>> waiting for the results of this database query:
>>>
>>> MariaDB [midpointdev]> explain select ruser0_.oid as col_0_0_, 
>>> ruser0_2_.fullObject as col_1_0_, ruser0_2_.stringsCount as 
>>> col_2_0_, ruser0_2_.longsCount as col_3_0_, ruser0_2_.datesCount as 
>>> col_4_0_, ruser0_2_.referencesCount as col_5_0_, 
>>> ruser0_2_.polysCount as col_6_0_, ruser0_2_.booleansCount as 
>>> col_7_0_ from m_user ruser0_ inner join m_focus ruser0_1_ on 
>>> ruser0_.oid=ruser0_1_.oid inner join m_object ruser0_2_ on 
>>> ruser0_.oid=ruser0_2_.oid order by ruser0_2_.name_orig asc;
>>> +------+-------------+-----------+--------+---------------+--------------+---------+-------------------------+--------+----------------------------------------------+ 
>>>
>>> | id   | select_type | table     | type   | possible_keys | 
>>> key          | key_len | ref                     | rows   | 
>>> Extra                                        |
>>> +------+-------------+-----------+--------+---------------+--------------+---------+-------------------------+--------+----------------------------------------------+ 
>>>
>>> |    1 | SIMPLE      | ruser0_   | index  | PRIMARY       | 
>>> uc_user_name | 768     | NULL                    | 100839 | Using 
>>> index; Using temporary; Using filesort |
>>> |    1 | SIMPLE      | ruser0_1_ | eq_ref | PRIMARY       | 
>>> PRIMARY      | 110     | midpointdev.ruser0_.oid |      1 | Using 
>>> index                                  |
>>> |    1 | SIMPLE      | ruser0_2_ | eq_ref | PRIMARY       | 
>>> PRIMARY      | 110     | midpointdev.ruser0_.oid |      1 
>>> |                                              |
>>> +------+-------------+-----------+--------+---------------+--------------+---------+-------------------------+--------+----------------------------------------------+ 
>>>
>>>
>>> Without the aliases, it is a little easier to read:
>>>
>>> MariaDB [midpointdev]> explain select m_user.oid, 
>>> m_object.fullObject, m_object.stringsCount, m_object.longsCount, 
>>> m_object.datesCount, m_object.referencesCount, m_object.polysCount, 
>>> m_object.booleansCount from m_user inner join m_focus on 
>>> m_user.oid=m_focus.oid inner join m_object on 
>>> m_user.oid=m_object.oid order by m_object.name_orig asc;
>>> +------+-------------+----------+--------+---------------+--------------+---------+------------------------+--------+----------------------------------------------+ 
>>>
>>> | id   | select_type | table    | type   | possible_keys | 
>>> key          | key_len | ref                    | rows   | 
>>> Extra                                        |
>>> +------+-------------+----------+--------+---------------+--------------+---------+------------------------+--------+----------------------------------------------+ 
>>>
>>> |    1 | SIMPLE      | m_user   | index  | PRIMARY       | 
>>> uc_user_name | 768     | NULL                   | 100839 | Using 
>>> index; Using temporary; Using filesort |
>>> |    1 | SIMPLE      | m_focus  | eq_ref | PRIMARY       | 
>>> PRIMARY      | 110     | midpointdev.m_user.oid |      1 | Using 
>>> index                                  |
>>> |    1 | SIMPLE      | m_object | eq_ref | PRIMARY       | 
>>> PRIMARY      | 110     | midpointdev.m_user.oid |      1 
>>> |                                              |
>>> +------+-------------+----------+--------+---------------+--------------+---------+------------------------+--------+----------------------------------------------+ 
>>>
>>>
>>> The result of this query is about 440MB of data.  "111690 rows in 
>>> set (22.84 sec)"  (BTW, join on m_focus isn't needed in this query, 
>>> right?)
>>>
>>> That is a LOT of data to push around anytime the Users list is 
>>> loaded.  We haven't even loaded all of our users yet!  Our total 
>>> user population is around 350,000.
>>>
>>> Is there a way to set some defaults for the User list?  For example, 
>>> could I change it to not display anything until a search is 
>>> performed?  Ideally, it wouldn't try to fetch m_object.fullObject, 
>>> at least until some filter is added.
>>>
>>> On a somewhat related note, I'd like to change which columns are 
>>> displayed in the User list.  For example, I would remove Fullname 
>>> and Email and add one of our custom attributes 
>>> (extension/username).  Is that possible?
>>>
>>> Thanks,
>>>
>>> Andy Morgan
>>> Systems Administrator, Identity & Access Management
>>> Information Services | Oregon State University
>>> 541-737-8877 | is.oregonstate.edu
>>> _______________________________________________
>>> midPoint mailing list
>>> midPoint at lists.evolveum.com<mailto:midPoint at lists.evolveum.com>
>>> http://lists.evolveum.com/mailman/listinfo/midpoint
>>>
>>> _______________________________________________
>>> midPoint mailing list
>>> midPoint at lists.evolveum.com<mailto:midPoint at lists.evolveum.com>
>>> http://lists.evolveum.com/mailman/listinfo/midpoint
>>>
>>> _______________________________________________
>>> midPoint mailing list
>>> midPoint at lists.evolveum.com<mailto:midPoint at lists.evolveum.com>
>>> http://lists.evolveum.com/mailman/listinfo/midpoint
>>>
>>>
>>
>>
>> _______________________________________________
>> midPoint mailing list
>> midPoint at lists.evolveum.com
>> http://lists.evolveum.com/mailman/listinfo/midpoint
>
>
>
> _______________________________________________
> midPoint mailing list
> midPoint at lists.evolveum.com
> http://lists.evolveum.com/mailman/listinfo/midpoint

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.evolveum.com/pipermail/midpoint/attachments/20180731/66c8e8bd/attachment.htm>


More information about the midPoint mailing list