oracle执行计划中的Semi-Join和Anti-Join(4)

本帖最后由 oraunix 于 2010-11-11 16:52 编辑

Semi-Join Example #2

I recently worked on a project where applying the concepts presented in this section brought the execution time for a key query in my client’s application down from 46 minutes to 1.1 seconds. A screen in the application allowed a user to enter a project owner and up to five participants, and a list of assignments involving all of the specified people would be displayed. The users found that the response time was tolerable if they only selected one participant, but it went downhill from there as they added more participants. The original version of the query looked like this:

        SELECT   DISTINCT A.name, A.code, A.description,
                          A.item_id, A.assignment_id, FI.string0, FI.string1
        FROM     relationships R, assignments A, format_items FI,
                 relationships R1, relationships R2, relationships R3,
                 relationships R4, relationships R5
        WHERE    R.user_id = 134546
        AND      R.account_id = 134545
        AND      R.type_code = 0
        AND      A.item_id = R.item_id
        AND      FI.item_id = A.item_id
        AND      R1.item_id = A.item_id AND R1.status = 5 AND R1.user_id = 137279
        AND      R2.item_id = A.item_id AND R2.status = 5 AND R2.user_id = 134555
        AND      R3.item_id = A.item_id AND R3.status = 5 AND R3.user_id = 134546
        AND      R4.item_id = A.item_id AND R4.status = 5 AND R4.user_id = 137355
        AND      R5.item_id = A.item_id AND R5.status = 5 AND R5.user_id = 134556
        ORDER BY A.name ASC;
(I have changed the names of tables and columns, stripped away extraneous columns and predicates, and eliminated bind variables in order to both protect my client’s confidentiality and make the example easier to understand.) The execution plan looked like this:

        Rows     Row Source Operation
        -------  ---------------------------------------------------
            642  SORT UNIQUE (cr=23520269 r=34 w=0 time=2759937104 us)
        64339104   TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=23520269 r=34 w=0 time=1838051782 us)
        95184881    NESTED LOOPS  (cr=7842642 r=23 w=0 time=907238095 us)
        2710288     NESTED LOOPS  (cr=2266544 r=23 w=0 time=103840003 us)
         317688      NESTED LOOPS  (cr=484734 r=11 w=0 time=23494451 us)
          50952       NESTED LOOPS  (cr=43280 r=10 w=0 time=2688237 us)
           4146        NESTED LOOPS  (cr=19016 r=3 w=0 time=988374 us)
           1831         NESTED LOOPS  (cr=13353 r=0 w=0 time=608296 us)
           4121          HASH JOIN  (cr=7395 r=0 w=0 time=399488 us)
           2046           TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=7211 r=0 w=0 time=299181 us)
          17788            INDEX RANGE SCAN RELATIONSHIPS_N3 (cr=71 r=0 w=0 time=81158 us)(object id 34528)
           3634           TABLE ACCESS FULL ASSIGNMENTS (cr=184 r=0 w=0 time=25536 us)
           1831          TABLE ACCESS BY INDEX ROWID FORMAT_ITEMS (cr=5958 r=0 w=0 time=163252 us)
           1831           INDEX RANGE SCAN FORMAT_ITEMS_N1 (cr=4127 r=0 w=0 time=115113 us)(object id 34316)
           4146         TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=5663 r=3 w=0 time=349554 us)
           4264          INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=3678 r=0 w=0 time=224390 us)(object id 34345)
          50952        TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=24264 r=7 w=0 time=1538051 us)
          70976         INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=8428 r=0 w=0 time=630831 us)(object id 34345)
         317688       TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=441454 r=1 w=0 time=19584539 us)
        1631032        INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=108302 r=0 w=0 time=7213989 us)(object id 34345)
        2710288      TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=1781810 r=12 w=0 time=72015925 us)
        4237104       INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=643298 r=1 w=0 time=29018764 us)(object id 34345)
        92474592     INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=5576098 r=0 w=0 time=503165776 us)(object id 34345)
It is no wonder that this query took so long when five participants were selected. With each join to the RELATIONSHIPS table, the Cartesian product gets more and more out of hand. By the final sort to eliminate the duplicates, we have over 64 million rows to reckon with.
上面的执行计划,无论是hash还是nested,都是一个糟糕的执行计划,join了太多的表,返回了太多的行,sort部分一定是处理了非常多的数据,势必效率会非常的差。如果发生mulipass的排序,那么效果将会非常的差。
返回的数据非常多,被过滤的数据也非常多,那么更可以考虑是否可以使用半连接。

This query struck me right away as a good candidate for a nested loops semi-join. We want a list of assignments that meet certain criteria. Whether a specific user has one qualifying relationship record or 10, it makes no difference. We are only concerned with the existence of at least one qualifying relationship record for each of the specified users.

Rewriting the query with EXISTS clauses to semi-join to the RELATIONSHIPS table five times is pretty easy. The challenge here is that the query does have a DISTINCT operator, and this defeats Oracle’s semi-join access paths. Not to worry—an inline view with a NO_MERGE hint saved the day. The revised query looked like:
//因为半连接返回的数据行,不能保证distinct,这个地方还是需要加上distinct,这和上面的例子不同。加上了distinct,就限制了半连接的使用,因此我们使用了hints,使得主查询和子查询不进行合并。
        SELECT   /*+ NO_MERGE (M) */   
                 DISTINCT M.name, M.code, M.description,  
                          M.item_id, M.assignment_id, M.string0, M.string1
        FROM     (
                 SELECT A.name, A.code, A.description,
                        A.item_id, A.assignment_id, FI.string0, FI.string1
                 FROM   relationships R, assignments A, format_items FI
                 WHERE  R.user_id = 134546
                 AND    R.account_id = 134545
                 AND    R.type_code = 0
                 AND    A.item_id = R.item_id
                 AND    FI.item_id = A.item_id
                 AND    EXISTS
                        (SELECT 1 FROM relationships R1
                         WHERE R1.item_id = A.item_id AND R1.status = 5
                         AND   R1.user_id = 137279)     //连接一个表,使用exists。
                 AND    EXISTS
                        (SELECT 1 FROM relationships R2
                         WHERE R2.item_id = A.item_id AND R2.status = 5
                         AND   R2.user_id = 134555)
                 AND    EXISTS
                        (SELECT 1 FROM relationships R3
                         WHERE R3.item_id = A.item_id AND R3.status = 5
                         AND   R3.user_id = 134546)
                 AND    EXISTS
                        (SELECT 1 FROM relationships R4
                         WHERE R4.item_id = A.item_id AND R4.status = 5
                         AND   R4.user_id = 137355)
                 AND    EXISTS
                        (SELECT 1 FROM relationships R5
                         WHERE R5.item_id = A.item_id AND R5.status = 5
                         AND   R5.user_id = 134556)
                 ) M
        ORDER BY M.name ASC;
The execution plan was as follows:

        Rows     Row Source Operation
        -------  ---------------------------------------------------
            642  SORT UNIQUE (cr=36315 r=89 w=0 time=1107054 us)
           1300   VIEW  (cr=36315 r=89 w=0 time=1085116 us)
           1300    NESTED LOOPS SEMI (cr=36315 r=89 w=0 time=1082232 us)
           1314     NESTED LOOPS SEMI (cr=32385 r=89 w=0 time=1002330 us)
           1314      NESTED LOOPS SEMI (cr=28261 r=89 w=0 time=904654 us)
           1314       NESTED LOOPS SEMI (cr=22822 r=89 w=0 time=737705 us)
           1322        NESTED LOOPS SEMI (cr=18730 r=89 w=0 time=651196 us)
           1831         NESTED LOOPS  (cr=13353 r=89 w=0 time=530670 us)
           4121          HASH JOIN  (cr=7395 r=89 w=0 time=347584 us)
           2046           TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=7211 r=0 w=0 time=186820 us)
          17788            INDEX RANGE SCAN RELATIONSHIPS_N3 (cr=71 r=0 w=0 time=43770 us)(object id 34528)
           3634           TABLE ACCESS FULL ASSIGNMENTS (cr=184 r=89 w=0 time=91899 us)
           1831          TABLE ACCESS BY INDEX ROWID FORMAT_ITEMS (cr=5958 r=0 w=0 time=141416 us)
           1831           INDEX RANGE SCAN FORMAT_ITEMS_N1 (cr=4127 r=0 w=0 time=100207 us)(object id 34316)
           1322         TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=5377 r=0 w=0 time=92046 us)
           2472          INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=3664 r=0 w=0 time=61077 us)(object id 34345)
           1314        TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=4092 r=0 w=0 time=63478 us)
           1582         INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=2647 r=0 w=0 time=40433 us)(object id 34345)
           1314       TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=5439 r=0 w=0 time=133620 us)
          11011        INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=2639 r=0 w=0 time=65312 us)(object id 34345)
           1314      TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=4124 r=0 w=0 time=75657 us)
           2234       INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=2630 r=0 w=0 time=49373 us)(object id 34345)
           1300     TABLE ACCESS BY INDEX ROWID RELATIONSHIPS (cr=3930 r=0 w=0 time=58651 us)
           1300      INDEX RANGE SCAN RELATIONSHIPS_N2 (cr=2630 r=0 w=0 time=38832 us)(object id 34345)
The execution plan starts out the same as before, with a hash join between RELATIONSHIPS and ASSIGNMENTS followed by a nested loops join to FORMAT_ITEMS. At this point there are 1831 rows in the candidate result set. Whereas the original query does five conventional joins to the RELATIONSHIPS table and the 1831 rows balloon to over 64 million, the revised query does semi-joins instead and whittles the candidate rows down to 1300. Joining 1300 rows is a lot faster than joining 64 million rows, so you can see how the semi-join access path really speeds up this query.

The NO_MERGE hint and the inline view (aliased as “M”) causes Oracle to separate the DISTINCT operation from the bulk of the query. This is how we get around the fact that a DISTINCT operation defeats semi-join access paths. In the second version of the query, Oracle executes the query as if the DISTINCT keyword were not there. Then, as a final step, it applies the DISTINCT operation to the interim result set that it has stored in memory (the output of the VIEW operator).

If you are wondering why the NO_MERGE hint is necessary, recall what we discussed earlier about the heuristic Oracle uses when deciding whether or not to merge a subquery into the main body of a query. Oracle assumes that merging is a good thing and will do it whenever possible. Thus without the NO_MERGE hint Oracle would merge the inline view into the DISTINCT operation (eliminating the inline view), and the semi-join access path would be defeated by the presence of the DISTINCT operation.

This example demonstrates the value of the semi-join access paths. Your gains may not be as spectacular as we have seen here, but hopefully you can see from this discussion how proper use of semi-joins can make a certain class of queries much more efficient. This example also underscores the importance of knowing how to properly use semi-joins. If you did not remember that semi-join access paths are defeated by the DISTINCT operation, you might have changed the example query to use EXISTS clauses and wondered why the query still ran slowly.


标签: 暂无标签
oraunix

写了 199 篇文章,拥有财富 1026,被 339 人关注

转播转播 分享分享 分享淘帖
回复

使用道具

P3 | 发表于 2011-2-27 22:31:51
nice job ~
回复

使用道具

您需要登录后才可以回帖 登录 | 加入社区

本版积分规则

意见
反馈