oracle执行计划中的Semi-Join和Anti-Join(3)

oraunix

Prerequisites and Hints that Impact Semi-Joins

There are a few situations in which Oracle cannot use a semi-join access path. If the query contains a DISTINCT operation (explicitly or implicitly through a set operator such as UNION) then Oracle cannot transform EXISTS or IN clauses into something that could use a semi-join access path. The same is true if the EXISTS or IN clause is part of an OR branch. (Semi-join access paths can be used in queries that contain ORs in the WHERE clause, just as long as the EXISTS or IN is not part of the OR.)
查询中包含distinct或者exists和in子句是or分支的一部分，那么Oracle不能进行semi join。

Assuming that a semi-join access path is possible, Oracle chooses the semi-join algorithm to use while evaluating execution plans. In Oracle 9i a semi-join can be performed using the nested loops, hash join, or merge join algorithms. As with a conventional join, Oracle 9i will choose the join algorithm with the lowest cost. In Oracle 8i a semi-join can be performed using the hash join or merge join algorithms, but Oracle 8i will not choose the join algorithm with the lowest cost. Instead, the always_semi_join instance parameter dictates whether or not an Oracle 8i database should perform semi-joins and, if so, which algorithm to use. The always_semi_join instance parameter is obsolete in Oracle 9i.
在Oracle 9i中，semi join可能选择nested join、hash join或者merge join，oracle 9i会选择最低cost的执行计划。

If a semi-join access path is not possible (because of a DISTINCT operation or an OR predicate), or if instance parameters specify a semi-join should not be used, then Oracle will choose an alternate data access path. This usually amounts to performing a conventional join and sorting out the duplicates, or scanning the first table and using the second table as a filter.
如果不能进行semi join，那么Oracle会采用普通join，然后再用sort去除重复行；也可能是采用filter。

Oracle provides the NL_SJ, HASH_SJ, and MERGE_SJ hints in order for you to manipulate the semi-join process if you need to. The hint is applied to the subquery of the EXISTS or IN clause, not the main body of the query itself. In my experience Oracle does not need you to provide a hint in order to consider the efficient semi-join access paths. However, in Oracle 9i or 10g, where you have a wider choice of semi-join algorithms, you might want to use hints if Oracle chooses a hash semi-join when a nested loops semi-join would in fact be better, or vice versa.
在子查询中，我们可以才用hints（NL_SJ, HASH_SJ, and MERGE_SJ），影响半连接的连接方法。
但是从Oracle 9i开始，基本上不用关心这个问题，因为Oracle会自己计算具体的执行成本。

Semi-Join Example #1

Suppose you want a list of your gold-status customers who have placed an order within the last three days. You might start with a query that looks like:

      SELECT DISTINCT C.short_name, C.customer_id
      FROM    customers C, orders O
      WHERE C.customer_type = 'Gold'
      AND    O.customer_id = C.customer_id
      AND    O.order_date > SYSDATE - 3
      ORDER BY C.short_name;
The execution plan (from a TKPROF report) for this query is:

      Rows    Row Source Operation
      -------  ---------------------------------------------------
            2  SORT UNIQUE (cr=33608 r=30076 w=0 time=6704029 us)
         20 HASH JOIN  (cr=33608 r=30076 w=0 time=6703101 us)
         10 TABLE ACCESS FULL CUSTOMERS (cr=38 r=36 w=0 time=31718 us)
         2990 TABLE ACCESS FULL ORDERS (cr=33570 r=30040 w=0 time=6646420 us)
You can see that Oracle did a (conventional) hash join between the CUSTOMERS and ORDERS tables, and then performed a sort for uniqueness to filter out the duplicate records. This sort was necessary because the query is supposed to show a customer only once, even if they have more than one order placed within the last three days. (A sort to satisfy the ORDER BY clause is also necessary, but Oracle tackles this at the same time as the sort for uniqueness and thus it does not appear as a separate step in the execution plan.)

Using what we have discussed so far, you might want to rewrite the query using a semi-join:
//为什么能将一个distinct直接转换成半连接呢，因为主查询查询的列是主键列，不可能重复。
      SELECT C.short_name, C.customer_id
      FROM    customers C
      WHERE C.customer_type = 'Gold'
      AND    EXISTS
               (
               SELECT 1
               FROM orders O
               WHERE  O.customer_id = C.customer_id
               AND O.order_date > SYSDATE - 3
               )
      ORDER BY C.short_name;
The execution plan for this query becomes:

      Rows    Row Source Operation
      -------  ---------------------------------------------------
            2  SORT ORDER BY (cr=33608 r=29921 w=0 time=6422770 us)
            2 HASH JOIN SEMI (cr=33608 r=29921 w=0 time=6422538 us)
         10 TABLE ACCESS FULL CUSTOMERS (cr=38 r=0 w=0 time=61290 us)
         2990 TABLE ACCESS FULL ORDERS (cr=33570 r=29921 w=0 time=6345754 us)
The “SEMI” that you see on the second line indicates that a semi-join access path is being used instead of a conventional join. A semi-join is more efficient here for two reasons: (1) The search through the ORDERS table for a given customer can stop as soon as the first qualifying order is found, and (2) There is no need to sort the results to remove duplicate customer records in the case where a customer had multiple qualifying orders. (You can see that the sort for uniqueness, which took 928 microseconds, has been replaced by a sort to satisfy the ORDER BY clause, which took only 232 microseconds.)
注意上面的执行计划的每一个细节，包括一些执行时间（时间是累积的）。

Rewriting the query to use a semi-join access path has perhaps been a novel experience, but, so far, it hasn’t done us a substantial amount of good. We brought the elapsed time down about 4%, but that doesn’t seem too significant, and it could just be the result of chance variation in our test environment. But suppose you are aware of the fact that very few of your customers are gold-status. Perhaps a nested loops semi-join would be more efficient. You can add an NL_SJ hint and see the results for yourself:

      SELECT C.short_name, C.customer_id
      FROM    customers C
      WHERE C.customer_type = 'Gold'
      AND    EXISTS
               (
               SELECT /*+ NL_SJ */ 1
               FROM orders O
               WHERE  O.customer_id = C.customer_id
               AND O.order_date > SYSDATE - 3
               )
      ORDER BY C.short_name;

      Rows    Row Source Operation
      -------  ---------------------------------------------------
            2  SORT ORDER BY (cr=833 r=725 w=0 time=358431 us)
            2 NESTED LOOPS SEMI (cr=833 r=725 w=0 time=358232 us)
         10 TABLE ACCESS FULL CUSTOMERS (cr=38 r=0 w=0 time=2210 us)
            2 TABLE ACCESS BY INDEX ROWID ORDERS (cr=795 r=725 w=0 time=355822 us)
         780    INDEX RANGE SCAN ORDERS_N1 (cr=15 r=13 w=0 time=5601 us)(object id 34176)
Indeed, the nested loops semi-join brought the consistent reads down to 833 from 33608 and reduced the execution time by over 90%. So it appears that Oracle chose a hash semi-join when a nested loops semi-join was more efficient. Once we get Oracle to see the light with the NL_SJ hint, we end up with a much more efficient query than we started with. We achieved this performance gain by leveraging Oracle’s nested loops semi-join access path.
改用嵌套以后，效果提升明显。具体的区别就是我们一直讲的什么时候使用嵌套、什么时候使用hash，在这个地方一样起作用。

P3 · 发表于 2011-2-27 22:00:52

讲的很透彻嘿嘿对CBO那本书理解的更进了一步

图文精华

oracle执行计划中的Semi-Join和Anti-Join(3)