oracle执行计划中的Semi-Join和Anti-Join(1)

本帖最后由 oraunix 于 2010-11-11 14:46 编辑

彻底的理解这两个知识点,对于优化非常的重要。这两个点经常出现在执行计划中,要彻底的理解。
下面的内容介绍了两个执行计划步骤的一些定义信息。

Semi-Join Defined

A “semi-join” between two tables returns rows from the first table where one or more matches are found in the second table. The difference between a semi-join and a conventional join is that rows in the first table will be returned at most once. Even if the second table contains two matches for a row in the first table, only one copy of the row will be returned. Semi-joins are written using the EXISTS or IN constructs.两个表的半连接,通常都是一个表返回数据,另外一个表只是用来进行连接
第一个表:返回数据的表(下图中的dept)
第二个表:进行连接的表(下图中的emp)
半连接的定义:第一个表只是返回一行数据(当第二个表匹配一行或者多行数据的时候)
具体的执行过程:第一个表拿出一条数据,第二个表根据第一个表的数据,进行匹配返回,如果返回一条以上数据,那么第一个表就返回一行数据,不管第二个表返回几行,第一个表都是返回一行数据。

Suppose you have the DEPT and EMP tables in the SCOTT schema and you want a list of departments with at least one employee. You could write the query with a conventional join:希望返回至少有一个员工的部门名称,下面的查询有个问题,如果一个部门有400个员工,那么关于这个部门至少返回400行数据。

        SELECT   D.deptno, D.dname
        FROM     dept D, emp E
        WHERE    E.deptno = D.deptno
        ORDER BY D.deptno;
Unfortunately, if a department has 400 employees then that department will appear in the query output 400 times. You could eliminate the duplicate rows by using the DISTINCT keyword, but you would be making Oracle do more work than necessary. Really what you want to do is specify a semi-join between the DEPT and EMP tables instead of a conventional join:
为了解决上面的问题,我们可以加上distinct,这样就增加了数据库的负载(join后进行sort),显然效果不是很好。
        SELECT   D.deptno, D.dname
        FROM     dept D
        WHERE    EXISTS
                 (
                 SELECT 1
                 FROM   emp E
                 WHERE  E.deptno = D.deptno
                 )
        ORDER BY D.deptno;
这个语句很好的完成了上面的任务,一个部门只是返回一行数据。这就给我们了一个提示,面对distinct+表连接,我们能不能才用半连接呢?
The above query will list the departments that have at least one employee. Whether a department has one employee or 100, the department will appear only once in the query output. Moreover, Oracle will move on to the next department as soon as it finds the first employee in a department, instead of finding all of the employees in each department.


Anti-Join Defined

An “anti-join” between two tables returns rows from the first table where no matches are found in the second table. An anti-join is essentially the opposite of a semi-join: While a semi-join returns one copy of each row in the first table for which at least one match is found, an anti-join returns one copy of each row in the first table for which no match is found. Anti-joins are written using the NOT EXISTS or NOT IN constructs. These two constructs differ in how they handle nulls—a subtle but very important distinction which we will discuss later.第一个表返回数据的前提是:在第二个表中没有匹配的数据行,只要有一个返回数据行,第一个表就不会返回数据行。
not exists和not in的区别在于:他们对待nulls的不同,后面会进行专门的讨论。




Suppose you want a list of empty departments—departments with no employees. You could write a query that finds all departments and subtracts off the department of each employee:希望返回没有雇员的部门信息

        SELECT   D1.deptno, D1.dname
        FROM     dept D1
        MINUS
        SELECT   D2.deptno, D2.dname
        FROM     dept D2, emp E2
        WHERE    D2.deptno = E2.deptno
        ORDER BY 1;
The above query will give the desired results, but it might be clearer to write the query using an anti-join:

        SELECT   D.deptno, D.dname
        FROM     dept D
        WHERE    NOT EXISTS
                 (
                 SELECT 1
                 FROM   emp E
                 WHERE  E.deptno = D.deptno
                 )
        ORDER BY D.deptno;
The above query is more efficient because Oracle can employ an anti-join access path. The difference in efficiency here is akin to the difference between a nested loops join and a hash join when you are joining every row in one table to another.





标签: 暂无标签
oraunix

写了 199 篇文章,拥有财富 1026,被 339 人关注

转播转播 分享分享 分享淘帖
回复

使用道具

成为第一个吐槽的人

您需要登录后才可以回帖 登录 | 加入社区

本版积分规则

意见
反馈