选择不同...内连接与选择...其中 id in (...)

发布于 2024-08-29 01:44:50 字数 4708 浏览 13 评论 0原文

我正在尝试创建表的子集(作为物化视图),定义为在另一个物化视图中具有匹配记录的记录。

例如,假设我有一个包含 user_id 和 name 列的用户表,以及一个包含entry_id、user_id、activity 和 timestamp 列的日志表。

首先,我创建日志表的物化视图,仅选择时间戳 > 的那些行。某个日期。现在我想要日志表快照中引用的用户的物化视图。我可以创建它

select * from Users where user_id in (select user_id from Log_mview)

或我可以这样做

select distinct u.* from Users u inner join Log_mview l on u.user_id = l.user_id

(需要不同的以避免来自具有多个日志条目的用户的多次点击)。

前者看起来更干净、更优雅,但需要更长的时间。我错过了什么吗?有更好的方法吗?

编辑: where contains 子句有很大帮助,除了条件使用 OR 的情况。例如,假设上面的日志表也有一个 user_name 列,将日志条目与用户记录匹配的正确方法是当任一列(用户 ID 或用户名)匹配时。我发现这

select distinct u.* from Users u
    inner join Log_mview l
        on u.user_id = l.user_id or u.name = l.user_name

快得多吗

select * from Users u where exists
    (select id from Log_mview l 
        where l.user_id = u.user_id or l.user_name = u.name)

比有帮助

? (关于解释计划......让我努力清理它,或者它们,而不是......我会在一段时间后发布它们。)

编辑:解释计划: 对于使用内连接的查询:

Plan hash value: 436698422

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                     |  4539K|   606M|       |   637K  (3)| 02:07:25 |
|   1 |  HASH UNIQUE                    |                     |  4539K|   606M|  3201M|   637K  (3)| 02:07:25 |
|   2 |   CONCATENATION                 |                     |       |       |       |            |          |
|*  3 |    HASH JOIN                    |                     |  4206K|   561M|    33M|   181K  (4)| 00:36:14 |
|   4 |     BITMAP CONVERSION TO ROWIDS |                     |   926K|    22M|       |  2279   (1)| 00:00:28 |
|   5 |      BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4     |       |       |       |            |          |
|*  6 |     TABLE ACCESS FULL           | USERS               |    15M|  1630M|       | 86638   (6)| 00:17:20 |
|*  7 |    HASH JOIN                    |                     |  7646K|  1020M|    33M|   231K  (4)| 00:46:13 |
|   8 |     BITMAP CONVERSION TO ROWIDS |                     |   926K|    22M|       |  2279   (1)| 00:00:28 |
|   9 |      BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4     |       |       |       |            |          |
|  10 |     TABLE ACCESS FULL           | USERS               |    23M|  2515M|       | 87546   (7)| 00:17:31 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("U"."NAME"="L"."USER_NAME")
   6 - filter("U"."NAME" IS NOT NULL)
   7 - access("U"."USER_ID"=TO_NUMBER("L"."USER_ID"))
       filter(LNNVL("U"."NAME"="L"."USER_NAME") OR LNNVL("U"."NAME" IS NOT NULL))

Note
-----
   - dynamic sampling used for this statement

对于使用 where isn't 的查询:

Plan hash value: 2786958565

-----------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                     |     1 |   114 |    21M  (1)| 70:12:13 |
|*  1 |  FILTER                       |                     |       |       |            |          |
|   2 |   TABLE ACCESS FULL           | USERS               |    23M|  2515M| 87681   (7)| 00:17:33 |
|   3 |   BITMAP CONVERSION TO ROWIDS |                     |  7062 |   179K|     1   (0)| 00:00:01 |
|*  4 |    BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4     |       |       |            |          |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT /*+ */ 0 FROM "MYSCHEMA"."LOG_MVIEW" 
              "LOG_MVIEW" WHERE ("USER_NAME"=:B1 OR TO_NUMBER("USER_ID")=:B2) AND 
              ("USER_NAME"=:B3 OR TO_NUMBER("USER_ID")=:B4) AND ("USER_NAME"=:B5 OR 
              TO_NUMBER("USER_ID")=:B6)))
   4 - filter("USER_NAME"=:B1 OR TO_NUMBER("USER_ID")=:B2)

Note
-----
   - dynamic sampling used for this statement

更改数据库对象名称以保护无辜者。 :p

I'm trying to create a subset of a table (as a materialized view), defined as those records which have a matching record in another materialized view.

For example, let's say I have a Users table with user_id and name columns, and a Log table, with entry_id, user_id, activity, and timestamp columns.

First I create a materialized view of the Log table, selecting only those rows with timestamp > some_date. Now I want a materliazed view of the Users referenced in my snapshot of the Log table. I can either create it as

select * from Users where user_id in (select user_id from Log_mview)

or I can do

select distinct u.* from Users u inner join Log_mview l on u.user_id = l.user_id

(need the distinct to avoid multiple hits from users with multiple log entries).

The former seems cleaner and more elegant, but takes much longer. Am I missing something? Is there a better way to do this?

Edit: The where exists clause helped a lot, except in the case where the condition uses an OR. For example, let's say the Log table above also had a user_name column, and the correct way to match a Log entry to a Users record is when either of the columns (user id or user name) match. I'm finding that

select distinct u.* from Users u
    inner join Log_mview l
        on u.user_id = l.user_id or u.name = l.user_name

is much faster than

select * from Users u where exists
    (select id from Log_mview l 
        where l.user_id = u.user_id or l.user_name = u.name)

Any help?

(Regarding the explain plan... Lemme work on sanitizing it, or them, rather... I'll post them in a while.)

Edit: explain plans:
For the query with inner join:

Plan hash value: 436698422

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                     |  4539K|   606M|       |   637K  (3)| 02:07:25 |
|   1 |  HASH UNIQUE                    |                     |  4539K|   606M|  3201M|   637K  (3)| 02:07:25 |
|   2 |   CONCATENATION                 |                     |       |       |       |            |          |
|*  3 |    HASH JOIN                    |                     |  4206K|   561M|    33M|   181K  (4)| 00:36:14 |
|   4 |     BITMAP CONVERSION TO ROWIDS |                     |   926K|    22M|       |  2279   (1)| 00:00:28 |
|   5 |      BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4     |       |       |       |            |          |
|*  6 |     TABLE ACCESS FULL           | USERS               |    15M|  1630M|       | 86638   (6)| 00:17:20 |
|*  7 |    HASH JOIN                    |                     |  7646K|  1020M|    33M|   231K  (4)| 00:46:13 |
|   8 |     BITMAP CONVERSION TO ROWIDS |                     |   926K|    22M|       |  2279   (1)| 00:00:28 |
|   9 |      BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4     |       |       |       |            |          |
|  10 |     TABLE ACCESS FULL           | USERS               |    23M|  2515M|       | 87546   (7)| 00:17:31 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("U"."NAME"="L"."USER_NAME")
   6 - filter("U"."NAME" IS NOT NULL)
   7 - access("U"."USER_ID"=TO_NUMBER("L"."USER_ID"))
       filter(LNNVL("U"."NAME"="L"."USER_NAME") OR LNNVL("U"."NAME" IS NOT NULL))

Note
-----
   - dynamic sampling used for this statement

For the one using where exists:

Plan hash value: 2786958565

-----------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                     |     1 |   114 |    21M  (1)| 70:12:13 |
|*  1 |  FILTER                       |                     |       |       |            |          |
|   2 |   TABLE ACCESS FULL           | USERS               |    23M|  2515M| 87681   (7)| 00:17:33 |
|   3 |   BITMAP CONVERSION TO ROWIDS |                     |  7062 |   179K|     1   (0)| 00:00:01 |
|*  4 |    BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4     |       |       |            |          |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT /*+ */ 0 FROM "MYSCHEMA"."LOG_MVIEW" 
              "LOG_MVIEW" WHERE ("USER_NAME"=:B1 OR TO_NUMBER("USER_ID")=:B2) AND 
              ("USER_NAME"=:B3 OR TO_NUMBER("USER_ID")=:B4) AND ("USER_NAME"=:B5 OR 
              TO_NUMBER("USER_ID")=:B6)))
   4 - filter("USER_NAME"=:B1 OR TO_NUMBER("USER_ID")=:B2)

Note
-----
   - dynamic sampling used for this statement

DB object names changed to protect the innocent. :p

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

清秋悲枫 2024-09-05 01:44:50

尝试一下

select * from Users u
where exists 
   ( select user_id 
     from Log_mview l
     where l.user_id = u.user_id )
/

如果子查询返回大量行,WHERE EXISTS 可能比 WHERE ... IN 快得多。

Try this

select * from Users u
where exists 
   ( select user_id 
     from Log_mview l
     where l.user_id = u.user_id )
/

If the sub-query returns a large number of rows WHERE EXISTS can be substantially faster than WHERE ... IN.

白芷 2024-09-05 01:44:50

这取决于您拥有的数据,但在联接中使用 Distinct 可以提高您的性能:

Select u.*
From Users u
Join ( Select Distinct user_id
       From log_mview ) l On u.user_id = l.user_id

This will depend on the data you have, but using Distinct within the join could improve your performance:

Select u.*
From Users u
Join ( Select Distinct user_id
       From log_mview ) l On u.user_id = l.user_id
御弟哥哥 2024-09-05 01:44:50

第二个查询可能比第一个查询(join+distinc)更多地使用硬盘。

第一个查询可能会翻译成这样:

for each row in table Log find corresponding row in table User (in memory).

The database is probably smart enough to create in memory structures for table User that is probably much smaller than Log table.

I believe that query one (join+distinct) will require only one pass on table Log.

The distinct is probably executed in memory.

第二个查询可能会强制数据库对表 Log 进行多次完整读取。

所以在第二个查询中你可能会得到:

对于表中的每一行,用户读取表 Log 中的所有行(从磁盘)以便匹配条件。

您还必须考虑到,由于内存可用性、负载和表增加的变化,某些查询可能会出现速度上的巨大差异。

The second query is probably working more the harddrive than the first query (join+distinc).

The first query will probably translates to something like:

for each row in table Log find corresponding row in table User (in memory).

The database is probably smart enough to create in memory structures for table User that is probably much smaller than Log table.

I believe that query one (join+distinct) will require only one pass on table Log.

The distinct is probably executed in memory.

The second query probably forces the database to do multiples fulls reads on table Log.

So in the second query you probably get:

For each row in table user read all the rows in table Log (from disk) in order to match the condition.

You have also to consider that some query may experience a dramatic diference in speed due to changes in memory availability, load and table increase.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文