连接具有数百万行的表

发布于 2024-12-07 18:25:28 字数 835 浏览 2 评论 0原文

我有下面的 sql,它返回 35k 行,运行大约需要 10 分钟。两个表都有数百万行。我该如何改进这个sql?

SELECT /*+ index(T_DIRECTORY X_DIR) */ 
                 DIRx.dir_id ,                   
                 base.id       
                 FROM T_DIRECTORY DIRx,  T_PERSON base 
                 WHERE 
                   DIRx.id = 26463
                   and DIRx.PERSONID=  base.PERSONID

'| Id  | Operation                    | Name           |'
'-------------------------------------------------------'
'|   0 | SELECT STATEMENT             |                |'
'|   1 |  NESTED LOOPS                |                |'
'|   2 |   TABLE ACCESS BY INDEX ROWID| T_DIRECTORY |'
'|   3 |    INDEX RANGE SCAN          | X_DIRECTORY |'
'|   4 |   TABLE ACCESS BY INDEX ROWID| T_PERSON      |'
'|   5 |    INDEX UNIQUE SCAN         | I_PERSON      |'

i have the sql below that returns 35k rows and take about 10 mins to run. both tables have millions of row. how can i improve this sql?

SELECT /*+ index(T_DIRECTORY X_DIR) */ 
                 DIRx.dir_id ,                   
                 base.id       
                 FROM T_DIRECTORY DIRx,  T_PERSON base 
                 WHERE 
                   DIRx.id = 26463
                   and DIRx.PERSONID=  base.PERSONID

'| Id  | Operation                    | Name           |'
'-------------------------------------------------------'
'|   0 | SELECT STATEMENT             |                |'
'|   1 |  NESTED LOOPS                |                |'
'|   2 |   TABLE ACCESS BY INDEX ROWID| T_DIRECTORY |'
'|   3 |    INDEX RANGE SCAN          | X_DIRECTORY |'
'|   4 |   TABLE ACCESS BY INDEX ROWID| T_PERSON      |'
'|   5 |    INDEX UNIQUE SCAN         | I_PERSON      |'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

淤浪 2024-12-14 18:25:28

首先,确保在 where 子句 (DIRx.id) 中的列和正在连接的表 (base.personid) 上有合适的索引,并且对这些索引进行分析,以便它们代表表中的数据 - 如果不进行分析,Oracle当可以使用索引时,可能会进行全表扫描。

SELECT INDEX_NAME, 
       NUM_ROWS, 
       LAST_ANALYZED 
  FROM DBA_INDEXES 
 WHERE TABLE_NAME IN ('T_DIRECTORY','T_PERSON');

此外,您还通过使用提示强制它使用索引,但如果一个表比另一个表小,则哈希联接可能是更好的解决方案,因此也许尝试完全删除提示并看看它是否有帮助。

并行查询

当此 SQL 运行时,您是否有多个 CPU,并且没有其他任何运行 - 即,它是批处理过程的一部分,还是可以同时调用多次的在线过程的一部分。如果是批处理并且您有多个 CPU,请尝试并行查询,但如果它是在线程序,请不要这样做(例如,使用并行查询的报告将尝试使用所有可用的 CPU,如果同时运行多次,性能可能会变得更糟或者,如果每个 CPU 核心运行的并行线程数量超过 2 个,

则实际上,每 4 个线程的并行线程执行时间大约为一半。

如果这些表

始终以这种方式连接,您可以考虑。聚簇表(Oracle 将在其中将每个表的连接行存储在相同的块中,这样就不必花费很长时间来检索连接的部分,但如果您还经常单独访问其中一个表

上下文

, 那么这可能会产生不利影响。孤立的查询并不总是能揭示出最好的答案 - 当可能是错误的事情时,做一些非常快的事情并没有帮助,所以看看上下文,即一旦返回,你将如何处理 35000 行,它们是只在今天添加的吗?具有子集的表可以用那个代替吗?

First, ensure you have suitable indexes on the columns in the where clause (DIRx.id) and on the table being joined (base.personid) and that these indexes are analyzed so they represent the data in the table - if not analyzed, Oracle may do a full tablescan when it could use the index instead.

SELECT INDEX_NAME, 
       NUM_ROWS, 
       LAST_ANALYZED 
  FROM DBA_INDEXES 
 WHERE TABLE_NAME IN ('T_DIRECTORY','T_PERSON');

Also you are forcing it to use an index by using a hint but if one table is smaller than the other, a hash join might be a better solution so maybe try removing the hint altogether and seeing if it helps.

Parallel Query

Do you have multiple CPUs and nothing else running when this SQL runs- i.e. is it part of a batch process or part of an online process that could be called multiple times simultaneously. If a batch process and you have multiple CPUs, try parallel query but do not do this if it's an online program (e.g. a report as using parallel query will try and use all available CPU and performance could get worse if it is run multiple times simultaneously or if you run more parallel threads than 2 per CPU core.

In practice parallel threads will approximately half the execution time per 4 threads.

Clustered tables/indexes

If these tables are always joined this way, you can consider a clustered table (where oracle will store the joining rows of each table in the same blocks so it doesnt have to spend so long retrieving the joined part but this can have a downside if you also frequently access one of the tables separately.

Context

Looking at a query in isolation doesnt alway reveal the best answer - doing something really fast when it might be the wrong thing doesnt help so look at the context i.e. what will you do with the 35000 rows once returned, are they ones added today only, is there a table that has the subset that could be used instead ?

岁月无声 2024-12-14 18:25:28

“我必须对表进行非规范化”

在 OLTP 数据库中,非规范化始终是坏消息。它会加快某些查询的速度,但会减慢其他查询的速度。最好的情况是它会对 DML 操作造成性能影响,最坏的情况可能会导致数据损坏。这无疑使我们的应用程序变得更加复杂。

@trevorNorth 对于上下文提出了一个有效的观点。数据分布非常重要。 T_DIRECTORY 中有多少行与该 ID 匹配?该结果集中有多少行在 T_PERSON 中具有匹配行?这些实际值与解释计划中的基数匹配吗?如果没有,刷新数据库统计信息可能会让数据库发现更好的计划(没有 INDEX 提示)。但如果您的统计数据是最新的,那么您需要不同的解决方案。

比如……什么?

调优是一项艰巨的任务,因为需要考虑的事情太多,而且细节确实很重要。有些顾问除了修复其他人代码中的性能错误之外什么都不做,过着非常好的生活。如果调整很简单,他们就无法做到这一点。

在缺乏确凿事实的情况下,这里有一些猜测。解决此问题的替代方法:

  1. 消除表读取,也许还可以通过构建复合索引来获得散列连接:
    T_DIRECTORY(ID、PERSONID、DIR_ID)
    T_PERSON(PERSONID,ID)
  2. 在语句上构建快速刷新的物化视图,并允许查询重写来满足查询。

"i had to denormalize the table"

In an OLTP database de-normalization is always bad news. It speeds up some queries but can slow down others. At best it imposes a performance hit on DML operations, at worst it can lead to data corruption. It certainly makes our applications more complicated.

@trevorNorth makes a valid point about context. Data distribution matters a lot. How many rows in T_DIRECTORY match that ID ? How many rows in that result set have matching rows in T_PERSON ? Do those actual values match the cardinalities in the explain plan? If not perhaps refreshing the database statistics will allow the database to discover a better plan (without the INDEX hint). But if your statistics are up-to-date then you need a different solution.

Such as ... what?

Tuning is a hard task because there are so many things to consider, and the specifics really matter . Some consultants make a very fine living from doing nothing but fixing performance bugs in other peoples' code. They wouldn't be able to do that if tuning was simple.

In the absence of hard facts here are a couple of guesses. Alternative ways to solve this problem:

  1. Eliminate the table reads and perhaps get a hash join as well by building compound indexes:
    T_DIRECTORY(ID, PERSONID, DIR_ID)
    T_PERSON(PERSONID,ID)
  2. Build a fast refresh materialized view on the statement, and allow Query Rewriting to satisfy queries.
臻嫒无言 2024-12-14 18:25:28

我必须对表进行非规范化。

i had to denormalize the table.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文