连接具有数百万行的表

发布于 2024-12-07 18:25:28 字数 835 浏览 2 评论 0原文

我有下面的 sql，它返回 35k 行，运行大约需要 10 分钟。两个表都有数百万行。我该如何改进这个sql？

SELECT /*+ index(T_DIRECTORY X_DIR) */ 
                 DIRx.dir_id ,                   
                 base.id       
                 FROM T_DIRECTORY DIRx,  T_PERSON base 
                 WHERE 
                   DIRx.id = 26463
                   and DIRx.PERSONID=  base.PERSONID

'| Id  | Operation                    | Name           |'
'-------------------------------------------------------'
'|   0 | SELECT STATEMENT             |                |'
'|   1 |  NESTED LOOPS                |                |'
'|   2 |   TABLE ACCESS BY INDEX ROWID| T_DIRECTORY |'
'|   3 |    INDEX RANGE SCAN          | X_DIRECTORY |'
'|   4 |   TABLE ACCESS BY INDEX ROWID| T_PERSON      |'
'|   5 |    INDEX UNIQUE SCAN         | I_PERSON      |'

原文

i have the sql below that returns 35k rows and take about 10 mins to run. both tables have millions of row. how can i improve this sql?

SELECT /*+ index(T_DIRECTORY X_DIR) */ 
                 DIRx.dir_id ,                   
                 base.id       
                 FROM T_DIRECTORY DIRx,  T_PERSON base 
                 WHERE 
                   DIRx.id = 26463
                   and DIRx.PERSONID=  base.PERSONID

'| Id  | Operation                    | Name           |'
'-------------------------------------------------------'
'|   0 | SELECT STATEMENT             |                |'
'|   1 |  NESTED LOOPS                |                |'
'|   2 |   TABLE ACCESS BY INDEX ROWID| T_DIRECTORY |'
'|   3 |    INDEX RANGE SCAN          | X_DIRECTORY |'
'|   4 |   TABLE ACCESS BY INDEX ROWID| T_PERSON      |'
'|   5 |    INDEX UNIQUE SCAN         | I_PERSON      |'

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

淤浪 2024-12-14 18:25:28

首先，确保在 where 子句 (DIRx.id) 中的列和正在连接的表 (base.personid) 上有合适的索引，并且对这些索引进行分析，以便它们代表表中的数据 - 如果不进行分析，Oracle当可以使用索引时，可能会进行全表扫描。

SELECT INDEX_NAME, 
       NUM_ROWS, 
       LAST_ANALYZED 
  FROM DBA_INDEXES 
 WHERE TABLE_NAME IN ('T_DIRECTORY','T_PERSON');

此外，您还通过使用提示强制它使用索引，但如果一个表比另一个表小，则哈希联接可能是更好的解决方案，因此也许尝试完全删除提示并看看它是否有帮助。

并行查询

当此 SQL 运行时，您是否有多个 CPU，并且没有其他任何运行 - 即，它是批处理过程的一部分，还是可以同时调用多次的在线过程的一部分。如果是批处理并且您有多个 CPU，请尝试并行查询，但如果它是在线程序，请不要这样做（例如，使用并行查询的报告将尝试使用所有可用的 CPU，如果同时运行多次，性能可能会变得更糟或者，如果每个 CPU 核心运行的并行线程数量超过 2 个，

则实际上，每 4 个线程的并行线程执行时间大约为一半。

如果这些表

始终以这种方式连接，您可以考虑。聚簇表（Oracle 将在其中将每个表的连接行存储在相同的块中，这样就不必花费很长时间来检索连接的部分，但如果您还经常单独访问其中一个表

上下文

，那么这可能会产生不利影响。孤立的查询并不总是能揭示出最好的答案 - 当可能是错误的事情时，做一些非常快的事情并没有帮助，所以看看上下文，即一旦返回，你将如何处理 35000 行，它们是只在今天添加的吗？具有子集的表可以用那个代替吗？

First, ensure you have suitable indexes on the columns in the where clause (DIRx.id) and on the table being joined (base.personid) and that these indexes are analyzed so they represent the data in the table - if not analyzed, Oracle may do a full tablescan when it could use the index instead.

SELECT INDEX_NAME, 
       NUM_ROWS, 
       LAST_ANALYZED 
  FROM DBA_INDEXES 
 WHERE TABLE_NAME IN ('T_DIRECTORY','T_PERSON');

Also you are forcing it to use an index by using a hint but if one table is smaller than the other, a hash join might be a better solution so maybe try removing the hint altogether and seeing if it helps.

Parallel Query

Do you have multiple CPUs and nothing else running when this SQL runs- i.e. is it part of a batch process or part of an online process that could be called multiple times simultaneously. If a batch process and you have multiple CPUs, try parallel query but do not do this if it's an online program (e.g. a report as using parallel query will try and use all available CPU and performance could get worse if it is run multiple times simultaneously or if you run more parallel threads than 2 per CPU core.

In practice parallel threads will approximately half the execution time per 4 threads.

Clustered tables/indexes

If these tables are always joined this way, you can consider a clustered table (where oracle will store the joining rows of each table in the same blocks so it doesnt have to spend so long retrieving the joined part but this can have a downside if you also frequently access one of the tables separately.

Context

Looking at a query in isolation doesnt alway reveal the best answer - doing something really fast when it might be the wrong thing doesnt help so look at the context i.e. what will you do with the 35000 rows once returned, are they ones added today only, is there a table that has the subset that could be used instead ?

回复收藏 0 原文

岁月无声 2024-12-14 18:25:28

“我必须对表进行非规范化”

在 OLTP 数据库中，非规范化始终是坏消息。它会加快某些查询的速度，但会减慢其他查询的速度。最好的情况是它会对 DML 操作造成性能影响，最坏的情况可能会导致数据损坏。这无疑使我们的应用程序变得更加复杂。

@trevorNorth 对于上下文提出了一个有效的观点。数据分布非常重要。 T_DIRECTORY 中有多少行与该 ID 匹配？该结果集中有多少行在 T_PERSON 中具有匹配行？这些实际值与解释计划中的基数匹配吗？如果没有，刷新数据库统计信息可能会让数据库发现更好的计划（没有 INDEX 提示）。但如果您的统计数据是最新的，那么您需要不同的解决方案。

比如……什么？

调优是一项艰巨的任务，因为需要考虑的事情太多，而且细节确实很重要。有些顾问除了修复其他人代码中的性能错误之外什么都不做，过着非常好的生活。如果调整很简单，他们就无法做到这一点。

在缺乏确凿事实的情况下，这里有一些猜测。解决此问题的替代方法：