为什么简单的“或”SQL 成本就会爆炸?

发布于 2024-11-08 19:54:56 字数 1028 浏览 0 评论 0原文

我有以下语句来查找数据中的明确名称(约 100 万条):

select Prename, Surname from person p1 
where Prename is not null and Surname is not null 
and not exists (
   select * from person p2 where (p1.Surname = p2.Surname OR p1.Surname = p2.Altname) 
   and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL

Oracle 显示了 1477315000 的巨大成本,并且执行在 5 分钟后没有结束。简单地将 OR 拆分为自己的存在子条款可将性能​​提高到 0.5 秒,并将成本提高到 45000:

select Prename, Surname from person p1 
where Prename is not null and Surname is not null 
and not exists (
   select * from person p2 where p1.Surname = p2.Surname and
   p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and not exists (
   select * from person p2 where p1.Surname = p2.Altname and 
   p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL

我的问题不是将其调整到最佳状态,因为它只是一个很少执行的查询,而且我知道 CONTACT 超越了任何索引,但我只是想知道这么高的成本从何而来。这两个查询在语义上对我来说似乎是等价的。

I have the following statement to find unambiguous names in my data (~1 Million entries):

select Prename, Surname from person p1 
where Prename is not null and Surname is not null 
and not exists (
   select * from person p2 where (p1.Surname = p2.Surname OR p1.Surname = p2.Altname) 
   and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL

Oracle shows a huge cost of 1477315000 and execution does not end after 5 minutes. Simply splitting the OR into an own exists subclause boosts performance to 0,5 s and costs to 45000:

select Prename, Surname from person p1 
where Prename is not null and Surname is not null 
and not exists (
   select * from person p2 where p1.Surname = p2.Surname and
   p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and not exists (
   select * from person p2 where p1.Surname = p2.Altname and 
   p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL

It's not my question to tweak this to the best, as it is only a seldomly executed query, and I know CONTACT is surpassing any index, but I just wonder where this high cost comes from. Both queries seem semantically equivalent to me.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

旧夏天 2024-11-15 19:54:56

答案就在您的疑问的解释计划中。它们在语义上可能是等效的,但查询幕后的执行计划却截然不同。

EXISTS 的操作方式与 JOIN 不同,本质上,OR 过滤语句是将表连接在一起的。

第二个查询中不会发生 JOIN,因为您仅从一张表中检索记录。

The answer is in the EXPLAIN PLAN for your queries. They may semantically be equivalent but the execution plan behind the scenes for your queries are vastly different.

EXISTS operates differently from a JOIN and essentially, your OR filter statement is what joins the tables together.

No JOIN occurs in the second query as you are only retrieving records from one table.

魂ガ小子 2024-11-15 19:54:56

两个查询的结果在语义上可能是等效的,但执行在操作上并不等效。您的第二个示例从不使用 OR 运算符来组合谓词。第二个示例中的所有谓词均使用 AND 进行组合。

性能更好,因为如果与 AND 组合的第一个谓词计算结果不为 true,则跳过第二个(或任何其他谓词)(不计算)。如果您使用 OR,则必须频繁评估两个(或所有)谓词,从而减慢查询速度。 (对 ORed 谓词进行检查,直到其计算结果为 true。)

The results of your two queries may be semantically equivalent, but the execution is not operationally equivalent. Your second example never makes use of an OR operator to combine predicates. All of your predicates in the second example are combined using an AND.

The performance is better because, if the first predicate that is combined with an AND does not evaluate to true then the second (or any other predicate) is skipped, (not evaluated). If you used an OR then both (or all) predicates would have to be evaluated frequently thus slowing down your query. (ORed predicates are checked until one evaluates to true.)

轮廓§ 2024-11-15 19:54:56

我会考虑测试重写如下的查询...根据“限定”被视为匹配的条件从一个到另一个进行直接连接...然后,在 WHERE 子句中,如果不匹配,则将其丢弃 根据您的评论,没有找到匹配项

select 
      p1.Prename, 
      p1.Surname
   from 
      person p1 
         join person p2
            on p1.ID <> p2.ID
            and (  p1.Surname = p2.Surname
                or p1.SurName = p2.AltName )
            and p2.PreName like concat( concat( '%', p1.Prename ), '%' )
   where
          p1.PreName is not null
      and p1.SurName is not null
      and p1.Inv_date is null
      and p2.id is null

,但从您正在寻找的内容来看...不,不要进行左外连接...如果您正在寻找与您想要清除的名称相似的名称(不过你会处理的),你只想通过自连接(因此是正常连接)预先限定那些确实有匹配的记录。如果您有一个没有相似名称的名称,您可能不想管它......因此它会自动被排除在结果集中。

现在,WHERE 子句开始发挥作用...左侧有一个有效的人...右侧也有一个人...这些是重复项...所以您可以匹配,现在通过输入逻辑“ p2.ID IS NULL”创建的结果与 NOT EXIST 给出的最终结果相同。

我将查询恢复为正常的“连接”。

I would consider testing the query rewritten as below... Do a direct join from one to the other on the criteria that "Qualifies" what IS considered a match... Then, in the WHERE clause, throw it out if it doesn't come up with a match

select 
      p1.Prename, 
      p1.Surname
   from 
      person p1 
         join person p2
            on p1.ID <> p2.ID
            and (  p1.Surname = p2.Surname
                or p1.SurName = p2.AltName )
            and p2.PreName like concat( concat( '%', p1.Prename ), '%' )
   where
          p1.PreName is not null
      and p1.SurName is not null
      and p1.Inv_date is null
      and p2.id is null

Per your comments, but from what it appears you were looking for... NO, do NOT do a left outer join... If you are looking for names that are ALIKE that you want to PURGE out (however you'll handle that), you only want to PREQUALIFY those records that DO HAVE A MATCH via the self-join (hence normal join). If you have a name that DOES NOT have a similar name, you probably want to leave it alone... thus it will automatically be left OUT of the result set.

Now, the WHERE clause kicks in... You have a valid person on the left... that HAS a person on the right.. These ARE the duplicates... so you have the match, now by throwing in the logical "p2.ID IS NULL" creates the same result as NOT EXIST giving the final results.

I put my query back to a normal "join".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文