两个sql查询的性能差异?

发布于 2024-09-14 20:58:28 字数 457 浏览 1 评论 0原文

我的表中有一个具有文本数据类型的字段。

以下两个 sql 查询的性能是否存在差异:

 select * from tablename where fieldname="xyz%";
 select * from tablename where fieldname="%zyx";

如果我们要实现这些查询的执行,我认为我们需要这样做:

我们必须匹配两个正则表达式(xyz* 和 *zyx)。

我们必须从头开始检查字符串字符。

对于第一个查询,我们必须读取前三个字符来查看是否存在匹配,但对于第二个查询,我们必须读取直到获得字符串末尾以确定是否发生匹配。但是,如果我们将字符串的长度存储在某处,我们可以直接读取最后三个字符,从而提供与第一种情况类似的性能。

我的问题是像 mysql 和 oracle 这样的商业数据库在执行查询的性能上是否表现出任何差异。

I have a field in my table having text data type.

Is there a difference in performance for the following two sql queries:

 select * from tablename where fieldname="xyz%";
 select * from tablename where fieldname="%zyx";

If we were to implement the execution of these queries, this is what I think we would need to do:

We have to match the two regexes (xyz* and *zyx).

We will have to check the string chars one by starting from the beginning.

For the first query we will have to read the first three characters to see if there is a match but for the second one we will have to read till the we get the end of the string to determine if the match has occurred. But if we have the length of the string stored somewhere we can directly read the last three characters giving similar performance as the first case.

My question is whether commercial databases like mysql and oracle show any difference in the performance in the execution of the queries.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

你好,陌生人 2024-09-21 20:58:28

所有数据库的性能之间肯定存在差异。
如果列被索引,第一种情况肯定会更快。

我在我的项目中有类似的实例,其中用户也可以搜索“结尾为”(就像您的第二个查询)。

由于这是经常使用的操作并且查询速度很慢,

  1. 我们在表中添加了额外的列来存储字段名的反向。
  2. 就索引此列
  3. 每当以 was 搜索结束时 ,我们
    在此新列中进行搜索:)(通过反转原始搜索字符串),

因此您的第二个查询变为:

 select * from tablename where fieldname_rev="xyz%";

这种方法使其与开始查询一样快。

There is definitely difference between performance on all DB's.
First case will be definitely faster if column is indexed.

I had similar instance in my project where user was also allowed to search "ends with" (like your second query).

As this was frequently used operation and query was slow,

  1. We added additional column to table which stored reverse of fieldname.
  2. indexed this column
  3. whenever ends with was searched , we
    searched in this new column :) (by reversing original search string)

so your second query becomes:

 select * from tablename where fieldname_rev="xyz%";

This approach made it as fast as starts with query.

冷清清 2024-09-21 20:58:28

从您的评论中摘录:“我只想知道以匹配开始与以匹配结束是否不同”。

首先 - 请记住,我们并不是在寻找匹配字符串的最佳算法。我们正在寻找最好的算法来查找一组 N 行中的所有匹配字符串。我们希望做得比“执行算法 X、N 次”更好。

如果 fieldname 没有索引,那么两个查询之间的性能差异非常小 - SQL 引擎只会对字符串的前 3 个或最后 3 个字节进行匹配,这只是偏移到正确的内存位置。

如果字段名已建立索引,则两次搜索之间的性能将存在巨大差异,因为我们可以丢弃大部分数据,而不是检查所有 N 行。

即对于“xyz%”版本,我们可以使用二分搜索。

我们从中间的元素开始,恰好是“peter”。我们可以立即丢弃“peter”之前的所有内容,并获取剩余部分的中间元素 -“samantha”,依此类推,直到找到以“xyz”开头的条目。

对于“%xyz”版本,我们不能这样做,因为任何字符串都可能在末尾匹配,我们需要查看每个字符串。

随着表的大小扩大,这两种方法之间的差异变得很大。

为字段名的反转创建字段/索引的解决方案允许我们再次使用二分搜索技术。 (在某些数据库中,实际上可以在不创建额外字段的情况下做到这一点,而是通过使用特定的索引类型、虚拟列等)。

这已经简化了很多 - 有关数据库索引实际实现的详细信息,请查看 B-Tree 和 B*Tree 索引。

Picking up from your comment : " I just want to know if a starts with match is diff from an ends with match".

Firstly - remember that we are not looking for the best algorithm to match a string. We are looking for the best algorithm to find all matching strings in a set of N rows. We want to do better than 'Do algorithm X, N times'.

If fieldname is NOT indexed, then there will be very little difference in performance between the two queries - the SQL engine is just going to do a match on the first 3 or last 3 bytes of the string, which is simply a matter of offsetting to the right memory location.

If the fieldname IS indexed, there will be a huge difference in performance between the two searches, because rather than examining all N rows, we can discard most of the data.

i.e. for the "xyz%" version, we can use a binary search.

We start at the middle element, which happens to be 'peter'. We can immediately discard everything before 'peter' and get the middle element on the remainder - 'samantha', and so on, until we find the entries starting 'xyz'.

With the "%xyz" version, we cannot do this, as ANY string could potentially match at the end, we need to look at every string.

As the size of our table expands, the difference between these two approaches becomes large.

The solution of creating a field/index for the reverse of fieldname allows us to use the binary search technique again. (In some databases it is actual possible to do this without creating an extra field, but through using particular index types, virtual columns, etc).

This is simplified a lot - for detail on the actual implementation of database indexes, look into B-Tree and B*Tree indexes.

眼泪都笑了 2024-09-21 20:58:28

如果fieldname被索引,大多数商业数据库都可以将第一个查询转换为区间搜索

select * from tablename where fieldname>="xyz" and fieldname<"xy{"

,速度非常快。

If fieldname is indexed, most of commercial databases can transform the first query into an interval search

select * from tablename where fieldname>="xyz" and fieldname<"xy{"

which is very fast.

梦回梦里 2024-09-21 20:58:28

是的,以下两个查询之间存在差异:

select * from tablename where fieldname LIKE "xyz%";
select * from tablename where fieldname LIKE "%zyx";
  1. 等于(“=”)运算符不允许在 SQL 中使用通配符 - 您需要使用 LIKE
  2. 查询完全不同
    • “xyz%”将返回以“xyz”开头的记录
    • “%xyz”将返回以“xyz”结尾的记录
  3. 假设fieldname列上存在索引,“%xyz”可以不使用索引 - 但是“ xyz%" 可以,这意味着它会更快。

在文本中查找子字符串的最快方法是使用全文搜索 (FTS) - Oracle 和 MySQL 都有自己的本机功能,并且还有 Sphinx 和 Solr 等第三方工具。

Yes, there is a difference between the following two queries:

select * from tablename where fieldname LIKE "xyz%";
select * from tablename where fieldname LIKE "%zyx";
  1. The equals ("=") operator doesn't allow wildcards in SQL - you need to use LIKE
  2. The queries are entirely different
    • "xyz%" will return records that start with "xyz"
    • "%xyz" will return records that end with "xyz"
  3. Assuming an index exists on the fieldname column, "%xyz" can not use the index - but"xyz%" could, which means it would be faster.

The fastest means of finding substrings within text is to use Full Text Search (FTS) - both Oracle and MySQL have their own native functionality, and there are 3rd party tools like Sphinx and Solr.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文