优化 Oracle 查询以查找子字符串匹配的好方法是什么?

发布于 2024-08-06 12:30:08 字数 277 浏览 3 评论 0原文

我在非分区 Oracle 表中有一列定义为 VARCHAR2(50);该列有一个标准的 B 树索引。我想知道是否有最佳方法来查询此列以确定它是否包含给定值。这是当前的查询:

SELECT * FROM my_table m WHERE m.my_column LIKE '%'||v_value||'%';

我查看了 Oracle Text,但这对于这么小的列来说似乎有点过分了。但是,该表中有数百万条记录,因此查找子字符串匹配所花费的时间比我想要的要多。有更好的办法吗?

I have a column in a non-partitioned Oracle table defined as VARCHAR2(50); the column has a standard b-tree index. I was wondering if there is an optimal way to query this column to determine whether it contains a given value. Here is the current query:

SELECT * FROM my_table m WHERE m.my_column LIKE '%'||v_value||'%';

I looked at Oracle Text, but that seems like overkill for such a small column. However, there are millions of records in this table so looking for substring matches is taking more time than I'd like. Is there a better way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

七分※倦醒 2024-08-13 12:30:08

不。

该查询是表扫描。如果 v_value 是一个实际的单词,那么您可能非常想查看 Oracle Text 或您自己推出的简单倒排索引方案。但就目前情况而言,这太可怕了。

No.

That query is a table scan. If v_value is an actual word, then you may very well want to look at Oracle Text or a simple inverted index scheme you roll your on your own. But as is, it's horrible.

深居我梦 2024-08-13 12:30:08

Oracle Text 涵盖了许多不同的方法,但并非所有方法都是重量级的。由于您的列非常小,您可以使用 CTXCAT 索引对其进行索引。

SELECT * FROM my_table m 
WHERE catsearch(m.my_column, v_value, null) > 0
/

与其他类型的文本索引不同,CTXCAT 索引是事务性的,因此不需要同步。这样的索引会消耗大量的空间,而且为了提高性能你必须付出一些代价。

了解更多信息。

Oracle Text covers a number of different approaches, not all of them heavyweight. As your column is quite small you could index it with a CTXCAT index.

SELECT * FROM my_table m 
WHERE catsearch(m.my_column, v_value, null) > 0
/

Unlike the other type of Text index, CTXCAT indexes are transactional, so they do not require synchronisation. Such indexes consume a lot of space, but that you have to pay some price for improved performance.

Find out more.

请止步禁区 2024-08-13 12:30:08

你有三个选择:

重新定义问题的最简单方法是指定列必须以搜索词开头(因此丢失第一个 %),然后它将使用索引。

另一种方法是说搜索从单词边界开始(因此“est”将匹配“estimate”,但不匹配“test”)。 MySQL (MyISAM) 和 SQL Server 都有可以进行这样的匹配的函数。不确定 Oracle 是否这样做。如果没有,您可以创建一个要搜索的单词查找表而不是列本身,并且可以在触发器上填充该表。

You have three choices:

The simplest way to redefine the problem is to say the column has to start with the search term (so lose the first %), which will then use the index.

An alternative way is to say that the search starts on word boundaries (so "est" will match "estimate" but not "test"). MySQL (MyISAM) and SQL Server have functions that will do matching like this. Not sure if Oracle does. If it doesn't you could create a lookup table of words to search instead of the column itself and you could populate that table on a trigger.

左秋 2024-08-13 12:30:08

您可以使用 REGEXP_LIKE 函数在列上放置基于函数的索引。您可能需要使用 case 语句创建 fbi,以返回匹配的“1”,因为布尔返回函数在 fbi 中似乎无效。

这是一个例子。

创建索引:

CREATE INDEX regexp_like_on_myCol ON my_table (
      CASE WHEN REGEXP_LIKE(my_column, '[static exp]', 'i') 
           THEN 1
           END);

然后使用它,而不是:

SELECT * FROM my_table m WHERE m.my_column LIKE '%'||v_value||'%';

您将需要执行如下所示的查询:

SELECT * FROM my_table m WHERE (
      CASE WHEN REGEXP_LIKE(m.my_column, '[static exp]', 'i')
           THEN 1
           END) IS NOT NULL;

此方法的一个显着缺点是您需要在创建时知道您的“[static exp]”你的索引。如果您希望在执行即席查询时提高性能,那么这可能不适合您。

不过,正如函数名称所示,一个好处是您有机会使用正则表达式创建此索引,这最终可能是一个强大的工具。评估命中将在项目添加到表中时进行,而不是在搜索期间进行。

You could put a function-based index on the column, using the REGEXP_LIKE function. You might need to create the fbi with a case statement to return '1' with a match, as boolean returning functions dont seem to be valid in fbi.

Here is an example.

Create the index:

CREATE INDEX regexp_like_on_myCol ON my_table (
      CASE WHEN REGEXP_LIKE(my_column, '[static exp]', 'i') 
           THEN 1
           END);

And then to use it, instead of:

SELECT * FROM my_table m WHERE m.my_column LIKE '%'||v_value||'%';

you will need to perform a query like the following:

SELECT * FROM my_table m WHERE (
      CASE WHEN REGEXP_LIKE(m.my_column, '[static exp]', 'i')
           THEN 1
           END) IS NOT NULL;

A significant shortcomming in this approach is that you will need to know your '[static exp]' at the time that you create your index. If you are looking for a performance increase while performing ad hoc queries, this might not be the solution for you.

A bonus though, as the function name indicates, is that you have the opportunity to create this index using regex, which could be a powerful tool in the end. The evaluation hit will be taken when items are added to the table, not during the search.

血之狂魔 2024-08-13 12:30:08

您可以尝试 INSTR

...WHERE INSTR(m.my_column, v_value) > 0

我无权访问 Oracle 来测试&看看它是否比使用通配符的 LIKE 更快。

You could try INSTR:

...WHERE INSTR(m.my_column, v_value) > 0

I don't have access to Oracle to test & find out if it is faster than LIKE with wildcarding.

人│生佛魔见 2024-08-13 12:30:08

对于最常见的情况,您事先不知道要搜索的字符串,那么您希望的最佳访问路径是快速完整索引扫描。您必须集中精力使索引尽可能小,这当然可能有其自身的问题,并且如果数据的基数不是很高,则可以查看压缩索引。

For the most generic case where you do not know in advance the string you are searching for then the best access path you can hope for is a fast full index scan. You'd have to focus on keeping the index as small as possible, which might have it's own problems of course, and could look at a compressed index if the data is not very high cardinality.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文