全文索引与pattern_ops索引

发布于 2024-11-26 15:12:18 字数 1226 浏览 0 评论 0原文

我正在使用 django,所有查询都是由 django 创建的,所以我没有手写查询...

我有一个 BillRecords 表,其中有一个字段 subscriberno 。在我的 django 过滤器中,我使用如下过滤查询:

BillRecords.objects.filter(subscriberno__icontains='123456')

由于客户所说的 subscriberno 可能是真实数字的相当缩短的版本...

该过滤器输出如下查询:

SELECT "subscriberno" FROM "BillRecords" WHERE UPPER("subscriberno"::text) LIKE UPPER(E'%123456%');

subscriberno code> 是一个字符字段,因为某些数字包含字母和一些特殊字符。

在我的数据库中,我的同事为该列创建了两个索引。

"BillRecords_subscriberno" btree (subscriberno)
"BillRecords_fsubscriberno_like" btree (subscriberno varchar_pattern_ops)

我想知道对于这样的查询使用两个索引是否合乎逻辑。由于我们所有的 django 过滤器都使用 icontains 并且应该像我上面写的那样创建查询。

Postgres对查询的分析如下:

Seq Scan on BillRecords  (cost=0.00..159782.40 rows=370 width=15) (actual time=579.637..3705.079 rows=10 loops=1)
Filter: (upper((subscriberno)::text) ~~ '%123456%'::text)
Total runtime: 3705.106 ms
(3 rows)

所以,据我所知,没有使用索引。由于索引 usega 在数据插入和更新方面有成本,因此拥有两个没有使用的索引(据我从本分析中可以看出)似乎不合逻辑。

django 是否有可能为类似的 icontanis 过滤器输出不同的查询?或者我的索引完全没用?

I am using django, and all of my queries are created by django, so i have no handwritten queries...

I have a table of BillRecords, which has a field subscriberno . In my django filters, i use a filtering query like:

BillRecords.objects.filter(subscriberno__icontains='123456')

Since the subscriberno the customer said might be quite shortened version of the real number...

That filter outputs a query like:

SELECT "subscriberno" FROM "BillRecords" WHERE UPPER("subscriberno"::text) LIKE UPPER(E'%123456%');

subscriberno is a char field because some numbers contains alphas and some special chars.

On my database, i have two indexes for that column, created by my colleagues.

"BillRecords_subscriberno" btree (subscriberno)
"BillRecords_fsubscriberno_like" btree (subscriberno varchar_pattern_ops)

I am wondering using two indexes for a such query is logical. Since all of our django filter uses icontains and that supposed to be create queries like i write above.

Postgres analyse of the query is as follows:

Seq Scan on BillRecords  (cost=0.00..159782.40 rows=370 width=15) (actual time=579.637..3705.079 rows=10 loops=1)
Filter: (upper((subscriberno)::text) ~~ '%123456%'::text)
Total runtime: 3705.106 ms
(3 rows)

So, as far as i see, no index is used. Since index usega have costs in data insertion and update, having two indexes with no usage (as far as i can see from this analyse) seemed me not logical.

Is there any channce for django to output different queries for a similar icontanis filter? Or my indexes are totally useless?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

寂寞美少年 2024-12-03 15:12:18

您不能在未锚定的 Like 语句上使用索引。

upper(foo) like 'bar%' -- index on upper(foo)
upper(foo) like '%bar' -- no index
reverse(upper(foo)) like 'rab%' -- index on reverse(upper(foo))
upper(foo) like '%bar%' -- no index

但是,如果您想减少使用,您可能会发现 trigram contrib搜索窗口。

You cannot use an index on an unanchored like statement.

upper(foo) like 'bar%' -- index on upper(foo)
upper(foo) like '%bar' -- no index
reverse(upper(foo)) like 'rab%' -- index on reverse(upper(foo))
upper(foo) like '%bar%' -- no index

But you might find the trigram contrib of use, if you want to reduce the search window.

跨年 2024-12-03 15:12:18

包含(子字符串)查询无权访问索引(除非运算符链接到全文模块)。另一方面,开头查询可以从索引中受益。如果基数不太低并且插入通常不是大批量进行而是在 OLTP 场景中进行,则索引开销可以忽略不计。

我是否正确读取统计数据:扫描 370 行几乎需要 4 秒?

PS 您可能会考虑另一种方法:使用基于函数的索引,也许在连接到订阅者名称的前三个字符的最后四个字符上,并使用开头或等于而不是带有搜索的 LIKE -术语以通配符结尾。

Contains (substring) queries do not have access to indexes (unless the operator is linked to a full-text module). Starts-with queries on the other hand can benefit from indexes. Indexing overhead is negligible if the cardinality is not too low and inserts typically are not made in large batches but in an OLTP scenario.

Do I read the stats correctly: almost 4 seconds to scan 370 rows?

P.S. You might consider an alternative approach: using a function-based index, perhaps on the last four characters of subscriberno concatenated to, say, the first three characters of the subscribername, and using starts-with or equals instead of LIKE with the search-term bookended with wildcards.

吹泡泡o 2024-12-03 15:12:18

检查索引是否被使用的一种简单方法是查看

SELECT * FROM pg_stat_user_indexes;

如果所有查询都像您显示的那样,那么它们肯定不会被使用,因为该模式没有锚定。如果你想解决这个问题,你将不得不使用全文搜索、三元组或类似的东西来重新设计你的搜索。

A simple way to check if your indexes are used at all is to look at

SELECT * FROM pg_stat_user_indexes;

If all your queries are like the one you show, then they certainly won't be used, because the pattern is not anchored. If you want to address that, you will have to re-engineer your search a bit by using full-text search, trigrams, or something like that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文