postgres 数据库中文本字段的合适索引是什么?
我有一个数据库,用于存储来自各种 SCR 的 Code Chekins 的详细信息。此数据库中的一个表存储每次签入的提交评论。我正在尝试开发一个搜索功能,借助 Postgres posix 表示法搜索此表,尝试匹配此评论字段上的正则表达式并返回所有匹配的内容。
我已经让它工作了,但这里的主要问题是这个搜索的性能。对于相当大的数据库,完成搜索几乎需要 15-20 分钟,并且由于它是等待结果的 Web 前端,这对于中型数据库来说是完全不可接受的时间。 我认为在此文本字段上创建索引可能会有所帮助,但我无法创建 btree 索引,因为某些行的数据太大,potgres 无法在其上创建索引。
对此还有其他解决方案吗?是否还可以创建任何其他不依赖于语言的索引?
I have a database that stores details of Code Chekins from various SCRs. One of the table in this database store Commit Comments for each checkin. I am trying to develop a search feature which with the help of Postgres posix notation searches through this table trying to match a regular expression on this comment field and return all the matched.
I have already got this to work, but the main problem here is the performance of this search. For a fairly big database it almost takes 15-20 mins for a search to complete and as its a web frontend waiting for the result this is totally unacceptable time for a medium sized database.
I figured that creating an index on this text field might help but I am unable to create a btree index because data for some of the rows is too big for potgres to create index on it.
Is there any other solution to this? Are there any other indexes that can be created which again should not be language dependent?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
查看全文搜索功能,正则表达式不能使用索引。
Check the full text search functions, regular expressions can't use indexes.
现在,您可以使用 pg_trgm 扩展。
文档:
http://www.postgresql.org/docs/9.1/static/pgtrgm .html
良好的起点:
http://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/
Now, you can use pg_trgm extension.
Documentation:
http://www.postgresql.org/docs/9.1/static/pgtrgm.html
Good start point:
http://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/
是的,全文搜索就是您的答案。 PostgreSQL 具有相当强大且快速的 FTS 功能。
Yeah, Full Text Searching is your answer here. PostgreSQL has a pretty robust and fast FTS capability.
其他人提到了全文搜索。如果您需要正则表达式而不是全文搜索,则无法以通用方式对它们进行索引。只要表达式锚定在字符串的开头(在开头使用 ^ ),通常就可以使用索引,但对于通用正则表达式,无法使用索引来搜索它们。
Others have mentioned full text searching. If you need regular expressions rather than full text searching, there is no way to index them in a generic way. As long as the expression is anchored at the beginning of the string (using ^ at the start), an index can usually be used, but for generic regular expressions, there is no way to use an index for searching them.
使用 pg_trgm 扩展
然后你可以为字段
name
创建索引,就像这个索引将用于搜索一样
use pg_trgm extension
then you can create index for field
name
likethis index will be used for search like