PostgreSQL 的全文索引可以索引 1 个或 2 个字符的术语吗?
当执行如下查询时:
select count(*) from myTextTable where tsv @@ plainto_tsquery('english', 'TERM');
我注意到当 TERM 长度为 1 或 2 个字符时,PostgreSQL 不使用 GIN 索引(我在 tsv 列上定义); 3 个或更多字符即可正常工作。
据我所知,通过索引 1 或 2 个字符术语,索引的大小将大大增加,但快速检索包含特定 1 或 2 个字符术语的文本对于我正在开发的应用程序至关重要。
是否有一些全文搜索配置参数可以索引 1 个或 2 个字符的术语?
When performing a query like:
select count(*) from myTextTable where tsv @@ plainto_tsquery('english', 'TERM');
I've noticed that PostgreSQL does not use the GIN index (that I defined on the tsv column) when TERM is 1 or 2 characters long; 3 or more characters work fine.
I understand that by indexing 1 or 2 character terms, the size of the index will increase vastly but retrieving texts containing specific 1 or 2 character terms in a fast way is essential for the application I'm developing.
Is there some full text search configuration parameter to index 1- or 2-character terms?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
前段时间,我编写了自己的 to_tsquery() 和 to_tsvector() 方法(用 Python),因为
我想要更多的控制权。 AFAIK 过滤发生在 plainto_tsquery() 中。如果替换此方法,您也可以索引单个字符。
Some time ago, I wrote my own to_tsquery() and to_tsvector() methods (in Python), since
I wanted more control. AFAIK the filtering happens in plainto_tsquery(). If you replace this method, you can index single characters, too.
这个问题现在已经通过 (a) 从页面中删除大量嘈杂的文本(使用语言检测)和 (b) 删除/重新创建 GIN 索引得到解决。我的猜测是,嘈杂的文本导致词位数量激增,并且索引变得不可用,或者被查询规划器分类为此类。 –
This issue has been solved now by (a) removing lots of noisy text from the pages (using language detection) and (b) dropping/re-creating the GIN index. My guess is that the noisy text caused an explosion in the number of lexemes and that the index became unusable, or was classified as such by the query planner. –