无法从 postgre 全文搜索获得正确的结果

发布于 2024-09-27 20:14:39 字数 1254 浏览 12 评论 0原文

我正在用巴西葡萄牙语开发一个简单的文章网站。搜索功能基于全文搜索，但不会返回预期结果。

我在 postgresql 上做了这个。这是简化的表格：

Artigos
-id
-title -- article title
-intro -- article introduction
-content -- article body
-publishdate -- date of launch
-artigosts -- this will work as our fts index.

创建表格后，我运行：

UPDATE artigos SET artigosts = 
setweight(to_tsvector('pg_catalog.portuguese', coalesce(title,'')), 'A') || 
setweight(to_tsvector('pg_catalog.portuguese', coalesce(intro,'')), 'B') ||
setweight(to_tsvector('pg_catalog.portuguese', coalesce(content,'')), 'C');

CREATE INDEX artigosts_idx ON artigos USING gist (artigosts);

CREATE TRIGGER artigosts_tg 
BEFORE INSERT OR UPDATE ON artigos 
FOR EACH ROW EXECUTE PROCEDURE 
  tsvector_update_trigger('artigosts', 'pg_catalog.portuguese', 'title', 'intro', 'content');

是的，我打算对搜索使用简单的加权。制作了一个索引来加速，一个触发器，这样我就可以插入和更新，而不必担心重新制作索引等。

嗯，根据我的理解，一切都很好。但结果并非如此。一个简单的例子。

假设我有“... banco de bados ... no banco ...” 作为一篇文章内容。当我这样做时：

SELECT title, intro, content FROM artigos WHERE plainto_tsquery('banco de dados') @@ artigosts;

它返回一个空集。我检查了 ts_vector 列，看到了谓词“banc”和“dad”。但我仍然不明白为什么它不返回包含上述文章的行。

有人可以阐明这个问题吗？

原文

I'm developing a simple articles website in brazilian portuguese language. The search feature is based on a full-text search, but it isn't returning expected results.

I made this on postgresql. Here is the simplified table:

Artigos
-id
-title -- article title
-intro -- article introduction
-content -- article body
-publishdate -- date of launch
-artigosts -- this will work as our fts index.

After creating the table, I ran:

UPDATE artigos SET artigosts = 
setweight(to_tsvector('pg_catalog.portuguese', coalesce(title,'')), 'A') || 
setweight(to_tsvector('pg_catalog.portuguese', coalesce(intro,'')), 'B') ||
setweight(to_tsvector('pg_catalog.portuguese', coalesce(content,'')), 'C');

CREATE INDEX artigosts_idx ON artigos USING gist (artigosts);

CREATE TRIGGER artigosts_tg 
BEFORE INSERT OR UPDATE ON artigos 
FOR EACH ROW EXECUTE PROCEDURE 
  tsvector_update_trigger('artigosts', 'pg_catalog.portuguese', 'title', 'intro', 'content');

Yes, I intend to use simple weightning on the searches. Made a index to speed up, a trigger, so I can insert and update without worry about remaking index and so on.

Well, per my understanding, it is everything fine. But results aren't. A simple example.

Let's say I have "... banco de dados ... no banco ... " as one article content. When I do:

SELECT title, intro, content FROM artigos WHERE plainto_tsquery('banco de dados') @@ artigosts;

It returns an empty set. I checked the ts_vector column and saw the predicates 'banc' and 'dad'. But I still can't understand why it does not return the row containing the mentioned article.

Can someone bring a light into this question?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初与友歌 2024-10-04 20:14:39

出现这种情况的原因可能是因为您的默认词典设置为英语。尝试以下查询以确定情况是否确实如此。

SELECT * FROM ts_debug('banco de dados');

此查询将显示字典如何解析您的搜索短语。它应该提供词位“banco”、“de”和“dado”。因此，您实际搜索的内容不会存在于索引中，并且您将收到 0 个结果。

现在试试这个：

SELECT * FROM ts_debug('portuguese', 'banco de dados');

它应该返回索引中存在的词位“banc”和“dad”。如果是这种情况，那么您只需更改搜索查询即可获得适当的结果。

SELECT title, intro, content FROM artigos WHERE plainto_tsquery('portuguese', 'banco de dados') @@ artigosts;

The reason for this is probably because your default dictionary is set to english. Try the following queries to determine if that is actually the case.

SELECT * FROM ts_debug('banco de dados');

This query will show how the dictionary parses up your search phrase. It should provide the lexemes "banco", "de", and "dado". So what you're actually searching for will not exist in the index and you will receive 0 results.

Now try this:

SELECT * FROM ts_debug('portuguese', 'banco de dados');

It should return the lexemes that exist in the index, "banc" and "dad". If that is the case then you can simply change your search query to get the appropriate result.

SELECT title, intro, content FROM artigos WHERE plainto_tsquery('portuguese', 'banco de dados') @@ artigosts;

回复收藏 0 原文

最好是你 2024-10-04 20:14:39

对你来说可能已经太晚了。
而且我没有足够的声誉来添加简单的评论...

对于特殊字符，我在转换为 tsvector 之前取消了它们的重音符号。

SELECT to_tsvector('simple',unaccent('banco de dados áéóíúçãõ'))

所以我得到：

"'aeoiucao':4 'banco':1 'dados':3 'de':2"

你需要：

CREATE EXTENSION unaccent;

... 作为用户 postgres。
当然，你也必须取消 tsquery 的重音

It's probably to late for you.
And I don't have enough reputations to add a simple comment...

With regard to the special characters I unaccent them before transforming to tsvector.

SELECT to_tsvector('simple',unaccent('banco de dados áéóíúçãõ'))

So I get:

"'aeoiucao':4 'banco':1 'dados':3 'de':2"

You need to:

CREATE EXTENSION unaccent;

... as user postgres.
And sure, you must unaccent your tsquery also

回复收藏 0 原文

~没有更多了~

关于作者

奶茶白久

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

无法从 postgre 全文搜索获得正确的结果

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

lylex099819

yg

mb_PT8LkUS5

埋情葬爱

佚名

奢望

友情链接

无法从 postgre 全文搜索获得正确的结果

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

lylex099819

yg

mb_PT8LkUS5

埋情葬爱

佚名

奢望

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。