无法从 postgre 全文搜索获得正确的结果
我正在用巴西葡萄牙语开发一个简单的文章网站。搜索功能基于全文搜索,但不会返回预期结果。
我在 postgresql 上做了这个。这是简化的表格:
Artigos
-id
-title -- article title
-intro -- article introduction
-content -- article body
-publishdate -- date of launch
-artigosts -- this will work as our fts index.
创建表格后,我运行:
UPDATE artigos SET artigosts =
setweight(to_tsvector('pg_catalog.portuguese', coalesce(title,'')), 'A') ||
setweight(to_tsvector('pg_catalog.portuguese', coalesce(intro,'')), 'B') ||
setweight(to_tsvector('pg_catalog.portuguese', coalesce(content,'')), 'C');
CREATE INDEX artigosts_idx ON artigos USING gist (artigosts);
CREATE TRIGGER artigosts_tg
BEFORE INSERT OR UPDATE ON artigos
FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger('artigosts', 'pg_catalog.portuguese', 'title', 'intro', 'content');
是的,我打算对搜索使用简单的加权。制作了一个索引来加速,一个触发器,这样我就可以插入和更新,而不必担心重新制作索引等。
嗯,根据我的理解,一切都很好。但结果并非如此。一个简单的例子。
假设我有“... banco de bados ... no banco ...” 作为一篇文章内容。当我这样做时:
SELECT title, intro, content FROM artigos WHERE plainto_tsquery('banco de dados') @@ artigosts;
它返回一个空集。我检查了 ts_vector 列,看到了谓词“banc”和“dad”。但我仍然不明白为什么它不返回包含上述文章的行。
有人可以阐明这个问题吗?
I'm developing a simple articles website in brazilian portuguese language. The search feature is based on a full-text search, but it isn't returning expected results.
I made this on postgresql. Here is the simplified table:
Artigos
-id
-title -- article title
-intro -- article introduction
-content -- article body
-publishdate -- date of launch
-artigosts -- this will work as our fts index.
After creating the table, I ran:
UPDATE artigos SET artigosts =
setweight(to_tsvector('pg_catalog.portuguese', coalesce(title,'')), 'A') ||
setweight(to_tsvector('pg_catalog.portuguese', coalesce(intro,'')), 'B') ||
setweight(to_tsvector('pg_catalog.portuguese', coalesce(content,'')), 'C');
CREATE INDEX artigosts_idx ON artigos USING gist (artigosts);
CREATE TRIGGER artigosts_tg
BEFORE INSERT OR UPDATE ON artigos
FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger('artigosts', 'pg_catalog.portuguese', 'title', 'intro', 'content');
Yes, I intend to use simple weightning on the searches. Made a index to speed up, a trigger, so I can insert and update without worry about remaking index and so on.
Well, per my understanding, it is everything fine. But results aren't. A simple example.
Let's say I have "... banco de dados ... no banco ... " as one article content. When I do:
SELECT title, intro, content FROM artigos WHERE plainto_tsquery('banco de dados') @@ artigosts;
It returns an empty set. I checked the ts_vector column and saw the predicates 'banc' and 'dad'. But I still can't understand why it does not return the row containing the mentioned article.
Can someone bring a light into this question?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
出现这种情况的原因可能是因为您的默认词典设置为英语。尝试以下查询以确定情况是否确实如此。
此查询将显示字典如何解析您的搜索短语。它应该提供词位“banco”、“de”和“dado”。因此,您实际搜索的内容不会存在于索引中,并且您将收到 0 个结果。
现在试试这个:
它应该返回索引中存在的词位“banc”和“dad”。如果是这种情况,那么您只需更改搜索查询即可获得适当的结果。
The reason for this is probably because your default dictionary is set to english. Try the following queries to determine if that is actually the case.
This query will show how the dictionary parses up your search phrase. It should provide the lexemes "banco", "de", and "dado". So what you're actually searching for will not exist in the index and you will receive 0 results.
Now try this:
It should return the lexemes that exist in the index, "banc" and "dad". If that is the case then you can simply change your search query to get the appropriate result.
对你来说可能已经太晚了。
而且我没有足够的声誉来添加简单的评论...
对于特殊字符,我在转换为 tsvector 之前取消了它们的重音符号。
所以我得到:
你需要:
... 作为用户 postgres。
当然,你也必须取消 tsquery 的重音
It's probably to late for you.
And I don't have enough reputations to add a simple comment...
With regard to the special characters I unaccent them before transforming to tsvector.
So I get:
You need to:
... as user postgres.
And sure, you must unaccent your tsquery also