禁用 PostgreSQL 8.4 tsvector 解析器的“文件”标记类型
我有一些包含诸如 radio/tested
之类的序列的文档,我希望在查询中返回命中,例如
select * from doc
where to_tsvector('english',body) @@ to_tsvector('english','radio')
不幸的是,默认解析器将 radio/tested
作为 file
令牌(尽管在 Windows 环境中),因此它与上面的查询不匹配。当我在其上运行 ts_debug 时,我看到它被识别为文件,并且词素最终被 radio/tested
而不是两个词素 无线电
和测试
。
有什么方法可以配置解析器不查找 file
标记吗?我尝试过
ALTER TEXT SEARCH CONFIGURATION public.english
DROP MAPPING FOR file;
...但它没有改变 ts_debug 的输出。如果有某种方法可以禁用 file
,或者至少让它能够识别 file
以及它认为构成目录名称的所有单词,或者如果有让它将斜杠视为连字符或空格(而不需要我自己使用 regexp_replace 来处理它们的性能影响),这将非常有帮助。
I have some documents that contain sequences such as radio/tested
that I would like to return hits in queries like
select * from doc
where to_tsvector('english',body) @@ to_tsvector('english','radio')
Unfortunately, the default parser takes radio/tested
as a file
token (despite being in a Windows environment), so it doesn't match the above query. When I run ts_debug
on it, that's when I see that it's being recognized as a file, and the lexeme ends up being radio/tested
rather than the two lexemes radio
and test
.
Is there any way to configure the parser not to look for file
tokens? I tried
ALTER TEXT SEARCH CONFIGURATION public.english
DROP MAPPING FOR file;
...but it didn't change the output of ts_debug
. If there's some way of disabling file
, or at least having it recognize both file
and all the words that it thinks make up the directory names along the way, or if there's a way to get it to treat slashes as hyphens or spaces (without the performance hit of regexp_replace
ing them myself) that would be really helpful.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为做你想做的事情的唯一方法是创建你自己的解析器:-( 将 wparser_def.c 复制到一个新文件,从解析表(
actionTPS_Base
及其后面的表)中删除条目与文件相关的(TPS_InFileFirst
、TPS_InFileNext
等),我认为主要的困难是使模块符合 PostgreSQL 的 C 习惯用法(PG_FUNCTION_INFO_V1)。
等)。请参阅contrib/test_parser/
示例。I think the only way to do what you want is to create your own parser :-( Copy wparser_def.c to a new file, remove from the parse tables (
actionTPS_Base
and the ones following it) the entries that relate to files (TPS_InFileFirst
,TPS_InFileNext
etc), and you should be set. I think the main difficulty is making the module conform to PostgreSQL's C idiom (PG_FUNCTION_INFO_V1
and so on). Have a look atcontrib/test_parser/
for an example.