禁用 PostgreSQL 8.4 tsvector 解析器的“文件”标记类型

发布于 2024-08-16 12:36:13 字数 707 浏览 3 评论 0原文

我有一些包含诸如 radio/tested 之类的序列的文档,我希望在查询中返回命中,例如

select * from doc
where to_tsvector('english',body) @@ to_tsvector('english','radio')

不幸的是,默认解析器将 radio/tested 作为 file 令牌(尽管在 Windows 环境中),因此它与上面的查询不匹配。当我在其上运行 ts_debug 时,我看到它被识别为文件,并且词素最终被 radio/tested 而不是两个词素 无线电测试

有什么方法可以配置解析器不查找 file 标记吗?我尝试过

ALTER TEXT SEARCH CONFIGURATION public.english
    DROP MAPPING FOR file;

...但它没有改变 ts_debug 的输出。如果有某种方法可以禁用 file,或者至少让它能够识别 file 以及它认为构成目录名称的所有单词,或者如果有让它将斜杠视为连字符或空格(而不需要我自己使用 regexp_replace 来处理它们的性能影响),这将非常有帮助。

I have some documents that contain sequences such as radio/tested that I would like to return hits in queries like

select * from doc
where to_tsvector('english',body) @@ to_tsvector('english','radio')

Unfortunately, the default parser takes radio/tested as a file token (despite being in a Windows environment), so it doesn't match the above query. When I run ts_debug on it, that's when I see that it's being recognized as a file, and the lexeme ends up being radio/tested rather than the two lexemes radio and test.

Is there any way to configure the parser not to look for file tokens? I tried

ALTER TEXT SEARCH CONFIGURATION public.english
    DROP MAPPING FOR file;

...but it didn't change the output of ts_debug. If there's some way of disabling file, or at least having it recognize both file and all the words that it thinks make up the directory names along the way, or if there's a way to get it to treat slashes as hyphens or spaces (without the performance hit of regexp_replaceing them myself) that would be really helpful.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

懷念過去 2024-08-23 12:36:13

我认为做你想做的事情的唯一方法是创建你自己的解析器:-( 将 wparser_def.c 复制到一个新文件,从解析表(actionTPS_Base 及其后面的表)中删除条目与文件相关的(TPS_InFileFirstTPS_InFileNext 等),我认为主要的困难是使模块符合 PostgreSQL 的 C 习惯用法(PG_FUNCTION_INFO_V1)。 等)。请参阅 contrib/test_parser/ 示例。

I think the only way to do what you want is to create your own parser :-( Copy wparser_def.c to a new file, remove from the parse tables (actionTPS_Base and the ones following it) the entries that relate to files (TPS_InFileFirst, TPS_InFileNext etc), and you should be set. I think the main difficulty is making the module conform to PostgreSQL's C idiom (PG_FUNCTION_INFO_V1 and so on). Have a look at contrib/test_parser/ for an example.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文