快速 ESP 字符标准化

发布于 2024-08-07 07:04:09 字数 525 浏览 15 评论 0原文

我正在 FAST ESP 服务器上运行搜索应用程序。现在我遇到了字符标准化的问题。

我想要的是搜索“wurth”并在“würth”中获得点击。

我尝试在 esp/etc/tokenizer/tokenization.xml 中配置以下内容，

 <normalizationlist name="German to Norwegian">
   <normalization description="German u with diaeresis, to Norwegian u">
      <input>x75</input> 
      <output>xFC</output> 
      <output>x75</output>
   </normalization>
  </normalizationlist>

但当然，这会将所有 u 转换为 ü，这是无用的。

我该如何正确配置？

原文

I'm running a search application on a FAST ESP server. Now I have this problem with character normalization.

What I want is to search for 'wurth' and get a hit in 'würth'.

i've tried configuring the following in esp/etc/tokenizer/tokenization.xml

 <normalizationlist name="German to Norwegian">
   <normalization description="German u with diaeresis, to Norwegian u">
      <input>x75</input> 
      <output>xFC</output> 
      <output>x75</output>
   </normalization>
  </normalizationlist>

but of cours, this translate all u to ü, which is useless.

How do I configure this the right way?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凉月流沐 2024-08-14 07:04:09

解决方案是将每个“特殊字符”规范化为相同的“普通字符”；

ö ->哦
ø->哦
å ->一个
ä ->一个
æ-> a

这有点耗时，但很有效！

回复收藏 0 原文

枯寂 2024-08-14 07:04:09

阅读高级物流指南。它包含关于字符规范化的一章。当您按照指南中的步骤操作时，所有特殊字符都将被视为普通字符。因此，搜索 uber 将给出与搜索 uber 相同的结果。

回复收藏 0 原文

哑 2024-08-14 07:04:09

您还可以安装 MS 支持提供的自定义词典，然后可以提供每种语言的词典。因此，如果您安装德语，那么搜索引擎将通过“您的意思是”功能了解您要搜索的内容。安装词典后，您可以启用搜索查询。另外，不要忘记使用正确的字符编码正确设置搜索架构以支持多语言。如果集合中的文档未使用正确的字符编码进行索引，您在标记化和查询结束时所做的任何努力都是无用的。

回复收藏 0 原文

~没有更多了~