Zend_lucene 搜索带重音符号
我正在为一个法国网站开发一个搜索引擎,使用 Zend_Search_Lucene 作为独立组件。在 Windows 上的本地网络服务器 (WAMP) 上一切都运行良好,但带重音词的搜索(例如:géographie)在我的生产服务器(在 Unix 上运行)上不起作用。
我在 Linux 上生成了索引,重音词被正确索引。
我尝试使用分析器的参数强制进行编码,使用 utf8_encode 转换查询字符串。 但我仍然无法让它发挥作用。
我使用这些参数调用 Lucene:
Zend_Search_Lucene_Search_QueryParser::setDefaultOperator(Zend_Search_Lucene_Search_QueryParser::B_AND);
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive());
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
$index = Zend_Search_Lucene::open($cheminIndexes);
$resultats = $index->find(Zend_Search_Lucene_Search_QueryParser::parse(utf8_encode($_POST['recherche'])));
此代码返回所有非重音单词,但它不会返回任何重音单词,尽管这些单词已建立索引。 这很令人沮丧,因为我不明白为什么它在 Windows 上工作,我觉得我在某处缺少一层编码,但我在谷歌上找不到任何有关此的信息。
I'm working on a search engine for a French website with Zend_Search_Lucene as a standalone component. Everything works well on my local webserver (WAMP) on windows, but the search with accented words (like: géographie) don't work on my production server (which is running on Unix).
I generated the index on Linux, the accented words are indexed correctly.
See a screenshot of my generated index here
I tried to force the encoding with the parameters of the analyser, convert the query string with utf8_encode.
But i still can't get it works.
I call Lucene with those parameters:
Zend_Search_Lucene_Search_QueryParser::setDefaultOperator(Zend_Search_Lucene_Search_QueryParser::B_AND);
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive());
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
$index = Zend_Search_Lucene::open($cheminIndexes);
$resultats = $index->find(Zend_Search_Lucene_Search_QueryParser::parse(utf8_encode($_POST['recherche'])));
This code returns all the non-accented words, but it don't returns any of my accented words although those words are indexed.
It's frustrating because i don't understand why it works on windows, i feel i'm missing a layer of encoding somewhere but i can't find any information about this on google.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我的网站设置与您的选项完全相同(不敏感、utf-8、AND)。但是,我曾经通过以下方式创建索引对象:
而不是通过代理(如您的情况通过
Zend_Search_Lucene::open
,但这应该没有任何区别)。另外,我只是将查询(经过简短的健全性检查后)直接传递到索引(不进行解析):
I have a site setup with the exact same options as yours (insensitive, utf-8, AND). However, I used to create the index object via:
and not through the proxy (as in your case via
Zend_Search_Lucene::open
, but that should not make any difference).Also I just pass the query (after a short sanity check), directly to the index (without parsing):