Zend Search Lucene 数字通配符问题
我正在一个项目中使用 Zend Lucene Search 的实现,就像许多初学者一样,我立即意识到数字没有被索引。因此,通过一些搜索,我找到了如何更改分析器以包含数字:
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());
但是,尽管数字现在可以工作,但它们并没有被视为“文本”(即,根据 ($this->;因此,如果我尝试使用 20234*(甚至是“C13A*”——其中数字作为前面 3 个字符之一出现的任何内容)执行通配符搜索,则会出现异常返回:
“在通配符.php 中,模式开头至少需要 3 个非通配符”...
我上次检查时,数字不是通配符!
我看到其他一些人使用上面的分析器修复来允许对数字进行索引,并且他们在搜索中没有这个问题。输入 20234* 实际上适用于他们的情况。
不幸的是,似乎没有人知道如何解决/改变这种行为,而且我读了很多搜索内容才意识到我肯定需要这方面的帮助。
我尝试的另一件事是简单地将要求(在 wildcard.php 中)更改为“0”,这消除了该错误(尽管以一种不好的方式),但带来了一个新错误:
... wildcard.php 中的“已达到每个查询的字词限制”
即使每个数字都被视为一个单独的术语,我也不认为 20234* 会如何违反查询限制。
因此,我当然可以现在更改 $maxTerms 变量,但显然这不是一个解决方案,并且可能会产生操作问题/更多错误。
I am using an implementation of Zend Lucene Search for a project, and like many beginners realized straight off that numbers weren't indexed. So with some searching, I figured out how to change the analyzer to include numbers using:
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());
But although numbers are working now, they are not being seen as 'text' (that is, by definition of ($this->_pattern->text
). So if I try to perform a wildcard search using, say, 20234* (or even 'C13A*' - anything in which a number appears as one of the 3 preceding characters) an exception is returned:
'At least 3 non-wildcard characters are required at the beginning of pattern' ... in wildcard.php
The last time I checked, numbers are NOT wildcard characters!
I have seen some others using the analyzer fix above to allow numbers to be indexed, and they do not have this problem in their search. Entering 20234* actually works in their case.
Unfortunately, nobody seems to know how to troubleshoot/change this behavior, and I have read through a lot of search content only to realize that I definitely need help with this one.
One other thing I tried was to simply change the requirement (in wildcard.php) to '0', which eliminates that error (albeit in a bad way), but brings up a new one:
'Terms per query limit is reached' in ... wildcard.php
Even if each number is being treated as a separate term, I don't see how 20234* could breach a query limit.
So of course I could change the $maxTerms variable now, but obviously this is not a solution, and likely would create operational issues / more errors.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论