Zend Search Lucene 数字通配符问题

发布于 2024-11-03 10:44:25 字数 875 浏览 1 评论 0原文

我正在一个项目中使用 Zend Lucene Search 的实现,就像许多初学者一样,我立即意识到数字没有被索引。因此,通过一些搜索,我找到了如何更改分析器以包含数字:

Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());

但是,尽管数字现在可以工作,但它们并没有被视为“文本”(即,根据 ($this->;因此,如果我尝试使用 20234*(甚至是“C13A*”——其中数字作为前面 3 个字符之一出现的任何内容)执行通配符搜索,则会出现异常返回:

“在通配符.php 中,模式开头至少需要 3 个非通配符”...

我上次检查时,数字不是通配符!

我看到其他一些人使用上面的分析器修复来允许对数字进行索引,并且他们在搜索中没有这个问题。输入 20234* 实际上适用于他们的情况。

不幸的是,似乎没有人知道如何解决/改变这种行为,而且我读了很多搜索内容才意识到我肯定需要这方面的帮助。

我尝试的另一件事是简单地将要求(在 wildcard.php 中)更改为“0”,这消除了该错误(尽管以一种不好的方式),但带来了一个新错误:

... wildcard.php 中的“已达到每个查询的字词限制”

即使每个数字都被视为一个单独的术语,我也不认为 20234* 会如何违反查询限制。

因此,我当然可以现在更改 $maxTerms 变量,但显然这不是一个解决方案,并且可能会产生操作问题/更多错误。

I am using an implementation of Zend Lucene Search for a project, and like many beginners realized straight off that numbers weren't indexed. So with some searching, I figured out how to change the analyzer to include numbers using:

Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());

But although numbers are working now, they are not being seen as 'text' (that is, by definition of ($this->_pattern->text). So if I try to perform a wildcard search using, say, 20234* (or even 'C13A*' - anything in which a number appears as one of the 3 preceding characters) an exception is returned:

'At least 3 non-wildcard characters are required at the beginning of pattern' ... in wildcard.php

The last time I checked, numbers are NOT wildcard characters!

I have seen some others using the analyzer fix above to allow numbers to be indexed, and they do not have this problem in their search. Entering 20234* actually works in their case.

Unfortunately, nobody seems to know how to troubleshoot/change this behavior, and I have read through a lot of search content only to realize that I definitely need help with this one.

One other thing I tried was to simply change the requirement (in wildcard.php) to '0', which eliminates that error (albeit in a bad way), but brings up a new one:

'Terms per query limit is reached' in ... wildcard.php

Even if each number is being treated as a separate term, I don't see how 20234* could breach a query limit.

So of course I could change the $maxTerms variable now, but obviously this is not a solution, and likely would create operational issues / more errors.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文