扩展/改变 Zend_Search_Lucene 的搜索方式

发布于 2024-09-01 10:27:10 字数 609 浏览 8 评论 0原文

我目前正在使用 Zend_Search_Lucene 来索引和搜索当前大约 1000 个左右的文档。我想做的是改变引擎对文档的点击率的评分方式,而不是当前的默认值。

Zend_Search_Lucene 根据文档内的命中次数频率进行评分,因此具有 10 个单词 PHP 匹配的文档将比仅具有 3 个 PHP 匹配的文档得分更高。我想做的是传递一些关键词,并根据这些关键词的点击率进行评分。例如,

我传递了 5 个关键字,PHPMySQLJavascriptHTMLCSS 我根据索引进行搜索。一份文档有 3 个与这些关键词的匹配,一份文档有全部 4 个匹配,这 4 个匹配得分最高。文件中这些词出现的次数与我无关。

现在我已经快速浏览了 Zend_Search_Lucene_Search_Similarity 但我必须承认我不确定(或者不太聪明)知道如何使用它来实现我所追求的目标。

我想要使​​用 Lucene 做的事情是否可行,或者是否有更好的解决方案?

I am currently using Zend_Search_Lucene to index and search a number of documents currently at around a 1000 or so. What I would like to do is change how the engine scores hits on a document, from the current default.

Zend_Search_Lucene scores on the frequency of number of hits within a document, so a document that has 10 matches of the word PHP will score higher than a document with only 3 matches of PHP. What I am trying to do is pass a number of key words and score depending on the hits of those keywords. e.g.

I pass 5 key words say,PHP, MySQL, Javascript, HTML and CSS that I search against the index. One document has 3 matches to those key words and one document has all 4 matches, the 4 matches scores the highest. The number of instances of those words in the document do not concern me.

Now I've had a quick look at Zend_Search_Lucene_Search_Similarity however I have to confess that I am not sure (or that bright) to know how to use this to achieve what I am after.

Is what I want to do possible using Lucene or is there a better solution out there?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

合久必婚 2024-09-08 10:27:10

对于我在 Zend_Search_Lucene_Search_Similarity 手册的部分,我首先扩展默认的相似性类来覆盖 tf(术语频率)方法,这样它就不会改变分数:

class MySimilarity extends Zend_Search_Lucene_Search_Similarity {    
    public function tf($freq) {
        return 1.0; // overriding default sqrt($freq);
    }
}

这样匹配的数量应该不予考虑。你认为这就足够了吗?

然后,在索引之前将其设置为默认相似度算法:

Zend_Search_Lucene_Search_Similarity::setDefault(new MySimilarity());

For what I've understood in the Zend_Search_Lucene_Search_Similarity section of the manual, I'd start by extending the default similarity class to override the tf (term frequency) method so that it doesn't alter the score:

class MySimilarity extends Zend_Search_Lucene_Search_Similarity {    
    public function tf($freq) {
        return 1.0; // overriding default sqrt($freq);
    }
}

This way the number of matches shouldn't be taken into account. Do you think this would be enough?

Then, set it to be the default similarity algorithm before indexing:

Zend_Search_Lucene_Search_Similarity::setDefault(new MySimilarity());
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文