Zend_Search_Lucene改变词频问题

发布于 2024-09-10 10:34:55 字数 962 浏览 9 评论 0原文

我正在尝试更新 Lucene 索引中文档术语的搜索。目前，搜索根据该术语在文档中出现的次数进行评分。我想做的是如果该术语存在则评分，而不是该术语存在的次数。因此，包含该术语的文档一次得分与包含该术语的文档 100 次得分相同。

我尝试用我自己的类扩展 Zend_Search_Lucene_Search_Similarity，但说实话，我不确定这是否正常工作，因为分数仍然很低。

class MySimilarity extends Zend_Search_Lucene_Search_Similarity{

//override the default frequency of searching
public function tf($freq){
    return 1.0; 
}

public function lengthNorm($fieldName, $numTerms) {
    return 1.0/sqrt($numTerms);
}

public function queryNorm($sumOfSquaredWeights) {
    return 1.0/sqrt($sumOfSquaredWeights);
}

public function sloppyFreq($distance) {
    return 1.0;
}

public function idfFreq($docFreq, $numDocs) {
    return log($numDocs/(float)($docFreq+1)) + 1.0;
}

public function coord($overlap, $maxOverlap) {
    return $overlap/(float)$maxOverlap;
}
}

现在，这是根据我在搜索旧谷歌时发现的示例构建的。然而，我所做的唯一真正的改变是对 tf() 函数。

任何对此的帮助，我都会非常感激，因为目前它真的搞乱了我的搜索。

谢谢，

格兰特

原文

I am trying to update the searching of terms of documents within my Lucene index. Currently the searches score on the number of times the term appears in the document. What I would like to do is score if the term exists, rather than the number of times the term exists. So a document with the term in it once scores the same as a document with the term in it 100 times.

I've tried to extend the Zend_Search_Lucene_Search_Similarity with my own class, but to be honest I am not sure if this is working correctly as the scores are still quite low.

class MySimilarity extends Zend_Search_Lucene_Search_Similarity{

//override the default frequency of searching
public function tf($freq){
    return 1.0; 
}

public function lengthNorm($fieldName, $numTerms) {
    return 1.0/sqrt($numTerms);
}

public function queryNorm($sumOfSquaredWeights) {
    return 1.0/sqrt($sumOfSquaredWeights);
}

public function sloppyFreq($distance) {
    return 1.0;
}

public function idfFreq($docFreq, $numDocs) {
    return log($numDocs/(float)($docFreq+1)) + 1.0;
}

public function coord($overlap, $maxOverlap) {
    return $overlap/(float)$maxOverlap;
}
}

Now this is built from examples I have found when searching good old google. However the only real change I've done has been to the tf() function.

Any help with this and I would be really greatful as at the moment it's really messing up my searches.

Thanks,

Grant

分享到QQ

分享到微博