如何计算文档字段中特殊术语的频率？

发布于 2024-11-02 05:21:01 字数 611 浏览 7 评论 0原文

我只是想知道Lucene如何做到这一点，从源代码中我知道它在使用IndexReader初始化搜索器时打开并加载段文件，但是有没有好心人告诉我Lucene如何计算文档中的术语频率特殊领域。有什么特殊的算法吗？在阅读 tf 上的解释代码时我无法弄清楚，例如：

Explanation tfExplanation = new Explanation();
  int d = scorer.advance(doc);
  float phraseFreq = (d == doc) ? scorer.currentFreq() : 0.0f;
  tfExplanation.setValue(similarity.tf(phraseFreq));
  tfExplanation.setDescription("tf(phraseFreq=" + phraseFreq + ")");

Idf>0，但是为什么代码中的phreqFreq是0.0，我知道这是因为 (d == doc) 为假，因为 d=Integer .MAX_VALUE，我不知道为什么以及问题是什么。

我只有一个文档，只有一个字段，该文档被索引和存储，并且在调试代码中使用的文档是1，例如searcher.explan(booleanQuery,1);

原文

I just wonder how Lucene can make it,and from the source code I know that it opens and loads the segment files when intializing a searcher with a IndexReader,but Is there any kind person tell me how Lucene calculates the term frequency in a document with special field.
Is there any special algorithm? I can not figure it out when reading the explan code on tf ,like:

Explanation tfExplanation = new Explanation();
  int d = scorer.advance(doc);
  float phraseFreq = (d == doc) ? scorer.currentFreq() : 0.0f;
  tfExplanation.setValue(similarity.tf(phraseFreq));
  tfExplanation.setDescription("tf(phraseFreq=" + phraseFreq + ")");

the Idf>0,but why phraseFreq in the code is 0.0,and I know it is because (d == doc) is false,because the d=Integer.MAX_VALUE,I don't know why and what is the problem.

I have only one document with one field,which is indexed and stored,and the doc which is used in the debug code is 1,like searcher.explan(booleanQuery,1);

分享到QQ

分享到微博