如何计算文档字段中特殊术语的频率?
我只是想知道Lucene如何做到这一点,从源代码中我知道它在使用IndexReader初始化搜索器时打开并加载段文件,但是有没有好心人告诉我Lucene如何计算文档中的术语频率特殊领域。 有什么特殊的算法吗?在阅读 tf 上的解释代码时我无法弄清楚,例如:
Explanation tfExplanation = new Explanation();
int d = scorer.advance(doc);
float phraseFreq = (d == doc) ? scorer.currentFreq() : 0.0f;
tfExplanation.setValue(similarity.tf(phraseFreq));
tfExplanation.setDescription("tf(phraseFreq=" + phraseFreq + ")");
Idf>0,但是为什么代码中的phreqFreq是0.0,我知道这是因为 (d == doc) 为假,因为 d=Integer .MAX_VALUE,我不知道为什么以及问题是什么。
我只有一个文档,只有一个字段,该文档被索引和存储,并且在调试代码中使用的文档是1,例如searcher.explan(booleanQuery,1);
I just wonder how Lucene can make it,and from the source code I know that it opens and loads the segment files when intializing a searcher with a IndexReader,but Is there any kind person tell me how Lucene calculates the term frequency in a document with special field.
Is there any special algorithm? I can not figure it out when reading the explan code on tf ,like:
Explanation tfExplanation = new Explanation();
int d = scorer.advance(doc);
float phraseFreq = (d == doc) ? scorer.currentFreq() : 0.0f;
tfExplanation.setValue(similarity.tf(phraseFreq));
tfExplanation.setDescription("tf(phraseFreq=" + phraseFreq + ")");
the Idf>0,but why phraseFreq in the code is 0.0,and I know it is because (d == doc) is false,because the d=Integer.MAX_VALUE,I don't know why and what is the problem.
I have only one document with one field,which is indexed and stored,and the doc which is used in the debug code is 1,like searcher.explan(booleanQuery,1);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我终于发现这都是因为lucene中使用了explain方法。explain只适用于搜索结果,但我以错误的输入变量(query,int)的方式使用它,并且int不是文档编号。
I finally found that it is all because of the useage of method explain in lucene.explain only works fine with the search result,but I used it in the way with wrong input variable (query,int),and the int isn't a doc number.