“没有包含实例错误”同时从 Lucene 索引获取文档的最高术语频率
我试图获取 Lucene 索引中每个特定文档最常出现的术语频率。我正在尝试设置我关心的出现频率最高的术语的阈值,也许是 20
但是,在调用 Comparator 时,我收到“没有可访问的 DisplayTermVectors 类型的内含实例”...
因此,对于此函数,我传递每个文档的向量我想知道最大顶级术语的
protected static Collection getTopTerms(TermFreqVector tfv, int maxTerms){
String[] terms = tfv.getTerms();
int[] tFreqs = tfv.getTermFrequencies();
List result = new ArrayList(terms.length);
for (int i = 0; i < tFreqs.length; i++) {
TermFrq tf = new TermFrq(terms[i], tFreqs[i]);
result.add(tf);
}
Collections.sort(result, new FreqComparator());
if(maxTerms < result.size()){
result = result.subList(0, maxTerms);
}
return result;
}
/*Class for objects to hold the term/freq pairs*/
static class TermFrq{
private String term;
private int freq;
public TermFrq(String term,int freq){
this.term = term;
this.freq = freq;
}
public String getTerm(){
return this.term;
}
public int getFreq(){
return this.freq;
}
}
/*Comparator to compare the objects by the frequency*/
class FreqComparator implements Comparator{
public int compare(Object pair1, Object pair2){
int f1 = ((TermFrq)pair1).getFreq();
int f2 = ((TermFrq)pair2).getFreq();
if(f1 > f2) return 1;
else if(f1 < f2) return -1;
else return 0;
}
}
解释和更正,我将非常感激,而且如果其他人有术语频率提取的经验并且做得更好,我愿意接受所有建议!
请帮忙!!!!谢谢!
I am trying to get the most occurring term frequencies for every particular document in Lucene index. I am trying to set the treshold of top occuring terms that I care about, maybe 20
However, I am getting the "no inclosing instance of type DisplayTermVectors is accessible" when calling Comparator...
So to this function I pass vector of every document and max top terms i would like to know
protected static Collection getTopTerms(TermFreqVector tfv, int maxTerms){
String[] terms = tfv.getTerms();
int[] tFreqs = tfv.getTermFrequencies();
List result = new ArrayList(terms.length);
for (int i = 0; i < tFreqs.length; i++) {
TermFrq tf = new TermFrq(terms[i], tFreqs[i]);
result.add(tf);
}
Collections.sort(result, new FreqComparator());
if(maxTerms < result.size()){
result = result.subList(0, maxTerms);
}
return result;
}
/*Class for objects to hold the term/freq pairs*/
static class TermFrq{
private String term;
private int freq;
public TermFrq(String term,int freq){
this.term = term;
this.freq = freq;
}
public String getTerm(){
return this.term;
}
public int getFreq(){
return this.freq;
}
}
/*Comparator to compare the objects by the frequency*/
class FreqComparator implements Comparator{
public int compare(Object pair1, Object pair2){
int f1 = ((TermFrq)pair1).getFreq();
int f2 = ((TermFrq)pair2).getFreq();
if(f1 > f2) return 1;
else if(f1 < f2) return -1;
else return 0;
}
}
Explanations and corrections i will very much appreciate, and also if someone else had experience with term frequency extraction and did it better way, I am opened to all suggestions!
Please help!!!! Thanx!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您需要将 TermFrq 创建为公共静态类。
I think you'd need to make your TermFrq
public static class
.