针对标题字段的 PhraseQuery 和针对 catch all 字段的 QueryParser 不会产生我期望的结果
如果用户在搜索框中输入短语(带或不带引号),我希望首先显示的结果是文档标题中具有确切短语的文档,以及在该短语之后显示的其他文档。这是我尝试过的,但它无法按顺序提供搜索结果:
在索引期间,我说:
AddStringFieldToDocument(document, "keyWord", this.BuildKeywordsString(), Field.Store.NO, Field.Index.ANALYZED, false);
AddStringFieldToDocument(document, "title", this.Title, Field.Store.NO, Field.Index.ANALYZED, false, 4f);
private void AddStringFieldToDocument(Document document, string fieldName, string fieldValue,
Field.Store store, Field.Index index, bool setOmitTermFreqAndPositions)
{
if (fieldValue == null)
{
return;
}
var field = GetFieldToAddToDocument(document, fieldName, fieldValue, store, index, setOmitTermFreqAndPositions);
document.Add(field);
}
private void AddStringFieldToDocument(Document document, string fieldName, string fieldValue,
Field.Store store, Field.Index index, bool setOmitTermFreqAndPositions, Single boost)
{
if (fieldValue == null)
{
return;
}
var field = GetFieldToAddToDocument(document, fieldName, fieldValue, store, index, setOmitTermFreqAndPositions);
field.SetBoost(boost); //boosting title
document.Add(field);
}
private Field GetFieldToAddToDocument(Document document, string fieldName, string fieldValue, Field.Store store,
Field.Index index, bool setOmitTermFreqAndPositions)
{
Field field = new Field(fieldName, fieldValue, store, index);
field.SetOmitTermFreqAndPositions(setOmitTermFreqAndPositions);
return field;
}
在搜索时,作为我的 BooleanQuery 的一部分,我得到:
if (!string.IsNullOrWhiteSpace(queryString))
{
QueryParser qpKeyWord = new QueryParser(myVersionUsed, "keyWord", StandardAnalyzer);
Query qKeyWord = qpKeyWord.Parse(queryString);
booleanQuery.Add(qKeyWord, BooleanClause.Occur.MUST);
Term titleTerm = new Term("title", queryString);
PhraseQuery qTitleWord = new PhraseQuery();
qTitleWord.SetSlop(12);
qTitleWord.Add(titleTerm);
qTitleWord.SetBoost(5);
booleanQuery.Add(qTitleWord, BooleanClause.Occur.SHOULD);
我得到的结果是混合的。此外,当我运行 IndexSearcher.Explain(query, docId) 时,
我得到:
Document Id: 92871
0.5439626 = (MATCH) product of:
0.8159439 = (MATCH) sum of:
0.5884751 = (MATCH) sum of:
0.2580064 = (MATCH) weight(KeyWord:chicken in 92871), product of:
0.2226703 = queryWeight(KeyWord:chicken), product of:
3.236447 = idf(docFreq=25345, maxDocs=237239)
0.06880084 = queryNorm
1.158692 = (MATCH) fieldWeight(KeyWord:chicken in 92871), product of:
4.582576 = tf(termFreq(KeyWord:chicken)=21)
3.236447 = idf(docFreq=25345, maxDocs=237239)
0.078125 = fieldNorm(field=KeyWord, doc=92871)
0.3304687 = (MATCH) weight(KeyWord:parmesan in 92871), product of:
0.2962231 = queryWeight(KeyWord:parmesan), product of:
4.305515 = idf(docFreq=8701, maxDocs=237239)
0.06880084 = queryNorm
1.115608 = (MATCH) fieldWeight(KeyWord:parmesan in 92871), product of:
3.316625 = tf(termFreq(KeyWord:parmesan)=11)
4.305515 = idf(docFreq=8701, maxDocs=237239)
0.078125 = fieldNorm(field=KeyWord, doc=92871)
0.2274688 = (MATCH) weight(has_photo:y in 92871), product of:
0.1251001 = queryWeight(has_photo:y), product of:
1.818294 = idf(docFreq=104665, maxDocs=237239)
0.06880084 = queryNorm
1.818294 = (MATCH) fieldWeight(has_photo:y in 92871), product of:
1 = tf(termFreq(has_photo:y)=1)
1.818294 = idf(docFreq=104665, maxDocs=237239)
1 = fieldNorm(field=has_photo, doc=92871)
0.6666667 = coord(2/3)
其中没有与 PhraseQuery 关联的编号,但每个关键字都有单独的编号。
然而,在搜索时,当我运行 query.ToString() 时,我得到:
+(KeyWord:chicken KeyWord:parmesan) title:"Chicken Parmesan"~12^5.0
这意味着查询写得正确。正确的?我缺少什么?
If a user enters a phrase in the search box (with or without quotes), I want the results that show first be the documents that have the exact phrase in the document title, and other documents showing after it. This is what I have tried but it fails to give me the search results in that order:
During indexing I say:
AddStringFieldToDocument(document, "keyWord", this.BuildKeywordsString(), Field.Store.NO, Field.Index.ANALYZED, false);
AddStringFieldToDocument(document, "title", this.Title, Field.Store.NO, Field.Index.ANALYZED, false, 4f);
private void AddStringFieldToDocument(Document document, string fieldName, string fieldValue,
Field.Store store, Field.Index index, bool setOmitTermFreqAndPositions)
{
if (fieldValue == null)
{
return;
}
var field = GetFieldToAddToDocument(document, fieldName, fieldValue, store, index, setOmitTermFreqAndPositions);
document.Add(field);
}
private void AddStringFieldToDocument(Document document, string fieldName, string fieldValue,
Field.Store store, Field.Index index, bool setOmitTermFreqAndPositions, Single boost)
{
if (fieldValue == null)
{
return;
}
var field = GetFieldToAddToDocument(document, fieldName, fieldValue, store, index, setOmitTermFreqAndPositions);
field.SetBoost(boost); //boosting title
document.Add(field);
}
private Field GetFieldToAddToDocument(Document document, string fieldName, string fieldValue, Field.Store store,
Field.Index index, bool setOmitTermFreqAndPositions)
{
Field field = new Field(fieldName, fieldValue, store, index);
field.SetOmitTermFreqAndPositions(setOmitTermFreqAndPositions);
return field;
}
At search time as part of my BooleanQuery I have:
if (!string.IsNullOrWhiteSpace(queryString))
{
QueryParser qpKeyWord = new QueryParser(myVersionUsed, "keyWord", StandardAnalyzer);
Query qKeyWord = qpKeyWord.Parse(queryString);
booleanQuery.Add(qKeyWord, BooleanClause.Occur.MUST);
Term titleTerm = new Term("title", queryString);
PhraseQuery qTitleWord = new PhraseQuery();
qTitleWord.SetSlop(12);
qTitleWord.Add(titleTerm);
qTitleWord.SetBoost(5);
booleanQuery.Add(qTitleWord, BooleanClause.Occur.SHOULD);
The results I get are mixed. Further, when I run IndexSearcher.Explain(query, docId)
I get:
Document Id: 92871
0.5439626 = (MATCH) product of:
0.8159439 = (MATCH) sum of:
0.5884751 = (MATCH) sum of:
0.2580064 = (MATCH) weight(KeyWord:chicken in 92871), product of:
0.2226703 = queryWeight(KeyWord:chicken), product of:
3.236447 = idf(docFreq=25345, maxDocs=237239)
0.06880084 = queryNorm
1.158692 = (MATCH) fieldWeight(KeyWord:chicken in 92871), product of:
4.582576 = tf(termFreq(KeyWord:chicken)=21)
3.236447 = idf(docFreq=25345, maxDocs=237239)
0.078125 = fieldNorm(field=KeyWord, doc=92871)
0.3304687 = (MATCH) weight(KeyWord:parmesan in 92871), product of:
0.2962231 = queryWeight(KeyWord:parmesan), product of:
4.305515 = idf(docFreq=8701, maxDocs=237239)
0.06880084 = queryNorm
1.115608 = (MATCH) fieldWeight(KeyWord:parmesan in 92871), product of:
3.316625 = tf(termFreq(KeyWord:parmesan)=11)
4.305515 = idf(docFreq=8701, maxDocs=237239)
0.078125 = fieldNorm(field=KeyWord, doc=92871)
0.2274688 = (MATCH) weight(has_photo:y in 92871), product of:
0.1251001 = queryWeight(has_photo:y), product of:
1.818294 = idf(docFreq=104665, maxDocs=237239)
0.06880084 = queryNorm
1.818294 = (MATCH) fieldWeight(has_photo:y in 92871), product of:
1 = tf(termFreq(has_photo:y)=1)
1.818294 = idf(docFreq=104665, maxDocs=237239)
1 = fieldNorm(field=has_photo, doc=92871)
0.6666667 = coord(2/3)
Which has no number associated with the PhraseQuery but has separate numbers for each keyWord.
However at search time when I run query.ToString() I get:
+(KeyWord:chicken KeyWord:parmesan) title:"Chicken Parmesan"~12^5.0
which means that The query was written right. Right? What am I missing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我怀疑您构建标题查询的方式永远不会获得来自标题子句的命中。
您构建 PhraseQuery 来查找单个术语:“Chicken Parmesan”,但当您对其建立索引时,StandardAnalyzer 会生成两个术语:“chicken”和“parmesan”。您需要使用这两个术语构建 PhraseQuery。
您可以使用 QueryParser 来实现此目的:
如果您不想使用 QueryParser,请使用 TokenStream api 将文本分解为标记:
The way you build up your query for the title I suspect you never get hits originating from the title clause.
You build the PhraseQuery to look for a single term: "Chicken Parmesan", but when you indexed it, the StandardAnalyzer produced two Terms: "chicken" and "parmesan". You need to build the PhraseQuery with those two terms.
You can use the QueryParser for this purpose:
If you dont want to use the QueryParser, use the TokenStream api to break your text in tokens: