基于多个匹配列的 Lucene 查询评分

发布于 2024-12-28 22:48:17 字数 1487 浏览 3 评论 0原文

我正在使用 Lucene 直接搜索联系人,其中包含人员数据库的一般联系信息,例如名字、姓氏、电话号码、地址等。这个问题专门与按名字和姓氏搜索有关。这是我对名称进行索引的方法。

document.add(new Field("firstName", contact.getFirstName(), Field.Store.NO, Field.Index.NOT_ANALYZED));
document.add(new Field("lastName", contact.getLastName(), Field.Store.NO, Field.Index.NOT_ANALYZED));

我正在这样搜索索引:

IndexReader indexReader = IndexReader.open(FSDirectory.open(directory));
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
int hitsPerPage = indexSearcher.maxDoc();
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
String[] fields = {"id", "firstName", "lastName", "phoneNumber", "email", "address", "website"};

BooleanQuery booleanQuery = new BooleanQuery();
String[] terms = queryString.split(" ");

for(String term : terms) {
    for(String field : fields) {
        booleanQuery.add(new FuzzyQuery(new Term(field, term)), BooleanClause.Occur.SHOULD);
    }
}

TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
indexSearcher.search(booleanQuery, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

我使用布尔查询而不是 MultiFieldQuery 的原因是因为它允许我在字段不精确时获得结果。基本上,我用空格分割查询字符串,然后在索引中的每个字段上为每个关键字添加术语。我是 Lucene 的新手,所以我真的不知道这是否是执行此操作的最佳方法,但到目前为止它对我来说工作正常。

我遇到的唯一问题是,当按全名搜索时,它不会以正确的顺序返回结果。

索引有 2 条记录,John Doe 和 John Smith。

当我搜索 John Doe 时,我的结果将如下所示: 1)约翰·史密斯 2) John Doe

如果我输入 John Smith,它将反转并首先显示 John Doe。为什么它不返回完全匹配的第一个结果?

I am using Lucene to search a contacts directly with general contact information for a database of people such as first name, last name, phone number, address etc. This question pertains specifically to searching by first and last name. Here is how I am indexing the names.

document.add(new Field("firstName", contact.getFirstName(), Field.Store.NO, Field.Index.NOT_ANALYZED));
document.add(new Field("lastName", contact.getLastName(), Field.Store.NO, Field.Index.NOT_ANALYZED));

I am searching the index like this:

IndexReader indexReader = IndexReader.open(FSDirectory.open(directory));
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
int hitsPerPage = indexSearcher.maxDoc();
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
String[] fields = {"id", "firstName", "lastName", "phoneNumber", "email", "address", "website"};

BooleanQuery booleanQuery = new BooleanQuery();
String[] terms = queryString.split(" ");

for(String term : terms) {
    for(String field : fields) {
        booleanQuery.add(new FuzzyQuery(new Term(field, term)), BooleanClause.Occur.SHOULD);
    }
}

TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
indexSearcher.search(booleanQuery, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

The reason I am using a boolean query as opposed to a MultiFieldQuery is because it allows me to get results when a field is not exact. Basically I split the querystring by whitespace and then add terms for each of those keywords on each field in the index. I'm new to Lucene so I really have no idea if this is the optimal way to do this, but so far its been working ok for me.

The only hiccup i'm having is that when searching by full name it is not returning the results in the right order.

Index has 2 records, John Doe and John Smith.

When I search for John Doe my results will look like:
1) John Smith
2) John Doe

If I type John Smith it will reverse and display John Doe first. Why is it not returning the exact match as the first result?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

漆黑的白昼 2025-01-04 22:48:17

如果您要搜索所有字段中的所有术语,为什么不将整个文本索引为另一个字段的一部分呢?然后您可以发出类似的查询

/*
\\\\ is for escaping "
*/
String searchCriteria = "all:\\\\"John Doe\\\\"^3 OR all:(John Doe)"; 
IndexSearcher is = new IndexSearcher(indexDirectory);
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser("all", analyzer);
Query query = parser.parse(searchCriteria);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
indexSearcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

但是,如果您想继续当前的设计,您可以尝试 http://lucene.apache.org/java/3_5_0/api/all/org/apache/lucene/search/IndexSearcher.html#explain(org.apache.lucene.search.Query, int) 来找出为什么某个文档的得分高于其他文档。

If you are going to search for all terms across all fields, why not index the entire text as part of another field? And then you can issue a query like

/*
\\\\ is for escaping "
*/
String searchCriteria = "all:\\\\"John Doe\\\\"^3 OR all:(John Doe)"; 
IndexSearcher is = new IndexSearcher(indexDirectory);
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser("all", analyzer);
Query query = parser.parse(searchCriteria);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
indexSearcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

However, if you want to continue with your current design, you can try http://lucene.apache.org/java/3_5_0/api/all/org/apache/lucene/search/IndexSearcher.html#explain(org.apache.lucene.search.Query, int) to find out why a document is being scored higher than other.

萌无敌 2025-01-04 22:48:17

在我的情况下,使用布尔查询和 for 循环被证明是搜索索引的正确方法。由于我在客户端解析和显示它们的方式,结果被颠倒了,所以这是一个完全不相关的问题。

Using boolean queries and a for loop turned out to be a proper way of searching the index in my situation. The results were being reversed due to the way I was parsing and displaying them on the client side so it was a completely unrelated issue.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文