Lucene 同义词扩展、词干、拼写检查等
我正在使用 Lucene 来索引我的数据库,然后对特定字段(字段名称:关键字)执行短语搜索。 我当前正在使用以下代码:
String userQuery = request.getParameter("query");
//create standard analyzer object
analyzer = new StandardAnalyzer(Version.LUCENE_30);
Analyzer analyze=AnalyzerUtil.getPorterStemmerAnalyzer(analyzer);
//create File object of our index directory
File file = new File(LUCENE_INDEX_DIRECTORY);
//create index reader object
reader = IndexReader.open(FSDirectory.open(file),true);
//create index searcher object
searcher = new IndexSearcher(reader);
//create topscore document collector
collector = TopScoreDocCollector.create(1000, false);
//create query parser object
parser = new QueryParser(Version.LUCENE_30,"keyword", analyze);
parser.setAllowLeadingWildcard(true);
//parse the query and get reference to Query object
query = parser.parse(userQuery);
//********Line 1***********************
//search the query
searcher.search(query, collector);
hits = collector.topDocs().scoreDocs;
//check whether the search returns any result
if(hits.length>0){//Code to retrieve hits}
此代码对于词干提取效果很好,但现在我还想扩展我的查询以进行同义词搜索,就像如果我输入“Man”并且我的 lucene 索引有一个条目“male”,它仍然可以给我一个打击。 我尝试将其添加到上述代码的第 1 行 query=SynExpand.expand(userQuery,
但它没有给我任何结果。 我还想引入拼写检查,如果我输入“u believeable”而不是“令人难以置信”,它仍然会给我一个结果。
搜索者,分析,“关键字”,serialVersionUID);
我不知道为什么同义词扩展对我不起作用以及如何进行拼写检查。如果有人可以指导我,我将非常感激。
谢谢!
I am using Lucene to index my database and then perform a phrase search on a specific field(field name: keyword).
I am using following code currently:
String userQuery = request.getParameter("query");
//create standard analyzer object
analyzer = new StandardAnalyzer(Version.LUCENE_30);
Analyzer analyze=AnalyzerUtil.getPorterStemmerAnalyzer(analyzer);
//create File object of our index directory
File file = new File(LUCENE_INDEX_DIRECTORY);
//create index reader object
reader = IndexReader.open(FSDirectory.open(file),true);
//create index searcher object
searcher = new IndexSearcher(reader);
//create topscore document collector
collector = TopScoreDocCollector.create(1000, false);
//create query parser object
parser = new QueryParser(Version.LUCENE_30,"keyword", analyze);
parser.setAllowLeadingWildcard(true);
//parse the query and get reference to Query object
query = parser.parse(userQuery);
//********Line 1***********************
//search the query
searcher.search(query, collector);
hits = collector.topDocs().scoreDocs;
//check whether the search returns any result
if(hits.length>0){//Code to retrieve hits}
This code works fine for stemming, but now I want to also expand my query to do synonym search like if I enter "Man" and my lucene index has a entry "male", it would still be able to give me that as a hit.
I tried to add this at Line 1 in the above code query=SynExpand.expand(userQuery,
searcher, analyze,"keyword",serialVersionUID);
But it doesn't give me any result.
I also want to introduce spell check, where in if I enter "ubelievable" instead of "unbelievable" it would still give me a result.
I have no idea why synonym expansion isn't working for me and how to do spelling check.Please if someone could guide me I will be really grateful.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
模糊搜索可以通过查询关键字修饰符来完成,即通过添加波形符:
参见 Lucene Parser语法了解更多详细信息以及您可能感兴趣的其他类型的查询。
有两种处理同义词的方法。您尝试使用的查询扩展依赖于WordNet。正如
SynExpand
的文档所述,您应该首先调用 Syns2Index 使用扩展。这是很简单的方法,但它只适用于英语单词。如果您需要添加对多种语言的支持或添加您自己的同义词,可以在索引期间使用同义词注入。这个想法是编写您自己的分析器,将您自己的字典中的同义词注入到索引文档中。这听起来可能很难实现,但幸运的是,Lucene in Action 一书中有很好的示例(源代码可用)免费,请参阅 lia.analysis.synonym 包(不过,我强烈建议您购买这本好书)。
Fuzzy search may be done by query keyword modifier, namely by adding tilde:
See Lucene Parser Syntax for more details and other types of queries that may be interesting to you.
There are 2 ways of dealing with synonyms. Query expansion you are trying to use relies on WordNet. As
SynExpand
's documentation says, you should first invoke Syns2Index to use expansion. This is easy way, but it works only with English words.If you need to add support for multiple languages or add your own synonyms, you can use synonym injection during indexing. The idea is to write your own analyzer that will inject synonyms from your own dictionary into indexed documents. This may sound hard to implement, but fortunately there's excellent example in Lucene in Action book (source code is available for free, see
lia.analysis.synonym
package. Though, I highly recommend to get your copy of this nice book).