Lucene - 如何使用特殊字符索引值
我正在尝试索引一个如下所示的值:
Test (Test)
使用 StandardAnalyzer,我尝试使用以下方法将其添加到我的文档中:
Field.Store.YES, Field.Index.TOKENIZED
当我使用“Test(测试)”值进行搜索时,我的 QueryParser 会生成以下标签:
+Name:test +Name:test
这按我的预期运行,因为我没有转义特殊字符。
但是,如果我在索引我的值时执行 QueryParser.Escape('Test (Test)') ,它会创建术语:
[test] and [test]
然后,当我进行这样的搜索时:
QueryParser.Escape('Test (Test)')
我得到相同的两个术语(如我所料)。问题是,如果我有两个用名称索引的文档:
Test
Test (Test)
它在两个文档上都匹配。如果我指定搜索值“测试(测试)”,那么我只想获取第二个文档。我很好奇为什么转义特殊字符不能将它们保留在创建的术语中。是否有我应该考虑的替代分析器?我查看了 WhitespaceAnalyzer 和 KeywordAnalyzer。 WhitespanceAnalyzer 区分大小写,KeywordAnalyzer 将其存储为单个术语:
[Test (Test)]
这意味着如果我仅搜索“Test”,我将无法返回两个文档。
关于如何实现这一点有什么想法吗?看起来应该不那么困难。
I have a value I am trying to index that looks like this:
Test (Test)
Using a StandardAnalyzer, I attempted to add it to my document using:
Field.Store.YES, Field.Index.TOKENIZED
When I do a search with the value of 'Test (Test)' my QueryParser generates the following tags:
+Name:test +Name:test
This operates as I expect because I am not escaping special characters.
However, if I do QueryParser.Escape('Test (Test)') while indexing my value, it creates the terms:
[test] and [test]
Then when I do a search like such:
QueryParser.Escape('Test (Test)')
I get the same two terms (as I expect). The problem is if I have two documents indexed with the names:
Test
Test (Test)
It matches on both. If I specify a search value of 'Test (Test)' then I want to just get the second document. I am curious as to why escaping the special characters does not preserve them in the created terms. Is there an alternate Analyzer I should look at? I looked at WhitespaceAnalyzer and KeywordAnalyzer. WhitespanceAnalyzer is case sensitive and KeywordAnalyzer stores it as a single term of:
[Test (Test)]
Which means that if I do a search for just 'Test' I will not be able to return both documents.
Any ideas on how to implement this? It doesn't seem like it should be that difficult.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您搜索“Test(测试)”并且想要检索包含该精确表达式的文档,则必须将搜索表达式括在“...”之间,以便 Lucene 知道您想要进行短语搜索。
详细信息请参见 Lucene 文档:
http://lucene.apache.org/java/3_0_1/queryparsersyntax.html#条款
If you search for 'Test (Test)' and you want to retrieve documents that contains that exact expression, you must enclose the search expression between "..." so that Lucene knows that you want to do a phrase search.
See the Lucene documentation for details:
http://lucene.apache.org/java/3_0_1/queryparsersyntax.html#Terms