lucene中转义特殊字符并使用通配符进行查询
当我尝试在包含特殊字符的术语中使用通配符进行查询时遇到问题。 例如,如果我索引 "Test::Here"
,我会使用通配符 ?
搜索 "TE?T\:\:Here" (注意:我转义了 ':')。我没有得到任何结果。我使用标准分析器和查询解析器进行索引和搜索。
有人遇到过类似的问题吗?
I have an issue when I try to query using wildcard in a term that has a special character in it.
As an example if I index "Test::Here"
,I search using this using wildcard ?
for "TE?T\:\:Here"
(NOTE: I escaped ':'). I do not get any results. I use standard analyser and queryparser for indexing and searching.
Anyone encountered similar issue?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
StandardAnalyzer
使用StandardTokenizer
,因此Test::Here
被视为两个标记:Test
和Here< /代码>。通配符查询不是通过分析器运行的,因此您最终会将冒号与不包含冒号的术语进行匹配。您需要使用不同的分词器,例如
WhitespaceTokenizer
。StandardAnalyzer
usesStandardTokenizer
, soTest::Here
is seen as two tokens:Test
andHere
. Wildcard queries are not run through an analyzer, so you end up matching colons against the terms that do not contain them. You need to use different tokenizer, for exampleWhitespaceTokenizer
.您无法搜索未编入索引的内容。下面是一段代码,用于查看您索引的内容。
You can't search what you haven't indexed. Below is a code to see what you index.
Artur 是对的,但还有另一个需要考虑的问题,即 Lucene 中根本不分析通配符术语,因此您必须确保查询术语的大小写与索引术语的大小写匹配(分析后)。
Artur is right, but there is another issue to consider which is that wildcard terms are not analyzed at all in Lucene, so you will have to make sure that the case of your query term matches the case of the indexed term (after analysis).