最近,我们在与 Jackrabbit(1.6.2) 配合使用的应用程序中添加了 Lucene(2.4.1) 支持。我们已经按照长耳大野兔教程中描述的那样完成了所有操作。一切都几乎正常。但我注意到一些奇怪的行为,但找不到任何有关它的文档。我决定问问你这件事。
例如:我在 jcr:data 属性的 Node(jcr:content) 中有以下文本
The quick brown fox jumps over the lazy dog
!@#$%^&
travmik!
tra!vmik
我的 XPath 查询如下:
String query = "root/element(*,my:documentBody)
[jcr:contains(*/*/element(*),'*" + param +"*')]";
然后我尝试搜索:
“q”、“qu”、“qui”、“quic”、“quick” "、"k"、"ck"、"ick"、"uick"、"quickbrownfox"、"quickfox"、"tra"、"travmik"、"mik" - 都可以找到
"tra!vmik", “travmik!”,“!@#$” - 什么都没有
而且,是的,我转义了 这个。
我做错了什么?
Ps 我还有一个问题 - 在 Lucene 文档中说“你不能使用 * 或 ? 符号作为搜索的第一个字符”,但我使用并且它有效。为什么?
Recently we have added Lucene(2.4.1) support to our application which worked with Jackrabbit(1.6.2). We have done all like it was described in jackrabbit tutorial. And all works almost fine. But I noticed some strange behavior and can't find any docs about it. I decided to ask you about it.
For example: I have following text in Node(jcr:content) in jcr:data property
The quick brown fox jumps over the lazy dog
!@#$%^&
travmik!
tra!vmik
My XPath query is the following:
String query = "root/element(*,my:documentBody)
[jcr:contains(*/*/element(*),'*" + param +"*')]";
Then I try to search:
"q", "qu", "qui", "quic", "quick", "k", "ck", "ick", "uick", "quick brown fox", "quick fox", "tra", "travmik", "mik" - all found ok
"tra!vmik", "travmik!", "!@#$" - nothing
And, yes I escaped all special characters from this.
What did I do wrong?
P.s. I have one more question - in Lucene docs says that "You cannot use a * or ? symbol as the first character of a search", but I use and it works. Why?
发布评论
评论(1)
我发现了问题。这是对 Jackrabbit 中用于索引内容的提取器的一些误解。我不想详细说明,但可以说来自提取器之一的这段代码是我所有问题的原因:
如果有人对此感兴趣 - 我可以更详细地解释。
I found the problem. It was some misunderstanding with Extractors which are used in jackrabbit for indexing content. I don't want to go into details, but can say that this piece of code from one of Extractors is the cause of all my problems:
If someone is interested in this - I can explain in greater detail.