卢塞恩长耳大野兔

发布于 2024-10-06 08:24:16 字数 893 浏览 3 评论 0 原文

最近,我们在与 Jackrabbit(1.6.2) 配合使用的应用程序中添加了 Lucene(2.4.1) 支持。我们已经按照长耳大野兔教程中描述的那样完成了所有操作。一切都几乎正常。但我注意到一些奇怪的行为,但找不到任何有关它的文档。我决定问问你这件事。

例如:我在 jcr:data 属性的 Node(jcr:content) 中有以下文本

The quick brown fox jumps over the lazy dog 
!@#$%^& 
travmik! 
tra!vmik

我的 XPath 查询如下:

String query = "root/element(*,my:documentBody)
                        [jcr:contains(*/*/element(*),'*" + param +"*')]";

然后我尝试搜索:

“q”、“qu”、“qui”、“quic”、“quick” "、"k"、"ck"、"ick"、"uick"、"quickbrownfox"、"quickfox"、"tra"、"travmik"、"mik" - 都可以找到

"tra!vmik", “travmik!”,“!@#$” - 什么都没有

而且,是的,我转义了 这个

我做错了什么?

Ps 我还有一个问题 - 在 Lucene 文档中说“你不能使用 * 或 ? 符号作为搜索的第一个字符”,但我使用并且它有效。为什么?

Recently we have added Lucene(2.4.1) support to our application which worked with Jackrabbit(1.6.2). We have done all like it was described in jackrabbit tutorial. And all works almost fine. But I noticed some strange behavior and can't find any docs about it. I decided to ask you about it.

For example: I have following text in Node(jcr:content) in jcr:data property

The quick brown fox jumps over the lazy dog 
!@#$%^& 
travmik! 
tra!vmik

My XPath query is the following:

String query = "root/element(*,my:documentBody)
                        [jcr:contains(*/*/element(*),'*" + param +"*')]";

Then I try to search:

"q", "qu", "qui", "quic", "quick", "k", "ck", "ick", "uick", "quick brown fox", "quick fox", "tra", "travmik", "mik" - all found ok

"tra!vmik", "travmik!", "!@#$" - nothing

And, yes I escaped all special characters from this.

What did I do wrong?

P.s. I have one more question - in Lucene docs says that "You cannot use a * or ? symbol as the first character of a search", but I use and it works. Why?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

秋意浓 2024-10-13 08:24:16

我发现了问题。这是对 Jackrabbit 中用于索引内容的提取器的一些误解。我不想详细说明,但可以说来自提取器之一的这段代码是我所有问题的原因:

if (!Character.isLetterOrDigit(c)) {
    if (!space) {
        space = true;
        buffer.append(' ');
        continue;
    }
    continue;
}

如果有人对此感兴趣 - 我可以更详细地解释。

I found the problem. It was some misunderstanding with Extractors which are used in jackrabbit for indexing content. I don't want to go into details, but can say that this piece of code from one of Extractors is the cause of all my problems:

if (!Character.isLetterOrDigit(c)) {
    if (!space) {
        space = true;
        buffer.append(' ');
        continue;
    }
    continue;
}

If someone is interested in this - I can explain in greater detail.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文