如何使用 Lucene 查找行号或页码

发布于 2025-01-05 05:37:46 字数 106 浏览 1 评论 0原文

谁能帮助我吗?

对于我的项目,我使用 lucene 来索引文件。它只给我文件名和位置,没有提到行号和页码。

Lucene 是否可以找到行号或页号?请帮助我该怎么做。

Can anyone help me?

For my project i use lucene for indexing files. It only give me the file name and location not mention about the line number and page number.

If it is possible with Lucene to find line number or page number? Please Help me how to do it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

凡尘雨 2025-01-12 05:37:46

这对于评论来说太长了,所以我只是做了一个答案。

您是否正在考虑 grep(*nix 工具)输出,在其中 grep 一组文档并获取包含行号和文本匹配项的结果集? EG:

46: I saw the brown fox jumping over the lazy dog

如果是这样,Lucene 就不会那样工作了。在操作系统上,为了简化,grep 会连续打开每个文档,并对每个文档内的每一行内容运行指定的模式。因此,它可以生成像我之前列出的内容一样的输出,因为它正在处理机器上存在的文件。 Lucene 的行为有所不同。

当您使用 Lucene 索引文件时,Lucene 会创建一个倒排索引,将每个文档的内容组合成一个高效的结构,让您快速查找和查找包含特定信息的文档。反过来,当您针对 Lucene 倒排索引运行查询时,它将返回与您的查询匹配的所有文档的内部表示以及相关性分数,以提供一些指示,表明文档对您的有用程度,基于查询。它通过针对其自己的内部倒排索引结构进行操作来实现此目的,而不是像 grep 那样迭代所有文件。 Lucene 不知道行号或页号,因此不可能直接使用 Lucene 复制 grep。

This ended up being too long for a comment so I just made it an answer.

Are you thinking of grep (*nix tool) output where you grep a set of documents and get a result set that contains matches with a line number and text? EG:

46: I saw the brown fox jumping over the lazy dog

If so, Lucene doesn't work like that. On the OS, grep, to simplify, opens each document serially and runs your specified pattern against each line of the contents inside each document. Hence, it can then produce output like the stuff I listed earlier because it's working on the file as it exists on the machine. Lucene behaves differently.

When you index a file with Lucene, Lucene creates a inverted index combining the contents of each document into a highly efficient structure that lets you quickly look up and find documents containing specific pieces of information. In turn, when you run a query against the Lucene Inverted Index, it will return its internal representation of all the documents that matched your query as well as a relevancy score to provide some indication of how useful a document might be to you, based on the query. It does this by operating against it's own internal inverted index structure, not iterating over all the files in place like grep. Lucene possesses no knowledge of line or page numbers, so no, it's not possible to replicate grep with Lucene right out of the box.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文