Lucene JDBC 目录

发布于 2024-12-28 05:10:13 字数 238 浏览 0 评论 0原文

我正在使用 Lucene 3.5.0 在我的网站上执行一些基本的搜索操作。我想将索引存储在 Mysql 数据库的 JDBC 目录中。我本来打算使用 Compass 项目来做到这一点,但通过更多的研究和实际尝试代码,我发现 Compass 是一个死项目,它不再与当前版本的 Lucene 兼容。

是否有其他选项可以将我的索引存储在 JDBC 目录中? Lucene 不提供这个原生功能有什么原因吗?出于某种原因,存储在 HDD 上是否是更好的选择?

I am using Lucene 3.5.0 to do some basic search stuff on my website. I want to store the index in a JDBC Directory in my Mysql Database. I was going to use the Compass Project to do this, but with some more research and actually trying the code I have found that Compass is a dead project and it no longer is compatible with the current version of Lucene.

Is there another option to store my index in a JDBC Directory? Is there a reason Lucene does not offer this native? Is storing on the HDD a better option for some reason?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

赠意 2025-01-04 05:10:13

来自 常见问题解答

Lucene 不支持开箱即用的功能,但有几个人已经实现了 JdbcDirectory 的。到目前为止我们看到的报告表明,这种实现的性能不是很好,但它是可行的。

另一种方法是将索引作为 BLOB 存储在数据库中,如果您有多个节点运行应用程序,这可能会很有用。如果向 BLOB 添加时间戳,每个节点都可以检查索引是否已更新并从数据库重新创建索引文件。

From the FAQ:

Lucene does not support that functionality out of the box, but several people have implemented JdbcDirectory's. The reports we have seen so far indicate that performance with such implementations is not great, but it is doable.

Another approach would be to store the index in the database as a BLOB, this could be useful if you have multiple nodes running your application. If you add a timestamp to the BLOB each node could check whether the index has been updated and recreate the index file from the DB.

铁轨上的流浪者 2025-01-04 05:10:13

这里有三个问题,违反了论坛的问题/答案格式,我将尝试回答它们相关的问题:

问:“是否有另一个选项可以将我的索引存储在 JDBC 目录中[与 Lucene 4 兼容” .x]?”

答:“Google 搜索……但是没有,没有得到广泛使用。大多数开发人员已经转向 ElasticSearch,它将许多方面捆绑到一个更大的包中。遗憾的是,随着许多 K/V 数据库的兴起,JDBC 正在消亡。有趣的是,从文件系统到非事务支持的 K/V 数据库……由于不可靠的操作系统文件锁定或缺乏 ACID 保证,这些解决方案实际上都不适合分布式可更新索引

。是Lucene 不提供这种原生功能有什么原因吗?”

答:“询问 Lucene 贡献者,但是……在阅读他们的文档时,他们再次转向构建在 ElasticSearch、REST API 和一般语言中立之上的更全面和商业的解决方案实现(相对于 Lucene 是几十年前的底层 Java 原生实现)。

问:“出于某种原因,存储在 HDD 上是否是更好的选择?”

答:“一般不建议这样做,因为可更新索引的操作系统文件锁定和非分布式支持不可靠(设想多个进程和节点尝试同时更新相同的索引文件)。甚至 AWS S3 也已被证明对此不可行由于缺乏锁定,需要有效删除并重新创建 S3 对象才能有效完成任务。

There are three questions here, which, violating the question/answer format of the forum, I will try to answer as they are related:

Q: "Is there another option to store my index in a JDBC Directory [that is compatible with Lucene 4.x]?"

A: "Google search... but no, not in widespread use. Most devs have moved to ElasticSearch, which bundles many aspects together into a much larger package. Sadly, JDBC is a dying thing as many K/V databases are taking hold. Interestingly enough, from file-systems to non-transactionally supported K/V databases ... none of these solutions are actually viable for a distributed updatable index due to unreliable operating-system file-locking or lack of ACID guarantees.

Q: "Is there a reason Lucene does not offer this native?"

A: "Ask the Lucene contributors but ... in reading their documentation, again, they have moved onto more comprehensive and commercial solutions built ontop of ElasticSearch, REST APIs and generally language-neutral implementations (versus Lucene being the underlying Java-native implementation going back decades).

Q: "Is storing on the HDD a better option for some reason?"

A: "Not recommended generally due to unreliable OS file-locking for updatable indexes and non-distribution support (envision multiple processes and nodes trying to concurrently update the same index file/s). Even AWS S3 has been shown as not viable for this purpose due to lack of locking and need to effectively delete and re-create S3 objects to effectively accomplish the task.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文