使用搜索引擎作为键值存储有哪些优点和缺点?
给定一个像 Lucene 这样的搜索引擎和一组需要完整保存的 XML 文档,在给定每个文档包含的唯一主键的情况下,使用搜索引擎作为键值存储来返回 XML 文档有哪些优点和缺点?
Given a search engine like Lucene and a set of XML documents which need to be fully preserved, what are the advantages and disadvantages of using the search engine as key value store for returning XML doucments given a unique primary key which each document contains?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
阅读搜索引擎与 DBMS。 IMO,您的应用程序属于 DBMS 领域,并且可能最好由键值数据库(例如 couchDB)提供服务。这是因为您没有利用文本操作,例如标记化、词干提取等。
Read Search Engine versus DBMS. IMO, your application falls in the DBMS realm, and will probably be best served by a key-value database, such as couchDB. This is because you take no advantage of textual operations such as tokenization, stemming etc.
如果您使用类似 Compass 的东西,它就是 XML 到 Lucene 映射引擎,它是存储和查询 XML 文档的绝佳解决方案,无需一直到 XML 数据库。
一个缺点是 XML 文档只能通过 Lucene API 检索(底层数据存储相当难以理解),但我可以忍受这一点。
If you use something like Compass, and it's XML-to-Lucene mapping engine, it's a great solution for storing and querying XML documents, without going all the way to a XML database.
One downside is that the XML documents can only be retrieved via the Lucene API (the underlying data store is pretty impenetrable), but I can live with that.
如果您要做的只是测试键相等性并检索 blob,那么 Lucene 与 bdb 相比没有明显的优势。在你将其他东西放在上面之前,你没有任何交易。并发性有一定的复杂性。对于您正在做的简单事情来说,API 有点巴洛克风格。
我已经实现了类似于您所描述的内容,但对数据的实际全文搜索是一个关键要求,它证明了其余的合理性。
If all you are going to do is test for key equality and retrieve a blob, Lucene has no visible advantage over, say, bdb. And you have no transactions until you layer something else on top. And concurrency has certain complexities to it. And the API is, well, a bit baroque for the simple thing you are doing.
I've implemented something like what you describe, but actual full text search on the data was a critical requirement that justified the rest.