Berkeley DB XML 是可行的数据库后端吗?
显然,BDB-XML 至少从 2003 年就已经存在,但我最近才在 Oracle 网站上偶然发现它:Berkeley DB XML。这是简介:
Oracle Berkeley DB XML 是一个开源、可嵌入的 XML 数据库,可以基于 XQuery 访问存储在容器中的文档,并根据其内容进行索引。 Oracle Berkeley DB XML 构建于 Oracle Berkeley DB 之上,并继承了其丰富的功能和属性。与 Oracle Berkeley DB 一样,它与应用程序一起运行,无需人工管理。 Oracle Berkeley DB XML 在 Oracle Berkeley DB 之上添加了文档解析器、XML 索引器和 XQuery 引擎,以实现最快、最高效的数据检索。
在我看来,其基本思想在技术上是合理的,并且可能比 CouchDB 或 MongoDB 等较新的基于文档的数据库更成熟。据我所知,它支持 C、C++、Ruby 和 Perl。它甚至具有 HA 功能,例如使用具有自动选举功能的主/从模型进行自动复制。
但是,我似乎找不到任何使用它的项目。它有什么根本性的错误吗?执照太繁琐吗?是不是太复杂了?
为什么它没有被使用?
Apparently, BDB-XML has been around since at least 2003 but I only recently stumbled upon it on Oracle's website: Berkeley DB XML. Here's the blurb:
Oracle Berkeley DB XML is an open source, embeddable XML database with XQuery-based access to documents stored in containers and indexed based on their content. Oracle Berkeley DB XML is built on top of Oracle Berkeley DB and inherits its rich features and attributes. Like Oracle Berkeley DB, it runs in process with the application with no need for human administration. Oracle Berkeley DB XML adds a document parser, XML indexer and XQuery engine on top of Oracle Berkeley DB to enable the fastest, most efficient retrieval of data.
To me it seems that the underlying ideas are technically sound and probably more mature than the newer document-based DBs like CouchDB or MongoDB. It has support for C, C++, Ruby and Perl, as far as I can determine. It even has HA-capabilities like automatic replication using a master/slave model with automatic election.
However, I can't seem to find any projects that use it. Is there something fundamentally wrong with it? Is the license too onerous? Is it too complicated?
Why is it not being used?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
“有什么根本性的问题吗?”
是的。它是 XML。
不幸的是,这意味着它的发明者没有费心去了解现有概念和技术(例如关系代数和关系微积分)的力量。
比这些做得更好并不是一项微不足道的任务(这是礼貌的说法),到目前为止所有尝试过的人都失败了。
这应该告诉你一些事情。
"Is there something fundamentally wrong with it?"
Yes. It's XML.
And unfortunately that means that those who invented it did not bother to take a look at the power of already existing concepts and technologies like, say, relational algebra and relational calculus.
Doing better than those is not a trivial task (and that's putting it politely), and everyone who has tried so far has failed.
That ought to tell you something.
我曾经是 Oracle 的 Berkeley DB 产品的产品经理。我已经在这些 BDB 数据库上工作了八年多了,我写了您复制到问题中的“简介”。
在商业上,我们用于(非详尽列表,只是我的想法):
Berkeley DB XML 在开源世界中相对被忽视,我不知道为什么。到处都有一些项目使用过它,但据我所知,没有什么是公开的。我最近确实看到一篇关于如何使用的漂亮博客文章来自 Emacs 内的 BDB XML。设置完成后,您可以在文本编辑器中通过 XML 交互运行 XQuery 语句。也就是说,它对于商业和开源用途非常可行。
XQilla 是由 BDB XML 工程师根据我们多年来编织的其他几个 XML 项目创建的项目。我们开源(Apache 2.0 许可证)XQilla,因为它是一个很棒的 XQuery 和 XML 解析库。我们是一家数据库公司,因此解析 XML 并将其组织到我们的 btree 数据库中的部分以及查询优化、索引、统计和大量其他代码的工作都位于 XQilla 下,但是上面的 BDB 的 btree 将两者粘合在一起形成 BDB XML。如果它解决了您的问题,请随意使用它,那里根本没有数据库。
从头开始为 XML 构建的产品通常在其核心有一些事务数据结构,用于管理磁盘上的信息。我们没有在 Berkeley DB 中完成并在 Berkeley DB XML 中使用过的优化。如果说从头开始构建的用于管理 XML 的数据库将比 BDB XML 好得多,那就是说 Berkeley DB 缺少一些东西,我不认为这里有一个站得住脚的论点,但我愿意了解是否有人拥有关于对高效 XML 存储至关重要的并发事务数据结构的信息,但 BDB 尚未实现。
eXist 是一个 Java XML 数据库,如果您愿意,我们有一个 Java JNI API,并且我们通常在性能、稳定性和可扩展性测试方面击败 eXist。
Sedna 是一个很好的 XML 数据库,它是 Apache 2.0,所以它不是双重许可证,它只是 FLOSS 软件。我建议您将其与 BDB XML 进行基准测试,您可能会感到惊讶。
MarkLogic 是一个很棒的 XML/XQuery 数据库服务器,他们构建了一个非常可靠的产品。它不是软件库,而是服务器。 BDB XML 和 MarkLogic 之间存在显着差异,但它们都是商业可用的 - 只有 BDB XML 是开源的。
有人提到了 Elliot Rusty Harold 的关于 XML 数据库状态的博客,注意那是在 2007 年左右 - 嘿,那不是在任何 NoSQL 数据库存在之前吗? ;-)
看一下 Kimbro Staken 的旧的但仍然相关的评论 (由 Oracle 转为白皮书< /a>),很好,但也过时了。 “为您的 XML 数据使用本机 XML 数据库:决定何时基于 XQuery 的本机 XML 数据库比 SQL 数据库更好”
多年来真正的权威是 罗恩·伯雷特。关于这个话题他有很多话要说。
MongoDB 和 CouchDB 处于不同的市场领域。他们进行分布式、分区、最终一致的 BASE 风格(非 ACID)数据管理,有些人认为他们做得很好。我认为他们还年轻,目前还没有定论。他们有了一个良好的开端,我希望他们能够继续增长,数据存储是一件很难做好的事情,而且一种尺寸并不能满足每个人的问题/需求。 BDB XML 的分布式故事构建在单主、多副本始终一致(如果您愿意)的基于日志的复制和主失败时基于 PAXOS 的选举算法之上。我们不对数据进行分区,每个节点都包含相同的数据(整个数据库)。我们不允许在任何地方进行写入,只允许在主机上进行写入。我们支持的不仅仅是 TCP/IP 复制(哎呀,如果您愿意,您可以使用为您的服务器定制的硬件总线)。我们构建 HA 产品是为了解决读取可扩展性、系统可用性和容错问题。 NoSQL 的分布式系统专为随处写入分区数据管理而设计。选择是好的,对吧? :)
XML 作为一种数据模式,XQuery 作为一种访问和管理 XML 内容的语言,已经并将继续成为非常成功的解决方案。也许现在使用 NoSQL 解决方案的公共网站并不多(这很好,而且对我来说很有趣),但在文档管理、金融、基因组学、生物信息学、数据交换、消息传递等领域更多。与 SQL/关系型产品相比,XML 可能是一种小众数据库,但它肯定比对象数据库或 NoSQL 数据库解决方案中的任何新产品要成功得多。每个存储解决方案都有其自己的位置,XML 在未来仍将继续发挥作用。
最终,我希望您选择一个适合您需求的数据库。
I used to be the product manager for Berkeley DB products at Oracle. I've been around working on these BDB databases for over eight years now, I wrote the "blurb" you copied into your question.
Commercially we're used in (non-exhaustive list, just off the top of my head):
Berkeley DB XML has been relatively ignored in the open source world, I have no idea why. There are a few projects here and there have used it, nothing all that public that I know of. I did recently see a nifty blog post about how to use BDB XML from within Emacs. Once setup you can run XQuery statements over XML interactively within the text editor. That said, it's very viable for commercial and open source use.
XQilla is a project created by the BDB XML engineers from a few other XML projects we knitted together over the years. We open sourced (Apache 2.0 license) XQilla because it's a great XQuery and XML parsing library. We're a database company, so the piece that takes XML after it's been parsed and organizes it into our btree databases as well as the work on query optimization, indexing, statistics, and a whole ton of other code is what sits under XQilla but above BDB's btree gluing the two together into BDB XML. Feel free to use it if it solves your problem, there's no database there at all.
Product built from the ground up for XML generally have a few transactional data structures at their core which manage information on disk. There's not much optimization that can be done that we've not already done in Berkeley DB and used in Berkeley DB XML. To say that a database built from the ground up to manage XML is going to be significantly better than BDB XML is saying that there's something missing from Berkeley DB, I don't think there a defensible argument here but I'm willing to learn if someone has information on a concurrent, transactional data structure critical for efficient XML storage that BDB doesn't already implement.
eXist is a Java XML database, we have a Java JNI API if you'd like and we generally beat the pants off eXist in performance, stability and scalability tests.
Sedna is a good XML database, it's Apache 2.0 so it's not a dual-license, it's just FLOSS software. I'd suggest you benchmark it against BDB XML, you might be surprised.
MarkLogic is a great XML/XQuery database server, they've built a very solid product. It's not a software library, it's a server. There are significant differences between BDB XML and MarkLogic, but they are both commercially available - only BDB XML is open source.
Someone mentioned Elliot Rusty Harold's blog on the state of XML databases, be careful it's circa 2007 - hey, isn't that before any NoSQL database existed? ;-)
Take a look at Kimbro Staken's old but still relevant review (turned into a whitepaper by Oracle), it's good but also dated. "Use a Native XML Database for Your XML Data: Deciding when an XQuery-based native XML database is better than an SQL database"
The real authority over the years has been Ron Bourrett. He has a lot to say on the subject.
MongoDB and CouchDB are in a different market segment. They do distributed, partitioned, eventually consistent BASE-style (non ACID) data management and some think they do that very well. I think they are young, the jury is still out. They are off to a good start and I hope that they continue to grow, data storage is a hard thing to get right and one size doesn't fit everyone's problem/needs. BDB XML's distributed story is built on single-master, multi-replica always consistent (if you'd like) log-based replication and PAXOS-based election algorithms when the master fails. We don't partition data, every node contains the same data (the entire database). We don't allow writes everywhere, only at the master. We support more than TCP/IP for replication (heck, you could use a hardware bus custom to your server if you want). We built our HA product to solve read-scalability, system availability and fault-tolerance. NoSQL's distributed systems are designed for write anywhere partitioned data management. Choice is good, right? :)
XML as a data schema and XQuery as a language to access and manage XML content has been and continues to be a very successful solution. Maybe not so much in the more public websites using NoSQL solutions these days (which is fine, and interesting to me) but more so in document management, finance, genomics, bioinformatic, data exchange, messaging, and much more. XML may be a niche database when compared to SQL/relational products but it is certainly much more successful than object databases or any new kid on the block NoSQL database solution. Every storage solution has its place, XML will continue to do useful things far into the future.
At the end of the day, I hope you pick a database suits your needs.
需要记住的一件事是 Berkeley DB 的许可证。除非您打算开源项目,否则您需要从 Oracle 购买许可证,这就是为什么我怀疑您看不到更多许可证。除此以外,所有 Berkeley DB 数据库都非常出色。我倾向于将它们用于任何我不打算分发的内容(内部项目)。
One thing to keep in mind is Berkeley DB's license. Unless you are going to open source your project, you'll need to buy a license from Oracle, which is why I suspect you don't see more of it. All of the Berkeley DB databases are quite excellent otherwise. I tend to use them for anything I'm not going to distribute (in house projects).
根据我的经验,Berkeley DB XML 有很多前景和很多相关的用例。但您应该小心,不要期望它在所有情况下都有效。请注意,最后一个版本是 2009 年 12 月 22 日发布的 Berkeley DB XML 2.5.16。
它所基于的技术 Berkeley DB 非常强大且速度快得惊人,如果您针对您的用例正确配置它的话。有许多细节需要正确处理(例如启用事务、日志记录、了解使 MVCC 工作所需的所有标志)。我相信大多数人都会因为这种复杂性而遇到问题。
不过,我还遇到了其他一些缺点。最大的一个是查询规划器在排序时不会使用索引。这意味着您无法执行非常常见的数据访问模式,这相当于:
如果您这样做,Berkeley DB 将在排序之前检查磁盘上的所有时间值,这使得当您超过数万个节点时速度变慢。其他人也在这里报告了这一点:
https://forums.oracle.com /forums/message.jspa?messageID=9754987#9754987
您也可以直接枚举任何索引,但随后您将无法执行即席查询。
论坛上还报告了一些与索引类型和性能相关的奇怪行为:
https ://forums.oracle.com/forums/message.jspa?messageID=9753022#9753022
因此,虽然基于密钥的访问快速可靠,但要小心其不成熟的查询规划器。
From my experiences Berkeley DB XML has a lot of promise and a lot of relevant use cases. But you should be careful not to expect it to work in all cases. Note that the last release was Berkeley DB XML 2.5.16 in December 22, 2009.
The technology it is based on, Berkeley DB, is very robust and blindingly fast, if you configure it correctly for your use-case. There are many details to get right (e.g. enable transactions, logging, understanding all flags needed to get MVCC working). I believe the majority of people have issues because of this complexity.
I have run into a few other shortcomings though. The biggest one is that the query planner will not use indexes when sorting. This means that you cannot do a pretty common data access pattern which is the equivalent of:
If you do this Berkeley DB will check all values of time on disk before ordering, which makes it slow when you go beyond a few tens of thousands of nodes. Someone else reported this as well here:
https://forums.oracle.com/forums/message.jspa?messageID=9754987#9754987
You can enumerate any indexes directly as well, but then you lose the ability to do ad-hoc queries.
Also reported on the forums is some strange behavior related to index types and performance:
https://forums.oracle.com/forums/message.jspa?messageID=9753022#9753022
So, while key based access is fast and reliable, be careful of its immature query planner.
取决于您的需求。我不会推荐一种本机 xml 数据库而不是另一种,但我可以告诉您,出版业是整个行业的一个例子,该行业几乎放弃了关系数据库,并将大量时间转移到本机 xml 数据库来处理其出版物的内容。其中最著名(也是最昂贵)的是 MarkLogic 的产品。 eXistDB 是一种开源数据库,似乎受到了一些关注。
这是由杰出的 xml 专家 Elliot Rusty Harold 撰写的关于该主题的一篇优秀文章。
http://cafe.elharo.com/xml/the -state-of-native-xml-databases/
Depends on what your needs are. I won't recommend one native xml DB over another, but I can tell you that the publishing industry is an example of an entire sector that has pretty much abandoned relational databases and moved big time to native xml databases for handling the content of their publications. The most prominent(and most expensive) is the one from MarkLogic. eXistDB is an opensource one that seems to be getting some traction.
Here is an excellent article on this subject by one of the preeminent xml gurus, Elliot Rusty Harold.
http://cafe.elharo.com/xml/the-state-of-native-xml-databases/
最好的[*] XML 存储库是从头开始构建的支持 XML 的存储库,例如 MarkLogic 或 < a href="http://exist.sourceforge.net/" rel="nofollow noreferrer">eXist。
然而,BDB-XML 的存储引擎是古老的 Berkeley DB 引擎,它是最广泛使用的嵌入式数据库引擎之一。它体积小、速度快、稳定。
BDB-XML 本身无疑是一个功能强大的产品。它以前以 Sleepycat 的名称出售,如果这可以帮助您找到任何参考资料。它是 BDB 存储引擎与 XQilla XQuery 引擎的组合。
此外,您可能会通过搜索 XQilla 找到更多信息。它是一个相当强大的引擎,并且仍然是开源的。
[*]“最好”当然是一个主观术语。
The best[*] XML repositories are the ones built from the ground up to support XML, like MarkLogic or eXist.
However, the storage engine for BDB-XML is the venerable Berkeley DB engine, one of the most wide-spread embedded database engines. It is small, quick and stable.
BDB-XML itself is certainly a capable product. It was formerly sold under the name Sleepycat, if that helps you find any references. It's a combination of the BDB storage engine with the XQilla XQuery engine.
Also you might find more information searching for XQilla. It's a fairly powerful engine, and still open source.
[*] "best" of course, being a subjective term.
所以总而言之,这些都是 BDB-XML 似乎没有被广泛使用的原因:
似乎没有任何理由不使用它,但同样也没有太多东西可以使其在竞争中脱颖而出。最重要的是,最近的比赛更多的是“哦,闪亮!”吸引力和 XML 数据库本身仍然是一个利基市场。
So in conclusion, these are all reasons why BDB-XML doesn't seem widely used:
There doesn't seem to be any reason not to use it, but likewise there's not much to make it stand out from the competition. On top of that, the recent competition has more of a "Ooh, shiny!" appeal and XML databases themselves are still a niche market.
我最近也遇到了同样的问题,并遇到了 Sedna XML dbms。
I've been for the same lately and came across the Sedna XML dbms.