CouchDb - MongoDb 和 NoSQL 数据库比较(使用 XML 文档)
我正在使用 Java 和 Spring 3 开发一个项目。我有一个新任务。将会有 Xml 文件,我获取这些文件并将它们转换为对象。之后我会将它们放入数据库中。
我检查nosql数据库的主要主题。 CouchDb
和 MongoDb
是我应该搜索的数据库。我将在数据库中对这些对象进行搜索
(索引类型之一将是日期,我将在选择之间创建日期)。 性能
对我来说非常重要,
我将处理大量数据
,这就是为什么我应该搜索nosql数据库。
根据我的情况,您有什么建议,它们的优点/缺点是什么,我应该选择哪一个,为什么?
我搜索并看到 Couch DB 使用 REST API,而 Mongo DB 使用驱动程序,根据此处,它是 Mongo 的性能优势: http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB
但是 Couch DB 使用复制方式到规模(它是性能加分吗?)
我还意识到有 BaseX 和 eXist。根据我的需要,你建议有人与他们合作吗?
PS:我还将获得像日志一样的 XML 文件。它们不会改变,我也不会操纵其中的数据。
I am working on a project using Java and Spring 3. There is a new task for me. There will be Xml files and I get that files and convert them into Objects. After that I will put them into a database.
The main topic for me to examine nosql databases. CouchDb
and MongoDb
are the databases I should search. I will make search
on that objects(one of the index type will be date and I will make date between selects) at database. Performance
is so important for me and
I will work on a huge data
thats why I should search nosql databases.
What do you suggest according to my scenario, what are pros/cons of them and which one I should choose and why?
I searched and see that Couch DB uses a REST API and Mongo DB uses drivers and it is performance plus for Mongo according to here: http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB
However Couch DB uses replication a way to scale(is it a performance plus?)
Also I realize that there are BaseX and eXist. According to my need what do you suggest did anyone worked with them?
PS:Also I will get XML files as like logs. They will not change and I won't manipulate data on it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是一个很大的问题,但我会尽力解决它。我工作的一家公司正在从使用 Mysql 开发应用程序转向使用 NoSQL,我是第一个 NoSQL 数据库的负责人,我们正在决定使用哪个 NoSQL 数据库。我在 MongoDB、CouchDB 和 Cassandra 之间徘徊。我必须考虑的一个重要因素是,编写与数据库一起使用的基线函数有多容易,这样您就不必了解正在发生的事情,但仍然能够执行查询等。 cassandra 的问题是 API 级别非常低,需要一些时间来编写可靠的高级接口,而我们没有这样的时间。 couchdb 的问题在于 REST 服务。由于我们已经使用 Rest 连接到我们的内部 api,因此这将是双重 Rest 服务。 REST 通常通过 http 进行,并且要使 http 易于使用需要相当多的开销。这种开销增加了加载信息的时间。因此,出于这个原因和许多其他原因,我们采用了 mongodb。此外,由于它是一个驱动程序,因此它是为与编程语言一起使用而开发的,如果您的语言受支持,那就太好了,如果不支持,则很糟糕。由于 mongodb 支持 Java,所以没问题。
我建议将 XML 文件转换为对象,然后将对象存储在 mongo 中。所以每个 XML 文件都会嵌入 mongodocuments mongo 的伟大之处在于你可以搜索嵌入的文档并且可以索引它们。所以享受帽子
This is a pretty big question but I will do my best to tackle it. A company I work for was making the change from developing our applications with Mysql to NoSQL and i was the lead on the first NoSQL database, we were deciding which NoSQL database to work with. I was between MongoDB, CouchDB and Cassandra. One important factor I had to look at was, how easy will it be to write base line functions to work with the database so u don't have to understand what is going on but still able to execute querys and so on. The issue with cassandra was there API was super low level and would take some time to write a solid high level interface and we did not have that kind of time. The issue with couchdb was the REST service. Since we were already connecting to our inhouse api using rest it would have been a double rest service. REST generally goes over http and there is a fair amount of over head for http to be as easy to work with has it is. And that over head adds time to loading information. So we took mongodb for that reason and many other reasons. Also since its a driver it is developed to work with the programming language which is great if your language is supported sucks if its not. Since Java is supported by mongodb then its fine.
I would recommend converting the XML files in to objects and then storing the objects in mongo. so each XML file would be embedded mongodocuments the great thing about mongo is you can search embedded documents and u can index them. So enjoy hat
我只在高数据量、低负载的内部应用程序中使用过MongoDB,因此我无法为您的选择提供第一手建议。
然而,MongoDB 人员与 CouchDB 进行了比较这里。还有不少比较独立的意见(1, 2)。
您还应该考虑适合您的环境的可用数据库驱动程序的质量。根据我的经验,Java MongoDB 驱动程序非常稳定,但在我看来,它仍然会产生比应有的更多的处理开销。我不知道任何 CouchDB 驱动程序。
除了存储大量数据的能力之外,您还有其他要求吗?您需要复制或分片吗?
PS:您如何存储 XML 文件? XML 文件不会完美地映射到 JSON(MongoDB 使用的就是 JSON)——除非您将整个 XML 文本存储在单个字段中。
PS2:您确定您需要一个基于文档的数据库吗?如果您只想对预先已知的几个字段执行搜索,那么关系数据库可能更容易处理。仅当您没有预定义的数据架构或需要存储更复杂的对象层次结构时,基于文档的数据库才开始有意义。
PS3:请问为什么大数据对你来说意味着NoSQL?您可以在任何现代关系数据库上存储大量数据(当然,只要您有硬件)。
编辑:
几个相关的SO问题:
(...还有大约一千个)
也许还有这些:
I have only used MongoDB in a high-data-volume, low-load internal application, so I cannot really offer first hand advice for your choice.
The MongoDB people, however, have a comparison with CouchDB here. There are also quite a few more independent opinions (1, 2).
You should also consider the quality of the available database drivers for your environment. The Java MongoDB driver is quite stable, in my experience, but it seems to me that it still incurs more processing overhead than it should. I have not idea about any of the CouchDB drivers.
Do you have any other requirements apart from the ability to store large amounts of data? Do you need replication or sharding?
PS: How are you storing the XML files anyway? XML files do not map into JSON (which is what e.g. MongoDB uses) perfectly - unless you store the whole XML text in a single field.
PS2: Are you sure that you need a document-based database? If you are only going to perform searches on a few fields that are known beforehand, a relational DB might be easier to handle. Document-based DBs start making sense only when you don't have a predefined schema for your data or when you need to store more complex object hierarchies.
PS3: May I ask why huge data implies NoSQL to you? You can store insane amounts of data on any modern relational database (as long as you have the hardware, of course).
EDIT:
A couple of related SO questions:
(...and about a thousand more)
Maybe also these:
我想补充一点,Couchbase 是比 CouchDB 更快、更具可扩展性的选项,2.0 版本引入了视图,在高层次上它是与 CouchDB 合并的分布式 memcached(Membase Server),但当然比将它们混在一起更复杂。 CouchDB 和 Membase Server 的创始人创建了 Couchbase。
最好的处理方法也可能是存储时转换 XML-JSON,检索时转换 JSON-XML。如果您在数据库中执行 XPATH 查询,那么在创建视图时需要更复杂一些。
www.couchbase.com
I'd like to add that Couchbase is a faster and more scalable option than CouchDB, the 2.0 version introduces Views, at a high level it's a distributed memcached (Membase Server) merged with CouchDB, but of course more sophisticated than just mashing them together. Founders of both CouchDB and Membase Server created Couchbase.
Also likely the best way to handle is conversion of XML-JSON for storage, and JSON-XML on retrieve. If you are doing XPATH queries in the database, then it would need to be a bit more sophisticated in the View creation.
www.couchbase.com