Hypertable 与 HBase 以及 BigTable 与 SQL
Hypertable 和 HBase 似乎是两个主要的开源 BigTable 实现,那么这两个数据库之间的主要优缺点是什么?
此外,BigTable 和 SQL RDBMS 之间的主要优缺点是什么,以及使用 Postgres 和 Hypertable 等传统 RDBMS 编写项目之间有何显着差异?
As Hypertable and HBase seem to be the two major open source BigTable implementations, what are the major pros and cons between these two databases?
In addition, what are the major pros and cons between BigTable and SQL RDBMSes, and what significant differences can I expect between writing a project with a traditional RDBMS like Postgres and Hypertable?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
冒着超出我应有的范围扩大你的第二个问题的风险(我从来没有玩过 BigTable,但我玩过 MongoDB 和 CouchDB)...
最重要的区别,就我所理解的而言,无论如何,问题在于,RDBMS 都使用基于行的存储,而 NoSQL 引擎则使用基于列的存储。优点和缺点大多源于这一点。
因素
我倾向于保留的主要考虑 考虑的是 ACID 合规性:NoSQL 引擎最终一致,而不是始终一致。可以将其视为行为类似于网站缓存的存储:后者通常是有效且一致的,但有时会稍微过时/不一致。
这里没有对错之分:对于某些用例(例如搜索引擎、博客),稍微不一致是一个非常可以接受的选择;对于其他人(例如银行、计费系统)则不然。 (我倾向于处理需要原子性的东西。)
然后,有很多性能考虑因素可以分解为实现细节。
努力实现最终一致性的直接后果是完整性检查等通常在应用程序而不是数据存储中完成(即没有触发器或存储过程可言)。您的数据存储最终要做的工作更少,从而带来明显的性能优势。
基于列的存储意味着如果您更新文档中的单个列,则只会使该列无效。相比之下,基于行的存储会使整行无效。根据您通常更新数据的方式(即仅几列与大多数列),这两种方法都可以相加。
基于列的存储的另一面是它使连接变得更加棘手(从实现的角度来看)。用过于简单的术语来说,可以将其视为每列都有一个 EAV 表;这对于几张桌子来说效果很好。如果您需要一份大型报告,需要对销售或股票进行十几个联接(好的 RDBMS 可以很好地处理),那就另当别论了。
希望更有经验的用户能够参与 NoSQL 分片和复制。对此,我只想指出 Postgres 从 9.0 开始就具有内置的复制功能,并且非常擅长处理跨多个分区的查询。
无论如何......长话短说:除非您已经知道您需要在下一个项目中立即扩展到多个数据中心中的 PB 和数万亿的请求,否则我认为您应该考虑的唯一因素是选择 SQL 或 NoSQL 实现时要注意的是是否绝对需要 ACID 合规性。
最后,如果您的主要兴趣在于尝试新玩具,请考虑尝试面向图形的数据库。这些可能结合了基于行和基于列的存储的优点。
At the risk of broadening your second question more than I should (I've never played with BigTable, but I've toyed with MongoDB and CouchDB)...
The most important difference, in so far as I've understood it anyway, is that RDBMS all use a row-based store, whereas NoSQL engines use a column-based store. The pros and cons mostly derives from this point.
http://en.wikipedia.org/wiki/Column-oriented_DBMS
The major consideration that I tend to keep in mind is ACID compliance: a NoSQL engine is eventually consistent, rather than always consistent. Think of it like a storage that behaves like a website's cache: the latter is normally valid and consistent, but occasionally slightly outdated/inconsistent.
There's no right or wrong here: for some use-cases (e.g. a search engine, a blog), slightly inconsistent is a very acceptable option; for others (e.g. a bank, a billing system) it is not. (I tend to work on stuff that needs atomicity.)
Then, there are plenty of performance considerations that break down to implementation details.
An immediate consequence of striving for eventual consistency is that integrity checks and so forth are typically done in the app rather than the data store (i.e. there are no triggers or stored procedures to speak of). Your data store ends up with less work to do, which results in obvious performance benefits.
A column-based store means that if you update a single column from your document, you only invalidate that column. A row-based store, by contrast, invalidates the entire row. Depending on how you typically update your data (i.e. just a few columns vs most of them), either approach can add up.
A flip side of a column-based store is that it makes joins trickier (from an implementation standpoint). In overly simplistic terms, think of it as having an EAV table per column; this works fine for a few tables. It's a different story if you need a big report that requires a dozen joins on sales or stocks (which a good RDBMS will handle just fine).
A more experienced user will hopefully chime in on NoSQL sharding and replication. On this I'd only feel comfortable enough to point out that Postgres has built-in replication features since 9.0 and is quite good at dealing with queries that span multiple partitions.
Anyway... To cut a very long story short: unless you already know that you'll need to instantly scale to petabytes and gazillions of requests in multitudes of data centers in your next project, I think the only consideration that you should have in mind when picking an SQL or NoSQL implementation is whether you absolutely need ACID compliance or not.
Lastly, if your main interest lies in trying a new toy, consider trying a graph-oriented database instead. These potentially combine the benefits of row- and column-based stores.