Google 的 Bigtable 与关系数据库
Duplicates
I don't know much about Google's Bigtable but am wondering what the difference between Google's Bigtable and relational databases like MySQL is. What are the limitations of both?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Bigtable 是 Google 的发明,旨在处理该公司经常处理的大量信息。Bigtable 数据集可以增长到巨大的规模(许多 PB),存储分布在大量服务器上。 使用 Bigtable 的系统包括 Google 的网络索引和 Google Earth 等项目。
根据关于该主题的 Google 白皮书:
Bigtable 与 MySQL 等的内部机制非常不同,因此很难进行比较,而且预期目标也没有太多重叠。 但你可以把 Bigtable 想象成有点像单表数据库。 例如,想象一下,如果您尝试使用 MySQL 数据库实现 Google 的整个网络搜索系统,您会遇到什么困难——Bigtable 就是为了解决这些问题而构建的。
Bigtable 数据集可以使用一种名为 GQL(“gee-kwal”)的语言从 AppEngine 等服务中查询,该语言基于 SQL 的子集。 GQL 中明显缺少任何类型的
JOIN
命令。 由于 Bigtable 数据库的分布式特性,在两个表之间执行联接的效率非常低。 相反,程序员必须在他的应用程序中实现这样的逻辑,或者设计他的应用程序以便不需要它。Bigtable is Google's invention to deal with the massive amounts of information that the company regularly deals in. A Bigtable dataset can grow to immense size (many petabytes) with storage distributed across a large number of servers. The systems using Bigtable include projects like Google's web index and Google Earth.
According to Google whitepaper on the subject:
The internal mechanics of Bigtable versus, say, MySQL are so dissimilar as to make comparison difficult, and the intended goals don't overlap much either. But you can think of Bigtable a bit like a single-table database. Imagine, for example, the difficulties you would run into if you tried to implement Google's entire web search system with a MySQL database -- Bigtable was built around solving those problems.
Bigtable datasets can be queried from services like AppEngine using a language called GQL ("gee-kwal") which is a based on a subset of SQL. Conspicuously missing from GQL is any sort of
JOIN
command. Because of the distributed nature of a Bigtable database, performing a join between two tables would be terribly inefficient. Instead, the programmer has to implement such logic in his application, or design his application so as to not need it.Google 的 BigTable 和其他类似项目(例如:CouchDB、HBase) 是面向数据的数据库系统,数据大部分是非规范化(即重复和分组)。
主要优点是:
- 由于非规范化,连接操作的成本较低
- 由于数据独立性,数据复制/分发的成本较低(即,如果您想跨两个节点分发数据,您可能不会遇到一个节点中有一个实体而另一个节点中有其他相关实体的问题,因为类似数据分组)
这种系统适用于需要实现最佳规模的应用程序(即,向系统添加更多节点,性能按比例提高)。 在像 MySQL 或 Oracle 这样的 RDBMS 中,当您开始添加更多节点时,如果连接不在同一节点中的两个表,则连接成本会更高。 当您处理大量数据时,这一点变得很重要。
RDBMS 的优点在于其丰富的存储模型(表、连接、fks)。 分布式数据库很好,因为易于扩展。
Google's BigTable and other similar projects (ex: CouchDB, HBase) are database systems that are oriented so that data is mostly denormalized (ie, duplicated and grouped).
The main advantages are:
- Join operations are less costly because of the denormalization
- Replication/distribution of data is less costly because of data independence (ie, if you want to distribute data across two nodes, you probably won't have the problem of having an entity in one node and other related entity in another node because similar data is grouped)
This kind of systems are indicated for applications that need to achieve optimal scale (ie, you add more nodes to the system and performance increases proportionally). In an RDBMS like MySQL or Oracle, when you start adding more nodes if you join two tables that are not in the same node, the join cost is higher. This becomes important when you are dealing with high volumes.
RDBMS' are nice because of the richness of the storage model (tables, joins, fks). Distributed databases are nice because of the ease of scale.