与传统的 RDBMS 实现相比,新的数据库模型如何实现更好的可扩展性和性能?

发布于 2024-09-14 14:27:32 字数 861 浏览 11 评论 0原文

我们有

都致力于实现一个共同目标 - 使数据管理尽可能可扩展

我对可扩展性的理解是,当数据大小增加时,使用成本不应急剧上升。

当数据量很大时,RDBMS 会很慢,因为间接数量不断增加,导致更多 IO。

alt text

这些自定义的可扩展友好数据管理系统如何解决问题?

这是一个数字来自解释 Google BigTable 的此文档

alt text

对我来说看起来一样。 超可扩展性是如何实现的?

We have

all aiming towards one common goal - making data management as scalable as possible.

By scalability what I understand is that the cost of the usage should not go up drastically when the size of data increases.

RDBMS's are slow when the amount of data is large as the number of indirections invariable increases leading to more IO's.

alt text

How do these custom scalable friendly data management systems solve the problem?

This is a figure from this document explaining Google BigTable:

alt text

Looks the same to me. How is the ultra-scalability achieved?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

三生殊途 2024-09-21 14:27:32

“传统”SQL DBMS 市场实际上意味着极少数产品,这些产品传统上针对企业环境中的业务应用程序。历史上,大规模无共享可扩展性并不是这些产品或其客户的优先考虑事项。因此很自然地出现了替代产品来支持互联网规模的数据库应用程序。

这与这些新产品不是“关系型”DBMS 这一事实无关。关系模型可以像任何其他模型一样进行扩展。可以说,关系模型比网络(基于图的)模型更适合这些类型的大规模可扩展应用程序。只是 SQL 语言有很多缺点,而且还没有人提出合适的关系型 NOSQL(非 SQL)替代方案。

The "traditional" SQL DBMS market really means a very small number of products, which have traditionally targeted business applications in a corporate setting. Massive shared-nothing scalability has not historically been a priority for those products or their customers. So it is natural that alternative products have emerged to support internet scale database applications.

This has nothing to do with the fact that these new products are not "Relational" DBMSs. The relational model can scale just as well as any other model. Arguably the relational model suits these types of massively scalable applications better than say, network (graph based) models. It's just that the SQL language has a lot of disadvantages and no-one has yet come up with suitable relational NOSQL (non-SQL) alternatives.

や三分注定 2024-09-21 14:27:32

具体到你关于 Bigtable 的问题,区别在于上图中的层次结构就是全部。每个Bigtabletabletserver负责一组tablet(表中的连续行范围);从行范围到tablet的映射维护在元数据表中,而从tablet到tabletserver的映射则维护在Bigtable master的内存中。查找一行或一系列行需要查找元数据条目(几乎肯定会在托管它的服务器的内存中),然后使用它来查找负责它的服务器上的实际行 - 导致仅一次或几次磁盘寻道。

简而言之,这种扩展性良好的原因是因为可以投入更多的硬件:如果有足够的资源,元数据总是在内存中,因此不需要去磁盘,只需要数据(而不是数据)。也总是为此!)。

Speaking specifically to your question about Bigtable, the difference is that the heirarchy in the diagram above is all there is. Each Bigtable tabletserver is responsible for a set of tablets (contiguous row ranges from a table); the mapping from row range to tablet is maintained in the metadata table, while the mapping from tablet to tabletserver is maintained in the memory of the Bigtable master. Looking up a row, or range of rows, requires looking up the metadata entry (which will almost certainly be in memory on the server that hosts it), then using that to look up the actual row on the server responsible for it - resulting in only one, or a few disk seeks.

In a nutshell, the reason this scales well is because it's possible to throw more hardware at it: given enough resources, the metadata is always in memory, and thus there's no need to go to disk for it, only for the data (and not always for that, either!).

诗笺 2024-09-21 14:27:32

它是关于使用廉价的商品硬件来构建网络/网格/云并传播数据和负载(例如使用映射/减少)。

在我看来,RDBMS 数据库就像(最初)设计用于在一台超级计算机上运行的软件。您可以使用各种硬盘驱动器阵列、数据库集群,但仍然......

数据量增加,因此设计新数据存储时考虑到这一点还有一个原因 - 可扩展性、高可用性、TB 级数据。

另一件事 - 如果您使用廉价服务器构建网格/云,它是容错的,因为您将所有数据存储在三个(?)不同的位置,同时它很便宜。

回到你的图片 - 第一张来自一台计算机(通常),第二张来自计算机网络。

It's about using cheap comodity hardware to build a network/grid/cloud and spread the data and load (for example using map/reduce).

RDBMS databases seem to me like software being (originaly) designed to run on one supercomputer. You can use various hard drive arrays, DB clusters, but still..

The amount of data increased so there's one more reason to design new data storages with this in mind - scalability, high availability, terabytes of data.

Another thing - if you build a grid/cloud from cheap servers, it's fault tolerant because you store all data at three (?) different locations and at the same time it's cheap.

Back to your pictures - the first one is from one computer (typically), the second one from a network of computers.

或十年 2024-09-21 14:27:32

关于可扩展性的一个理论答案位于 http://queue.acm.org/detail.cfm ?id=1394128 - ACID 保证的成本很高。请参阅http://database.cs.brown.edu/papers/stonebraker-cacm2010 .pdf 进行反驳。

事实上,仅仅在停电时幸存下来的成本就很高。几年前,我将 MySQL 与 Oracle 进行了比较。 MySQL 比 Oracle 快得几乎令人难以置信,但我们无法使用它。那时的MySQL是建立在伯克利之上的
DB,它比 Oracle 成熟的基于日志的数据库快了几英里,但是如果在基于 Berkely DB 的 MySQL 运行时断电,则需要手动过程才能在电源重新接通时使数据库再次保持一致,并且您' ld 可能会永远失去最近的更新。

One theoretical answer on scalability is at http://queue.acm.org/detail.cfm?id=1394128 - the ACID guarantees are expensive. See http://database.cs.brown.edu/papers/stonebraker-cacm2010.pdf for a counter-argument.

In fact just surviving power failures is expensive. Years ago now I compared MySQL against Oracle. MySQL was almost unbelieveably faster than Oracle, but we couldn't use it. MySQL of those days was built on top of Berkeley
DB, which was miles faster than Oracle's full blown log-based database, but if the power went off while Berkely DB based MySQL was running, it was a manual process to get the database consistent again when the power went back on, and you'ld probably lose recent updates for good.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文