分布式数据库计算——在 RDBMS 范式中真的可能吗?

发布于 2024-09-30 04:51:38 字数 170 浏览 1 评论 0原文

我是在 NoSQL 的背景下提出这个问题的——NoSQL 可以在不昂贵的情况下实现可扩展性和性能。

所以,如果我需要跨数据库实现大规模并行分布式计算...... 目前(在 RDBMS 范式内)有哪些可用的方法来实现高可扩展性的分布式计算?

是否进行数据库集群&镜像对分布式计算有何贡献?

I am asking this in the context of NoSQL - which achieves scalability and performance without being expensive.

So, if I needed to achieve massively parallel distributed computing across databases ...
What are the various methodologies available today (within the RDBMS paradigm) to achieve distributed computing with high-scalability?

Does database clustering & mirroring contribute in any way towards distributed computing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

£烟消云散 2024-10-07 04:51:39

我猜您是在问 RDBMS 数据库的可扩展性。谈论基于(amazon dynamo、BigTable)的 NoSQL 数据库完全是另一个话题。我说的是 HBase、Cassandra 等。还有像 Oracle Coherence 这样的商业产品,粗略地说,更像是分布式缓存和键值存储。

回到 RDBMS、

分片
为了扩展 RDBMS,可以进行自定义分片。分片是一种技术,其中多个表可能是多个主机。然后您决定以某种方式将某些行分配给某些表。例如,您可以说第 1-1M 行进入表 1,第 1M-2M 行进入表 2,等等。但是,从管理的角度来看,这是一个困难的过程。许多大型网站依靠分片来扩展。其他值得一提的技术是分区、mysql 联邦和 mysql 集群。

MPP数据库
然后还有一些数据库,其中有非常RDBMS,可以为您进行分配和扩展。 Terradata 是这些公司中最成功的。我相信他们在某个时候使用了 postgres 核心代码。大量财富 500 强公司和许多航空公司都使用 Terradata。但是,它的价格贵得离谱。还有一些较新的公司,如 greenplum、vertica、netezza。

I guess you are asking about scalability of RDBMS databases. Talking about NoSQL databases based on ( amazon dynamo, BigTable ) are a whole another topic. I am talking about HBase, Cassandra etc. There are also commerical products like Oracle Coherence thats more like a distributed cache and key value store , to put it crudely.

going back to rdbms,

Sharding
to scale RDBMS one can do cusstom sharding. Sharding is a technique where you have multiple table is possibly multiple hosts. And then you decide in a certain fashion to assign certain rows to certain tables. For example you can say that rows 1-1M goes to table1, 1M-2M goes to table2 etc. But, this is a difficult process from an administration point of view. A lot of large scale websites scale by relying on sharding. Other techniques worth mentioning are partioning and mysql federation and mysql cluster.

MPP databases
Then there are databases are there very RDBMS which does distribution and scaling for you. Terradata is the most successful of these companies. I believe they used postgres core code at some point. A significant number of fortune 500 companies and a lot of the airlines use Terradata. But, its ridiculously expensive. There are newer companies like greenplum, vertica, netezza.

送你一个梦 2024-10-07 04:51:39

除非您是一家非常大的公司,具有极高的可扩展性要求,否则您可以通过构建相同的 RDBMS 实例集群并将它们与 JTA 事务同步来水平和 ACID 地扩展您的数据库。

查看这篇基于 Java/JDBC 的文章 JEPLayer 框架,但您可以直接使用 JDBC 和 JTA 代码。

Unless you're a very big company with extreme scalability requirements, you can horizontally and ACID scale up your DB by building a cluster of identical RDBMS instances and synchronizing them with JTA transactions.

Take a look to this Java/JDBC based article the JEPLayer framework is used but you can use straight JDBC and JTA code.

素罗衫 2024-10-07 04:51:39

在 RDBMS 范例中:分片。
RDBMS 范例之外:键值存储。

我的选择:(我有 RDBMS 背景)表格类型的键值存储 - HBase。

在 RDBMS 范例中,分片不会让您走得太远。
使用 RDBMS 范例来设计您的模型,以使您的项目启动并运行。
使用表格键值存储来扩展。

分片:

考虑分片的一个好方法是将其视为面向用户帐户的
数据库设计。

用户帐户涉及的所有模式实体都保存在一台主机上。

当用户创建帐户时,就会将用户分配给主机。
负载最少的主机获得该用户。

当该用户在创建帐户后登录时,他就会建立连接
到拥有他的数据的主机。

每个主机都有一组用户帐户。

这种方法的问题是,如果主机被冲洗,
一小部分用户将被屏蔽。

解决方案是拥有一个复制的备用主机
当主主机遇到问题时,成为主主机。

此外,对于设计所涉及的流程来说,这是一个相当严格的设置
不会发生巨大变化。

从用户的角度来看,我注意到网站
使用分片数据库后端的“转瞬即逝”速度不那么快
在他们的平台上创建不同的商业模式。

将此与真正分布式的网站进行对比
键值存储。这些企业可以举办任何范围的
服务。他们的平台就是这样——一个平台。
它不是关系性的,并且有 API 接口,
但这似乎有效。

Within the RDBMS paradigm: Sharding.
Outside the RDBMS paradigm: Key-value stores.

My pick: (I come from an RDBMS background) Key-value stores of the tabluar type - HBase.

Within the RDBMS paradigm, sharding will not get you far.
Use the RDBMS paradigm to design your model, to get your project up and running.
Use tabular key-value stores to SCALE OUT.

Sharding:

A good way to think about sharding is to see it as user-account-oriented
DB design.

The all schema entities touched by a user-account are kept on one host.

The assignment of user to host happens when the user creates an account.
The least loaded host gets that user.

When that user signs on after account creation, he gets connected
to the host that has his data.

Each host has a set of user accounts.

The problem with this approach is that if the host gets hosed,
a fraction of users will be blacked out.

The solution to this is have a replicated standby host that
becomes the primary when the primary host encounters problems.

Also, it's a fairly rigid setup for processes where the design does
not change dramatically.

From the user standpoint, I've noticed that web sites
with a sharded DB backend are not as quick to "turn on a dime"
to create different business models on their platform.

Contrast this with web sites that have truly distributed
key-value stores. These businesses can host any range of
services. Their platform is just that - a platform.
It's not relational and it does have an API interface,
but it just seems to work.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文