数据库分片和 Rails

发布于 2024-07-04 22:39:47 字数 81 浏览 7 评论 0原文

在 Rails 中处理分片数据库的最佳方法是什么? 分片应该在应用层、活动记录层、数据库驱动层、代理层还是其他层处理? 各自的优点和缺点是什么?

What's the best way to deal with a sharded database in Rails? Should the sharding be handled at the application layer, the active record layer, the database driver layer, a proxy layer, or something else altogether? What are the pros and cons of each?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

聽兲甴掵 2024-07-11 22:39:49

我假设对于分片,我们谈论的是水平分区而不是垂直分区(以下是维基百科上的差异)。

首先,在考虑水平分区之前,尽可能拉伸垂直分区。 在 Rails 中很容易让不同的模型指向不同的机器,对于大多数 Rails 站点来说,这已经足够了。

对于水平分区,在理想情况下,这将在 Rails 的应用程序层处处理。 虽然这并不难,但在 Rails 中也并非微不足道,而且当您需要它时,通常您的应用程序已经超出了可行的范围,因为 ActiveRecord 调用遍布各处。 而且,无论是开发人员还是管理人员,没有人喜欢在您需要之前就对其进行处理,因为每个人都宁愿处理用户现在将使用的功能,而不是进行分区,而分区可能在流量爆炸后几年内不会发挥作用。

ActiveRecord 层...从我看来并不容易。 需要对 Rails 内部进行大量的猴子修补。

Spock,我们最终使用自定义 MySQL 代理来处理此问题并将其开源在 SourceForge 上作为 Spock 代理。 ActiveRecord 认为它正在与一台 MySQL 数据库计算机通信,而实际上它正在与代理通信,然后代理与一个或多个 MySQL 数据库通信,对结果进行合并/排序,然后将其返回给 ActiveRecord。 只需要对 Rails 代码进行一些更改。 请查看 Spock Proxy SourceForge 页面,了解更多详细信息以及我们选择此路线的原因。

I assume with shards we're talking about horizontal partitioning and not vertical partitioning (here are the differences on Wikipedia).

First off, stretch vertical partitioning as far as you can take it before you consider horizontal partitioning. It's easy in Rails to have different models point to different machines and for most Rails sites, this will bring you far enough.

For horizontal partitioning, in an ideal world, this would be handled at the application layer in Rails. But while it's not hard, it's not trivial in Rails, and by the time you need it, usually your application has grown beyond the point where this is feasible since you have ActiveRecord calls sprinkled all over the place. And no one, developers or management, likes working on it before you need it since everyone would rather work on features users will use now rather than on partitioning which may not come into play for years after your traffic has exploded.

ActiveRecord layer... not easy from what I can see. Would require lots of monkey patching into Rails internals.

At Spock we ended up handling this using a custom MySQL proxy and open sourced it on SourceForge as Spock Proxy. ActiveRecord thinks it's talking to one MySQL database machine when reality it's talking to the proxy, which then talks to one or more MySQL databases, merges/sorts the results, and returns them to ActiveRecord. Requires only a few changes to your Rails code. Take a look at the Spock Proxy SourceForge page for more details and for our reasons for going this route.

无言温柔 2024-07-11 22:39:49

Rails 6.1 提供了切换每个数据库连接的能力,因此我们可以进行水平分区。

  • 分片在三层配置中声明如下:
production:
  primary:
    database: my_primary_database
    adapter: mysql2
  primary_replica:
    database: my_primary_database
    adapter: mysql2
    replica: true
  primary_shard_one:
    database: my_primary_shard_one
    adapter: mysql2
  primary_shard_one_replica:
    database: my_primary_shard_one
    adapter: mysql2
    replica: true
  • 然后模型通过分片键与connected_to API 连接
class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  connects_to shards: {
    default: { writing: :primary, reading: :primary_replica },
    shard_one: { writing: :primary_shard_one, reading: :primary_shard_one_replica }
  }
end
  • 然后模型可以通过connected_to API 手动交换连接。 如果使用分片,则必须传递角色和分片:
ActiveRecord::Base.connected_to(role: :writing, shard: :shard_one) do
  @id = Person.create! # Creates a record in shard one
end

ActiveRecord::Base.connected_to(role: :writing, shard: :shard_one) do
  Person.find(@id) # Can't find record, doesn't exist because it was created
                   # in the default shard
end

参考:

rails 6.1 provides ability to switch connection per database thus we can do the horizontal partitioning.

  • Shards are declared in the three-tier config like this:
production:
  primary:
    database: my_primary_database
    adapter: mysql2
  primary_replica:
    database: my_primary_database
    adapter: mysql2
    replica: true
  primary_shard_one:
    database: my_primary_shard_one
    adapter: mysql2
  primary_shard_one_replica:
    database: my_primary_shard_one
    adapter: mysql2
    replica: true
  • Models are then connected with the connects_to API via the shards key
class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  connects_to shards: {
    default: { writing: :primary, reading: :primary_replica },
    shard_one: { writing: :primary_shard_one, reading: :primary_shard_one_replica }
  }
end
  • Then models can swap connections manually via the connected_to API. If using sharding, both a role and a shard must be passed:
ActiveRecord::Base.connected_to(role: :writing, shard: :shard_one) do
  @id = Person.create! # Creates a record in shard one
end

ActiveRecord::Base.connected_to(role: :writing, shard: :shard_one) do
  Person.find(@id) # Can't find record, doesn't exist because it was created
                   # in the default shard
end

reference:

阳光的暖冬 2024-07-11 22:39:49

取决于 Rails 版本。 正如 @Oshan 所说,较新的 Rails 版本提供了对分片的支持。 但如果您无法更新到较新的版本,您可以使用章鱼宝石。
宝石链接
https://github.com/thiagopradi/octopus

Depends upon rails version. Newer rails version provide support for sharding as said by @Oshan. But if you can't update to a newer version you can use the octopus gem.
Gem Link
https://github.com/thiagopradi/octopus

千秋岁 2024-07-11 22:39:49

对于要在复制环境中工作的 Rails,我建议使用 my_replication 插件,该插件有助于在运行时将数据库连接切换到其中一个从站

https://github.com/minhnghivn/my_replication

For rails to work with replicated environment, I would suggest using my_replication plugin which helps switch database connection to one of the slaves at run-time

https://github.com/minhnghivn/my_replication

小镇女孩 2024-07-11 22:39:49

在我看来,最简单的方法是在 Rails 实例和数据库分片之间保持 1:1。

To my mind, the simplest way is maintain a 1:1 between rails instances and DB shards.

梦在深巷 2024-07-11 22:39:49

代理层比较好,它可以支持所有的程序语言。

例如:Apache ShardingSphere 的代理。

Apache ShardingSphere 有 2 个不同的产品,仅适用于 Java 语言的应用层 ShardingSphere-JDBC 和适用于所有程序语言的代理层 ShardingSphere-Proxy。

仅供参考: https://shardingsphere.apache.org/document /current/en/user-manual/shardingsphere-proxy/

Proxy layer is better, it can support all program languages.

For example: Apache ShardingSphere' proxy.

There are 2 different products of Apache ShardingSphere, ShardingSphere-JDBC for application layer which for Java language only and ShardingSphere-Proxy for proxy layer which for all program languages.

FYI: https://shardingsphere.apache.org/document/current/en/user-manual/shardingsphere-proxy/

肥爪爪 2024-07-11 22:39:49

将 Rails 连接到多个数据库并不是什么大问题 - 您只需为每个分片拥有一个覆盖连接属性的 ActiveRecord 子类即可。 如果您需要进行跨分片调用,这将变得非常简单。 当您需要在分片之间进行调用时,您只需编写一些代码即可。

我不喜欢 Hank 分割 Rails 实例的想法,因为除非你有一个大的共享库,否则在实例之间调用代码似乎很有挑战性。

另外,在开始分片之前,您应该考虑做一些类似 Masochism 的事情。

Connecting Rails to multiple databases is not a big deal- you simply have an ActiveRecord subclass for each shard that overrides the connection property. That makes it pretty simple if you need to make cross-shard calls. You then just have to write a little code when you need to make calls between the shards.

I don't like Hank's idea of splitting the rails instances, because it seems challenging to call the code between the instances unless you have a big shared library.

Also you should look at doing something like Masochism before you start sharding.

不知所踪 2024-07-11 22:39:47

FiveRuns 有一个名为 DataFabric 的 gem,它执行应用程序级分片和主/从复制。 也许值得一看。

FiveRuns have a gem named DataFabric that does application-level sharding and master/slave replication. It might be worth checking out.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文