数据库分片和 Rails
在 Rails 中处理分片数据库的最佳方法是什么? 分片应该在应用层、活动记录层、数据库驱动层、代理层还是其他层处理? 各自的优点和缺点是什么?
What's the best way to deal with a sharded database in Rails? Should the sharding be handled at the application layer, the active record layer, the database driver layer, a proxy layer, or something else altogether? What are the pros and cons of each?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
我假设对于分片,我们谈论的是水平分区而不是垂直分区(以下是维基百科上的差异)。
首先,在考虑水平分区之前,尽可能拉伸垂直分区。 在 Rails 中很容易让不同的模型指向不同的机器,对于大多数 Rails 站点来说,这已经足够了。
对于水平分区,在理想情况下,这将在 Rails 的应用程序层处处理。 虽然这并不难,但在 Rails 中也并非微不足道,而且当您需要它时,通常您的应用程序已经超出了可行的范围,因为 ActiveRecord 调用遍布各处。 而且,无论是开发人员还是管理人员,没有人喜欢在您需要之前就对其进行处理,因为每个人都宁愿处理用户现在将使用的功能,而不是进行分区,而分区可能在流量爆炸后几年内不会发挥作用。
ActiveRecord 层...从我看来并不容易。 需要对 Rails 内部进行大量的猴子修补。
在 Spock,我们最终使用自定义 MySQL 代理来处理此问题并将其开源在 SourceForge 上作为 Spock 代理。 ActiveRecord 认为它正在与一台 MySQL 数据库计算机通信,而实际上它正在与代理通信,然后代理与一个或多个 MySQL 数据库通信,对结果进行合并/排序,然后将其返回给 ActiveRecord。 只需要对 Rails 代码进行一些更改。 请查看 Spock Proxy SourceForge 页面,了解更多详细信息以及我们选择此路线的原因。
I assume with shards we're talking about horizontal partitioning and not vertical partitioning (here are the differences on Wikipedia).
First off, stretch vertical partitioning as far as you can take it before you consider horizontal partitioning. It's easy in Rails to have different models point to different machines and for most Rails sites, this will bring you far enough.
For horizontal partitioning, in an ideal world, this would be handled at the application layer in Rails. But while it's not hard, it's not trivial in Rails, and by the time you need it, usually your application has grown beyond the point where this is feasible since you have ActiveRecord calls sprinkled all over the place. And no one, developers or management, likes working on it before you need it since everyone would rather work on features users will use now rather than on partitioning which may not come into play for years after your traffic has exploded.
ActiveRecord layer... not easy from what I can see. Would require lots of monkey patching into Rails internals.
At Spock we ended up handling this using a custom MySQL proxy and open sourced it on SourceForge as Spock Proxy. ActiveRecord thinks it's talking to one MySQL database machine when reality it's talking to the proxy, which then talks to one or more MySQL databases, merges/sorts the results, and returns them to ActiveRecord. Requires only a few changes to your Rails code. Take a look at the Spock Proxy SourceForge page for more details and for our reasons for going this route.
对于像我这样没有听说过分片的人:
http://highscalability .com/unorthodox-approach-database-design-coming-shard
For those of you like me who hadn't heard of sharding:
http://highscalability.com/unorthodox-approach-database-design-coming-shard
Rails 6.1 提供了切换每个数据库连接的能力,因此我们可以进行水平分区。
参考:
rails 6.1 provides ability to switch connection per database thus we can do the horizontal partitioning.
reference:
取决于 Rails 版本。 正如 @Oshan 所说,较新的 Rails 版本提供了对分片的支持。 但如果您无法更新到较新的版本,您可以使用章鱼宝石。
宝石链接
https://github.com/thiagopradi/octopus
Depends upon rails version. Newer rails version provide support for sharding as said by @Oshan. But if you can't update to a newer version you can use the octopus gem.
Gem Link
https://github.com/thiagopradi/octopus
对于要在复制环境中工作的 Rails,我建议使用 my_replication 插件,该插件有助于在运行时将数据库连接切换到其中一个从站
https://github.com/minhnghivn/my_replication
For rails to work with replicated environment, I would suggest using my_replication plugin which helps switch database connection to one of the slaves at run-time
https://github.com/minhnghivn/my_replication
在我看来,最简单的方法是在 Rails 实例和数据库分片之间保持 1:1。
To my mind, the simplest way is maintain a 1:1 between rails instances and DB shards.
代理层比较好,它可以支持所有的程序语言。
例如:Apache ShardingSphere 的代理。
Apache ShardingSphere 有 2 个不同的产品,仅适用于 Java 语言的应用层 ShardingSphere-JDBC 和适用于所有程序语言的代理层 ShardingSphere-Proxy。
仅供参考: https://shardingsphere.apache.org/document /current/en/user-manual/shardingsphere-proxy/
Proxy layer is better, it can support all program languages.
For example: Apache ShardingSphere' proxy.
There are 2 different products of Apache ShardingSphere, ShardingSphere-JDBC for application layer which for Java language only and ShardingSphere-Proxy for proxy layer which for all program languages.
FYI: https://shardingsphere.apache.org/document/current/en/user-manual/shardingsphere-proxy/
将 Rails 连接到多个数据库并不是什么大问题 - 您只需为每个分片拥有一个覆盖连接属性的 ActiveRecord 子类即可。 如果您需要进行跨分片调用,这将变得非常简单。 当您需要在分片之间进行调用时,您只需编写一些代码即可。
我不喜欢 Hank 分割 Rails 实例的想法,因为除非你有一个大的共享库,否则在实例之间调用代码似乎很有挑战性。
另外,在开始分片之前,您应该考虑做一些类似 Masochism 的事情。
Connecting Rails to multiple databases is not a big deal- you simply have an ActiveRecord subclass for each shard that overrides the connection property. That makes it pretty simple if you need to make cross-shard calls. You then just have to write a little code when you need to make calls between the shards.
I don't like Hank's idea of splitting the rails instances, because it seems challenging to call the code between the instances unless you have a big shared library.
Also you should look at doing something like Masochism before you start sharding.
FiveRuns 有一个名为 DataFabric 的 gem,它执行应用程序级分片和主/从复制。 也许值得一看。
FiveRuns have a gem named DataFabric that does application-level sharding and master/slave replication. It might be worth checking out.