Gem 允许使用分片 mysql 数据库进行数据访问,同时保持 Activerecord 的使用

发布于 2024-09-16 20:34:49 字数 1452 浏览 10 评论 0原文

这是我正在思考的一个相对复杂的问题,所以请建议编辑或评论您不清楚的部分。我将根据您的评论进行更新和迭代

我正在考虑开发一个 Rails gem,它可以简化分片表的使用,即使您的大部分数据存储在关系数据库中也是如此。我相信这类似于 Quora 或 Friendfeed 在使用传统 mysql 进行扩展时遇到困难时使用的概念,大多数潜在的解决方案都需要大规模迁移(nosql),或者只是非常痛苦(完全坚持关系)

从本质上讲,我们如何才能继续使用 MySQL 来完成它真正擅长的许多事情,同时允许系统的某些部分进行扩展?这将允许有人开始使用 mysql/activerecord,但在扩展方面遇到了障碍,无法轻松扩展数据库中有意义的部分。

对于我们来说,我们在分片数据库上使用 Ruby on Rails,并在其中存储 JSON blob。由于我们无法进行联接,因此我们正在为实体之间的关系创建表。

例如,我们有 10 种不同类型的实体。每个实体都可以使用大型(分片)关系表相互链接。

表格非常简单。索引为 (Id1, Id2..., type),数据存储在 JSON blob 中。

  • Id, type, {json data}
  • Id1, Id2, type {json data}
  • Id1, Id2, Id3, type {json data}

我们投入了大量的工作来创建更高级别的接口,用于存储关系数据的一系列数据集

对于任何给定的类型,您可以定义一种存储类型 - (值、未加权列表、加权列表、带 guid 的加权列表)

我们为每个类型都有更高级别的接口 - 查询、排序、时间戳比较、交集等。

这样,如果有人意识到他们需要扩展数据库的特定部分,他们可以保留大部分基础设施,并仅将他们需要的表移动到这个分片数据库中,

您有什么想法?正如上面提到的,我很想知道你们的想法

This is a relatively complex problem that I am thinking of, so please suggest edits or comment on parts where you are not clear about. I will update and iterate based on your comments

I am thinking of a developing a rails gem that simplifies the usage of sharded tables, even when most of your data is stored in relational databases. I believe this is similar to the concept being used in Quora or Friendfeed when they hit a wall scaling w traditional mysql, with most of the potential solutions requiring massive migration (nosql), or just being really painful (sticking w relational completely)

Essentially, how can we continue using MySQL for a lot of things it is really good at, yet allowing parts of the system to scale? This will allow someone got started using mysql/activerecord, but hit a roadblock scaling to easily scale the parts of the database that makes sense.

For us, we are using Ruby on Rails on a sharded database, and storing JSON blobs in them. Since we cannot do joins, we are creating tables for relationships between entities.

For example, we have 10 different type of entities. Each entity can be linked to each other using a big (sharded) relationship tables.

The tables are extremely simple. The indexes is (Id1, Id2..., type), and data is stored in the JSON blob.

  • Id, type, {json data}
  • Id1, Id2, type {json data}
  • Id1, Id2, Id3, type {json data}

We have put a lot of work into creating higher level interfaces for storing a range of data sets for relational data

For any given type, you can define a type of storage - (value, unweighted list, weighted lists, weighted lists with guids)

We have higher level interfaces for each of them - querying, sorting, timestamp comparison, intersections etc.

That way, if someone realizes that they need to scale a specific part of the database, they can keep most of their infrastructure, and move only the tables they need into this sharded database

What are your thoughts? As mentioned above, I would love to know what you folks think

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

枕花眠 2024-09-23 20:34:49

可扩展性是一个棘手的难题。我的背景包括在 BEA 系统担任两年销售工程师,当时他们销售的只是 TUXEDO 中间件(TUXEDO == Transactions for UNIX Extended for Distributed Operations)。 TUXEDO 仍然是 Unix 平台上 TPC-C 基准测试的王者。

扩展 WRT 数据库与数据库本身无关,而与访问该数据库的方式有关。例如,如果您建立与数据库的连接,并且希望该单个连接能够扩展,请使该连接始终访问数据库中的同一个表。当今基础设施(包括 RoR)的问题在于,当它们打开通用连接时,这些连接会访问数据库中的许多表。

因此,如果您想要扩展数据库连接,请使该连接将数据库引擎集中在尽可能少的数据库资源上。例如,如果您可以设法创建一个仅访问一个表和一个表索引的“集中”连接,那么它将比访问数据库中的每个表以及为所有这些表定义的每个索引的连接更好地扩展。

Scalability is a tough nut to crack. My background includes two years as a sales engineer for BEA systems, back when all they sold was the TUXEDO middleware (TUXEDO == Transactions for UNix Extended for Distributed Operations). TUXEDO is still the king of the TPC-C benchmark on Unix platforms.

Scaling WRT a database is not so much about the database itself, it's about how you access that database. For example, if you establish a connection to a database, and you want that single connection to scale, make that connection access the same table in the database always. The problem with today's infrastructures (RoR included) is that when they open generic connections, those connections accesses many tables in the database.

So if you want to make a database CONNECTION scale, make that connection focus the database engine on as few database resources as possible. If you can manage to create a 'focused' connection, that ONLY accesses one table, and one table index, for example, it will scale much better than a connection that accesses EVERY table in the database and every index defined for all those tables.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文