如何创建具有共享表结构的多租户数据库？

发布于 2024-08-20 20:55:56 字数 512 浏览 8 评论 0原文

我们的软件目前运行在 MySQL 上。所有租户的数据都存储在同一个schema中。由于我们使用 Ruby on Rails，我们可以轻松确定哪些数据属于哪个租户。然而，当然也有一些公司担心他们的数据可能会受到损害，因此我们正在评估其他解决方案。

到目前为止，我已经看到了三个选项：

多数据库（每个租户都有自己的模式 - 几乎与每个客户 1 个服务器相同）
多模式（在 MySQL 中不可用，每个租户在共享数据库中都有自己的模式）
共享模式（我们当前的方法，也许每列上都有额外的识别记录）

多模式是我最喜欢的（考虑到成本）。然而，创建新帐户并进行迁移似乎非常痛苦，因为我必须迭代所有模式并更改它们的表/列/定义。

问：多架构似乎被设计为每个租户都有稍微不同的表 - 我不想要这样。是否有任何 RDBMS 允许我使用多模式多租户解决方案，其中表结构在所有租户之间共享？

PS 我所说的“多”是指超多（10.000+ 租户）之类的东西。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

枫以 2024-08-27 20:55:56

但是有一些公司
当然，他们担心他们的数据可能会
受到损害，所以我们正在评估
其他解决方案。

这是不幸的，因为客户有时会误以为只有物理隔离才能提供足够的安全性。

有一篇有趣的 MSDN 文章，标题为多租户数据架构，其中你可能想检查一下。这就是作者如何解决对共享方法的误解：

一个常见的误解是
只有物理隔离才能提供
适当的安全级别。在
事实上，数据存储使用共享
方法还可以提供强有力的数据
安全，但需要使用更多
复杂的设计模式。

至于技术和业务方面的考虑，本文对某种方法可能比另一种方法更合适的情况进行了简要分析：

的数量、性质和需求
您期望服务的租户都会受到影响
您的数据架构决策
不同的方式。以下一些
问题可能会让你更倾向于
孤立的方法，而其他人可能
使你倾向于更加共享的
方法。
您预计有多少潜在租户？你可能无处可去
接近能够估计
预期使用权威，但是
从数量级的角度思考：
您正在构建一个应用程序吗
数百名租户？数千？十
数千个？更多的？你越大
期望您的租户群是
您更有可能想要考虑
更加共享的方法。
您预计租户的数据平均占用多少存储空间？
如果您希望部分或全部租户
存储非常大量的数据，
单独的数据库方法可能是
最好的。（事实上，数据存储
要求可能会迫使您采用
无论如何，单独的数据库模型。如果是这样，
设计起来会容易得多
应用程序从这样的方式
开始而不是移动到
稍后采用单独数据库方法。）
您预计租户平均支持多少并发最终用户？
数字越大越多
采取更加孤立的方法
将满足最终用户的要求。
您是否希望提供任何针对租户的增值服务，例如
按租户备份和恢复
能力？这样的服务比较方便
通过更孤立的方式提供
方法。

更新：进一步更新预期租户数量。

对于大多数（如果不是全部）场景，预期的租户数量 (10k) 应排除多数据库方法。我认为您不会喜欢维护 10,000 个数据库实例并且每天必须创建数百个新实例的想法。

仅从该参数来看，共享数据库、单模式方法似乎是最合适的。事实上，每个租户只需存储大约 50Mb，并且不会有每个租户的附加组件，这使得这种方法更加合适。

上面引用的 MSDN 文章提到了解决共享数据库方法的安全注意事项的三种安全模式：

当您对应用程序的数据安全措施充满信心时，您将能够为您的客户提供服务级别协议，提供强有力的数据安全保证。在 SLA 中，除了保证之外，您还可以描述为确保数据不受到损害而将采取的措施。

更新2：显然，微软人员移动/制作了一篇关于此主题的新文章，原始链接已消失，这是新链接：多租户 SaaS 数据库租赁模式（感谢 Shai Kerer）

However there are some companies of
course who fear that their data might
be compromised, so we are evaluating
other solutions.

This is unfortunate, as customers sometimes suffer from a misconception that only physical isolation can offer enough security.

There is an interesting MSDN article, titled Multi-Tenant Data Architecture, which you may want to check. This is how the authors addressed the misconception towards the shared approach:

A common misconception holds that
only physical isolation can provide an
appropriate level of security. In
fact, data stored using a shared
approach can also provide strong data
safety, but requires the use of more
sophisticated design patterns.

As for technical and business considerations, the article makes a brief analysis on where a certain approach might be more appropriate than another:

The number, nature, and needs of the
tenants you expect to serve all affect
your data architecture decision in
different ways. Some of the following
questions may bias you toward a more
isolated approach, while others may
bias you toward a more shared
approach.
How many prospective tenants do you expect to target? You may be nowhere
near being able to estimate
prospective use with authority, but
think in terms of orders of magnitude:
are you building an application for
hundreds of tenants? Thousands? Tens
of thousands? More? The larger you
expect your tenant base to be, the
more likely you will want to consider
a more shared approach.
How much storage space do you expect the average tenant's data to occupy?
If you expect some or all tenants to
store very large amounts of data, the
separate-database approach is probably
best. (Indeed, data storage
requirements may force you to adopt a
separate-database model anyway. If so,
it will be much easier to design the
application that way from the
beginning than to move to a
separate-database approach later on.)
How many concurrent end users do you expect the average tenant to support?
The larger the number, the more
appropriate a more isolated approach
will be to meet end-user requirements.
Do you expect to offer any per-tenant value-added services, such
as per-tenant backup and restore
capability? Such services are easier
to offer through a more isolated
approach.

UPDATE: Further to update about the expected number of tenants.

That expected number of tenants (10k) should exclude the multi-database approach, for most, if not all scenarios. I don't think you'll fancy the idea of maintaining 10,000 database instances, and having to create hundreds of new ones every day.

From that parameter alone, it looks like the shared-database, single-schema approach is the most suitable. The fact that you'll be storing just about 50Mb per tenant, and that there will be no per-tenant add-ons, makes this approach even more appropriate.

The MSDN article cited above mentions three security patterns that tackle security considerations for the shared-database approach:

When you are confident with your application's data safety measures, you would be able to offer your clients a Service Level Agrement that provides strong data safety guarantees. In your SLA, apart from the guarantees, you could also describe the measures that you would be taking to ensure that data is not compromised.

UPDATE 2: Apparently the Microsoft guys moved / made a new article regarding this subject, the original link is gone and this is the new one: Multi-tenant SaaS database tenancy patterns (kudos to Shai Kerer)

回复收藏 0 原文

口干舌燥 2024-08-27 20:55:56

以下是 Salesforce.com 上有关如何实施多租户的白皮书的链接：

http://www.developerforce.com/media/ForcedotcomBookLibrary/Force.com_Multitenancy_WP_101508.pdf

他们有 1 个巨大的表，包含 500 个字符串列（Value0、Value1、... Value500）。日期和数字以某种格式存储为字符串，以便可以在数据库级别将它们转换为其本机类型。有一些元数据表定义数据模型的形状，每个租户可以是唯一的。还有用于索引、关系、唯一值等的附加表。

为什么这么麻烦呢？

每个租户都可以在运行时自定义自己的数据模式，而无需在数据库级别进行更改（更改表等）。这绝对是做这样的事情的困难方法，但非常灵活。

回复收藏 0 原文

蒲公英的约定 2024-08-27 20:55:56

我的经验（尽管是 SQL Server）是多数据库是正确的选择，每个客户端都有自己的数据库。因此，尽管我没有 mySQL 或 Ruby On Rails 经验，但我希望我的输入可以增加一些价值。

原因包括：

数据安全/灾难恢复。每家公司的数据都与其他公司完全分开存储，从而降低了数据被泄露的风险（想一想，如果您引入代码错误，这意味着某些东西在不应该的情况下错误地查看了其他客户数据），最大限度地减少了一个客户的潜在损失，如果一个客户特定的数据库被损坏等。对客户端来说，感知到的安全优势甚至更大（额外的副作用！）
可扩展性。本质上，您将对数据进行分区以实现更大的可扩展性 - 例如，数据库可以放置在不同的磁盘上，您可以使多个数据库服务器联机并移动数据库以更轻松地分散负载。
性能调整。假设您有一位非常大的客户和一位非常小的客户。使用模式、数据量等可能存在很大差异。如果需要，您可以更轻松地为每个客户进行调整/优化。

我希望这确实提供了一些有用的输入！原因还有很多，但我的脑子一片空白。如果它重新出现，我会更新:)

编辑：
自从我发布这个答案以来，现在很明显我们正在谈论 10,000 多个租户。我的经验是在数百个大型数据库中 - 我认为 10,000 个独立的数据库对于您的场景来说不太容易管理，因此我现在不赞成针对您的场景使用多数据库方法。特别是现在很明显，您正在谈论每个租户的小数据量！

无论如何，将我的答案保留在这里，因为它可能对类似船上的其他人（租户较少）有一些用处

回复收藏 0 原文

失而复得 2024-08-27 20:55:56

正如您提到的，每个租户一个数据库是一种选择，并且确实需要一些更大的权衡。它可以在较小的规模（例如个位数或十个以下的租户）下很好地工作，但超出范围就变得更难以管理。不仅是迁移，而且只是保持数据库的正常运行。

每个模式模型不仅对于每个模式的唯一模式有用，尽管在所有租户之间运行迁移仍然变得很困难，并且在 1000 个模式下 Postgres 可能会开始遇到麻烦。

更具可扩展性的方法绝对是让租户随机分布，存储在同一个数据库中，但跨不同的逻辑分片（或表格）。根据您的语言，有许多库可以帮助您完成此任务。如果您使用 Rails，有一个库可以确保租户acts_as_tenant，它有助于确保您的租户查询只会拉回该数据。还有一个 gem apartment - 尽管它使用架构模型，但它确实有助于跨所有架构的迁移。如果您使用 Django，则有很多，但最流行的一种似乎是跨 schemas 。所有这些在应用程序层面上都有更多帮助。如果您直接在数据库级别寻找更多内容，Citus 专注于为多租户使用 Postgres 可以更加开箱即用。