如何创建具有共享表结构的多租户数据库?
我们的软件目前运行在 MySQL 上。所有租户的数据都存储在同一个schema中。由于我们使用 Ruby on Rails,我们可以轻松确定哪些数据属于哪个租户。然而,当然也有一些公司担心他们的数据可能会受到损害,因此我们正在评估其他解决方案。
到目前为止,我已经看到了三个选项:
- 多数据库(每个租户都有自己的模式 - 几乎与每个客户 1 个服务器相同)
- 多模式(在 MySQL 中不可用,每个租户在共享数据库中都有自己的模式)
- 共享模式(我们当前的方法,也许每列上都有额外的识别记录)
多模式是我最喜欢的(考虑到成本)。然而,创建新帐户并进行迁移似乎非常痛苦,因为我必须迭代所有模式并更改它们的表/列/定义。
问:多架构似乎被设计为每个租户都有稍微不同的表 - 我不想要这样。是否有任何 RDBMS 允许我使用多模式多租户解决方案,其中表结构在所有租户之间共享?
PS 我所说的“多”是指超多(10.000+ 租户)之类的东西。
Our software currently runs on MySQL. The data of all tenants is stored in the same schema. Since we are using Ruby on Rails we can easily determine which data belongs to which tenant. However there are some companies of course who fear that their data might be compromised, so we are evaluating other solutions.
So far I have seen three options:
- Multi-Database (each tenant gets its own - nearly the same as 1 server per customer)
- Multi-Schema (not available in MySQL, each tenant gets its own schema in a shared database)
- Shared Schema (our current approach, maybe with additional identifying record on each column)
Multi-Schema is my favourite (considering costs). However creating a new account and doing migrations seems to be quite painful, because I would have to iterate over all schemas and change their tables/columns/definitions.
Q: Multi-Schema seems to be designed to have slightly different tables for each tenant - I don't want this. Is there any RDBMS which allows me to use a multi-schema multi-tenant solution, where the table structure is shared between all tenants?
P.S. By multi I mean something like ultra-multi (10.000+ tenants).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是不幸的,因为客户有时会误以为只有物理隔离才能提供足够的安全性。
有一篇有趣的 MSDN 文章,标题为多租户数据架构,其中你可能想检查一下。这就是作者如何解决对共享方法的误解:
至于技术和业务方面的考虑,本文对某种方法可能比另一种方法更合适的情况进行了简要分析:
更新:进一步更新预期租户数量。
对于大多数(如果不是全部)场景,预期的租户数量 (10k) 应排除多数据库方法。我认为您不会喜欢维护 10,000 个数据库实例并且每天必须创建数百个新实例的想法。
仅从该参数来看,共享数据库、单模式方法似乎是最合适的。事实上,每个租户只需存储大约 50Mb,并且不会有每个租户的附加组件,这使得这种方法更加合适。
上面引用的 MSDN 文章提到了解决共享数据库方法的安全注意事项的三种安全模式:
当您对应用程序的数据安全措施充满信心时,您将能够为您的客户提供服务级别协议,提供强有力的数据安全保证。在 SLA 中,除了保证之外,您还可以描述为确保数据不受到损害而将采取的措施。
更新2:显然,微软人员移动/制作了一篇关于此主题的新文章,原始链接已消失,这是新链接:多租户 SaaS 数据库租赁模式(感谢 Shai Kerer)
This is unfortunate, as customers sometimes suffer from a misconception that only physical isolation can offer enough security.
There is an interesting MSDN article, titled Multi-Tenant Data Architecture, which you may want to check. This is how the authors addressed the misconception towards the shared approach:
As for technical and business considerations, the article makes a brief analysis on where a certain approach might be more appropriate than another:
UPDATE: Further to update about the expected number of tenants.
That expected number of tenants (10k) should exclude the multi-database approach, for most, if not all scenarios. I don't think you'll fancy the idea of maintaining 10,000 database instances, and having to create hundreds of new ones every day.
From that parameter alone, it looks like the shared-database, single-schema approach is the most suitable. The fact that you'll be storing just about 50Mb per tenant, and that there will be no per-tenant add-ons, makes this approach even more appropriate.
The MSDN article cited above mentions three security patterns that tackle security considerations for the shared-database approach:
When you are confident with your application's data safety measures, you would be able to offer your clients a Service Level Agrement that provides strong data safety guarantees. In your SLA, apart from the guarantees, you could also describe the measures that you would be taking to ensure that data is not compromised.
UPDATE 2: Apparently the Microsoft guys moved / made a new article regarding this subject, the original link is gone and this is the new one: Multi-tenant SaaS database tenancy patterns (kudos to Shai Kerer)
以下是 Salesforce.com 上有关如何实施多租户的白皮书的链接:
http://www.developerforce.com/media/ForcedotcomBookLibrary/Force.com_Multitenancy_WP_101508.pdf
他们有 1 个巨大的表,包含 500 个字符串列(Value0、Value1、... Value500)。日期和数字以某种格式存储为字符串,以便可以在数据库级别将它们转换为其本机类型。有一些元数据表定义数据模型的形状,每个租户可以是唯一的。还有用于索引、关系、唯一值等的附加表。
为什么这么麻烦呢?
每个租户都可以在运行时自定义自己的数据模式,而无需在数据库级别进行更改(更改表等)。这绝对是做这样的事情的困难方法,但非常灵活。
Below is a link to a white-paper on Salesforce.com about how they implement multi-tenancy:
http://www.developerforce.com/media/ForcedotcomBookLibrary/Force.com_Multitenancy_WP_101508.pdf
They have 1 huge table w/ 500 string columns (Value0, Value1, ... Value500). Dates and Numbers are stored as strings in a format such that they can be converted to their native types at the database level. There are meta data tables that define the shape of the data model which can be unique per tenant. There are additional tables for indexing, relationships, unique values etc.
Why the hassle?
Each tenant can customize their own data schema at run-time without having to make changes at the database level (alter table etc). This is definitely the hard way to do something like this but is very flexible.
我的经验(尽管是 SQL Server)是多数据库是正确的选择,每个客户端都有自己的数据库。因此,尽管我没有 mySQL 或 Ruby On Rails 经验,但我希望我的输入可以增加一些价值。
原因包括:
我希望这确实提供了一些有用的输入!原因还有很多,但我的脑子一片空白。如果它重新出现,我会更新:)
编辑:
自从我发布这个答案以来,现在很明显我们正在谈论 10,000 多个租户。我的经验是在数百个大型数据库中 - 我认为 10,000 个独立的数据库对于您的场景来说不太容易管理,因此我现在不赞成针对您的场景使用多数据库方法。特别是现在很明显,您正在谈论每个租户的小数据量!
无论如何,将我的答案保留在这里,因为它可能对类似船上的其他人(租户较少)有一些用处
My experience (albeit SQL Server) is that multi-database is the way to go, where each client has their own database. So although I have no mySQL or Ruby On Rails experience, I'm hoping my input might add some value.
The reasons why include :
I hope this does offer some useful input! There are more reasons, but my mind went blank. If it kicks back in, I'll update :)
EDIT:
Since I posted this answer, it's now clear that we're talking 10,000+ tenants. My experience is in hundreds of large scale databases - I don't think 10,000 separate databases is going to be too manageable for your scenario, so I'm now not favouring the multi-db approach for your scenario. Especially as it's now clear you're talking small data volumes for each tenant!
Keeping my answer here as anyway as it may have some use for other people in a similar boat (with fewer tenants)
正如您提到的,每个租户一个数据库是一种选择,并且确实需要一些更大的权衡。它可以在较小的规模(例如个位数或十个以下的租户)下很好地工作,但超出范围就变得更难以管理。不仅是迁移,而且只是保持数据库的正常运行。
每个模式模型不仅对于每个模式的唯一模式有用,尽管在所有租户之间运行迁移仍然变得很困难,并且在 1000 个模式下 Postgres 可能会开始遇到麻烦。
更具可扩展性的方法绝对是让租户随机分布,存储在同一个数据库中,但跨不同的逻辑分片(或 表格)。根据您的语言,有许多库可以帮助您完成此任务。如果您使用 Rails,有一个库可以确保租户
acts_as_tenant
,它有助于确保您的租户查询只会拉回该数据。还有一个 gemapartment
- 尽管它使用架构模型,但它确实有助于跨所有架构的迁移。如果您使用 Django,则有很多,但最流行的一种似乎是跨 schemas 。所有这些在应用程序层面上都有更多帮助。如果您直接在数据库级别寻找更多内容,Citus 专注于为 多租户使用 Postgres 可以更加开箱即用。As you mention the one database per tenant is an option and does have some larger trade-offs with it. It can work well at smaller scale such as a single digit or low 10's of tenants, but beyond that it becomes harder to manage. Both just the migrations but also just in keeping the databases up and running.
The per schema model isn't only useful for unique schemas for each, though still running migrations across all tenants becomes difficult and at 1000's of schemas Postgres can start to have troubles.
A more scalable approach is absolutely having tenants randomly distributed, stored in the same database, but across different logical shards (or tables). Depending on your language there are a number of libraries that can help with this. If you're using Rails there is a library to enfore the tenancy
acts_as_tenant
, it helps ensure your tenant queries only pull back that data. There's also a gemapartment
- though it uses the schema model it does help with the migrations across all schemas. If you're using Django there's a number but one of the more popular ones seems to be across schemas. All of these help more at the application level. If you're looking for something more at the database level directly, Citus focuses on making this type of sharding for multi-tenancy work more out of the box with Postgres.