多租户数据库:为什么在每个表中放置一个 TenantID 列?
我看过的关于多租户数据库模型的每个教程都告诉您将 TenantID 放入每个表中:
zoos
-------
id
zoo_name
tenant_id
animals
-------
id
zoo_id
animal_name
tenant_id
但是,这对我来说似乎多余。为什么不将 tenant_id
列添加到 zoos
表中并利用 zoos
和 animals
之间的外键关系?
您是否将 tenant_id
添加到每个表只是为了防止连接变得过于疯狂?它是针对错误的防范措施吗?性能考虑?
Every tutorial I've seen about Multitenant database models tells you to put the TenantID in every single table:
zoos
-------
id
zoo_name
tenant_id
animals
-------
id
zoo_id
animal_name
tenant_id
However, this seems redundant to me. Why not add the tenant_id
column to just the zoos
table and exploit the foreign key relationship between zoos
and animals
?
Do you add tenant_id
to every table just to keep the joins from getting too crazy? Is it a safeguard against bugs? A performance consideration?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您的关键设计考虑因素之一是安全性(具体来说,一个客户端在访问另一客户端的数据时无法进行任何操作),那么,根据您实现此安全性的方式,可能有必要在每个表中粘贴该限定列。 此处描述的一种此类策略需要在每个表上构建一个视图;假设每个表都包含一个tenantId列,那么如果配置正确,每个视图都可以包含“WHEREtenantId = SUSER_SID()”子句(当然,您配置数据库以便客户端只能访问视图)。
另一个因素(如我当前的工作)是加载仓库数据 (ETL)。表根据tenantId进行分区(我们使用表分区,但分区视图也可以工作),并且可以轻松地为客户端加载或卸载数据,而不会严重影响任何其他客户端。
但与以往一样,这涉及到很多“这取决于情况”。如果当前没有明确的需求,并且未来需求的可能性非常低,则将该列标准化。只是要意识到,它更多的是物理实现的设计,而不是概念或逻辑数据库设计。
If one of your key design considerations is security--specifically, one client can no way no how no when access another client's data--then, depending on how you implement this security, sticking that qualifying column in every table may be necessary. One such tactic described here requires building a view on every table; assuming each table contains a tenantId column, then if properly configured each view could contain a "WHERE tenantId = SUSER_SID()" clause (and of course you configure the database so that clients can only access the views).
Another factor (as in my current job) is loading warehouse data (ETL). Tables are partitioned on tenantId (we use table partitioning, but partitioned views would also work), and data can be easily loaded or unloaded for a client without seriously impacting any other client.
But as ever, there's a lot of "it depends" involved. If there is no clear and present need, and a very low likelihood of future need, then normalize that column out. Just realize that it's more a devise of physical implementation than of conceptual or logical database design.
它的存在是为了方便和性能 - 在标准化方面你是绝对正确的,它只需要进入顶部。那么问题就变成了,要获取一些数据(例如动物园 -> 动物 -> 食品 -> 供应商),您必须对理论上非常简单的查询进行极其复杂的连接。
因此,在现实世界中,人们必须妥协——问题就变成了妥协的地点和程度。
请参阅这篇文章也许标准化并不正常 - 及其结论:
开始探索该主题的地方
Its there for convenience and performance - in terms of normalisation you're absolutely right, it only needs to go in at the top. The problem then becomes that to get to some data (say zoo -> animal -> food -> supplier) you have to have horribly complex joins on what are notionally very simple queries.
So in the real world one has to compromise - question then becomes where and to what extent.
See this article Maybe Normalizing Isn't Normal - and its conclusion:
as a place to start exploring the subject
如果我的tenantID位于层次结构的顶部(即动物园级别),那么您需要考虑几个问题。
If I had tenantID at the top of the hierarchy (i.e. at the zoo level) you have several issues to consider.
我首先想到的是查找
animals > 的速度较慢。动物园>租户不仅仅是
动物>租户。这很可能是您经常执行的查找(例如,“获取某个租户的所有动物,无论动物园如何”)。
对于中小型应用程序,您可以采用更标准化的结构,但为了提高效率,您应该使用无关的数据(一般来说,多租户应用程序并不小)。只要确保它不会“不同步”,这是冗余数据带来的风险。
回答你的最后一段,原因是性能,纯粹而简单。连接并不是坏事;它们可以帮助您将一份数据保存在一个地方而不是三个地方。这绝对不是为了防止错误。将
tenant_id
字段添加到更多表中会增加出现错误的风险(尽管对于永不更改的 id,这不会是一个大问题)。The first thing that springs to mind is that it's slower to look up
animals > zoos > tenants
than simplyanimals > tenants
. And most likely this is a lookup you will do often (for example, "get all animals for a certain tenant, regardless of zoo").For small to mid-sized applications you can get away with a more normalized structure, but for the sake of efficiency, you should go with extraneous data (and generally speaking, multitenancy applications are not small). Just make sure it doesn't go "out of sync", which is a risk that comes with having redundant data.
To answer your last paragraph, the reason is performance, pure and simple. Joins are no bad thing; they help you keep a piece of data in one place rather than three. It's definitely not to prevent bugs. Adding a
tenant_id
field to more tables will increase the risk of bugs (although for an id that never changes, it wouldn't be as much of an issue).那么,鲍勃可能在第一动物园拥有一只长颈鹿,而乔可能在同一动物园拥有一只狮子。
他们不应该查看彼此的数据。
Well, Bob may own a giraffe in zoo No1, while Joe may own a lion in the same zoo.
They are not supposed to look at each others' data.
N1 的原因是为了安全性。
安全性需要成为多租户应用程序中的一个强有力的概念。
假设您授予用户修改动物的能力。
您创建一个带有选择的表单,显示当前租户的动物园。
如果用户破解表单并传递另一个租户的动物园 ID,会发生什么?
动物将被转移到另一个租户的另一个动物园!!
这对于多租户应用程序来说是真正的痛苦!
The reason N1 is for security.
Security need to be a strong concept in multi-tenant application.
Suppose that you give a user the ability to modify a Animal.
You create a form whith a select that show the zoo fo the current tenant.
What happen if the user hack the form and pass a zoo id of another tenant?
The animal will be moved to another zoo of another tenant!!
This is real pain in a multi tenant app!