我遇到的情况既简单又烦人。
要求是
每个项目
必须有一个关联的类别
。
-
每个项目
可以包含在集合
中。
-
集合
必须由同一类别
的项目
组成。
-
同一类别可能有多个集
。
插入新数据所需的逻辑过程如下:
- 插入
类别
。
-
项目
已插入。对于每个新项目
,都会分配一个类别
。
- 创建相同
类别
的组
项目
。
我想要一种确保表之间数据完整性的设计。
我提出了以下设计,但我不知道如何维护数据完整性。

如果不考虑黄色突出显示的关系,一切都非常简单,数据完整性是设计强制的:一个项目仅当将其分配给集合并且类别由集合本身给出时才获取类别。但是,不可能有不与集合关联但链接到类别的项目,这很烦人。
我想避免使用特殊的“桥接集”来为项目分配类别,因为它会让人感觉很麻烦,并且无法区分真实集和特殊集。
所以我用黄色介绍了关系。但现在您可以创建不同类别的对象集!
如何在 MySQL 中仅使用普通约束(索引、唯一性、FK)来避免此完整性问题?
另外,我想避免触发器,因为我不喜欢它们,因为它似乎是一种脆弱且不太可靠的解决此问题的方法...
我读过类似的问题,例如 如何在循环引用数据库结构中保留数据完整性?但我不明白如何应用我的情况的解决方案...
I have this situation that is as simple as it is annoying.
The requirements are
-
Every item
must have an associated category
.
-
Every item
MAY be included in a set
.
-
Sets
must be composed of items
of the same category
.
-
There may be several sets
of the same category.
The desired logic procedure to insert new data is as following:
-
Categories
are inserted.
-
Items
are inserted. For each new item
, a category
is assigned.
-
Sets
of items
of the same category
are created.
I'd like to get a design where data integrity between tables is ensured.
I have come up with the following design, but I can't figure out how to maintain data integrity.

If the relationship highlighted in yellow is not taken into account, everything is very simple and data integrity is forced by design: an item acquires a category only when it is assigned to a set and the category is given by the set itself.However, it would not be possible to have items not associated with a set but linked to a category and this is annoying.
I want to avoid using special "bridging sets" to assign a category to an item since it would feel hacky and there is no way to distinguish between real sets and special ones.
So I introduced the relationship in yellow. But now you can create sets of objects of different categories!
How can I avoid this integrity problem using only plain constraints (index, uniques, FK) in MySQL?
Also I would like to avoid triggers as I don't like them as it seems a fragile and not very reliable way to solve this problem...
I've read about similar question like How to preserve data integrity in circular reference database structure? but I cannot understand how to apply the solution in my case...
发布评论
评论(2)
有趣的场景。我不认为有绝对的“最佳”方法。这里需要考虑的一个问题是:
集合
中的项目
的比例是多少,与仅附加到<code>类别的比例是多少?您不想要的是
items
上有两个字段。因为,正如您所说,将会出现数据异常:item
的直接category
与它通过其继承的
category
不同代码>设置。理想情况下,您可以在
items
上创建一个代数数据类型的字段,即 标记联合,带有一个标记,表示其有效负载是category
与set
。但 SQL 不支持 ADT。因此任何 SQL 方法都必须有点 hacky。然后我建议折衷方案是让每个
item
成为set
的成员,并从中继承其category
。然后数据访问是一致的:始终JOIN
items-sets-categories
。为了支持这一点,请创建虚拟
集
,其唯一目的是链接到类别
。要解决“无法区分真实集合和特殊集合”的问题:在”。)
集合上放置一个额外的字段/指示符
:这是一个“真实”集合与这是一个链接- 类别集。 (或者一个技巧:将set-description
设置为“Category:Addit:顺便说一句,您的“插入新数据所需的逻辑过程”是错误的:您必须在
items
(第 2 步)之前插入sets
(第 3 步)。Interesting scenario. I don't see a slam-dunk 'best' approach. One consideration here is: what proportion of
items
are insets
vs attached only tocategories
?What you don't want is two fields on
items
. Because, as you say, there's going to be data anomalies: anitem
's directcategory
being different to thecategory
it inherits via itsset
.Ideally you'd make a single field on
items
that is an Algebraic Data Type aka Tagged Union, with a tag saying its payload was acategory
vs aset
. But SQL doesn't support ADTs. So any SQL approach would have to be a bit hacky.Then I suggest the compromise is to make every
item
a member of aset
, from which it inherits itscategory
. Then data access is consistent: alwaysJOIN
items-sets-categories
.To support that, create dummy
sets
whose only purpose is to link to acategory
.To address "there is no way to distinguish between real sets and special ones": put an extra field/indicator on
sets
: this is a 'real' set vs this is a link-to-category set. (Or a hack: make theset-description
as "Category: <category-name>".)Addit: BTW your "desired logic procedure to insert new data" is just wrong: you must insert
sets
(Step 3) beforeitems
(Step 2).我想我可以通过查看罗杰·沃尔夫(Roger Wolf)对类似情况的回答找到解决方案:
确保数据库建模集和子集中的关系完整性
本质上,在
items
表中,我已将set_id
FK 更改为引用两者的复合 FKset.id
和set.category_id
分别来自items.set_id
和item.category_id
列。这样,
items
表上的两个 FK 就会重叠。因此,对于
items
表中的每一行,一旦选择了category_id
,引用sets
表的 FK 就被迫指向一组同一类别。如果不满足此条件,则会引发异常。
现在,最初的答案提出了反对使用这种方法的建议。
我不确定这是否是一个好主意。
它确实有效,而且我认为与使用跳跳器进行如此简单的更复杂设计的解决方案相比,这是一个相当优雅的解决方案。
如果大量应用于大量表,也许相同的解决方案会更难以理解和维护。
编辑:
正如 AntC 在下面的评论中指出的那样,这种技术虽然有效,但可能会带来潜在的问题,例如,如果您想更改
set
的category_id
。在这种情况下,您必须更新链接到该
集
的每个item
的category_id
。这需要 BEGIN COMMIT/END COMMIT 围绕更新。
因此最终可能不值得,最好进一步调查需求以找到更好的模式。
I think I might found a solution by looking at the answer from Roger Wolf to a similar situation here:
Ensuring relationship integrity in a database modelling sets and subsets
Essentially, in the
items
table, I've changed theset_id
FK to a composite FK that references bothset.id
andset.category_id
from, respectively,items.set_id
anditem.category_id
columns.In this way there is an overlap of the two FKs on
items
table.So for each row in
items
table, once acategory_id
is chosen, the FK referring to thesets
table is forced to point to a set of the same category.If this condition is not respected, an exception is thrown.
Now, the original answer came with an advice against the use of this approach.
I am uncertain whether this is a good idea or not.
Surely it works and I think that is a fairly elegant solution compared to the one that uses tiggers for such a simple piece of a a more complex design.
Maybe the same solution is more difficult to understand and maintain if heavily applied to a large set of tables.
Edit:
As AntC pointed out in the comments below, this technique, although working, can give insidious problems e.g. if you want to change the
category_id
for aset
.In that case you would have to update the
category_id
of eachitem
linked to thatset
.That needs BEGIN COMMIT/END COMMIT wrapped around the updates.
So ultimately it's probably not worth it and it's better to investigate the requirements further in order to find a better schema.