应该如何在 Django ManyToMany 模型或 MySQL 中强制执行唯一集?
我的 django 模型如下所示:
class Entity(models.Model):
name = models.CharField(max_length=40)
examples = models.ManyToManyField(Example, blank=True)
tokens = models.ManyToManyField(Token, blank=True, null=True)
我想强制令牌之间的唯一性,即如果已经存在一个带有令牌 ['a', 'b', 'c'] 的实体,我不想添加另一个带有 ['a' '、'b'、'c']。然而,具有标记 ['a', 'b', 'c', 'd'] 或 ['a', 'b'] 的实体是不同的集合,应该添加。
在已经找到具有特定集合的实体的情况下,我想将已发现的新示例添加到该 ManyToMany 中。旧名可以保留。
目前,我运行一个查询来获取具有相关令牌集的现有实体(这本身就是 django 中的一个挑战),然后如果找到它,我会使用新示例更新它。问题在于,它在多个服务器上的多个进程中运行,因此在检查是否存在匹配实体和创建新实体(如果不存在)之间存在竞争条件。这可能会导致创建具有重复令牌集的实体。
我提出的一个解决方案是对 ManyToMany 使用显式直通模型,并重写直通模型的保存方法以创建令牌集的哈希,并将其包含在具有唯一约束的列上的直通模型本身中。我认为这可行,但看起来并不是一个特别优雅的解决方案。
我还想象这个问题在带有映射表的 SQL 领域中有些常见,如果您想要一个唯一的集合 - 也许这里有一个众所周知的解决方案?我当然愿意使用原始 sql。
预先感谢您的帮助!
My django model looks like this:
class Entity(models.Model):
name = models.CharField(max_length=40)
examples = models.ManyToManyField(Example, blank=True)
tokens = models.ManyToManyField(Token, blank=True, null=True)
I want to enforce uniqueness among the tokens, i.e. if there is already an entity with tokens ['a', 'b', 'c'] I do not want to add another one with ['a', 'b', 'c']. However an entity with tokens ['a', 'b', 'c', 'd'] or ['a', 'b'] is a different set and should be added.
In the case where an entity with the specific set is already found, I want to add the new examples that have been discovered to that ManyToMany. The old name can stay.
Currently, I run a query to get an existing Entity with the exact token set in question (this is itself a challenge in django), then if it is found I update it with the new example. The problem is that this runs in multiple processes on multiple servers, so there is a race condition between checking whether a matching entity exists, and creating a new one, if it was not found to. This could result in entities with duplicate token sets being created.
One solution I came up with is to use an explicit through model for the ManyToMany, and override the save method for the through model to create a hash of the token set and include it in the through model itself on a column with a unique constraint. I think this would work but it doesn't seem like a particularly elegant solution.
I would also imagine that this problem is somewhat common in SQL land with mapping tables in the case where you want a unique set - perhaps there is a well known solution here? I am certainly open to using raw sql.
Thanks in advance for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不完全确定此方法可以解决您的问题,但这里是。
一位使用大型保险公司数据库的朋友正在寻找一种基于列子集跨多个数据库快速检测精确重复的方法。我建议采用他正在检查的一组列,通过某种 canonicalifier™ 运行它们,计算 MD5 摘要,并将其保存为表的附加索引列。傻瓜跑得像涂了油的闪电。
所以也许你可以做类似的事情。它不能解决你的竞争条件(我能想到的都没有),但它可以显着简化重复检查。
I'm not entirely sure this method addresses your problem, but here goes.
A friend who works with Big insurance company databases was looking for a way of rapidly detecting exact dupes across multiple databases based on a subset of columns. I suggested taking the set of columns that he was checking against, run them through some sort of canonicalifier™, computing an MD5 digest, and saving that as an additional, indexed column for the table. Sucker ran like greased lightning.
So maybe you can do something similar. It doesn't address your race condition (nothing I can think of does), but it could significantly simplify the dupe-check.