数据库表中的数组和规范化
将数组保留在表列中是否明智?更准确地说,我正在考虑以下模式,据我所知,该模式违反了规范化:
create table Permissions(
GroupID int not null default(-1),
CategoryID int not null default(-1),
Permissions varchar(max) not null default(''),
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID)
);
以及:
create table Permissions(
GroupID int not null default(-1),
CategoryID int not null default(-1),
PermissionID int not null default(-1),
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID)
);
UPD3:我将权限设想为逗号分隔的字符串,因为 MSSQL 是我们的主要部署目标。
UPD:忘记提及,在这个具体问题的范围内,我们将考虑不会执行“获取具有权限 X 的行”,而是仅通过 GroupID 和 CategoryID 进行所有查找
UPD2:我设想的典型使用场景如下: 想法
int category_id=42;
int[] array_of_groups=new int[]{40,2,42};
if(!Permissions.Check(category_id, array_of_groups, Permission.EatAndDrink)) {
throw new StarveToDeathException();
}
?
提前致谢!
Is it smart to keep arrays in table columns? More precisely I am thinking of the following schema which to my understanding violates normalization:
create table Permissions(
GroupID int not null default(-1),
CategoryID int not null default(-1),
Permissions varchar(max) not null default(''),
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID)
);
and this:
create table Permissions(
GroupID int not null default(-1),
CategoryID int not null default(-1),
PermissionID int not null default(-1),
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID)
);
UPD3: I envision Permissions as a comma-delimited string since MSSQL is our primary deployment target.
UPD: Forgot to mention, in the scope of this concrete question we will consider that the "fetch rows that have permission X" won't be performed, instead all the lookups will be made by GroupID and CategoryID only
UPD2: I envision the typical usage scenario as following:
int category_id=42;
int[] array_of_groups=new int[]{40,2,42};
if(!Permissions.Check(category_id, array_of_groups, Permission.EatAndDrink)) {
throw new StarveToDeathException();
}
Thoughts?
Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我建议采取规范化的方式,原因如下:
I'd suggest to take the normalized road for the following reasons:
您的第二个示例可能应该是:
您的第一个示例将违反正常形式(并且字符串解析可能无法很好地利用您的处理时间),但这并不意味着它对于您的应用程序来说一定是错误的。这实际上取决于您如何使用数据。
Your second example should probably be:
Your first example would violate normal form (and string parsing might not be a good use of your processing time), but that doesn't mean it's necessarily wrong for your application. It really depends how you use the data.
偶尔,这要看情况。我想说这取决于你对标准化事物的定义有多狭隘。
如果您认为每个项目一行的表格没有任何用处,那么我建议可以考虑封装在字符串中。
在给出的示例中,如果我必须编写使用字符串模式匹配的 WHERE 子句,我希望确保执行查询来查找指定权限的所有组/类别组合不会给我带来问题。当然,如果我永远不需要执行这样的查询,那么这是一个没有实际意义的问题。
一般来说,当组装的数据在孤立时没有任何意义时,我对这种方法最满意:数据只有在被视为一个完整的集合时才有意义。如果有更多的结构,例如数据/值对列表,那么使用 XML 或 JSON 进行格式化可能会很有用。
Occasionally, it depends. I'd say it depends how narrowly you define the things being normalised.
If you can see no way in which a table with one row for each item would ever be useful then I'd suggest that the encapsulate-in-a-string might be considered.
In the example given, I'd want to be sure that executing a query to find all group/category combinations for a specified permission would not cause me a problem if I had to write a WHERE clause that used string pattern matching. Of course, if I never have to perform such a query then it's a moot point.
In general I'm happiest with this approach when the data being assembled thus has no significance in isolation: the data only makes sense when considered as a complete set. If there's a little more structure, say a list of data/value pairs, then formatting with XML or JSON can be useful.
如果您仅通过 GroupID 和/或 CategoryID 进行查询,那么没有任何问题。规范化意味着更多的表、行和连接。因此,对于大型数据库来说,这可能会对性能产生负面影响。
如果您绝对确定您永远不需要处理权限的查询,并且它仅由您的应用程序解析,则此解决方案没有任何不妥之处。如果您始终想要完整的权限集(即,您不只是为了获取字符串的一部分而查询,而是始终想要其所有值),那么它也可能是更好的选择。
If you're only querying by GroupID and/or CategoryID then there's nothing wrong with it. Normalizing would mean more tables, rows, and joins. So for large databases this can have a negative performance impact.
If you're absolutely certain you'll never need a query which processes Permissions, and it's only parsed by your application, there's nothing improper about this solution. It could also be preferable if you always want the complete set of permissions (i.e. you're not querying just to get part of the string, but always want all of its values).
第一个实现的问题在于它实际上并不使用数组,而是使用连接的字符串。
这意味着您将无法轻松使用该字符串中存储的值来执行基于集的查询,例如查找具有特定权限或特定权限集的所有人员。
如果您使用的数据库本身支持数组作为原子值,例如 PostgreSQL,则参数会有所不同。
根据建议查询的第二个要求,我不得不建议第二个是最好的,因为您可以简单地查询 SELECT count(*) FROM Permissions WHERE CategoryID = 42 AND GroupID IN (40, 2, 42) AND PermissionID = 2(假设 EatAndDrink 的 ID 为 2)。然而,第一个版本需要检索每个组的所有权限并解析字符串,然后才能测试它是否包含所请求的权限。
The problem with the first implementation is that it doesn't actually use an array but a concatenated string.
This means that you won't easily be able to use the value stored in that string to perform set based queries such as finding all people with a specific permission or specific set of permissions.
If you were using a database that natively supported arrays as an atomic value such PostgreSQL then the argument would be different.
Based upon the second requirement of the proposed query I'd have to suggest the second one is best as you can simply query
SELECT count(*) FROM Permissions WHERE CategoryID = 42 AND GroupID IN (40, 2, 42) AND PermissionID = 2
(assuming EatAndDrink has an ID of 2). The first version however would require retrieving all the permissions for each group and parsing the string before you can test if it includes the requested permission.