关系数据库中的多值属性？

发布于 2024-12-06 06:53:55 字数 887 浏览 0 评论 0原文

当一个想法被广泛引用时，关系数据库中的多值属性有多好？

让我举个例子来说明我的意思。假设我有下表：

UserID          Attribute1

User1           a,b,c
User2           x,y,z
User3           a,x,y
User4           c,b,z
[a,b,c,x,y,z are to be strings]

还有另一个用户 User5，我必须根据他的 Attribute1 是否与其他 4 个用户中的任何一个匹配或不是。

[在图形数据库中，任务本来可以更容易，因为我可以使用相同的关系从各个用户创建多个节点。]

现在，这个表只是一个微观层面的抽象实际的数据库会是什么样子。表中的行数即使不是数百万，也可能达到数十万。此外，多个值实际上可能远大于 3。除此之外，数据库可能处于重负载状态，在这种情况下，可能会出现一些问题。

那么，多值属性在这种情况下有用吗？或者有更好的方法来做同样的事情吗？我能想到的一种明显的方法是将其存储为：

UserID          Attribute1

User1           a
User1           b
User1           c
User2           x
User2           y
User2           z
User3           a
User3           x
User3           y
User4           c
User4           b
User4           z

在数据库中处理此类情况的任何更快的方法？或者现代数据库是否有任何内置功能可供利用？

原文

How well an idea are multi-valued attributes in a relational database when they are to be referred extensively?

Let me give you an example to show what I mean. Suppose I have the following table:

UserID          Attribute1

User1           a,b,c
User2           x,y,z
User3           a,x,y
User4           c,b,z
[a,b,c,x,y,z are to be strings]

There is another user User5 to whom I have to make some suggestions about other users based on whether his Attribute1 matches any one of other 4 users or not.

[In graph databases, the task could have been much easier as I could have created multiple nodes from the respective users using the same relationship.]

Now, this table is just a micro-level abstraction of what an actual database will look like. The number of rows in a table may run into hundreds of thousands, if not millions. Also, the multiple values may actually be a lot more than 3. Apart from this, the database can be under heavy load, and in that situation, there may be some issues.

So, are multi-valued attributes helpful in such cases? Or is there any better way of doing the same? One obvious way I can think of is to store it as:

UserID          Attribute1

User1           a
User1           b
User1           c
User2           x
User2           y
User2           z
User3           a
User3           x
User3           y
User4           c
User4           b
User4           z

Any faster way of dealing such situations in databases? Or are there any built-in features of modern-day databases to exploit?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

还给你自由 2024-12-13 06:53:55

仅当数据在数据库中很重时（即，如果您仅从数据库中读取该字段并随后对其进行处理），在一个字段中包含多个值才有用。

一旦您想在查询中使用字段中的值，您就会因为必须解析该值进行比较而遭受巨大的性能损失。如果像第二个示例一样将值放在单独的记录中，以便可以在其上添加索引，则查询速度提高 10 000 倍并非不现实。

表中拥有一百万条记录不是问题。我们有一些表，其中包含超过 1 亿条记录。

回复收藏 0 原文

北城挽邺 2024-12-13 06:53:55

除了其他人所说的关于规范化之外，我想回答您的问题中的“或者现代数据库的任何内置功能可以利用吗？”部分：

PostgreSQL 有一个非常漂亮的扩展称为 hstore ，它正是以高度优化的方式做到这一点。

hstore 数据类型本质上是一个键/值对，您可以在其中存储任何内容。在您的示例中，如下所示：

INSERT INTO user_attributes
(user_id, , attributes)
VALUES
(1, ('att1 => x, att2 => y'));

将键 att1 和 att2 插入列属性中。可以对其进行索引以加快查找速度。

您可以使用以下语法查询数据：

SELECT *
FROM user_attributes
WHERE attributes @> ('att1 => "Some Value"')

这将返回具有名为 att1 的键且映射到值“Some Value”的所有行。上面的语句将使用列上的现有索引，因此查找速度几乎与“真实”列一样快。上面的语句在我的笔记本电脑上大约需要 2 毫秒才能在包含 100.000 行的表中找到一行。

您还可以查询定义了特定属性的行，而不管值如何：

SELECT user_id,
       (attributes -> 'att1')
FROM user_attributes
WHERE attributes ? 'att1'

将查找定义了 att1 的所有行，并输出这些行的值。

Apart from what the others have said regarding normalization, I'd like to answer to the "Or any inbuilt feature of modern-day databses to exploit?" part of your question:

PostgreSQL has a pretty nifty extension called hstore which does exactly that and in a highly optimized manner.

The hstore data type is essentially a key/value pair, where you can store anything. In your example something like this:

INSERT INTO user_attributes
(user_id, , attributes)
VALUES
(1, ('att1 => x, att2 => y'));

Will insert the keys att1 and att2 into the column attributes. This can be indexed to make lookups fast.

You can query the data using this syntax:

SELECT *
FROM user_attributes
WHERE attributes @> ('att1 => "Some Value"')

This will return all rows that have a key named att1 and where that is mapped to the value "Some Value". The above statement will use an existing index on the column, so the lookup is nearly as fast as with a "real" column. The above statement takes ~2ms on my laptop to find a row in a table with 100.000 rows.

You can also query for rows that have a specific attribute defined regardless of the value:

SELECT user_id,
       (attributes -> 'att1')
FROM user_attributes
WHERE attributes ? 'att1'

will find all rows where att1 is defined and will output the value for those.

回复收藏 0 原文

暮光沉寂 2024-12-13 06:53:55

对于 nn 表，您可以将其规范化为 3 个表（在事务模型中） users - user_attribute - fields，其中 user_attribute 表由用户和属性的主键组成。键通常被索引，因此读取操作

编辑速度相当快在问题之后，

Users
int Id PrimaryKey
string name

User_Attribute
UserId PrimaryKey (FK to Users.Id)
AttributeId PrimaryKey (FK to Attributes.Id)

Attributes
int Id PrimaryKey
Value

这将导致一个仅保存用户的表，一个仅保存属性的表和一个保存哪个用户持有什么的

表

   Users      User_Attribute      Attrubutes      
id  Name   UserId AttributeId  Id Value
1   User1  1      1            1  Att1
2   User2  1      2            2  Att2
           2      1            3  Att3  
           2      3

For a n-n table you could normalize it to 3 tables (in a transactional model) users - user_attribute - attributes where the user_attribute table consists out of the primary key of users and attributes.. Keys are usually indexed and therefore quite fast for read ops

EDIT AFTER QUESTION

Users
int Id PrimaryKey
string name

User_Attribute
UserId PrimaryKey (FK to Users.Id)
AttributeId PrimaryKey (FK to Attributes.Id)

Attributes
int Id PrimaryKey
Value

this would result in a table holding only the users, a table holding only the attributes and a table holding which user is holding what

for instance

   Users      User_Attribute      Attrubutes      
id  Name   UserId AttributeId  Id Value
1   User1  1      1            1  Att1
2   User2  1      2            2  Att2
           2      1            3  Att3  
           2      3

回复收藏 0 原文

~没有更多了~