当前位置：文江博客话题详情

如何在数据库中存储具有动态数量属性的数据

发布于 2024-08-04 16:36:14 字数 384 浏览 2 评论 0原文

我有许多不同的对象，它们具有不同数量的属性。到目前为止，我已将数据保存在 XML 文件中，这可以轻松地允许不断变化的属性数量。但我正在尝试将其移至数据库。

您存储这些数据的首选方式是什么？

到目前为止，我已经确定了一些策略：

在对象的表中有一个名为“attributes”的字段，并将序列化或 json 格式的数据存储在其中。
将数据存储在两个表（对象、属性）中，并使用第三个表来保存关系，使其成为真正的 n:m 关系。非常干净的解决方案，但获取整个对象及其所有属性可能非常昂贵
识别所有对象共有的属性并为这些对象的表创建字段。将其余属性作为序列化数据存储在另一个字段中。这比第一种策略有一个优势，使搜索更容易。

有什么想法吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

猫九 2024-08-11 16:36:14

如果您曾经计划搜索特定属性，那么将它们序列化到单个列中不是一个好主意，因为您必须使用每行函数来获取信息 - 这很少能很好地扩展。

我会选择你的第二个选择。在属性表中包含属性列表，在其自己的表中包含对象，以及称为对象属性的多对多关系表。

例如：

objects:
    object_id    integer
    object_name  varchar(20)
    primary key  (object_id)
attributes:
    attr_id      integer
    attr_name    varchar(20)
    primary key  (attr_id)
object_attributes:
    object_id    integer  references (objects.object_id)
    attr_id      integer  references (attributes.attr_id)
    oa_value     varchar(20)
    primary key (object_id,attr_id)

您对性能的担忧已被注意到，但根据我的经验，拆分一列总是比合并多列成本更高。如果事实证明存在性能问题，出于性能原因破坏 3NF 是完全可以接受的。

在这种情况下，我会以相同的方式存储它，但还有一个包含原始序列化数据的列。如果您使用插入/更新触发器来保持列式数据和组合数据同步，那么您不会遇到任何问题。但在实际问题出现之前，您不应该担心这一点。

通过使用这些触发器，您可以最大限度地减少仅数据更改时所需的工作。通过尝试提取子列信息，您对每个选择都做了不必要的工作。

If you ever plan on searching for specific attributes, it's a bad idea to serialize them into a single column, since you'll have to use per-row functions to get the information out - this rarely scales well.

I would opt for your second choice. Have a list of attributes in an attribute table, the objects in their own table, and a many-to-many relationship table called object attributes.

For example:

objects:
    object_id    integer
    object_name  varchar(20)
    primary key  (object_id)
attributes:
    attr_id      integer
    attr_name    varchar(20)
    primary key  (attr_id)
object_attributes:
    object_id    integer  references (objects.object_id)
    attr_id      integer  references (attributes.attr_id)
    oa_value     varchar(20)
    primary key (object_id,attr_id)

Your concern about performance is noted but, in my experience, it's always more costly to split a column than to combine multiple columns. If it turns out that there are performance problems, it's perfectly acceptable to break 3NF for performance reasons.

In that case I would store it the same way but also have a column with the raw serialized data. Provided you use insert/update triggers to keep the columnar and combined data in sync, you won't have any problems. But you shouldn't worry about that until an actual problem surfaces.

By using those triggers, you minimize the work required to only when the data changes. By trying to extract sub-column information, you do unnecessary work on every select.

回复收藏 0 原文

書生途 2024-08-11 16:36:14

二维解决方案的一个变体只是两个表（假设所有属性都是单一类型）：

T1: |对象数据列|Object_id|

T2：|对象 ID|属性名称|属性值| （前 2 列上的唯一索引）

与第三个解决方案结合使用时，这甚至更有效，例如，所有公共字段都进入 T1。

不建议将 >1 个属性填充到同一个 blob 中 - 您无法按属性进行过滤，也无法有效地更新它们

回复收藏 0 原文

〃温暖了心ぐ 2024-08-11 16:36:14

让我具体说明一下 DVK 所说的内容。

假设值的类型相同，表格看起来像这样（祝你好运，我觉得你会需要它）：

dynamic_attribute_table
------------------------
id         NUMBER
key        VARCHAR
value      SOMETYPE?

示例（汽车）：

|id|    key   |   value   |
---------------------------
| 1|'Make'    |'Ford'     |
| 1|'Model'   |'Edge'     |
| 1|'Color'   |'Blue'     |
| 2|'Make'    |'Chevrolet'|
| 2|'Model'   |'Malibu'   |
| 2|'MaxSpeed'|'110mph'   |

因此，
实体 1 = { ('制造', '福特'), ('模型', '边缘'), ('颜色', '蓝色') }
并且，
实体 2 = { ('品牌', '雪佛兰'), ('车型', '迈锐宝'), ('MaxSpeed', '110mph') }.

Let me give some concreteness to what DVK was saying.

Assuming values are of same type the table would look like (good luck, I feel you're going to need it):

dynamic_attribute_table
------------------------
id         NUMBER
key        VARCHAR
value      SOMETYPE?

example (cars):

|id|    key   |   value   |
---------------------------
| 1|'Make'    |'Ford'     |
| 1|'Model'   |'Edge'     |
| 1|'Color'   |'Blue'     |
| 2|'Make'    |'Chevrolet'|
| 2|'Model'   |'Malibu'   |
| 2|'MaxSpeed'|'110mph'   |

Thus,
entity 1 = { ('Make', 'Ford'), ('Model', 'Edge'), ('Color', 'Blue') }
and,
entity 2 = { ('Make', 'Chevrolet'), ('Model', 'Malibu'), ('MaxSpeed', '110mph') }.

回复收藏 0 原文

当爱已成负担 2024-08-11 16:36:14

如果您使用的是关系数据库，那么我认为您列出的选项做得很好。他们各有各的优点和缺点。您最有能力决定什么最适合您的情况。

序列化方法可能是最快的（取决于您的反序列化代码），但这意味着您将无法使用 SQL 查询数据。如果你说你不需要用SQL查询数据，那么我同意@longneck，也许你应该使用键/值样式数据库而不是关系数据库。

编辑-阅读更多您的评论，如果速度是您主要关心的问题，为什么要切换到数据库。您当前的 XML 实现有什么不好的地方？

回复收藏 0 原文

梦言归人 2024-08-11 16:36:14

我曾经实现 此方案：

t_class (id RAW(16), parent RAW(16)) -- holds class hierachy.
t_property (class RAW(16), property VARCHAR) -- holds class members.
t_declaration (id RAW(16), class RAW(16)) -- hold GUIDs and types of all class instances
t_instance (id RAW(16), class RAW(16), property VARCHAR2(100), textvalue VARCHAR2(200), intvalue INT, doublevalue DOUBLE, datevalue DATE) -- holds 'common' properties

t_class1 (id RAW(16), amount DOUBLE, source RAW(16), destination RAW(16)) -- holds 'fast' properties for class1.
t_class2 (id RAW(16), comment VARCHAR2(200)) -- holds 'fast' properties for class2
--- etc.

RAW(16)是Oracle保存GUID的地方

如果你想选择所有属性对于一个对象，您可以发出：

SELECT  i.*
FROM    (
        SELECT  id 
        FROM    t_class
        START WITH
                id = (SELECT class FROM t_declaration WHERE id = :object_id)
        CONNECT BY
                parent = PRIOR id
        ) c
JOIN    property p
ON      p.class = c.id
LEFT JOIN
        t_instance i
ON      i.id = :object_id
        AND i.class = p.class
        AND i.property = p.property

t_property 保存您通常不搜索的内容（例如文本描述等）。

快速属性实际上是数据库中的普通表，以提高查询效率。它们仅保存某个类或其后代的实例的值。这是为了避免额外的连接。

您不必使用快速表并将所有数据限制在这四个表中。

I used to implement this scheme:

t_class (id RAW(16), parent RAW(16)) -- holds class hierachy.
t_property (class RAW(16), property VARCHAR) -- holds class members.
t_declaration (id RAW(16), class RAW(16)) -- hold GUIDs and types of all class instances
t_instance (id RAW(16), class RAW(16), property VARCHAR2(100), textvalue VARCHAR2(200), intvalue INT, doublevalue DOUBLE, datevalue DATE) -- holds 'common' properties

t_class1 (id RAW(16), amount DOUBLE, source RAW(16), destination RAW(16)) -- holds 'fast' properties for class1.
t_class2 (id RAW(16), comment VARCHAR2(200)) -- holds 'fast' properties for class2
--- etc.

RAW(16) is where Oracle holds GUIDs

If you want to select all properties for an object, you issue:

SELECT  i.*
FROM    (
        SELECT  id 
        FROM    t_class
        START WITH
                id = (SELECT class FROM t_declaration WHERE id = :object_id)
        CONNECT BY
                parent = PRIOR id
        ) c
JOIN    property p
ON      p.class = c.id
LEFT JOIN
        t_instance i
ON      i.id = :object_id
        AND i.class = p.class
        AND i.property = p.property

t_property hold stuff you normally don't search on (like, text descriptions etc.)

Fast properties are in fact normal tables you have in the database, to make the queries efficient. They hold values only for the instances of a certain class or its descendants. This is to avoid extra joins.

You don't have to use fast tables and limit all your data to these four tables.

回复收藏 0 原文

醉态萌生 2024-08-11 16:36:14

听起来你需要一些东西 couchdb，而不是 RDBMS。

回复收藏 0 原文

请恋爱 2024-08-11 16:36:14

如果您要在稍后编辑/操作/删除属性，那么制作一个真正的 n:m （第二个选项）将是我所追求的。（或者尝试将其设为 2 个表，其中相同的属性重复。但数据大小会很大）

如果您不处理属性（只是捕获和显示数据），那么您可以继续并使用一些分隔符存储在一个字段中（确保属性值中不会出现分隔符）

回复收藏 0 原文

何时共饮酒 2024-08-11 16:36:14

我假设您没有数字属性汤，但您的数据有一定的顺序。

否则，RDBMS 可能不是最合适的。 NO SQL 的一些东西可能会工作得更好。

如果您的对象属于不同类型，则通常每种类型应该有一个表。

特别是如果您想使用主键连接它们。如果您有产品、订单、客户等表，而不仅仅是对象和属性表，它还有助于带来秩序和理智。

然后看你的属性。任何存在超过该类型类别中 50% 的对象的内容，都将其设为对象表中的一列，并在不使用时使用 null。

当然，任何强制的内容都应该定义为 NOT NULL 列。

其余的，您可以拥有一个或多个“额外属性”表。

您可以将属性名称与值一起放入表中，或者将它们规范化到单独的表中并仅使用值表中的主键。

您可能还会发现您有数据组合。例如，对象类型的变体总是具有特定的属性集，而同一对象类型的另一个变体具有另一组属性。

在这种情况下，你可能想做这样的事情：

MainObjectTable:
  mainObjectId: PRIMARY KEY
  columns...
MainObjectVariant1Table:
  mainObjectId: FOREIGN KEY TO MainObjectTable
  variant1Columns...
MainObjectVariant2Table:
  mainObjectId: FOREIGN KEY TO MainObjectTable
  variant2Columns...

我认为从长远来看，艰苦的工作会得到回报，就是分析数据，找到对象和常用属性，并将其变成一个好的“对象” /ERD/DB”模型。

I am assuming you do not have digital attribute soup, but that there is some order to your data.

Otherwise, an RDBMS might not be the best fit. Something along NO SQL might work better.

If your objects are of different types, you should generally have one table per type.

Especially if you want to connect them using primary keys. It also helps to bring order and sanity if you have Products, Orders, Customers, etc tables, instead of just an Object and Attribute table.

Then look at your attributes. Anything that exists more than, say for 50% of the objects in that type category, make it a column in the object's table and use null when it's not being used.

Anything that is mandatory, should, of course, be defined as a NOT NULL column.

The rest, you can either have one or several "extra attributes" tables for.

You could put the attribute names into the table with the values, or normalize them out in a separate table and only use the primary key in the value table.

You may also find that you have combinations of data. For instance, a variant of an object type always has a certain set of attributes while another variant of the same object type has another set of attributes.

In that case, you might want to do something like:

MainObjectTable:
  mainObjectId: PRIMARY KEY
  columns...
MainObjectVariant1Table:
  mainObjectId: FOREIGN KEY TO MainObjectTable
  variant1Columns...
MainObjectVariant2Table:
  mainObjectId: FOREIGN KEY TO MainObjectTable
  variant2Columns...

I think the hard work, that will pay off, in the long run, is to analyze the data, find the objects and the commonly used attributes and make it into a good "object/ERD/DB" model.

回复收藏 0 原文

~没有更多了~