设计数据库来保存不同的元数据信息

发布于 2024-09-09 16:34:54 字数 174 浏览 5 评论 0原文

因此,我正在尝试设计一个数据库,该数据库允许我将一种产品与多个类别连接起来。这部分我已经想好了。但我无法解决的是保存不同类型的产品详细信息的问题。

例如,产品可能是一本书(在这种情况下,我需要引用该书的元数据,如 isbn、作者等),也可能是一个企业列表(具有不同的元数据)。

我应该如何解决这个问题?

So I am trying to design a database that will allow me to connect one product with multiple categories. This part I have figured. But what I am not able to resolve is the issue of holding different type of product details.

For example, the product could be a book (in which case i would need metadata that refers to that book like isbn, author etc) or it could be a business listing (which has different metadata) ..

How should I tackle that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

懒的傷心 2024-09-16 16:34:54

这称为观察模式。

输入图像描述这里

三个对象,例如

Book
Title = 'Gone with the Wind' 
Author = 'Margaret Mitchell'
ISBN   = '978-1416548898'

Cat
Name = 'Phoebe'
Color = 'Gray'
TailLength = 9 'inch'

Beer Bottle
Volume = 500 'ml'
Color = 'Green'

表格的样子:

Entity
EntityID    Name            Description
   1        'Book'            'To read'
   2        'Cat'             'Fury cat' 
   3        'Beer Bottle'     'To ship beer in'

PropertyType
PropertyTypeID   Name        IsTrait         Description
   1            'Height'     'NO'       'For anything that has height' 
   2            'Width'      'NO'       'For anything that has width' 
   3            'Volume'     'NO'       'For things that can have volume'
   4            'Title'      'YES'      'Some stuff has title' 
   5            'Author'     'YES'      'Things can be authored' 
   6            'Color'      'YES'      'Color of things' 
   7            'ISBN'       'YES'      'Books would need this'
   8            'TailLength' 'NO'       'For stuff that has long tails'
   9            'Name'       'YES'      'Name of things'

Property
PropertyID   EntityID  PropertyTypeID      
    1           1              4     -- book, title
    2           1              5     -- book, author
    3           1              7     -- book, isbn
    4           2              9     -- cat, name
    5           2              6     -- cat, color
    6           2              8     -- cat, tail length
    7           3              3     -- beer bottle, volume
    8           3              6     -- beer bottle, color

Measurement
PropertyID     Unit       Value 
    6          'inch'       9          -- cat, tail length
    7          'ml'        500         -- beer bottle, volume

Trait
PropertyID         Value 
    1         'Gone with the Wind'     -- book, title
    2         'Margaret Mitchell'      -- book, author
    3         '978-1416548898'         -- book, isbn
    4         'Phoebe'                 -- cat, name
    5         'Gray'                   -- cat, color
    8         'Green'                  -- beer bottle, color

编辑:

杰弗里提出了一个有效的观点(请参阅评论),所以我将扩展答案。

该模型允许动态(即时)创建任意数量的实体
具有任何类型的属性,无需更改架构。然而,这种灵活性是有代价的——存储和搜索比通常的表设计更慢、更复杂。

是时候举个例子了,但首先,为了让事情变得更容易,我将把模型展平为视图。

create view vModel as 
select 
      e.EntityId
    , x.Name  as PropertyName
    , m.Value as MeasurementValue
    , m.Unit
    , t.Value as TraitValue
from Entity           as e
join Property         as p on p.EntityID       = p.EntityID
join PropertyType     as x on x.PropertyTypeId = p.PropertyTypeId
left join Measurement as m on m.PropertyId     = p.PropertyId
left join Trait       as t on t.PropertyId     = p.PropertyId
;

使用评论中 Jefferey 的示例

with 
q_00 as ( -- all books
    select EntityID
    from vModel
    where PropertyName = 'object type'
      and TraitValue   = 'book' 
),
q_01 as ( -- all US books
    select EntityID
    from vModel as a
    join q_00   as b on b.EntityID = a.EntityID
    where PropertyName = 'publisher country'
      and TraitValue   = 'US' 
),
q_02 as ( -- all US books published in 2008
    select EntityID
    from vModel as a
    join q_01   as b on b.EntityID = a.EntityID
    where PropertyName     = 'year published'
      and MeasurementValue = 2008 
),
q_03 as ( -- all US books published in 2008 not discontinued
    select EntityID
    from vModel as a
    join q_02   as b on b.EntityID = a.EntityID
    where PropertyName = 'is discontinued'
      and TraitValue   = 'no' 
),
q_04 as ( -- all US books published in 2008 not discontinued that cost less than $50
    select EntityID
    from vModel as a
    join q_03   as b on b.EntityID = a.EntityID
    where PropertyName     = 'price'
      and MeasurementValue < 50 
      and MeasurementUnit  = 'USD'
)
select
      EntityID
    , max(case PropertyName when 'title' than TraitValue else null end) as Title
    , max(case PropertyName when 'ISBN'  than TraitValue else null end) as ISBN
from vModel as a
join q_04   as b on b.EntityID = a.EntityID
group by EntityID ;

这看起来写起来很复杂,但仔细检查后,您可能会注意到 CTE 中的模式。

现在假设我们有一个标准的固定模式设计,其中每个对象属性都有自己的列。
查询看起来像这样:

select EntityID, Title, ISBN
from vModel
WHERE ObjectType       = 'book'
  and PublisherCountry = 'US'
  and YearPublished    = 2008
  and IsDiscontinued   = 'no'
  and Price            < 50
  and Currency         = 'USD'
;

This is called the Observation Pattern.

enter image description here

Three objects, for the example

Book
Title = 'Gone with the Wind' 
Author = 'Margaret Mitchell'
ISBN   = '978-1416548898'

Cat
Name = 'Phoebe'
Color = 'Gray'
TailLength = 9 'inch'

Beer Bottle
Volume = 500 'ml'
Color = 'Green'

This is how tables may look like:

Entity
EntityID    Name            Description
   1        'Book'            'To read'
   2        'Cat'             'Fury cat' 
   3        'Beer Bottle'     'To ship beer in'

.

PropertyType
PropertyTypeID   Name        IsTrait         Description
   1            'Height'     'NO'       'For anything that has height' 
   2            'Width'      'NO'       'For anything that has width' 
   3            'Volume'     'NO'       'For things that can have volume'
   4            'Title'      'YES'      'Some stuff has title' 
   5            'Author'     'YES'      'Things can be authored' 
   6            'Color'      'YES'      'Color of things' 
   7            'ISBN'       'YES'      'Books would need this'
   8            'TailLength' 'NO'       'For stuff that has long tails'
   9            'Name'       'YES'      'Name of things'

.

Property
PropertyID   EntityID  PropertyTypeID      
    1           1              4     -- book, title
    2           1              5     -- book, author
    3           1              7     -- book, isbn
    4           2              9     -- cat, name
    5           2              6     -- cat, color
    6           2              8     -- cat, tail length
    7           3              3     -- beer bottle, volume
    8           3              6     -- beer bottle, color

.

Measurement
PropertyID     Unit       Value 
    6          'inch'       9          -- cat, tail length
    7          'ml'        500         -- beer bottle, volume

.

Trait
PropertyID         Value 
    1         'Gone with the Wind'     -- book, title
    2         'Margaret Mitchell'      -- book, author
    3         '978-1416548898'         -- book, isbn
    4         'Phoebe'                 -- cat, name
    5         'Gray'                   -- cat, color
    8         'Green'                  -- beer bottle, color

EDIT:

Jefferey raised a valid point (see comment), so I'll expand the answer.

The model allows for dynamic (on-fly) creation of any number of entites
with any type of properties without schema changes. Hovewer, this flexibility has a price -- storing and searching is slower and more complex than in a usual table design.

Time for an example, but first, to make things easier, I'll flatten the model into a view.

create view vModel as 
select 
      e.EntityId
    , x.Name  as PropertyName
    , m.Value as MeasurementValue
    , m.Unit
    , t.Value as TraitValue
from Entity           as e
join Property         as p on p.EntityID       = p.EntityID
join PropertyType     as x on x.PropertyTypeId = p.PropertyTypeId
left join Measurement as m on m.PropertyId     = p.PropertyId
left join Trait       as t on t.PropertyId     = p.PropertyId
;

To use Jefferey's example from the comment

with 
q_00 as ( -- all books
    select EntityID
    from vModel
    where PropertyName = 'object type'
      and TraitValue   = 'book' 
),
q_01 as ( -- all US books
    select EntityID
    from vModel as a
    join q_00   as b on b.EntityID = a.EntityID
    where PropertyName = 'publisher country'
      and TraitValue   = 'US' 
),
q_02 as ( -- all US books published in 2008
    select EntityID
    from vModel as a
    join q_01   as b on b.EntityID = a.EntityID
    where PropertyName     = 'year published'
      and MeasurementValue = 2008 
),
q_03 as ( -- all US books published in 2008 not discontinued
    select EntityID
    from vModel as a
    join q_02   as b on b.EntityID = a.EntityID
    where PropertyName = 'is discontinued'
      and TraitValue   = 'no' 
),
q_04 as ( -- all US books published in 2008 not discontinued that cost less than $50
    select EntityID
    from vModel as a
    join q_03   as b on b.EntityID = a.EntityID
    where PropertyName     = 'price'
      and MeasurementValue < 50 
      and MeasurementUnit  = 'USD'
)
select
      EntityID
    , max(case PropertyName when 'title' than TraitValue else null end) as Title
    , max(case PropertyName when 'ISBN'  than TraitValue else null end) as ISBN
from vModel as a
join q_04   as b on b.EntityID = a.EntityID
group by EntityID ;

This looks complicated to write, but on a closer inspection you may notice a pattern in CTEs.

Now suppose we have a standard fixed schema design where each object property has its own column.
The query would look something like:

select EntityID, Title, ISBN
from vModel
WHERE ObjectType       = 'book'
  and PublisherCountry = 'US'
  and YearPublished    = 2008
  and IsDiscontinued   = 'no'
  and Price            < 50
  and Currency         = 'USD'
;
旧竹 2024-09-16 16:34:54

我本来不打算回答,但现在接受的答案有一个非常糟糕的主意。关系数据库决不应该用于存储简单的属性值对。这会在未来造成很多问题。

处理这个问题的最佳方法是为每种类型创建一个单独的表。

Product
-------
ProductId
Description
Price
(other attributes common to all products)

Book
----
ProductId (foreign key to Product.ProductId)
ISBN
Author
(other attributes related to books)

Electronics
-----------
ProductId (foreign key to Product.ProductId)
BatteriesRequired
etc.

每个表的每一行都应该代表一个关于现实世界的命题,表的结构及其约束应该反映所代表的现实。越接近这个理想,数据就越干净,报告和以其他方式扩展系统就越容易。它还将更有效地运行。

I wasn't going to answer, but right now the accepted answer has a very bad idea. A relational database should never be used to store simple attribute-value pairs. That will cause a lot of problems down the road.

The best way to deal with this is to create a separate table for each type.

Product
-------
ProductId
Description
Price
(other attributes common to all products)

Book
----
ProductId (foreign key to Product.ProductId)
ISBN
Author
(other attributes related to books)

Electronics
-----------
ProductId (foreign key to Product.ProductId)
BatteriesRequired
etc.

Each row of each table should represent a proposition about the real world, and the structure of the tables and their constraints should reflect the realities that are being represented. The closer you can get to this ideal, the cleaner the data will be, and the easier it will be to do reporting and to extend the system in other ways. It will also run more effeciently.

滥情哥ㄟ 2024-09-16 16:34:54

您可以采用无模式方法:

将元数据作为 JSON 对象(或其他序列化,但 JSON 更好,原因很快就会解释)保存在 TEXT 列中。

这种技术的优点:

  1. 更少的查询:您可以在一次查询中获取所有信息,无需“定向”查询(以获取元元数据)和联接。

  2. 随时添加/删除任何您想要的属性,无需更改表(这在某些数据库中是有问题的,例如Mysql锁定表,对于大表需要很长时间)

  3. 因为它是JSON,所以您不需要需要在后端进行额外处理。您的网页(我假设它是一个 Web 应用程序)只是从您的 Web 服务中读取 JSON,仅此而已,您可以根据需要将 JSON 对象与 javascript 一起使用。

问题:

  1. 可能会浪费空间,如果您有 100 本同一作者的书籍,则所有书籍都只有author_id 的作者表会更节省空间。

  2. 需要实现索引。由于您的元数据是 JSON 对象,因此您不会立即拥有索引。但为您需要的特定元数据实现特定索引相当容易。例如,您想按作者建立索引,因此您创建一个包含author_id和item_id的author_idx表,当有人搜索作者时,您可以查找该表和项目本身。

根据规模,这可能有点矫枉过正。在较小规模的连接上效果很好。

You could go with the schema-less approach:

Hold the metadata in a TEXT column as a JSON object (or other serialization, but JSON is better for reasons soon explained).

Advantages to this technique:

  1. Less queries: you get all the information in one query, no need for "in directional" queries (to get meta-meta-data) and joins.

  2. You can add/remove any attributes you want at any time, no need to alter table (which is problematic in some databases, e.g Mysql locks the table, and it takes long time with huge tables)

  3. Since it's JSON, you don't need extra processing on your backend. Your webpage (I assume it's a web application) just reads the JSON as is from your web service and that's it, you can use the JSON object with javascript however you like.

Problems:

  1. Potentially wasted space, if you have 100 books with the same author, an author table with all the books having just the author_id is more economical space wise.

  2. Need to implement indexes. since your metadata is a JSON object you don't have indexes straight away. But it's fairly easy to implement specific index for the specific metadata you need. e.g you want to index by author, so you create a author_idx table with author_id and item_id, when someone searches for author, you can look up this table and the items themselves.

Depending on the scale, this might be an overkill. on smaller scale joins would work just fine.

奈何桥上唱咆哮 2024-09-16 16:34:54

产品应打字。例如,在产品表中包含 type_id,它指向您将支持的产品类别,并让您知道要查询哪些其他表以获取适当的相关属性。

The product should be typed. e.g. include type_id in the product table, that points to the categories of products you will support, and lets you know which other tables to query against for the appropriate related attributes.

风吹短裙飘 2024-09-16 16:34:54

在此类问题中,您有三种选择:

  1. 创建一个包含“通用”列的表。例如,如果您同时销售书籍和烤面包机,则您的烤面包机可能没有 ISBN 和标题,但仍具有某种产品标识符和说明。因此,为字段指定通用名称,例如“product_id”和“description”,对于书籍,product_id 是 ISBN,对于烤面包机,它是制造商的零件号等。

当现实世界的实体都以相同的方式处理时,这才有效。 ,至少在大多数情况下,因此即使不是“相同”的数据,也必须至少具有类似的数据。当存在真正的功能差异时,这种情况就会崩溃。就像对于烤面包机我们计算瓦特=伏特*安培一样,书籍很可能没有相应的计算。当您开始创建包含书籍页数和烤面包机电压的 page_volts 字段时,事情已经失去控制。

  1. 使用达米尔建议的属性/价值方案。请参阅我对他的帖子的评论,了解其中的优点和缺点。

  2. 我通常建议的是类型/子类型方案。为“产品”创建一个表,其中包含类型代码和通用字段。然后,对于每种真实类型(书籍、烤面包机、猫等),创建一个连接到产品表的单独表。然后当需要进行特定于书籍的处理时,处理书籍表。当您需要进行通用处理时,请处理产品表。

In this kind of problem, you have three choices:

  1. Create a table with "generic" columns. For example, if you sell both books and toasters, it is likely that your toasters don't have an ISBN and a title, but they still have some kind of product identifier and description. So give the fields generic names like "product_id" and "description", and for books the product_id is an ISBN, for toasters its the manufacturer's part number, etc.

This works when the real-world entities are all being processed in the same way, at least for the most part, and so must have, if not the "same" data, at least analagous data. This breaks down when there are real functional differences. Like if for toasters we are calcualting watts = volts * amps, it is likely that there is no corresponding calculation for books. When you start creating a pages_volts fields that contains the page count for books and the voltage for toasters, things have gotten out of control.

  1. Use a property/value scheme like Damir suggests. See my comment on his post for the pros and cons there.

  2. What I'd usually suggest is a type/subtype scheme. Create a table for "product" that contains a type code and the generic fields. Then for each of the true types -- books, toasters, cats, whatever -- create a separate table that is connected to the product table. Then when you need to do book-specific processing, process the book table. When you need to do generic processing, process the product table.

冰火雁神 2024-09-16 16:34:54

我知道这可能不是您正在寻找的答案,但不幸的是,关系数据库(SQL)是建立在结构化预定义模式的想法之上的。您正在尝试将非结构化无模式数据存储在不是为其构建的模型中。是的,您可以捏造它,以便从技术上讲您可以存储无限量的元数据,但这很快就会导致很多问题并很快失控。只要看看 WordPress 以及他们使用这种方法遇到的问题数量,您就可以很容易地明白为什么这不是一个好主意。

幸运的是,这一直是关系数据库的一个长期存在的问题,这就是为什么使用文档方法的 NoSQL 无模式数据库被开发出来,并且在过去十年中流行度如此之高。所有财富 500 强科技公司都使用它来存储不断变化的用户数据,因为它允许单个记录具有任意数量的字段(列),同时保留在同一集合(表)中。

因此,我建议研究 NoSQL 数据库(例如 MongoDB)并尝试转换为它们,或者将它们与关系数据库结合使用。您知道需要具有相同数量的列来表示它们的任何类型的数据都应该存储在 SQL 中,并且您知道记录之间存在差异的任何类型的数据都应该存储在 NoSQL 数据库中。

I understand this may not be the sort of answer you are looking for however unfortunately a relational database ( SQL ) is built upon the idea of a structured predefined schema. You are trying to store non structured schemaless data in a model that was not built for it. Yes you can fudge it so that you can technically store infinite amounts of meta data however this will soon cause lots of issues and quickly get out of hand. Just look at Wordpress and the amount of issues they have had with this approach and you can easily see why it is not a good idea.

Luckily this has been a long standing issue with relational databases which is why NoSQL schemaless databases that use a document approach were developed and have seen such a massive rise in popularity in the last decade. It's what all of the fortune 500 tech companies use to store ever changing user data as it allows for individual records to have as many or as little fields ( columns ) as they wish whilst remaining in the same collection ( table ).

Therefore I would suggest looking into NoSQL databases such as MongoDB and try to either convert over to them, or use them in conjunction with your relational database. Any types of data you know need to have the same amount of columns representing them should be stored in SQL and any types of data you know will differ between records should be stored in the NoSQL database.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文