设计数据库来保存不同的元数据信息
因此,我正在尝试设计一个数据库,该数据库允许我将一种产品与多个类别连接起来。这部分我已经想好了。但我无法解决的是保存不同类型的产品详细信息的问题。
例如,产品可能是一本书(在这种情况下,我需要引用该书的元数据,如 isbn、作者等),也可能是一个企业列表(具有不同的元数据)。
我应该如何解决这个问题?
So I am trying to design a database that will allow me to connect one product with multiple categories. This part I have figured. But what I am not able to resolve is the issue of holding different type of product details.
For example, the product could be a book (in which case i would need metadata that refers to that book like isbn, author etc) or it could be a business listing (which has different metadata) ..
How should I tackle that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这称为观察模式。
三个对象,例如
表格的样子:
。
。
。
。
编辑:
杰弗里提出了一个有效的观点(请参阅评论),所以我将扩展答案。
该模型允许动态(即时)创建任意数量的实体
具有任何类型的属性,无需更改架构。然而,这种灵活性是有代价的——存储和搜索比通常的表设计更慢、更复杂。
是时候举个例子了,但首先,为了让事情变得更容易,我将把模型展平为视图。
使用评论中 Jefferey 的示例
这看起来写起来很复杂,但仔细检查后,您可能会注意到 CTE 中的模式。
现在假设我们有一个标准的固定模式设计,其中每个对象属性都有自己的列。
查询看起来像这样:
This is called the Observation Pattern.
Three objects, for the example
This is how tables may look like:
.
.
.
.
EDIT:
Jefferey raised a valid point (see comment), so I'll expand the answer.
The model allows for dynamic (on-fly) creation of any number of entites
with any type of properties without schema changes. Hovewer, this flexibility has a price -- storing and searching is slower and more complex than in a usual table design.
Time for an example, but first, to make things easier, I'll flatten the model into a view.
To use Jefferey's example from the comment
This looks complicated to write, but on a closer inspection you may notice a pattern in CTEs.
Now suppose we have a standard fixed schema design where each object property has its own column.
The query would look something like:
我本来不打算回答,但现在接受的答案有一个非常糟糕的主意。关系数据库决不应该用于存储简单的属性值对。这会在未来造成很多问题。
处理这个问题的最佳方法是为每种类型创建一个单独的表。
每个表的每一行都应该代表一个关于现实世界的命题,表的结构及其约束应该反映所代表的现实。越接近这个理想,数据就越干净,报告和以其他方式扩展系统就越容易。它还将更有效地运行。
I wasn't going to answer, but right now the accepted answer has a very bad idea. A relational database should never be used to store simple attribute-value pairs. That will cause a lot of problems down the road.
The best way to deal with this is to create a separate table for each type.
Each row of each table should represent a proposition about the real world, and the structure of the tables and their constraints should reflect the realities that are being represented. The closer you can get to this ideal, the cleaner the data will be, and the easier it will be to do reporting and to extend the system in other ways. It will also run more effeciently.
您可以采用无模式方法:
将元数据作为 JSON 对象(或其他序列化,但 JSON 更好,原因很快就会解释)保存在 TEXT 列中。
这种技术的优点:
更少的查询:您可以在一次查询中获取所有信息,无需“定向”查询(以获取元元数据)和联接。
因为它是JSON,所以您不需要需要在后端进行额外处理。您的网页(我假设它是一个 Web 应用程序)只是从您的 Web 服务中读取 JSON,仅此而已,您可以根据需要将 JSON 对象与 javascript 一起使用。
问题:
可能会浪费空间,如果您有 100 本同一作者的书籍,则所有书籍都只有author_id 的作者表会更节省空间。
需要实现索引。由于您的元数据是 JSON 对象,因此您不会立即拥有索引。但为您需要的特定元数据实现特定索引相当容易。例如,您想按作者建立索引,因此您创建一个包含author_id和item_id的author_idx表,当有人搜索作者时,您可以查找该表和项目本身。
根据规模,这可能有点矫枉过正。在较小规模的连接上效果很好。
You could go with the schema-less approach:
Hold the metadata in a TEXT column as a JSON object (or other serialization, but JSON is better for reasons soon explained).
Advantages to this technique:
Less queries: you get all the information in one query, no need for "in directional" queries (to get meta-meta-data) and joins.
You can add/remove any attributes you want at any time, no need to alter table (which is problematic in some databases, e.g Mysql locks the table, and it takes long time with huge tables)
Since it's JSON, you don't need extra processing on your backend. Your webpage (I assume it's a web application) just reads the JSON as is from your web service and that's it, you can use the JSON object with javascript however you like.
Problems:
Potentially wasted space, if you have 100 books with the same author, an author table with all the books having just the author_id is more economical space wise.
Need to implement indexes. since your metadata is a JSON object you don't have indexes straight away. But it's fairly easy to implement specific index for the specific metadata you need. e.g you want to index by author, so you create a author_idx table with author_id and item_id, when someone searches for author, you can look up this table and the items themselves.
Depending on the scale, this might be an overkill. on smaller scale joins would work just fine.
产品应打字。例如,在产品表中包含 type_id,它指向您将支持的产品类别,并让您知道要查询哪些其他表以获取适当的相关属性。
The product should be typed. e.g. include type_id in the product table, that points to the categories of products you will support, and lets you know which other tables to query against for the appropriate related attributes.
在此类问题中,您有三种选择:
当现实世界的实体都以相同的方式处理时,这才有效。 ,至少在大多数情况下,因此即使不是“相同”的数据,也必须至少具有类似的数据。当存在真正的功能差异时,这种情况就会崩溃。就像对于烤面包机我们计算瓦特=伏特*安培一样,书籍很可能没有相应的计算。当您开始创建包含书籍页数和烤面包机电压的 page_volts 字段时,事情已经失去控制。
使用达米尔建议的属性/价值方案。请参阅我对他的帖子的评论,了解其中的优点和缺点。
我通常建议的是类型/子类型方案。为“产品”创建一个表,其中包含类型代码和通用字段。然后,对于每种真实类型(书籍、烤面包机、猫等),创建一个连接到产品表的单独表。然后当需要进行特定于书籍的处理时,处理书籍表。当您需要进行通用处理时,请处理产品表。
In this kind of problem, you have three choices:
This works when the real-world entities are all being processed in the same way, at least for the most part, and so must have, if not the "same" data, at least analagous data. This breaks down when there are real functional differences. Like if for toasters we are calcualting watts = volts * amps, it is likely that there is no corresponding calculation for books. When you start creating a pages_volts fields that contains the page count for books and the voltage for toasters, things have gotten out of control.
Use a property/value scheme like Damir suggests. See my comment on his post for the pros and cons there.
What I'd usually suggest is a type/subtype scheme. Create a table for "product" that contains a type code and the generic fields. Then for each of the true types -- books, toasters, cats, whatever -- create a separate table that is connected to the product table. Then when you need to do book-specific processing, process the book table. When you need to do generic processing, process the product table.
我知道这可能不是您正在寻找的答案,但不幸的是,关系数据库(SQL)是建立在结构化预定义模式的想法之上的。您正在尝试将非结构化无模式数据存储在不是为其构建的模型中。是的,您可以捏造它,以便从技术上讲您可以存储无限量的元数据,但这很快就会导致很多问题并很快失控。只要看看 WordPress 以及他们使用这种方法遇到的问题数量,您就可以很容易地明白为什么这不是一个好主意。
幸运的是,这一直是关系数据库的一个长期存在的问题,这就是为什么使用文档方法的 NoSQL 无模式数据库被开发出来,并且在过去十年中流行度如此之高。所有财富 500 强科技公司都使用它来存储不断变化的用户数据,因为它允许单个记录具有任意数量的字段(列),同时保留在同一集合(表)中。
因此,我建议研究 NoSQL 数据库(例如 MongoDB)并尝试转换为它们,或者将它们与关系数据库结合使用。您知道需要具有相同数量的列来表示它们的任何类型的数据都应该存储在 SQL 中,并且您知道记录之间存在差异的任何类型的数据都应该存储在 NoSQL 数据库中。
I understand this may not be the sort of answer you are looking for however unfortunately a relational database ( SQL ) is built upon the idea of a structured predefined schema. You are trying to store non structured schemaless data in a model that was not built for it. Yes you can fudge it so that you can technically store infinite amounts of meta data however this will soon cause lots of issues and quickly get out of hand. Just look at Wordpress and the amount of issues they have had with this approach and you can easily see why it is not a good idea.
Luckily this has been a long standing issue with relational databases which is why NoSQL schemaless databases that use a document approach were developed and have seen such a massive rise in popularity in the last decade. It's what all of the fortune 500 tech companies use to store ever changing user data as it allows for individual records to have as many or as little fields ( columns ) as they wish whilst remaining in the same collection ( table ).
Therefore I would suggest looking into NoSQL databases such as MongoDB and try to either convert over to them, or use them in conjunction with your relational database. Any types of data you know need to have the same amount of columns representing them should be stored in SQL and any types of data you know will differ between records should be stored in the NoSQL database.