MySQL:同一个表上有许多连接和关系(理论问题)
这是一个更理论的问题,而不是具体的场景:
假设我们有一个像这样的简化表格方案:
items
包含一些基本数据,item_data
每个项目的附加属性,rel_items
设置不同项目之间的树关系。有不同类型的项目(由字段 items.item_type
表示),它们在 item_data
中存储有不同的字段,例如:狗、猫、老鼠。
如果我们有一些带有一些连接和连接的更大查询(例如获取其父项与其他项具有某些条件的项等),与将所有不同类型的项拆分为单独的表(< code>dog、cat
、mouse
)而不是将它们合并为一个?
如果我们将所有内容都保存在一个基本项目表中,创建视图(狗、猫、老鼠)是否会以某种方式影响性能?
编辑(如下评论):我认为“物种”,“家养宠物”等为 item_types。每种类型都有不同的属性。使用基本项目表和 item_data 表的目的是拥有一个基本“对象”,并根据需要向其附加尽可能多的属性,而无需修改数据库方案。例如,我不知道应用程序中会有多少动物以及它们具有什么属性,因此我想到了一种不需要在用户每次创建新动物时都进行更改的数据库方案。
This is a more theoretical question, not a specific scenario:
Let's assume, we have a simplified table scheme like this:
items
contains some basic data, item_data
additional properties for each item an rel_items
sets a tree relationship between the different items. There are different types of items (represented by the field items.item_type
) which have different fields stored in item_data
, for example: dog, cat, mouse.
If we have some bigger queries with some joins and conjunctions (stuff like getting items with their parent items having some conditions with other items and so on), could this become a performance issue compared to splitting all different types of items into separate tables (dog
, cat
, mouse
) and not merging them into a single one?
If we keep it all in one basic item table, does creating views (dog, cat, mouse) impact performance somehow?
edit (as commented below): I thought of "species", "house-pets" and so on as item_types. Each type has different properties. The intention of using a basic item table and the item_data table is to have a basic "object" and attaching as many properties to them as necessary without having to modify the database scheme. For example, I don't know how many animals there will be in the application and what properties they have, so I thought of a database scheme that doesn't need to be alterted each time the user creates a new animal.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不。
不。
单独的表意味着它们本质上是不同的东西 - 不同的属性或不同的操作(或两者都不同)。
同一个表意味着它们本质上是相同的东西 - 相同的属性和相同的操作。
性能并不是首要考虑的因素。
意义是首先要考虑的。
在弄清楚这些事情的含义以及各项之间真正的功能依赖关系是什么之后,您就可以考虑连接性能了。
“狗、猫、鼠”都是哺乳动物。一张桌子。
“狗、猫、鼠”是两种肉食动物和一种杂食动物。两张桌子。
“狗、猫、鼠”是两种常见的家养宠物和一种常见的害虫。两张桌子。
“狗、猫、老鼠”是一种很酷的动物和两种令人讨厌的动物。两张桌子。
“狗、猫、鼠”是三个不同的物种。三张桌子。
这是关于意义的。
No.
No.
Separate tables means they're fundamentally different things -- different attributes or different operations (or both are different)
Same table means they're fundamentally the same things -- same attributes and same operations.
Performance is not the first consideration.
Meaning is the first consideration.
After you sort out what these things mean, and what the real functional dependencies among the items are, then you can consider join performance.
"Dog, cat, mouse" are all mammals. One table.
"Dog, cat, mouse" are two carnivores and one omnivore. Two tables.
"Dog, cat, mouse" are two conventional house-pets and one conventional pest. Two tables.
"Dog, cat, mouse" are one cool animal and two nasty animals. Two tables.
"Dog, cat, mouse" are three separate species. Three tables.
It's about the meaning.
尝试构建一个可以容纳新对象的模式,这些新对象是在设计数据库时未分析和包含的对象,这是在关系数据库的讨论中反复出现的想法。
在经典的关系数据建模中,可以根据关于讨论范围的某些命题来设计关系。这些命题是数据用户通过从数据库检索数据可以获得的事实。基本关系是通过在数据库中存储某些内容来断言的。派生关系可以通过对基础关系进行运算得到。当使用关系数据模型作为指导构建 SQL 数据库时,基本关系成为表,派生关系成为视图。
但所有这一切都以在数据库设计开始之前在数据分析过程中发现属性为前提。
实际上,在过去的 25 年里,大多数数据库都是建立在后来被发现不完整或不正确的分析基础上的。然后,根据新的和改进的分析对数据库进行修订,修订后的数据库有时需要维护应用程序代码。可以肯定的是,关系模型和 SQL 数据库比关系型数据库之前创建的应用程序依赖性要少。
但是尝试提出像您这样的通用数据模式是很自然的,它可以容纳任何主题而无需更改模式。这种方法会带来一些后果,而且它们所涉及的成本比单纯的性能问题要大得多。对于小型项目,这些成本是相当容易管理的,并且完全通用的模式在这些情况下可能会很好地工作。
但在非常大的情况下,有数十种实体类型和数百个基于这些实体及其关系的相关命题,尝试构建“主题不可知”的模式常常会导致灾难。这些发展灾难有据可查,较大的灾难则导致数百万美元的努力白费。
我无法向你证明这种做法一定会导致灾难。但从别人的错误中吸取教训往往比冒着重蹈覆辙的风险更有价值。
The attempt to build a schema that can acommodate new objects, ones not analyzed and included when the database was designed, is an idea that comes up over and over again in discussions of relational databases.
In classical relational data modeling, relations can be devised in the light of certain propositions that are to be asserted about the universe of discussion. These propositions are the facts that users of the data can obtain by retrieving data from the database. Base relations are asserted by storing something in the database. Derived relations can be obtained by operations on the base relations. When an SQL database is built using a relational data model as a guide, base relations become tables and derived relations become views.
But all of this presupposes that the attributes are discovered during data analysis, before database design begins.
In practice, over the last 25 years, most databases have been built on the basis of analysis later revealed to have been incomplete or incorrect. Databases then get revised in the light of new and improved analysis, and the revised database sometimes requires application code maintenance. To be sure, the relational model and the SQL databases created fewer application dependencies than the pre-relational databases did.
But it's natural to try to come up with a generic data schema like yours, that can accomodate any subject matter whatsoever with no schema changes. There are consequences to this approach, and they involve far greater costs than mere performance issues. For small projects, these costs are quite manageable, and the completely generic schema may work well in those cases.
But in the very big cases, where there are dozens of entity types and hundreds of relevant propositions based on those entities and their relationships, the attempt to build a schema that is "subject matter agnostic" has often resulted in disaster. These development disasters are well documented, and the larger disasters involve millions of dollars of wasted effort.
I can't prove to you that such an approach has to lead to disaster. But learning from other people's mistakes is often much more worthwhile than taking the risk of repeating them.
当然,访问连接表中的数据总是会更慢。
但如果有适当的索引,速度减慢可能是可以接受的(比如 2 倍)。
我会将您在查询中使用的常用项目移至项目表中,并仅在 item_data 中保留您需要显示的值,这些值在 WhERE 和 JOIN 条件中不使用。
Surely, accessing data in joined table WILL be slower, always.
But with proper indexes it might be acceptable slowdown (like 2x).
I would move common items you use in queries into items table, and leave in item_data only values you need to display, which are not uses in WhERE and JOIN conditions.