DDL 定义非规范化规则?
我从事关系数据库工作已有多年,但最近进入了 Cassandra/Redis 领域。 NoSQL 对于我们正在做的事情来说是有意义的,所以这很好。
今天,当我定义 Cassandra 列族时,我想到了一个问题:在关系数据库中,为什么 DDL 不让我们定义非规范化规则,以便数据库引擎本身可以本地管理由此产生的一致性问题。换句话说,当关系数据库程序员通过非规范化来实现性能目标时……为什么他/她还要通过专门编写的 SQL 来保持一致性?
也许我缺少一些明显的东西?是否有某种原因导致这样的建议很愚蠢,因为在我看来,拥有这种功能可能非常有用。
编辑:
感谢迄今为止的反馈。我仍然觉得我手上有一个未回答的问题(也许是因为它没有明确表达)。据我所知,物化视图试图为非规范化数据提供引擎管理的一致性。但是,我的理解是,它们不会立即随着基础表的更改而更新。如果这是真的,则意味着引擎实际上没有管理因非规范化而导致的一致性问题......至少在写入时没有。我的意思是,当需要针对复杂的关系模型扩展具有大量读取负载的系统时,没有真正的、功能丰富的、引擎管理的非规范化的规范化数据结构会限制关系数据库引擎。我认为调整物化视图刷新率确实等同于 Cassandra 等 NoSQL 引擎提供的可调“最终一致性”。我需要了解引擎如何有效地同步其物化视图。为了被认为相对于 NoSQL 选项可行,同步视图所需的时间需要随着添加/更新的行数线性增加。
无论如何,我会再考虑一下并重新编辑。希望能提供一些想象中的 DDL 的代表性示例。
I've been doing the relational database thing for years now, but lately have moved into Cassandra/Redis territory. NoSQL makes sense for what we're doing, so that's fine.
As I was working through defining Cassandra column families today a question occurred to me: In relational databases, why doesn't DDL let us define denormalization rules in such a way that the database engine itself could manage the resulting consistency issues natively. In other words, when a relational database programmer denormalizes to achieve performance goals... why is he/she then left to maintain consistency via purpose-written SQL?
Maybe there's something obvious that I'm missing? Is there some reason why such a suggestion is silly, because it seems to me like having this capability might be awfully useful.
EDIT:
Appreciate the feedback so far. I still feel like I have an unanswered (perhaps because it's been poorly articulated) question on my hands. I understand that materialized views attempt to offer engine-managed consistency for denormalized data. However, my understanding is that they aren't updated immediately with changes to the underlying tables. If this is true, it means the engine really isn't managing the consistency issues resulting from the denormalization... at least not at write-time. What I'm getting at is that a normalized data structure without true, feature-rich, engine-managed denormalization hamstrings relational database engines when it comes time to scale a system with heavy read load against complex relational models. I suppose it's true that adjusting materialized view refresh rates equates to tunable "eventual consistency" offered by NoSQL engines like Cassandra. I need to read up on how efficiently engines are able to sync their materialized views. In order to be considered viable relative to NoSQL options, the time it takes to sync a view would need to increase linearly with the number of added/updated rows.
Anyway, I'll think about this some more and re-edit. Hopefully with some representative examples of imagined DDL.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一些关系数据库系统能够在某种程度上保持非规范化数据的一致性(如果我理解正确的话)。
在
Oracle
中,这称为物化视图
,在SQL Server
中称为索引视图
。基本上,这意味着您可以创建一个自我维护的非规范化表作为
SQL
查询的结果并为其建立索引:生成的视图
a_b
,如果它是一个真实的表,会违反2NF
,因为property
在功能上依赖于a_id
,而a_id
不是候选键。但是,数据库系统保留了这种功能依赖性,您可以在(value, property)
上创建复合索引。即使本身不支持物化视图的
MySQL
和PostgreSQL
也能够维护某种非规范化表。例如,当您在 MySQL 中的一列或一组列上创建 FULLTEXT 索引时,您会同时获得两个索引:第一个索引包含每个不同单词的一个条目在每条记录中(引用原始记录
id
),第二条记录包含整个表中每个单词的一个记录,以及总单词数。这允许快速搜索单词并按相关性排序。总字数表当然依赖于各个单词表,因此违反了
5NF
,但是系统再次保持了这种依赖性。PostgreSQL
中的GIN
和GIST
索引也做了类似的事情。当然,并非所有可能的非规范化都可以维护,这意味着您无法实时具体化和索引任何查询:有些查询维护成本太高,有些理论上可行,但在实际系统中未实现,等等。
但是,您可以维护它们在触发器、存储过程或其他什么中使用您自己的逻辑,这正是它们的用途。
Some relational database systems are able to maintain consistency of denormalized data to some extent (if I understand right what you mean).
In
Oracle
, this is calledmaterialized views
, inSQL Server
—indexed views
.Basically, this means that you can create a self-maintaned denormalized table as a result of an
SQL
query and index it:The resulting view,
a_b
, were it a real table, would violate2NF
sinceproperty
is functionally dependent ona_id
which is not a candidate key. However, the database system maintains this functional dependency and you can create a composite index on, say,(value, property)
.Even
MySQL
andPostgreSQL
which don't support materialized views natively are capable of maintaining some kind of denormalized tables.For instance, when you create a
FULLTEXT
index on a column or a set of columns inMySQL
, you get two indexes at once: first one contains one entry for each distinct word in each record (with a reference to the original recordid
), the second one contains one record per each word in the whole table, with the total word count. This allows searching for the words fast and ordering by relevance.The total word count table is of course dependent on the individual words table and hence violates
5NF
, but, again, the systems maintains this dependency.Similar things are done for
GIN
andGIST
indexes inPostgreSQL
.Of course not all possible denormalizations can be maintained, that means that you cannot materialize and index just any query in real time: some are too expensive to maintain, some are theoretically possible but not implemented in actual systems, etc.
However, you may maintain them using your own logic in triggers, stored procedures or whatever, that's exactly what they are there for.
RDBMS 中的非规范化是一种特殊情况:不是标准。只有当你有一个经过证实的案例时,人们才会这样做。如果您预先使用非规范化数据进行设计,那么您就已经失败了。
鉴于每种情况在定义上都是“特殊的”,那么如何才能有标准的 SQL 结构来维护非规范化的数据。
RDBMS 与 NoSQL 的不同之处在于它旨在与规范化设计一起使用。恕我直言,你不能这样比较 RDBMS 和 NoSQL
Denormalisation in an RDBMS is a special case: not the standard. One only does this when you have a proven case. If you design in denormalised data up front, you've already lost.
Given each case is by definition "special", then how can there be standard SQL constructs to maintain the denormalised data.
An RDBMS differs from NoSQL in that it is designed to work with normalised designs. IMHO, you can't compare RDBMS and NoSQL like this