DDL 定义非规范化规则？

发布于 2024-11-04 11:01:19 字数 708 浏览 10 评论 0原文

我从事关系数据库工作已有多年，但最近进入了 Cassandra/Redis 领域。 NoSQL 对于我们正在做的事情来说是有意义的，所以这很好。

今天，当我定义 Cassandra 列族时，我想到了一个问题：在关系数据库中，为什么 DDL 不让我们定义非规范化规则，以便数据库引擎本身可以本地管理由此产生的一致性问题。换句话说，当关系数据库程序员通过非规范化来实现性能目标时……为什么他/她还要通过专门编写的 SQL 来保持一致性？

也许我缺少一些明显的东西？是否有某种原因导致这样的建议很愚蠢，因为在我看来，拥有这种功能可能非常有用。

编辑：

感谢迄今为止的反馈。我仍然觉得我手上有一个未回答的问题（也许是因为它没有明确表达）。据我所知，物化视图试图为非规范化数据提供引擎管理的一致性。但是，我的理解是，它们不会立即随着基础表的更改而更新。如果这是真的，则意味着引擎实际上没有管理因非规范化而导致的一致性问题......至少在写入时没有。我的意思是，当需要针对复杂的关系模型扩展具有大量读取负载的系统时，没有真正的、功能丰富的、引擎管理的非规范化的规范化数据结构会限制关系数据库引擎。我认为调整物化视图刷新率确实等同于 Cassandra 等 NoSQL 引擎提供的可调“最终一致性”。我需要了解引擎如何有效地同步其物化视图。为了被认为相对于 NoSQL 选项可行，同步视图所需的时间需要随着添加/更新的行数线性增加。

无论如何，我会再考虑一下并重新编辑。希望能提供一些想象中的 DDL 的代表性示例。

原文

I've been doing the relational database thing for years now, but lately have moved into Cassandra/Redis territory. NoSQL makes sense for what we're doing, so that's fine.

As I was working through defining Cassandra column families today a question occurred to me: In relational databases, why doesn't DDL let us define denormalization rules in such a way that the database engine itself could manage the resulting consistency issues natively. In other words, when a relational database programmer denormalizes to achieve performance goals... why is he/she then left to maintain consistency via purpose-written SQL?

Maybe there's something obvious that I'm missing? Is there some reason why such a suggestion is silly, because it seems to me like having this capability might be awfully useful.

EDIT:

Appreciate the feedback so far. I still feel like I have an unanswered (perhaps because it's been poorly articulated) question on my hands. I understand that materialized views attempt to offer engine-managed consistency for denormalized data. However, my understanding is that they aren't updated immediately with changes to the underlying tables. If this is true, it means the engine really isn't managing the consistency issues resulting from the denormalization... at least not at write-time. What I'm getting at is that a normalized data structure without true, feature-rich, engine-managed denormalization hamstrings relational database engines when it comes time to scale a system with heavy read load against complex relational models. I suppose it's true that adjusting materialized view refresh rates equates to tunable "eventual consistency" offered by NoSQL engines like Cassandra. I need to read up on how efficiently engines are able to sync their materialized views. In order to be considered viable relative to NoSQL options, the time it takes to sync a view would need to increase linearly with the number of added/updated rows.

Anyway, I'll think about this some more and re-edit. Hopefully with some representative examples of imagined DDL.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

时常饿 2024-11-11 11:01:19

一些关系数据库系统能够在某种程度上保持非规范化数据的一致性（如果我理解正确的话）。

在Oracle中，这称为物化视图，在SQL Server中称为索引视图。

基本上，这意味着您可以创建一个自我维护的非规范化表作为 SQL 查询的结果并为其建立索引：

CREATE VIEW a_b
WITH SCHEMABINDING
AS
SELECT  b.id AS id, b.value, b.a_id, a.property
FROM    dbo.b b
JOIN    dbo.a a
ON      a.id = b.a_id

生成的视图 a_b，如果它是一个真实的表，会违反2NF，因为property在功能上依赖于a_id，而a_id不是候选键。但是，数据库系统保留了这种功能依赖性，您可以在 (value, property) 上创建复合索引。

即使本身不支持物化视图的 MySQL 和 PostgreSQL 也能够维护某种非规范化表。

例如，当您在 MySQL 中的一列或一组列上创建 FULLTEXT 索引时，您会同时获得两个索引：第一个索引包含每个不同单词的一个条目在每条记录中（引用原始记录id），第二条记录包含整个表中每个单词的一个记录，以及总单词数。这允许快速搜索单词并按相关性排序。

总字数表当然依赖于各个单词表，因此违反了5NF，但是系统再次保持了这种依赖性。

PostgreSQL 中的 GIN 和 GIST 索引也做了类似的事情。

当然，并非所有可能的非规范化都可以维护，这意味着您无法实时具体化和索引任何查询：有些查询维护成本太高，有些理论上可行，但在实际系统中未实现，等等。

但是，您可以维护它们在触发器、存储过程或其他什么中使用您自己的逻辑，这正是它们的用途。

Some relational database systems are able to maintain consistency of denormalized data to some extent (if I understand right what you mean).

In Oracle, this is called materialized views, in SQL Server — indexed views.

Basically, this means that you can create a self-maintaned denormalized table as a result of an SQL query and index it:

CREATE VIEW a_b
WITH SCHEMABINDING
AS
SELECT  b.id AS id, b.value, b.a_id, a.property
FROM    dbo.b b
JOIN    dbo.a a
ON      a.id = b.a_id

The resulting view, a_b, were it a real table, would violate 2NF since property is functionally dependent on a_id which is not a candidate key. However, the database system maintains this functional dependency and you can create a composite index on, say, (value, property).

Even MySQL and PostgreSQL which don't support materialized views natively are capable of maintaining some kind of denormalized tables.

For instance, when you create a FULLTEXT index on a column or a set of columns in MySQL, you get two indexes at once: first one contains one entry for each distinct word in each record (with a reference to the original record id), the second one contains one record per each word in the whole table, with the total word count. This allows searching for the words fast and ordering by relevance.

The total word count table is of course dependent on the individual words table and hence violates 5NF, but, again, the systems maintains this dependency.

Similar things are done for GIN and GIST indexes in PostgreSQL.

Of course not all possible denormalizations can be maintained, that means that you cannot materialize and index just any query in real time: some are too expensive to maintain, some are theoretically possible but not implemented in actual systems, etc.

However, you may maintain them using your own logic in triggers, stored procedures or whatever, that's exactly what they are there for.

回复收藏 0 原文