合并相同的表但保持单独的引用完整性

发布于 2024-08-04 11:20:38 字数 1213 浏览 5 评论 0原文

考虑一个具有诸如 (fk_dim1value, fk_dim2value, ..., value) 之类的事实表的维度模型,其中 fk_X 列是对应的普通维度表 dim1value 的外键( id,value),dim2value(id,value),等。

这些事实和维度表是从不同的来源自动收集的,所以它们有很多......而且它们是多余的:所有维度值表在结构上是相同的,(id, value),表示简单的文本值集合,没有进一步的语义(唯一的区别是在不同的事实表中引用它们的外键不同)。稍后可能会出现不太重要的维度类型,但不同类型维度的集合仍然很小。

因此,我想将维度表合并到一个表 dimvalue (fk_dim, dimvalue_id, value) 中,其中 fk_dim 引用表 dimension (dim_id, name) >,并且 dimvalue_id 仅在每个维度内是唯一的。然后,自然主键是复合的:(fk_dim, dimvalue_id)

事实表外键列现在都引用同一个表,dimvalue (fk_dim, dimvalue_id, value) ...但是当然,每个列都与特定维度相关联,因此仍应限制为专门引用该维度的值(统一表dimvalue的水平分区)。

有没有(明智的)方法来做到这一点?

我的意思是类似“半复合”外键,即对复合 PK 的“切片”的单列引用,其他列具有固定值。 “完全复合”FK 将是 FOREIGN KEY (col1, col2) REFERENCES dimvalue (fk_dim, dimvalue_id) 但这里 fk_dim 是固定的,因此“home”一侧键只有一列,引用 dimvalue 主键的第二列;类似于FOREIGN KEY (fk_dim7value) REFERENCES dimvalue (fk_dim=7, dimvalue_id)

这样的事情可能吗?或者我在最后一段中迷失了方向?我是否应该放弃整个 dimvalue 表的外键,然后添加检查约束以按维度进行限制?或者引用完整性是否要求我放弃更多并只接受所有单独的相同表?

(限制对写入性能的影响并不重要;读取性能是一个设计目标。)

Consider a dimensional model with fact tables like (fk_dim1value, fk_dim2value, ..., value) where the fk_X columns are foreign keys into corresponding trivial dimension tables dim1value (id, value), dim2value (id, value), etc.

These fact-and-dimension tables are collected automatically from disparate sources, so there's a lot of them ... and they are redundant: all the dimension value tables are structurally identical, (id, value), representing simple collections of textual values with no further semantics (the only difference being the different foreign keys referencing them in the various fact tables). Less trivial dimension types will probably come up later, but the set of different types of dimensions will remain small.

So I want to merge the dimension tables into one table dimvalue (fk_dim, dimvalue_id, value) where fk_dim references a table dimension (dim_id, name), and dimvalue_id is unique only within each dimension. The natural primary key is then composite: (fk_dim, dimvalue_id).

The fact table foreign-key columns now all reference the same table, dimvalue (fk_dim, dimvalue_id, value) ... but of course each column is associated with a particular dimension and thus should still be limited to referencing the values of that dimension specifically (a horizontal partition of the unified table dimvalue).

Is there a (sensible) way to do this?

I mean something like a “half-composite” foreign key, i.e. a single-column reference to a “slice” of a composite PK, with a fixed value for the other column(s). A “fully-composite” FK would be FOREIGN KEY (col1, col2) REFERENCES dimvalue (fk_dim, dimvalue_id) but here fk_dim is fixed and so the “home” side of the key is just one column, referencing the second column of the dimvalue primary key; something like FOREIGN KEY (fk_dim7value) REFERENCES dimvalue (fk_dim=7, dimvalue_id).

Is something like that possible? Or am I losing my way in this last paragraph? Should I give up and just foreign-key to the whole dimvalue table and then add check constraints to limit by dimension? Or does referential integrity require me to give up even more and just accept all the separate identical tables?

(Impact of constraints on write performance is not important; read performance is a design goal.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

依 靠 2024-08-11 11:20:38

您已经阐述了这些关键考虑因素

  • 数据是从不同的系统收集的,因此我得出结论这是一个“报告”表,而不是一个“操作”或“事务”类型系统
  • 每个事实表每行包含 1 条业务数据,即 您的事实表的“值”列
  • 似乎只包含一个“度量”或“事实”,
  • 写入性能无关,只有读取性能才是目标。这证实了我的结论,即这是一个“报告”表

考虑到您追求快速读取性能,我会选择“大表”设计。诚然,大表设计对于事务系统来说是可怕的,但这不是一个。我所说的大桌子是指
TABLE(DIM1VALUE, DIM2VALUE, DIM3VALUE, DIM4VALUE ....DIMNVALUE, FACTVALUE)

无论如何,您的维度表只有 1 列业务数据,因此请跳过查找。对每一列(事实值除外)建立索引,然后测试查询的性能。

当您加载大表时,您可以检查数据质量值并标记/解决那些超出预期范围的值。

现在,如果维度表的数量过多,您可以将大表分成几组,其中分组基于逻辑用法,例如,如果维度中的 10 个属性始终一起使用,则将它们放在 BIGTABLE1 中,依此类推。

You've stated these key considerations

  • Data is collected from disparate systems therefore I conclude this is a "reporting" table and not an "operations" or "transactional" type system
  • each fact table contains 1 piece of business data per row, i.e., the "value" column
  • your fact table seems to contain only one "measure" or "fact"
  • write-performance is irrelevant, only read-performance is the goal. This affirms my conclusion that this is a "reporting" table

Considering you're after fast read performance I'd go for the "big table" design. Granted the big table design is HORRIBLE for transaction systems but this isn't one. By big table I mean
TABLE(DIM1VALUE, DIM2VALUE, DIM3VALUE, DIM4VALUE....DIMNVALUE, FACTVALUE)

Your dimension tables were only 1 column of business data anyway so skip the lookups. Index every column (except the fact value) and then test your queries for performance.

While you load the big table you can check the values for data quality and flag/address those outside the expected ranges.

Now if your number of dimension tables was excessive you could split the big table into groups where the grouping is based on logical usage, e.g., if 10 of the attributes from the dimensions are always used together then keep them together in BIGTABLE1, and so on.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文