一个有两组几乎重复的列名的表应该如何设计?

发布于 2024-10-07 11:07:50 字数 241 浏览 0 评论 0原文

我有一个大约有 40 列的表。列名称的唯一区别是最后 20 个列名称前均以“B”开头。这张表是用来比较的。换句话说,将前 20 列中的数据与后 20 列中的数据进行比较。

我知道这是一个非常糟糕的设计,那么这个表应该如何重新设计,使得只有20列,但我们仍然可以比较数据呢?

编辑:如果有帮助,我们还使用这些数据来查找匹配的队列

还要注意,性能是这里的主要关注点。通过复制列,数据的获取速度非常快。

谢谢!

I have a table that has around 40 columns. The only difference in the columns names is that the last 20 all start with "B" before the column name. This table is used for comparing. In other words, compare the data in the first 20 columns to the data in the last 20 columns.

I know this is very bad design, so how should this table be redesigned, so that there are only 20 columns, yet we can still compare the data?

EDIT: if it helps, we also use this data to find a matched cohort

Also note that performance is of main concern here. By duplicating the columns the getting of data is extremely fast.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

洒一地阳光 2024-10-14 11:07:50

两种可能的架构和查询提示。

1) 使用“类型”列构建表格,并使用它来标记“主要”与“备用”。就您的情况而言,“A”与“B”可能比较合适。

2) 构建一个垂直分区,两个相同的表(用于主数据和备用数据),共享一个公共主键。 (如果 Id = 42 在一个表中,则它必须在另一个表中 - 除非“备用”数据是可选的,在这种情况下,不要填充第二个表。)另外,还可以选择使用第三个表来跟踪所有可能的主表键,以及已知对于两个表始终通用的任何数据。

提示:阅读 SELECT...EXCEPT 和 SELECT...INTERSECT。它们运行速度快得令人不安,并且非常适合比较两个数据集之间的所有列和行的差异(例外)和匹配(相交)。您可以相当轻松地使用这两种结构中的任何一个,并且它也可以与您现有的代码一起使用(尽管编写查询可能会更麻烦)。

Two possible architectures and a query tip.

1) Build your table with a "Type" column, and use that to flag "primary" vs. "alternate". In your case, "A" vs. "B" might be appropriate.

2) Build a vertical partition, two identical tables (for primary and alternate data), that share a common primary key. (If Id = 42 is in one table, it must be in the other--unless "alternate" data is optional, in which case don't populate the second table.) Also optionally, have a third table that tracks all possible primary keys, along with any data that is known to always be common to both tables.

Tip: Read up on SELECT...EXCEPT and SELECT...INTERSECT. They run disturbingly quickly, and are idea for comparing all columns and rows between two datasets for differences (except) and matches (intersect). You can use this fairly easily with either of the two structures, and it would work with your existing code as well (though it might be fussier to write the query).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文