使用存储过程控制所有数据流的海量表的优缺点
DBA(仅用了 2 年的 google 培训)创建了一个庞大的数据管理表(108 列并且还在不断增长),其中包含系统中任何数据流的所有必要属性。 我们将该表简称为 BFT。
这些列中:
10 用于元数据引用。
15 个用于数据源和时间跟踪
1 个文本数据的 new/curr 列实例
用于多值数字更新的新/当前/增量/比率/范围列的 10 个实例 :总共 50 列。
多值数值更新通常只需要2-5个更新组。
批量的 15K-1500K 记录被加载到 BFT 中,并由存储过程进行处理,并使用逻辑来验证这些记录,将它们洗牌到大约 30 个其他表中的永久存储中。
在大多数记录加载中,整个过程中有 50-70 列是空的。
我不是数据库专家,但这个模型和过程似乎有点味道,但我不知道足以说出原因,并且不想在无法提供替代方案的情况下抱怨。
鉴于对数据处理模型的这种非常小的了解,有人有想法或建议吗? 是否可以信任数据库(SQL Server)有效地处理大部分为空列的记录,或者以这种方式进行处理会浪费大量周期/内存等?
DBA (with only 2 years of google for training) has created a massive data management table (108 columns and growing) containing all neccessary attribute for any data flow in the system. Well call this table BFT for short.
Of these columns:
10 are for meta-data references.
15 are for data source and temporal tracking
1 instance of new/curr columns for textual data
10 instances of new/current/delta/ratio/range columns for multi-value numeric updates
:totaling 50 columns.
Multi valued numeric updates usually only need 2-5 of the update groups.
Batches of 15K-1500K records are loaded into the BFT and processed by stored procs with logic to validate those records shuffle them off to permanent storage in about 30 other tables.
In most of the record loads, 50-70 of the columns are empty through out the entire process.
I am no database expert, but this model and process seems to smell a little, but I don't know enough to say why, and don't want to complain without being able to offer an alternative.
Given this very small insight to the data processing model, does anyone have thoughts or suggestions? Can the database (SQL Server) be trusted to handle records with mostly empty columns efficiently, or does processing in this manner wasted lots of cycles/memory,etc.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
听起来他重新发明了 BizTalk。
Sounds like he reinvented BizTalk.
我通常有多个与输入负载相对应的暂存表。 这些可能与目标表相对应,也可能不相对应,但我们不会执行您所说的操作。 如果他不喜欢有很多基本上是临时工作表的东西,可以将它们放入自己的模式中,甚至放入单独的数据库中。
就空列而言,如果在处理 BFT 的特定查询中没有引用它们,那并不重要 - 然而,将会发生的情况是,索引变得更加重要,因为所选索引是非索引聚集覆盖索引。 当使用 BFT 并选择表扫描或聚集索引扫描时,必须读取未使用的列并忽略或跳过,根据我的经验,这似乎肯定会影响处理。 而使用非聚集索引扫描或查找时,读取的列较少,并且希望这不包括任何未使用的列。
I typically have multiple staging tables corresponding to the input loads. These may or may not correspond to the destination tables, but we don't do what you're talking about. If he doesn't like to have a lot of what are basically temporary work tables, they could be put into their own schema or even a separate database.
As far as the columns which are empty, if they aren't referenced in the particular query which is processing BFT it doesn't matter - HOWEVER, what will happen is that the indexing becomes much more crucial that the index chosen is a non-clustered covering index. When your BFT is used and a table scan or clustered index scan is chosen, the unused column have to be read and ignored or skipped, and this definitely seems to affect processing in my experience. Whereas with a non-clustered index scan or seek, less columns are read, and hopefully this doesn't include (m)any of the unused columns.
标准化是这里的关键词。 如果您有如此多的 NULL 值,则很可能会浪费大量空间。 规范化表还应该使该表中的数据完整性更容易执行。
Normalization is the keyword here. If you have so many NULL values, chances are high that you're wasting a lot of space. Normalizing the table should also make data integrity in this table easier to enforce.
可能使事情变得更加灵活(除了规范化之外)的一件事可能是创建一个或多个视图或表函数来呈现数据。 特别是如果桌子不在你的控制范围内,这些可以让你过滤掉虚假的垃圾,只从桌子上抓取你需要的东西。
然而,如果您要成为使用那张巨大的表(并且每次必须打开它时都会皱起眉头)的人之一,您可能想要超越 DBA 的“设计”并使该野兽正常化,也许给 DBA 分配创建一些视图和/或表函数的任务来帮助您解决问题。
我目前使用一个类似但不是那么大的表,它已经在我们的系统上存在多年了,并且有新的字段、索引和约束,相当仓促地添加了弗兰肯斯坦风格。 不幸的是,其他一些工作组依赖该结构作为福音,因此我们创建了这样的视图和函数,使我们能够按照我们需要的方式“塑造”数据。
One thing that might make things a little more flexible (other than normalizing) could be to create one or more views or table functions to present the data. Particularly if the table is outside your control, these would enable you to filter the spurious crap out and grab only what you need from the table.
However, if you're going to be one of the people who will be working with (and frowning every time you have to crack open) that massive table, you might want to trump the DBA's "design" and normalize that beast, and maybe give the DBA the task of creating some views and/or table functions to help you out.
I currently work with a similar but not so huge table which has been around on our system for years and has had new fields and indices and constraints rather hastily tacked on Frankenstein-style. Unfortunately some other workgroups rely on the structure as gospel, so we've created such views and functions to enable us to "shape" the data the way we need it.