根据列值将数据划分到不同的表中是否合理?
如果我有一个大表,其中有一列的值范围相当有限(例如 < 100),那么将该表划分为多个名称与该列值相关的表是否合理?
例如,像列这样的表:
table "TimeStamps": [Id] [DeviceId] [MessageCounter] [SomeData]
其中 [DeviceId]
是“有限范围”列将被分成几个不同的表:
table "TimeStamps1": [Id] [MessageCounter] [SomeData] table "TimeStamps2": [Id] [MessageCounter] [SomeData] ... table "TimeStampsN": [Id] [MessageCounter] [SomeData]
我的原始表遇到的问题是找到最大的 MessageCounter 值某些 DeviceId 值需要很长时间才能执行(请参阅此邮政)。
如果表是分开的,找到最大列数应该是一个 O(1) 的操作。
[编辑]
只是偶然发现了这个,我想我会更新它。我最初给我带来的问题是查询原始数据库时的性能问题。但是,在添加额外的数据库索引和计划的索引重组作业后,我能够通过规范化形式获得出色的性能。 SSMS 数据库引擎优化顾问工具对于识别瓶颈和建议缺失的索引有很大帮助。
If I have a large table with a column which has a rather limited range of values (e.g. < 100), is it reasonable to divide this table into several tables with names tied to that column value?
E.g. a table like with columns:
table "TimeStamps": [Id] [DeviceId] [MessageCounter] [SomeData]
where [DeviceId]
is the "limited range" column would be separated into several different tables:
table "TimeStamps1": [Id] [MessageCounter] [SomeData] table "TimeStamps2": [Id] [MessageCounter] [SomeData] ... table "TimeStampsN": [Id] [MessageCounter] [SomeData]
The problem I am having with my original table is that finding a largest MessageCounter value for some DeviceId values takes really long time to execute (see this post).
If tables would be separated, finding a maximum column number should be an O(1) operation.
[Edit]
Just stumbled upon this, thought I would update it. The problem I originally brought me here was performance issues when querying the original database. However, after adding additional db indexes and scheduled index reorganizing jobs, I was able to get great performance with the normalized form. SSMS Database Engine Tuning Advisor tool was of great help for identifying bottlenecks and suggesting the missing indexes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
虽然您可以将其作为最后的性能优化,但我建议不要这样做。主要是因为它使得容纳新的 DeviceID 变得非常困难。
无论如何,这样做是没有必要的。如果有 DeviceID 的索引,DBMS 应该能够非常快速地对其进行过滤。毕竟,这就是 DBMS 的用途......
While you could do it as a last-ditch performance optimization, I would advise against it. Mainly because it makes it very difficult to accomodate new DeviceIDs.
At any rate, doing this should not be necessary. If there's an index for DeviceID, the DBMS should be able to filter on it very quickly. That's what a DBMS is for, after all...
我担心这种方法会大大增加需要访问此数据的任何应用程序的复杂性。另一种方法是将每个设备放在单独的表中,同时仍将所有设备保留在同一个表中,从而为您带来任何好处,即根据 DeviceID 对表进行分区。我建议您研究一下表分区,看看它是否适合您的需求。
分享并享受。
I fear this approach would add a great deal to the complexity of any application which needed to access this data. An alternate approach, which gains you whatever benefits you might get from putting each device in a separate table while still keeping all devices in the same table, would be to partition the table on DeviceID. I suggest that you investigate table partitioning to see if it fits your needs.
Share and enjoy.
这就是分布式数据库的用途。服务器基于某些列共享同一数据库中的表。您告诉服务器如何根据列值的范围分布表。设置完成后,您只需查询表即可,而不必关心数据实际驻留在哪个服务器上。
This is what a distributed database is for. The servers share a table in the same database based on some column. You tell the servers how to distribute the table based on ranges of column values. Once this is set up you just query the table and aren't concerned on which server the data actually resides.
您考虑过数据库分区吗?这是针对您所描述的问题类型的内置解决方案。请参阅:SQL Server 2005 中的分区表和索引< /a>
Have you considered Database partitioning? This is the baked in solution for the type of problem you've described. See: Partitioned Tables and Indexes in SQL Server 2005