将数据从平面表投影到邻接表
我正在设计一个系统来记录和报告日常测量数据。数据由类别标识符、日期/时间和测量数据(其中最多可以有 500 条浮点或整数)组成。
这些类别被可视化为树结构,其中数据与节点以及叶子相关联。
原始数据以 CSV 形式提供,格式如下:
1/6/2001 15:55, /Node1/Node2/Node3, 121, 34, 452, 651, 167
1/6/2001 15:55, /Node1/Node2/Node3/LeafA, 12, 34, 45, 65, 67
1/6/2001 15:55, /Node1/Node4/Node5/LeafB, 21, 32, 43, 54, 65
我计划使用邻接列表(请参阅 树形数据结构的数据库结构)用于树形结构。我还计划有第二个表仅用于测量数据和日期/时间。这样,一旦树形结构第一次生成,就可以被测量数据表反复引用。此外,拥有一个小的邻接列表表可以使系统更具可读性:)。在下面的类别表中,Name 是节点或叶名称(例如 Node1 或 LeafA),FullName 是整个分支路径(例如 Node1/Node2/Node3/LeafA)。不确定我是否需要两者,但我认为它们会派上用场,因此我不必在需要时重新创建 FullName。
CREATE TABLE [dbo].[Category](
[CatId] [int] IDENTITY(1,1) NOT NULL,
[ParentCatId] [int] NULL,
[Name] [nvarchar](30) NOT NULL,
[FullName] [nvarchar](MAX) NOT NULL
CONSTRAINT [PK_Category] PRIMARY KEY CLUSTERED
(
[CatId] ASC
) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[MeasurementData](
[CatId] [int] NOT NULL,
[DateCollected] [datetime] NOT NULL,
[foo] [int] NOT NULL,
[bar] [float] NOT NULL,
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[MeasurementData] WITH CHECK ADD CONSTRAINT [FK_ MeasurementData _Category] FOREIGN KEY([CatId])
REFERENCES [dbo].[Category] ([CatId])
GO
ALTER TABLE [dbo].[MeasurementData] CHECK CONSTRAINT [FK_ MeasurementData _Category]
GO
为了将数据加载到系统中,我正在考虑使用 BCP 将 CSV 加载到平面表(SQL Server 2008 中),然后将平面表投影到分层表结构。
问题1:我应该使用 T-SQL 还是 C#(SQL Server 之外的 C# 应用程序)尝试此投影?
Q2:任何人都有现有的算法可以根据上面的类别标识符快速找到(或创建并返回)正确的叶子吗?
仅供参考,我还在使用WITH关键字后跟公共表表达式来理解递归查询语法——当我需要进行一些递归编程时。
https://stackoverflow.com/questions/tagged/common-table-expression
http://media.pragprog.com/titles/bksqla/trees.pdf
提前致谢
I am designing a system to record and report on daily measurement data. The data consists of a category identifier, the date/time, and the measurement data (of which there can be up to 500 pieces either as float or int).
The categories are visualized as a tree structure where data is associated with a node as well as with a leaf.
The raw data comes in as a CSV in the following format:
1/6/2001 15:55, /Node1/Node2/Node3, 121, 34, 452, 651, 167
1/6/2001 15:55, /Node1/Node2/Node3/LeafA, 12, 34, 45, 65, 67
1/6/2001 15:55, /Node1/Node4/Node5/LeafB, 21, 32, 43, 54, 65
I am planning on using Adjacency List (see Database Structure for Tree Data Structure) for the tree structure. I am also planning to have a second table just for the measurement data and the date/time. This way, once the tree structure is generated the first time, it can be referenced over and over again by the measurement data table. Also, having a small Adjacency List table makes the system much more readable :). In the Category table below, Name would be a node or leaf name (e.g. Node1 or LeafA) and FullName would be the entire branch path (e.g. Node1/Node2/Node3/LeafA). Not sure I need both, but I think they will come in handy so I don’t have to recreate the FullName when needed.
CREATE TABLE [dbo].[Category](
[CatId] [int] IDENTITY(1,1) NOT NULL,
[ParentCatId] [int] NULL,
[Name] [nvarchar](30) NOT NULL,
[FullName] [nvarchar](MAX) NOT NULL
CONSTRAINT [PK_Category] PRIMARY KEY CLUSTERED
(
[CatId] ASC
) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[MeasurementData](
[CatId] [int] NOT NULL,
[DateCollected] [datetime] NOT NULL,
[foo] [int] NOT NULL,
[bar] [float] NOT NULL,
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[MeasurementData] WITH CHECK ADD CONSTRAINT [FK_ MeasurementData _Category] FOREIGN KEY([CatId])
REFERENCES [dbo].[Category] ([CatId])
GO
ALTER TABLE [dbo].[MeasurementData] CHECK CONSTRAINT [FK_ MeasurementData _Category]
GO
To load the data into the system, I was thinking of using BCP to load the CSV into a flat table (into SQL Server 2008) and then project the flat table to the hierarchical table structure.
Q1: Should I attempt this projection using T-SQL or C# (C# app outside of SQL Server)?
Q2: Anyone have an existing algorithm to quickly find (or create and return) the correct leaf given the category identifier above?
FYI, I’m also in the process of wrapping my head around the recursive query syntax using the WITH keyword followed by a common table expression - for when i need to do some recursive programming.
https://stackoverflow.com/questions/tagged/common-table-expression
http://media.pragprog.com/titles/bksqla/trees.pdf
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的表结构可能有点不稳定。
您提供的示例输入数据表明整套度量适用于整个节点列表。如果这是真的,那么您最好对节点列表字符串进行哈希处理,得到如下内容:
来自MeasurementData 的外键现在位于HashId 上。
这回答了您的问题 1:在传递数据时在 C# 中生成哈希,生成两个为类别表和测量数据表准备好 BCP 的输出文件。
由于这是某种数据仓库,因此不要害怕生成针对其他方法检索而优化的其他数据副本,因此请务必在 CategoryDetails 表中制作类别的第二个表示形式,如下所示:
至于如何使用通用表表达式,我也遇到了一些麻烦,但一旦我弄清楚了,我就写了一篇博客文章: http://database-programmer.blogspot.com/2010/11/recursive-queries-with-common-table.html
Your table structure may be a little iffy.
The example input data you provide suggests that the entire set of measures applies to the entire node list. If this is true, then you are better off hashing the node list string, getting something like this:
The foreign key from MeasurementData is now on the HashId.
This answers your Q1: generate the hash in C# while passing the data, generating two output files that are BCP ready for the Category table and the MeasurementData tables.
Since this is some kind of data warehouse, don't be afraid to generate other copies of the data that are optimized for retrieval by other methods, so by all means make a second representation of the categories, in a CategoryDetails table something like this:
As for how to use Common Table Expressions, I also had some trouble wrapping my head around them, but once I figure it out I wrote a blog entry: http://database-programmer.blogspot.com/2010/11/recursive-queries-with-common-table.html