我有一些相当大的数据集,并且正在使用分层数据模块 pytables。在设计数据库时,我想知道创建具有多行且只有 1 个值列的表或采用更常见的“矩阵”格式是否会更快。
例如:
from_cty、to_cty、值
奥斯汀, 纽约, 9000
达拉斯, 纽约, 8000
纽约, 达拉斯, 8400
达拉斯奥斯汀, 5030
达拉斯, 奥斯汀, 4022
纽约, 奥斯汀, 8002
等等...
--或者--
城市'''''''',奥斯汀,达拉斯,纽约
奥斯汀''''''', 0000, 5030, 9000
达拉斯''''''', 4022, 0000, 8000
new york, 8002, 8400, 0000
第一种类型的好处可能包括能够一次拉出单个列,由城市选择或到达城市。但它添加了一个通常是列名称的列。
I have some rather large datasets and am using the hierarchical data module pytables. In desiging the databases, I'm wondering if it would be faster to create tables with many rows and only 1 value column, or in more common 'matrix' format.
For example:
from_cty, to_cty, value
austin, new york, 9000
dallas, new york, 8000
new york, dallas, 8400
austin, dallas, 5030
dallas, austin, 4022
new york, austin, 8002
etc...
--or--
Cities'''''''', austin, dallas, new york
austin''''''', 0000, 5030, 9000
dallas''''''', 4022, 0000, 8000
new york, 8002, 8400, 0000
Benefits of the first type could include being able to pull a single column as once, selected by from or to cities. But it adds a column that would normally be column names.
发布评论
评论(2)
始终从规范化模型开始,并考虑到表的主要用途。
从您显示的示例数据来看,似乎可能会添加更多城市。如果您使用第二个示例,则需要修改架构,这通常是一件坏事。
如果需要查询第二种格式的数据,您始终可以提供一个视图,甚至是一个定期刷新的永久表。或者,如果您的主要使用模式要求您访问该格式的数据,您当然应该对其进行相应的建模。
尝试、测量并做出适当的调整。
Always start off with a normalized model having your primary usage of the table in mind.
Judging from the example data you have shown, it seems likely that additional cities will be added. If you go with your second example, that would require schema modifications, which is typically a bad thing.
If the need arises to query the data in the second format, you can always provide a view or even a permanent table that you periodically refresh. Or if your primary usage pattern require you to access the data in that format, you should of course model it accordingly.
Try it, measure it and make appropriate adjustments.
假设在此模型中“Austin-Dallas”与“Dallas-Austin”相同,您可以防止反向-使用 then 在
Distance
表中进行重复查询,您可以使用如下内容:
EDIT:
只是注意到,在您的示例中,“Austin-Dallas”与“Dallas-Austin”不同,因此不需要检查约束和
WHERE
子句中的第二行。Providing that in this model "Austin-Dallas" is the same as "Dallas-Austin" you can prevent reverse-duplicates in the
Distance
table by usingThen when querying you can use something like:
EDIT:
Just noticed that in your example "Austin-Dallas" is not the same as "Dallas-Austin", so no need for the check constraint and the second row in the
WHERE
clause.