对高效数据库形状的总体感觉是什么?

发布于 2024-10-06 07:43:56 字数 464 浏览 2 评论 0 原文

我有一些相当大的数据集,并且正在使用分层数据模块 pytables。在设计数据库时,我想知道创建具有多行且只有 1 个值列的表或采用更常见的“矩阵”格式是否会更快。

例如:

from_cty、to_cty、值
奥斯汀, 纽约, 9000
达拉斯, 纽约, 8000
纽约, 达拉斯, 8400
达拉斯奥斯汀, 5030
达拉斯, 奥斯汀, 4022
纽约, 奥斯汀, 8002
等等...

--或者--

城市'''''''',奥斯汀,达拉斯,纽约
奥斯汀''''''', 0000, 5030, 9000
达拉斯''''''', 4022, 0000, 8000
new york, 8002, 8400, 0000

第一种类型的好处可能包括能够一次拉出单个列,由城市选择或到达城市。但它添加了一个通常是列名称的列。

I have some rather large datasets and am using the hierarchical data module pytables. In desiging the databases, I'm wondering if it would be faster to create tables with many rows and only 1 value column, or in more common 'matrix' format.

For example:

from_cty, to_cty, value
austin, new york, 9000
dallas, new york, 8000
new york, dallas, 8400
austin, dallas, 5030
dallas, austin, 4022
new york, austin, 8002
etc...

--or--

Cities'''''''', austin, dallas, new york
austin''''''', 0000, 5030, 9000
dallas''''''', 4022, 0000, 8000
new york, 8002, 8400, 0000

Benefits of the first type could include being able to pull a single column as once, selected by from or to cities. But it adds a column that would normally be column names.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

書生途 2024-10-13 07:43:56

始终从规范化模型开始,并考虑到表的主要用途。

从您显示的示例数据来看,似乎可能会添加更多城市。如果您使用第二个示例,则需要修改架构,这通常是一件坏事。

如果需要查询第二种格式的数据,您始终可以提供一个视图,甚至是一个定期刷新的永久表。或者,如果您的主要使用模式要求您访问该格式的数据,您当然应该对其进行相应的建模。

尝试、测量并做出适当的调整。

Always start off with a normalized model having your primary usage of the table in mind.

Judging from the example data you have shown, it seems likely that additional cities will be added. If you go with your second example, that would require schema modifications, which is typically a bad thing.

If the need arises to query the data in the second format, you can always provide a view or even a permanent table that you periodically refresh. Or if your primary usage pattern require you to access the data in that format, you should of course model it accordingly.

Try it, measure it and make appropriate adjustments.

梦情居士 2024-10-13 07:43:56

alt text

假设在此模型中“Austin-Dallas”与“Dallas-Austin”相同,您可以防止反向-使用 then 在 Distance 表中进行重复

ALTER TABLE Distance ADD CONSTRAINT chk_id CHECK (FromCityId < ToCityId);

查询,您可以使用如下内容:

select
      a.CityName as FromCity
    , b.CityName as ToCity
    , Value      as Travel
from Distance as d
join City     as a on a.CityId = d.FromCityId
join City     as b on b.CityId = d.ToCityId
where (a.CityName = 'Austin' and b.CityName = 'Dallas')
   or (a.CityName = 'Dallas' and b.CityName = 'Austin') ;

EDIT
只是注意到,在您的示例中,“Austin-Dallas”与“Dallas-Austin”不同,因此不需要检查约束和 WHERE 子句中的第二行。

alt text

Providing that in this model "Austin-Dallas" is the same as "Dallas-Austin" you can prevent reverse-duplicates in the Distance table by using

ALTER TABLE Distance ADD CONSTRAINT chk_id CHECK (FromCityId < ToCityId);

Then when querying you can use something like:

select
      a.CityName as FromCity
    , b.CityName as ToCity
    , Value      as Travel
from Distance as d
join City     as a on a.CityId = d.FromCityId
join City     as b on b.CityId = d.ToCityId
where (a.CityName = 'Austin' and b.CityName = 'Dallas')
   or (a.CityName = 'Dallas' and b.CityName = 'Austin') ;

EDIT:
Just noticed that in your example "Austin-Dallas" is not the same as "Dallas-Austin", so no need for the check constraint and the second row in the WHERE clause.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文