数据库设计：复合键与一列主键

发布于 2024-07-30 17:13:54 字数 1297 浏览 13 评论 0原文

我们的 Web 应用程序的数据库包含两个表：

States (idStates, State, Lat, Long)

idStates 是一个自动递增主键。

Cities (idAreaCode, idStates, City, Lat, Long)

idAreaCode 是由国家/地区代码 + 区号组成的主键（例如 91422，其中 91 是印度的国家/地区代码，422 是印度某个城市的区号）。 idStates 是引用 States 的外键，用于将 Cities 中的每个城市与其相应的州关联起来。

我们认为国家代码+区号对于每个城市都是唯一的，因此可以安全地用作主键。一切正常。但印度的一个地点发现了 DB 设计的缺陷——印度和美国一样，是联邦民主国家，在地理上分为许多邦或联邦直辖区。州和联合领地数据都存储在States中。然而，有一个地点 - 昌迪加尔 - 属于两个州（哈里亚纳邦和旁遮普邦），本身也是一个联邦领土。

当前的数据库设计不允许我们存储昌迪加尔市的多条记录。

建议的一种解决方案是创建一个组合列 idAreaCode 和 idStates 的主键。

什么是解决方案？

我们使用 MySQL 和 InnoDB 引擎。

该数据库存储每个城市的气象信息。因此，州和城市是每个查询的起点。

数据库规范化对我们很重要。

该数据库每天/每小时使用 CSV 文件（由另一个应用程序生成）更新。 CSV 文件中的每条记录均由 idStates 和 idAreaCode 列标识。

因此，最好在 City 中使用的主键对于每个城市都是相同的，而不是自动递增，即使表被删除并再次刷新也是如此。邮政编码（或 pin 码）和区号（或 STD 代码）满足唯一和静态（不经常更改）的标准，并且可以轻松获得这些现成的列表。（我们现在决定使用区号，因为印度正在将其 pin 码更新为新格式。）

PS：我们决定在应用程序级别处理此问题，而不是更改数据库设计。在数据库中，我们将仅存储昌迪加尔的一条记录。在应用程序中，我们为任何对昌迪加尔旁遮普邦或昌迪加尔哈里亚纳邦的搜索创建了一个标志，以重定向到此记录。这是一个可以接受的妥协，因为这是我们遇到的唯一例外。

原文

The database of our web application includes two tables:

States (idStates, State, Lat, Long)

idStates is an auto-incrementing primary key.

Cities (idAreaCode, idStates, City, Lat, Long)

idAreaCode is a primary key consisting of country code + area code (e.g. 91422 where 91 is the country code for india and 422 is the area code of a city in India). idStates is a foreign key referencing States to associate each city in Cities with its corresponding state.

We figured that country code + area code would be unique for each city, and thus could safely be used as a primary key. Everything was working. But a location in India found a flaw in the DB design--India, like the US, is a federal democracy and is geographically divided into many states or union territories. Both the states and union territories data is stored in States. There is, however, one location--Chandigarh--which belongs to TWO states (Haryana and Punjab) and is also a union territory by itself.

The current db design doesn't allow us to store more than one record for the city Chandigarh.

One solution suggested is to create a primary key combining columns idAreaCode and idStates.

What is a solution?

We are using MySQL with the InnoDB engine.

The database stores meteorological information for each city. Thus, the state and city are the starting point of each query.

Database normalization is important to us.

The database is updated daily / hourly using a CSV file (which is generated by another app). Each record in the CSV file is identified by the idStates and idAreaCode columns.

Hence it is preferred that the primary key used in City, rather than be auto-incremented, is the same for every city, even if the table is deleted and refreshed again. Zip codes (or pin codes) and area codes (or STD codes) meet the criteria of being unique and static (don't change often) and a ready list of these are easily available. (We decided on area codes for now because India is in the process of updating its pin codes to a new format.)

PS: We decided to handle this at the application level instead of making changes to the database design. In the database we will only be storing one record for Chandigarh. In the application we created a flag for any search for Chandigarh Punjab or Chandigarh Haryana to redirect to this record. It's an acceptable compromise since this is the ONLY exception we've come across.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我的鱼塘能养鲲 2024-08-06 17:13:55

添加另一个表“国家/地区”。

您的问题是数据库规范化如此重要的一个例子。您不能只是将不同的键混合并匹配到一列。

国家

+------------+--------------+
| country_id | country_name |
+------------+--------------+

州

+------------+----------+------------+
| country_id | state_id | state_name |
+------------+----------+------------+

城市

+------------+----------+---------+-----------+
| country_id | state_id | city_id | city_name |
+------------+----------+---------+-----------+

数据

+------------+----------+---------+---------+----------+
| country_id | state_id | city_id | data_id | your_CSV |
+------------+----------+---------+---------+----------+

粗体字段是主键。输入标准的country_id，例如美国为1，印度为91，依此类推。 city_id 也应该使用他们的标准 ID。

然后，您可以以最小的开销快速找到属于彼此的东西。所有数据都可以直接输入到 Data 中，因此 Data 充当一个入口点，将所有数据存储在一个位置。如果您的数据库支持分区，您可以根据country_id或country_id+state_id将表分区到几个服务器阵列；因此它也将大大提高你的数据库性能。国家、州和城市不会对服务器负载造成太大影响，仅作为参考。您将主要从事数据工作。您可以根据需要添加数据，而不会出现任何重复。

如果每个城市只有一个数据，则可以省略 Data 并将 CSV_data 移至 Cities:

Cities

+------------+----------+---------+-----------+----------+
| country_id | state_id | city_id | city_name | CSV_data |
+------------+----------+---------+-----------+----------+

Add another table, Countries.

Your problem is an example why database normalization is important. You can't just mix and match different keys to one column.

Countries

+------------+--------------+
| country_id | country_name |
+------------+--------------+

States

+------------+----------+------------+
| country_id | state_id | state_name |
+------------+----------+------------+

Cities

+------------+----------+---------+-----------+
| country_id | state_id | city_id | city_name |
+------------+----------+---------+-----------+

Data

+------------+----------+---------+---------+----------+
| country_id | state_id | city_id | data_id | your_CSV |
+------------+----------+---------+---------+----------+

The bold fields are primary keys. Enter a standard country_id like 1 for US, 91 for India, and so on. city_id should also use their standard id.

You can then find things belonging to each other fast with minimal overhead. All data can be entered directly into Data, which is thus serving as one entry point, storing all the data in a single spot. If your database supports partitioning, you can partition tables according to country_id or country_id+state_id to a couple of server arrays; thus it will also speed up your database performance considerably. Contries, States and Cities won't make much of a hit on server load and will only serve as reference. You will mainly be working on Data. You can add data as much as you wish, without any duplicates.

If you only have one datum per city, you can omit Data and move CSV_data to Cities:

Cities

+------------+----------+---------+-----------+----------+
| country_id | state_id | city_id | city_name | CSV_data |
+------------+----------+---------+-----------+----------+

回复收藏 0 原文

梦醒时光 2024-08-06 17:13:55

“我们认为国家代码 + 区号组合对于每个城市都是唯一的，因此可以安全地用作主键”

读完本文后，我只是停下来继续阅读本主题中的任何内容。
怎么会有人这样想？
区号，根据定义（我在互联网上找到的第一个）：
- “区号是用于根据北美号码计划识别地理区域的前缀号码。这个 3 位数号码可以分配给北美的任何号码，包括加拿大、美国、墨西哥、拉丁美洲和加勒比地区”[1]

抛开它们是可变的并且仅在北美定义，区号在其他一些国家/地区不是 3 位数（在某些国家/地区拥有数十万个位置，3 位数根本不够。顺便说一句，我母亲的区号有5位）并且它们并没有严格地与固定的地理位置联系起来。

区号有迁徙地点，例如漂浮在冰上的北极营地、游牧部落、迁徙的军事单位，甚至大型远洋船只等。

那么，将几个城市合并为一个（或反之亦然）怎么样？

[1]
http://www.successfuloffice.com/articles/answering-service -词汇表-区域代码.htm

回复收藏 0 原文

束缚ｍ 2024-08-06 17:13:55

如果您向键添加一列，以便可以为给定城市添加附加记录，那么您就没有正确规范化数据。鉴于一个城市可以是多个州的成员，请从 Cities 中删除对州的任何引用，并添加关联州和城市的 StateCity 表（创建 am:m 关系）。

回复收藏 0 原文

痴情 2024-08-06 17:13:55

引入代理键。

当区号更改号码或被拆分时，您该怎么办？使用业务键作为主键几乎总是错误的。

您的总结是原因的另一个例子。

回复收藏 0 原文

寄意 2024-08-06 17:13:55

数据库未规范化。它可能是部分标准化的。因此，您会发现更多的可扩展性错误和限制。
国家/地区/州/城市的层次结构就可以了。您不需要像某些人建议的那样需要多对多的附加表。上述城市（以及美国的许多城市）在三个州都有多个城市。
通过将 CountryCode 和 AreaCode 连接在一起放在一个列中，您已经破坏了基本的数据库规则，更不用说在每次访问时添加代码了。此外，CountryCode 未标准化。
问题是国家代码+区域代码对于城市的密钥来说是一个糟糕的选择。实际上，它与城市无关，它适用于大片土地。如果 City 的含义更改为城镇（例如，您的公司开始收集大城市的数据），数据库将完全崩溃。
魔术师有唯一接近正确的答案，这将使您摆脱当前由于缺乏标准化而受到的限制。说魔术师的答案是标准化的并不准确；这是标识符的正确选择，在这种情况下形成了层次结构。但我会删除“id”列，因为它们是不必要的，100% 冗余列，100% 冗余索引。 char() 列本身就很好，对于 PK（复合键）也很好。请记住，无论如何，您都需要在 char() 列上建立索引，以确保它是唯一的。
- 如果你有这个关系结构和关系标识符，你的问题就不会存在。
- 您的可怜的用户不必弄清楚愚蠢的事情或跟踪无意义的标识符。他们只是自然地声明：State.Name、City.Name、ReadingType、Data ...
  。
当你到达层次结构的低端（城市）时，复合 PK 变得繁重（3 x CHAR(20) ），我不想将它带入数据表（特别是如果有每日 CSV 导入以及每个城市的许多读数或行）。因此，仅对于城市，我会添加一个代理键作为 PK。
但是对于发布的 DDL，即使是这样，在没有规范化数据库和使用关系标识符的情况下，是的，城市的 PK 是不正确的。它应该是 (idStates, idAreaCode)，而不是相反。这将解决您的问题。

顺便说一句，命名非常糟糕。

The database is not Normalised. It may be partly Normalised. You will find many more bugs and limitations in extensibility, as a result.
A hierarchy of Country then State then City is fine. You do not need a many-to-many additional table as some suggest. The said city (and many in America) is multiply in three States.
By placing CountryCode and AreaCode, concatenated, in a single column, you have broken basic database rules, not to mention added code on every access. Additionally, CountryCode is not Normalised.
The problem is that CountryCode+AreaCode is a poor choice for a key for a City. In real terms, it has very little to do with a city, it applies to huge swaths of land. If the meaning of City was changed to town (as in, your company starts collecting data for large towns), the db would break completely.
Magician has the only answer that is close to being correct, that would save you from your current limitations due to lack of Normalisation. It is not accurate to say that Magician's answer is Normalised; it is correct choice of Identifiers, which form a hierarchy in this case. But I would remove the "id" columns because they are unnecessary, 100% redundant columns, 100% redundant indices. The char() columns are fine as they are, and fine for the PK (compound keys). Remember you need an Index on the char() column anyway, to ensure it is unique.
- If you had this, the Relational structure, with Relational Identifiers, your problem would not exist.
- and your poor users do not have to figure silly things out or keep track of meaningless identifiers. They just state, naturally: State.Name, City.Name, ReadingType, Data ...
  .
When you get to the lower end of the hierarchy (City), the compound PK has become onerous (3 x CHAR(20) ), and I wouldn't want to carry it into the Data table (esp if there are daily CSV imports and many readings or rows per city). Therefore for City only, I would add a surrogate key, as the PK.
But for the posted DDL, even as it is, without Normalising the db and using Relational Identifiers, yes, the PK of City is incorrect. It should be (idStates, idAreaCode), not the other way around. That will fix your problem.

Very bad naming by the way.

回复收藏 0 原文

吃→可爱长大的 2024-08-06 17:13:55

向 Cities 表添加一个新的主键字段，该字段将简单地自动增量。 KISS 方法（保持简单）。

任何其他解决方案都是麻烦且令人困惑的。

回复收藏 0 原文

ˇ宁静的妩媚 2024-08-06 17:13:54

听起来您正在收集电话簿的数据。你是？为什么国家对您很重要？这个问题的答案可能会决定哪种数据库设计最适合您。

您可能认为城市是什么是显而易见的。它不是。这取决于您要如何处理数据。在美国，有一个单位称为MSA（大都会统计区）。堪萨斯城 MSA 横跨堪萨斯州堪萨斯城和密苏里州堪萨斯城。 MSA 单元是否有意义取决于数据的预期用途。
如果您使用美国的区号来确定城市，您最终会得到与 MSA 截然不同的分组。同样，这取决于您要如何处理数据。

一般来说，每当政治细分的等级模式被打破时，最通用的解决方案就是考虑多对多的关系。您可以像解决其他多对多问题一样解决此问题。通过创建一个带有两个外键的新表。在本例中，外键是 IdAreacode 和 IdStates。

现在，您可以在多个州拥有一个区号，并在一个州拥有多个区号。接受这一额外开销来只覆盖一个例外似乎是一种耻辱。你知道你所发现的异常是否只是冰山一角，这样的异常还有很多吗？

回复收藏 0 原文