规范化地址
我正在尝试标准化地址。
下图显示了我认为这个问题的相关表格。我想知道邮政编码应如何集成到模型中。这适用于国际地址,因此我知道邮政编码并非在任何地方都使用。我认为 City::ZipCode 是 1::0-n (我读过其他人说情况并非总是如此,但他们从未提供证据)。如果他们是正确的,那么我想这将是多对多的关系。由于每个地址最多只能有一个邮政编码,而邮政编码可以包含许多地址,因此我不知道如何标准化此模型。
由于地址可能包含也可能不包含邮政编码,因此我需要避免将其作为地址表中可为空的 FK。
编辑:只是想强调提供的实体和属性与实际数据库相比大幅缩减。它仅用作参考并解决我对将邮政编码包含到模型中的位置的担忧。
I am trying to normalize an address.
The diagram below shows the relevant tables for this question I believe. I want to know how ZipCodes should be integrated into the model. This would be for international addresses so I know that a Zip/PostalCode is not used everywhere. I think City::ZipCode is 1::0-n (I have read others saying this is not always the case but they never provided evidence). If they are correct then I guess this would be a many-to-many relationship. Since each Address can only have at most one ZipCode while a ZipCode can contain many addresses I am lost at how to normalize this model.
Since the Address may or may not contain a contain a ZipCode I need to refrain from having that as a nullable FK in the Address table.
EDIT: Just want to emphasize that the entities and attributes provided are drastically scaled back from the actual DB. It is only used as a reference and to address my concern of where to include zipcodes into the model.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
根据您所在的国家/地区,邮政编码的规则可能会变得相当危险。您可以很安全地假设邮政编码有一个官方城市名称,但美国和加拿大都允许邮政编码使用其他城市名称。我确实知道这一点,因为我为北美开发邮政地址验证软件。非官方名称通常会得到邮政当局的认可,并且您通常必须允许其使用。
因此,如果您希望能够使用非官方名称,则需要在城市和邮政编码之间使用 m:n。我想问为什么你无论如何都需要邮政编码的代码表。地址存储最好将它们视为独立属性,而不是尝试将它们标准化。
如果您以某种方式认为您将能够使用数据库中的某些数据从邮政编码向后查找到城市名称或从城市名称向前查找邮政编码,那么您将会失望!有 USPS 和加拿大邮政认可的软件解决方案用于进行地址验证,如果您花时间实际研究一下,您会发现地址验证的问题领域比您想象的要复杂得多是。如果地址准确性对您的应用程序很重要(在大多数情况下应该如此),那么请购买第三方工具来进行地址验证并将您的地址存储在一个表中,其中包含对您有意义的尽可能多的列。
Depending on what country you are in the rules for postal codes can get pretty dicey. You are pretty safe to assume that a postal code has one official city name, but both the U.S. and Canada allow for alternative city names for a postal code. I know this for a fact because I develop postal address validation software for North America. The non-official names are often recognized by the postal authorities and you typically have to permit their use.
Therefore, if you want to be able to use non-official names you need m:n between city and postal code. I would question why you want a code table of postal codes in any case. It is better for address storage to treat these as independent attributes rather than trying to normalize them.
If you somehow think that you will be able to use some data in your database to go backwards from postal code to city name or forwards from city name to postal code, then you are setting yourself up for disappointment! There are USPS and Canada Post recognized software solutions for doing address validation and if you spend any time actually looking into it, you'll find that the problem domain of address validation is much more complicated than you think it is. If address accuracy is important to your app (and it should be in most cases) then go buy 3rd party tools to do your address validation and store your addresses in a single table with as many columns as makes sense to you.
对于大多数需要准确、规则格式地址的实体来说,规范化或标准化地址是一个巨大的问题。 (我在地址验证行业工作 - 对于 SmartyStreets - 所以我已经处理过很多次了。)由于不同交付端点的复杂性、地址更改、地址组成部分的更新以及许多其他事情,最好聘请经过认证的服务来为您处理这些问题。
假设您使用的是美国地址,您可以轻松连接 API 或列表处理服务来获取所需的数据。例如,如果您遇到可 NULLable ZipCode FK 的问题,那么您不妨将邮政编码附加到每个地址(如果找不到,那么为什么要保留它,因为无论如何它都是一个错误的地址)。
其中一项服务是 SmartyStreets 的 地址验证 API,它处理 API 请求,或者您可以使用我们的批量地址验证工具处理现有的地址列表/表格。
Normalizing, or standardizing addresses, is a huge problem for most entities that need accurate, regularly-formatted addresses. (I work in the address validation industry - for SmartyStreets - so I've dealt with this a lot.) Because of the complexity of different delivery endpoints, address changes, updates to the components of an address, and many other things, it's best to recruit a certified service to take care of that for you.
Assuming you're working with US addresses, there are APIs or list processing services you can hook into pretty easily to get the data you need. For example, if you're having issues with a NULLable ZipCode FK, then you might as well append the ZIP code to every address (if it can't find one, then why keep it because it is a bad address anyway).
One such service is SmartyStreets' Address Validation API, which handles API requests, or you could process an existing list/table of addresses with our Bulk Address Validation Tool.
邮政编码有_许多地址/地址属于_邮政编码。需要正常化吗?大多数应用程序最好在地址表中包含一个邮政编码列。维护国际地址的所有邮政编码是一场艰苦的战斗。
此外,您还复制了地址和城市中的region_id。您可能需要解释您的应用程序中的区域,但这看起来只需要在城市中。
zip codes has_many addresses / address belongs_to zip_code. Do you need to normalize? Most apps it's best to just have a zip_code column in the addresses table. Maintaining all the zipcodes for international addresses is an uphill battle.
Also you are duplicating region_id in address and city. You might need to explain what region is in your app but this looks like it only needs to be in city.
全球 190 个国家中有 119 个使用邮政编码。不使用它们的著名国家包括爱尔兰和巴拿马。[1]
除了支持这一事实之外,坚持拥有邮政编码的系统将是一个非常烦人的系统。它还应该允许邮政编码未知。
在美国,每个“城市”至少有一个邮政编码,因此这种关系是正确的。我通过开发邮政编码数据库大约一年的时间了解到这一点。
Worldwide, 119 of 190 countries use postal codes. Notable countries which do not use them include Ireland and Panama.[1]
Besides supporting that fact, it would be a very annoying system which insisted on having a postal code. It should also allow postal code unknown.
In the U.S., each "city" has at least one zip code, so the relationship is correct. I know this from developing a zip code database for about a year.
规范化您拥有的模式;添加一个表Address-ZipCode表,带有外键Address ID和Zip Code;和主键地址 ID - 与地址表中的相同。然后通过在地址和新表之间使用左连接来包含邮政编码。仅当地址具有邮政编码时才会填充新表。
但是,我建议,如果您尝试容纳国际地址,您所拥有的架构可能是不够的 - 您将需要多个地址行和比图表中显示的更多级别的类别。遗漏的类别包括国家、次区域、城镇以及可能的其他类别。
我的答案这里(这是非常长)显示了全面处理国际地址(和其他事物)所需的内容。除非您要处理多个国家中每个国家的数百万个地址,否则这是巨大的杀伤力。
To normalise the schema you have; add a table Address-ZipCode table, with foreign keys Address ID and Zip Code; and primary key Address Id - identical to that in the Address table. Then include the Zip codes by using a Left Join between address and the new table. The new table will only be populated when an address has a zipcode.
However, I would suggest that if you are trying to accommodate international addresses, the schema you have is likely to be inadequate - you will need multiple address lines and more levels of category than shown in your diagram. Categories missed include country, sub-region, town, and possibly others.
My answer here (which is extremely long) shows what is needed to deal with international addresses (and other things) comprehensively. This is massive overkill unless you are dealing with millions of addresses in each of multiple countries.
多年来我一直在为几个不同的应用程序解决这个问题。如何设置取决于您的需要。我在经济适用房工作,我们需要做的一件事是将不同的地理组成部分(城市、县、州等)与 HU(美国住房和城市发展)定义的各个区域联系起来。
我最终得到的结果看起来有点像这样:
在我的例子中,HUD 区域是在县一级定义的(一个 HUD 区域包括一个或多个县(或在某些情况下为“县镇”)。每个 HUD 区域实际上都有一个唯一标识符定义了 ny HUD(HUD CBSA_Sub),我将其用作“HUD-region_code”。还需要注意的是,HUD 区域可以包含一个或多个州的县,因此,HUD 区域标识符与县相关。但仅通过每个县间接到州,例如,HUD“波特兰/温哥华/比弗顿”HUD MSA 包括俄勒冈州和华盛顿州的县(和城市)
。此外,您可能需要稍微调整“县”和“州”的概念以适应其他国家(“省”以及它们用于大于城市但小于州的任何细分。“地区”可能在这种情况下也适用 - 我相信许多欧洲国家都使用“地区”)
。一个州有一个或多个县(或同等县)。一个县有一个或多个城市。城市往往至少有一个邮政编码。
在我的例子中,HUD 区域等区域往往被定义为这些级别之一的聚合。
在许多情况下,在这个 HUD 驱动模型之外,我必须开发(通常情况下,人们需要确定哪个 HUD MSA 正在按 ZIP 或按县工作。在所有情况下,假设 HUD 是不安全的) 。
还需要注意的是,USPS 会定期更改某些区域的邮政编码
I haved struggled with this for several different apps over the years. How you set this up depends upon your needs. I work in affordable housing, and one of the things we need to do is relate different geographical components (City, county, state, etc) to various REGIONS as defined by HU (Housing and Urban Development in USA).
What I ended up with looks a little like THIS:
In my case, HUD regions are defined at the county level (one HUD region includes one or more counties (or "County-Towns" in some cases). Each HUD region actually has a Unique Identifier defined ny HUD (The HUD CBSA_Sub), which I use as the "HUD-region_code". Also important to note is that HUD regions can include counties in one or more states. Therefore, the HUD region identifier is related to county, but only indirectly to state, THROUGH each county. For example, the HUD "Portland/Vancouver/Beaverton" HUD MSA includes counties (and cities) in both Oregon and Washington states.
In YOUR case, you will need to define one more top-layer, tblCountry. Further, you may need to adapt the concenpt of "county" and "state" a bit to accomodate other countries ("Province" and whatever they use for subdivisions greater than City, but less than state. "Region" may work in this case as well - I believe many european coutnries use "regions").
A country has one or more States (or equivelent). A state has one or more Counties (or equivelent). A County Has one or more Cities. And Cities tend to have at least one postal code.
Regions such as the HUD region, in my case, tend to be defined as aggregations at one of these levels.
In many cases, outside of this HUD-driven model I had to develop (it is often the case one needs to ascertain which HUD MSA one is working by ZIP, or by County. In all cases, it is not safe to assume that HUD regions are contained within a apecific state.
Also important to note is that the USPS periodically change the ZIP codes for certain areas.