在数据库 (RDBMS) 中存储邮政地址的最佳实践?
对于在 RDBMS 中存储邮政地址的最佳实践,是否有任何好的参考? 似乎可以做出很多权衡,并且每个方案都有很多优点和缺点需要评估——这肯定已经一次又一次地完成了吗? 也许有人至少在某处写过一些经验教训?
我所讨论的权衡示例是将邮政编码存储为整数与字符字段,门牌号是否应存储为单独的字段或地址行 1 的一部分,套房/公寓/等数字是否应标准化或仅存储为地址行 2 中的一大块文本,如何处理 zip +4(单独的字段或一个大字段,整数与文本)? 目前
我主要关心的是美国地址,但我想有一些最佳实践可以帮助您为全球化的可能性做好准备(例如,适当地命名字段,如区域而不是州或邮政编码而不是邮政编码)代码等
Are there any good references for best practices for storing postal addresses in an RDBMS? It seems there are lots of tradeoffs that can be made and lots of pros and cons to each to be evaluated -- surely this has been done time and time again? Maybe someone has at least written done some lessons learned somewhere?
Examples of the tradeoffs I am talking about are storing the zipcode as an integer vs a char field, should house number be stored as a separate field or part of address line 1, should suite/apartment/etc numbers be normalized or just stored as a chunk of text in address line 2, how do you handle zip +4 (separate fields or one big field, integer vs text)? etc.
I'm primarily concerned with U.S. addresses at this point but I imagine there are some best practices in regards to preparing yourself for the eventuality of going global as well (e.g. naming fields appropriately like region instead of state or postal code instead of zip code, etc.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
我会将所有字段放在一个大的 NVARCHAR(1000) 字段中,并使用一个 textarea 元素供用户输入值(除非您想对邮政编码等进行分析)。 如果您的地址与该格式不太相符(而且您知道,除了美国之外还有其他国家/地区),那么所有这些地址行 1、地址行 2 等输入都会非常烦人。
I would just put all the fields together in a large NVARCHAR(1000) field, with a textarea element for the user to enter the value for (unless you want to perform analysis on eg. zip codes). All those address line 1, address line 2, etc. inputs are just so annoying if you have an address that doesn't fit well with that format (and, you know, there are other countries than the US).
对于更多的国际使用,需要考虑的一种模式是 Drupal 地址字段 使用的模式。 它基于 xNAL 标准,似乎涵盖了大多数国际案例。 深入研究该模块将揭示一些用于解释和验证国际地址的精彩珍珠。 它还拥有一组带有 ISO 代码的不错的行政区域(省、州、州等)。
以下是从模块页面复制的架构要点:
我学到的教训:
大道
。For more international use, one schema to consider is the one used by Drupal Address Field. It's based on the xNAL standard, and seems to cover most international cases. A bit of digging into that module will reveal some nice pearls for interpreting and validating addresses internationally. It also has a nice set of administrative areas ( province, state, oblast, etc ) with ISO codes.
Here's the gist of the schema, copied from the module page:
A lessons I've learned:
locality
&thoroughfare
.作为“国际”用户,没有什么比处理仅面向美国格式地址的网站更令人沮丧的了。 一开始这有点粗鲁,但当验证也过于热心时,就会成为一个严重的问题。
如果你关心走向全球,我唯一的建议就是保持自由。 不同的国家/地区有不同的惯例 - 在某些国家/地区,门牌号位于街道名称之前,而在某些国家/地区则位于街道名称之后。 有些是州,有些是地区,有些是县,有些是这些的组合。 在英国,邮政编码不是邮政编码,而是包含字母和数字的邮政编码。
我建议简单地使用大约 10 行可变长度字符串,以及一个单独的邮政编码字段(并且要小心如何描述它以应对国家敏感性)。 让用户/客户决定如何编写他们的地址。
As an 'international' user, there is nothing more frustrating than dealing with a website that is oriented around only US-format addresses. It's a little rude at first, but becomes a serious problem when the validation is also over-zealous.
If you are concerned with going global, the only advice I have is to keep things free-form. Different countries have different conventions - in some, the house number comes before the street name, in some it comes after. Some have states, some regions, some counties, some combinations of those. Here in the UK, the zipcode is not a zipcode, it's a postcode containing both letters and numbers.
I'd advise simply ~10 lines of variable-length strings, together with a separate field for a postcode (and be careful how you describe that to cope with national sensibilities). Let the user/customer decide how to write their addresses.
如果您需要有关其他国家/地区如何使用邮政地址的全面信息,这里有一个非常好的参考链接(哥伦比亚大学):
弗兰克的邮政地址强制指南
国际邮件的有效寻址
If you need comprehensive information about how other countries use postal addresses, here's a very good reference link (Columbia University):
Frank's Compulsive Guide to Postal Addresses
Effective Addressing for International Mail
您绝对应该考虑将门牌号存储为字符字段而不是数字,因为存在特殊情况,例如“半数”或我当前的地址,例如“129A” - 但 A 不被视为公寓送货服务号码。
You should definitely consider storing house number as a character field rather than a number, because of special cases such as "half-numbers", or my current address, which is something like "129A" — but the A is not considered as an apartment number for delivery services.
我已经做到了这一点(严格模拟数据库中的地址结构),并且我永远不会再这样做。 您无法想象您必须将其作为规则考虑在内的例外情况有多么疯狂。
我隐约记得挪威邮政编码有一些问题(我认为),除了奥斯陆之外,其余都是 4 个位置,奥斯陆有 18 个左右。
我确信,从我们开始对我们自己的所有国家地址使用地理上正确的邮政编码那一刻起,相当多的人开始抱怨他们的邮件到达得太晚了。 原来这些人住在邮区的边界附近,尽管事实上有人确实住在邮区,比如说1600,但实际上他的邮件应该寄到邮区1610,因为实际上是邻近的邮区这实际上为他服务,因此将他的邮件发送到正确的邮政区域将需要几天的时间才能到达该邮件,因为正确的邮局需要进行不必要的干预才能将其转发到错误的邮政区域......
(我们最终用 ISO 代码“ZZ”在该国注册了那些拥有国外地址的人。)
I've done this (rigorously model address structures in a database), and I would never do it again. You can't imagine how crazy the exceptions are that you'll have to take into account as a rule.
I vaguely recall some issue with Norwegian postal codes (I think), which were all 4 positions, except Oslo, which had 18 or so.
I'm positively sure that from the moment we started using the geographically correct ZIP codes for all of our own national addresses, quite a few people started complaining that their mail arrived too late. Turned out those people were living near a borderline between postal areas, and despite the fact that someone really lived in postal area, say, 1600, in reality his mail should be addressed to postal area 1610, because in reality it was that neighbouring postal area that actually served him, so sending his mail to his correct postal area would take that mail a couple of days longer to arrive, because of the unwanted intervention that was required in the correct postal office to forward it to the incorrect postal area ...
(We ended up registering those people with an address abroad in the country with ISO-code 'ZZ'.)
我发现从最小离散单元到最大列出所有可能的字段是最简单的方法。 用户将填写他们认为合适的字段。 我的地址表如下所示:
Ive found that listing all possible fields from smallest discrete unit to largest is the easiest way. Users will fill in the fields they see fit. My address table looks like this:
除非您要对街道号码或邮政编码进行数学计算,否则将它们存储为数字只会招致未来的痛苦。
您可能会在这里或那里节省一些字节,并且可能会获得更快的索引,但是当美国邮政或您正在处理的任何其他国家决定在代码中引入字母时,您会怎么做?
磁盘空间的成本将比以后修复它的成本便宜很多...有人知道吗?
Unless you are going to do maths on the street numbers or zip / postal codes, you are just inviting future pain by storing them as numerics.
You might save a few bytes here and there, and maybe get a faster index, but what do you when US postal, or whatever other country you are dealing with, decides the introduce alphas into the codes?
The cost of disk space is going to be a lot cheaper than the cost of fixing it later on... y2k anybody?
添加到 @Jonathan Leffler 和 @Paul Fisher 说过,
如果您预计将加拿大或墨西哥的邮政地址添加到您的要求中,则必须将
邮政编码
存储为字符串。 加拿大有字母数字邮政编码,而我不记得墨西哥的邮政编码是什么样的。Adding to what @Jonathan Leffler and @Paul Fisher have said
If you ever anticipate having postal addresses for Canada or Mexico added to your requirements, storing
postal-code
as a string is a must. Canada has alpha-numeric postal codes and I don't remember what Mexico's look like off the top of my head.您当然应该咨询“这是在关系数据库中建模地址信息的好方法”,但您的问题不是直接重复的。
肯定有很多预先存在的答案(例如,查看 DatabaseAnswers 中的示例数据模型) 。 许多预先存在的答案在某些情况下都是有缺陷的(根本没有选择 DB 答案)。
需要考虑的一个主要问题是地址的范围。 如果您的数据库必须处理国际地址,那么您必须比仅处理一个国家/地区的地址更加灵活。
在我看来,记录地址的“地址标签图像”并单独分析内容是经常(并不意味着总是)明智的做法。 这使您可以处理邮政编码位置之间的差异,例如不同国家/地区之间的差异。 当然,您可以编写一个分析器和格式化程序来处理不同国家/地区的怪异情况(例如,美国地址有 2 或 3 行;相比之下,英国地址可以有更多行;我定期写入的一个地址有 9 行)。 但让人类进行分析和格式化并让 DBMS 只存储数据会更容易。
You should certainly consult "Is this a good way to model address information in a relational database", but your question is not a direct duplicate of that.
There are surely a lot of pre-existing answers (check out the example data models at DatabaseAnswers, for example). Many of the pre-existing answers are defective under some circumstances (not picking on DB Answers at all).
One major issue to consider is the scope of the addresses. If your database must deal with international addresses, you have to be more flexible than if you only have to deal with addresses in one country.
In my view, it is often (which does not mean always) sensible to both record the 'address label image' of the address and separately analyze the content. This allows you to deal with differences between the placement of postal codes, for example, between different countries. Sure, you can write an analyzer and a formatter that handle the eccentricities of different countries (for instance, US addresses have 2 or 3 lines; by contrast, British addresses can have considerably more; one address I write to periodically has 9 lines). But it can be easier to have the humans do the analysis and formatting and let the DBMS just store the data.
将 ZIP 存储为 NUMBER 或 VARCHAR 的“权衡”在哪里? 这只是一个选择——这不是一种权衡,除非双方都有好处,并且你必须放弃一些好处才能获得其他好处。
除非 zip 的总和有任何意义,否则 Zips 作为数字是没有用的。
Where's the "trade off" in storing the ZIP as a NUMBER or VARCHAR? That's just a choice -- it's not a trade off unless there are benefits to both and you have to give up some benefits to get others.
Unless the sum of zips has any meaning at all, Zips as number is not useful.
这可能有点大材小用,但如果您需要一个适用于多个国家/地区的解决方案,并且需要以编程方式处理部分地址:
您可以使用两个表进行国家/地区特定地址处理:一个包含 10 个 VARCHAR2 列、10 个数字的通用表columns,另一个表将这些字段映射到提示,并具有将地址结构与国家/地区联系起来的国家/地区列。
This might be an overkill, but if you need a solution that would work with multiple countries and you need to programmatically process parts of the address:
you could have country specific address handling using two tables: One generic table with 10 VARCHAR2 columns, 10 Number columns, another table which maps these fields to prompts and has a country column tying an address structure to a country.
灵感来自数据库答案
Inspired by Database Answers
如果您必须验证地址或使用它来处理信用卡付款,您至少需要一些结构。 自由格式的文本块对此效果不佳。
邮政编码是一个常见的可选字段,用于在不使用整个地址的情况下验证支付卡交易。 因此,为此设置一个单独且足够大的字段(至少 10 个字符)。
If you ever have to verify an address or use it to process credit card payments, you'll at least need a little structure. A free-form block of text does not work very well for that.
Zip code is a common optional field for validating payment card transactions without using the whole address. So have a separate and generously sized field for that (at least 10 chars).
目前,我正在开发一个国际电子商务网站。
它应该涵盖这个世界上几乎所有的地址,如下所示:
At the moment, I'm developing an international ecommerce website.
It should cover almost all addresses in this world as shown below: