国际地理地址应如何存储在关系数据库中?
考虑到在关系表中存储国际地理地址的任务,最灵活的模式是什么? 地址的每个部分都应该分解为各自的字段,还是应该更像自由文本?
将不同格式的地址分成不同的表有什么意义吗? 例如,有一个 USAAddress、CanadianAddress、UKAddress... 的表?
Given the task of storing international geographic addresses in a relational table, what is the most flexible schema? Should every part of the address be broken out into their own fields, or should it be more like free text?
Is there any sense in separating differently formatted address into different tables? For example, have a table for USAAddress, CanadianAddress, UKAddress...?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
我将从我的博客文章中总结我的想法 - 地址存储课程(在 archive.org 上)。
在我当前的项目中[我在一家物流公司工作],我们正在存储国际地址。 在这部分数据库的设计中,我对世界各地的地址进行了研究。 有很多不同的格式。 在西方世界,我们倾向于使用相当统一的格式 - 有一些差异,但主要是:
这似乎涵盖了大多数国家/地区,但字段的顺序可能会有所不同。 您可以在 http://www.bitboost 找到显示格式列表。 com/ref/international-address-formats.html#Formats
例如,在许多国家/地区,邮政编码位于城市名称之前,街道号码位于街道名称之后。 在加拿大、美国和英国,街道号码位于街道名称之前,邮政编码 (或 ZIP) 位于城市名称之后。
在回答您关于将地址分离到不同国家/地区的问题时,我不建议这样做,这只会使其他领域的生活变得更加困难 - 例如报告。 我提供的格式涵盖了我们物流数据库中的所有地址,其中涵盖美国、加拿大、墨西哥和英国,没有任何问题。 它还涵盖了我们所有的欧洲、中国、日本和马来西亚地址。 我不能代表其他国家/地区,但我还没有必须存储这些字段不支持的国家/地区的地址。
我不建议使用其他人建议的以及在许多数据库中看到的 Address1、Address2、Address3 格式,因为从字母数字字符串中解析地址信息并不像乍看起来那么简单 - 特别是在数据输入不正确的情况下,由于错误信息、拼写错误、拼写错误等。如果您将字段分开,您可以使用距离算法来检查可能的含义,使用概率根据邮政编码和街道号码检查街道名称,或根据街道名称检查省份和城市等。尝试当你有一个表示你的整个街道地址的字符串时,就可以执行上述任何操作。 无论怎么想,这都不是一件小事。
地址数据库的质量保证是一件令人头疼的事情。 简化您在这方面的工作的最简单方法是确保所有字段仅包含一条信息,该信息可以在输入时自动验证是否正确。 概率、距离算法和正则表达式可以检查输入的有效性,并向用户提供关于他们的错误的反馈,并建议适当的纠正。
需要注意的一个警告是,道路的名称也是街道类型 - 如果您覆盖加拿大,您需要注意多伦多的“大道路”,如果您使用地址 1、2,这会让您大吃一惊,3格式。 这种情况也可能发生在其他地方,尽管我不知道它们 - 这一个实例就足以让我尖叫 WTF?!
I will summarize my thoughts from my blog post - A lesson in address storage (on archive.org).
On my current project [I work for a logistics company] we're storing international addresses. I've done research on addresses all over the world in the design of this portion of the database. There's a lot of different formats. In the Western world we tend to use a fairly uniform format - a few differences but they're mostly:
This appears to cover most countries but the ordering of the fields may be displayed differently. You can find a list of display formats at http://www.bitboost.com/ref/international-address-formats.html#Formats
For instance, in many countries, the postal code falls before the city name and the street number falls after the street name. In Canada, U.S. and the U.K. the street number precedes the street name and the postal code (or ZIP) comes after the city name.
In answer to your question about separation of the addresses into different countries, I wouldn't suggest it, it will just make life harder in other areas - for instance reporting. The format I've provided covers all the addresses in our logistics database which covers USA, Canada, Mexico and the UK without any problems. It also covers all of our European, Chinese, Japanese and Malaysian addresses. I can't speak for other countries but I haven't yet had to store an address from a country that these fields won't support.
I don't suggest going with the Address1, Address2, Address3 format suggested by others and seen in many databases because parsing address information out of an alphanumeric string isn't as simple as it might first seem - especially if data isn't entered correctly, due to misinformation, typo, misspelling etc. If you separate your fields you can use distance algorithms to check for likely meaning, use probability to check street name against postal code and street number or to check province and city against street name etc. Try doing any of that when you've got a string denoting your whole street address. It's not a trivial matter by any stretch of the imagination.
QA on an address database is a headache, period. The easiest way to simplify your life in this area is to make sure all the fields hold only a single piece of information that can be automatically verified as correct at entry time. Probability, distance algorithms and regular expressions can check for validity of entry and provide feedback to the user as to what their mistake was and suggest suitable corrections.
One caveat to be aware of is roads with names that are also street types - if you're covering Canada you need to be aware of "Avenue Road" in Toronto which will trip you up big time if you're using the Address1, 2, 3 format. This likely occurs in other places too, although I'm not aware of them - this single instance was enough for me to scream WTF?!
注意不要过度分析地址格式。 当您这样做时,您很可能最终得到大多数用户需要解决的规范,有效地迫使他们使用错误的字段,或者只填充主要字段并忽略额外的字段。
让事情变得简单。
当您开始使用与英语或西班牙语等隔离语言不同的语言时,BenAlabaster 提到的 StreetType 会出现问题
。 向您展示野外情况会变得多么糟糕:阿姆斯特丹的“Henriette Roland Holststraat”,由“Henriette”+“Roland Holst”+“straat”组成,可以缩写为“Roland Holststraat”,或“ Roland Holststr.”,或拼写错误为“HRHolststr.” 或“Henriette Roland-Holst straat”,具体取决于天气。 除非您拥有地球上每个国家/地区的最新街道登记册,否则您将无处可去。
最后,请注意,在一些多语言国家,一种语言的名称可能与另一种语言不同! 例如,在布鲁塞尔,许多街道都有法语和荷兰语名称:“Avenu du Port”和“Havenlaan”,具体取决于收件人的首选语言。 (为了安全起见,谷歌地图会交替显示两个名字。)
你可以在这里尝试设计各种巧妙的技巧,但销售代表却是这样。 会明白这个吗?
Be careful not to over-analyze address formats. When you do, you're quite likely to end up with a specification most users will need to work around, effectively forcing them to use the wrong fields, or only filling the primary fields and ignoring the extra fields.
Keep things simple.
A StreetType like mentioned by BenAlabaster will cause problems when you start working with languages different from isolating languages like English or Spanish.
To show you how bad things can get in the wild: the "Henriette Roland Holststraat" in Amsterdam, built up from "Henriette" + "Roland Holst" + "straat", which can be abbreviated as the "Roland Holststraat", or "Roland Holststr.", or misspelled as "H.R.Holststr." or "Henriette Roland-Holst straat", depending on the weather. Unless you've got an up-to-date street register for each country on earth, you'll be going nowhere.
And finally, be careful that in some multilingual countries, names can be different from one language to another! For instance in Brussels where many streets have both a French and a Dutch name: "Avenu du Port" and "Havenlaan", depending on the addressee's preferred language. (Google Maps shows both names alternately, just to be on the safe side.)
You can try to devise all kinds of clever tricks here, but are the sales reps. going to understand this?
对于任何偶然发现这个问题的人来说,这里有一个轶事:
我作为一个在许多大陆(欧洲、亚洲、北美)生活和工作过的人来发言。 根据我的经验以及与我一起工作的人的经验,我们使用执行以下操作的系统要容易得多:
我发现,这样构建的系统让我的生活变得更加轻松。 特别是当我向邮政系统发送邮件时,贵公司几乎没有相关的内部知识。
如果您的公司确实拥有有关特定邮政系统的内部知识,请使用我在第 3 点中的选择来告知您向我显示哪个视图。 很多人都知道美国邮政系统对包装的要求; 如果我在第 3 点中选择“美国”,请随意使视图看起来适合美国地址。 如果我选择贵公司一无所知的国家/地区,请显示通用的三行,然后让我完成其余的工作; 不要强迫我使用 ASCII。
让我们面对现实吧——建立一个包含所有全球邮政系统(公共和私人)的完整的百科全书式数据库,即使不是不可能,也是一项艰巨的任务。 例如,在邮政系统中,只有当地的最后一英里承运人真正知道地址所在的位置。 有时,能够在包装上向承运人传递注释是非常有用的。 将每个边缘情况载体的本地知识映射到数据库中确实是一项不可能完成的任务。
问问哥德尔就知道了。 (然后问问自己,是否尝试使用公理系统来模拟宇宙论域,给出或采用某种算术,例如集合论或关系代数。)
Here's an anecdote for anyone who stumbles on this question:
I speak as a person who has lived and worked on a lot of continents (Europe, Asia, North America). In my experience, and the experience of the people I work with, it has been much easier for us to use systems that do the following:
Systems built like this, I find, make my life easiest. Particularly when I'm sending mail to a postal system about which your firm has virtually no functional internal knowledge.
If your firm does have internal knowledge about particular postal systems, use my selection in Point 3 to inform which view you display to me. A lot of people know what the US postal system expects on packaging; if I select US in Point 3, feel free to make the view look appropriate for a US address. If I select a country about which your firm knows nothing--display a generic three lines and let me do the rest; don't force me to use ASCII.
And let's be real here--building a complete, encyclopedic database of all global postal systems ( public and private ) is a herculean task at best, if not an impossible one. There are, for example, postal systems in which only the local, last-mile carrier really knows where an address is located. Sometimes being able to pass notes to that carrier on the packaging is extremely useful. And mapping the local knowledge of every edge case carrier into your database is indeed an impossible task.
Just ask Gödel. ( And then ask yourself if you're attempting to use an axiomatic system to model a universe of discourse, give or take some sort of arithmetic like set theory or relational algebra. )
这取决于您想用它做什么。
我发现,如果地址是分开的,则将地址用于其他目的(例如验证 USPS 数据或从 UPS/FEDEX 获取运费)总是更容易。
以下是我通常使用的地址:
对编辑的回应: 对于大多数情况,我看不到这种用法。 我上面列出的表格对于大多数国家/地区的地址都有足够的字段(并且足够通用)。
That depends on what you want to do with it.
I've found it always easier to use addresses for other purposes (such as verification against USPS data or getting shipping rates from UPS/FEDEX) if they're separated.
Here's what I typically use for addresses:
In Response to the edit: For most situations I don't see the use. The table I listed above has enough fields (and is generic enough) for most country's addresses.
地址
与 @BenAlabaster 提供的出色答案相反,您可以简单地:
您的客户端表单布局仍然可以像您认为合适的那样复杂(或者使用多行输入,用户可以手动输入他们的地址)。 然后,您可以根据需要在地址中添加换行符。
国家/地区
您的国家/地区表将如下所示:
此外,您可以具有以下其中一项:
然后使用以下列表来设计您的国家/地区表:
Address
As a polar opposite to the excellent answer @BenAlabaster has provided, you could simply have:
Your client-side form layouts can still be as complex as you see fit (or use a multi-line input where the user can manually type their address). You can then add the line breaks in the address where necessary.
Country
Your country table would look as follows:
Additionally, you could have one of the following:
Then use the following lists to design your country table:
Ben Alabaster 的回答评论:
要根据国家/地区格式化地址,您可以使用格式化表,该表将每个国家/地区的列排序为单独的行。
字段顺序也可以编码为使用复杂的网格布局。
按国家/地区分隔地址是没有意义的。 随着国家数量的增加,这将会变得混乱,如果你想找到国际客户的所有地址,你就会遇到麻烦。
当您的地址同时包含建筑物号码和公寓号码时,Ben 建议的地址类型也可能会导致歧义。 我可能在一个公寓大楼里,每栋楼都有不同的名字。 这在印度很常见。
Comment of Ben Alabaster's Answer:
To format addresses based on country, you could use a formatting table that has the ordering of the columns for each country as separate rows.
The field order can be coded to use complex grid layouts also.
There is no point in separating addresses by country. This will be chaotic as the number of countries increases and you will land in trouble if you want to find all the addresses of say, an international client.
Having an Address Type suggested by Ben could also lead to ambiguities when you have an address that has both a building number and an apartment number. I could be in an apartment complex where each building has a different name. This is very common in India.
为了覆盖国际地址,地址字段不应该太具体和太抽象。
它将覆盖这个世界上几乎所有的地址,如下所示:
To cover international addresses, address fields should not be too specific and too abstract.
It will cover almost all addresses in this world as shown below:
我使用 https://github.com/commerceguys/addressing 库来格式化国际地址,它们使用这些元素:
如果您想解析街道(名称、门牌号……),这没有帮助。
顺便提一句。 如果您正在寻找多语言国家/地区列表:https://github.com/umpirsky/country-list
I use https://github.com/commerceguys/addressing library to format international addresses and they use these elements:
This doen't help if you want to parse the street (name, house number, ...).
Btw. if you are looking for a multilanguage country list: https://github.com/umpirsky/country-list
唯一的方法是将它们拆分为:
因为几乎每个国家/地区都有自己的地址数据标准,并且每个国家/地区都有不同的邮政编码格式。
您可以在 我的帖子来自类似的问题。
将每个国家/地区的地址分开是没有意义的,因为有些国家/地区的地址约定很少。 一些流行的惯例包括小村庄没有街道,只有村庄名称和号码,而街道则位于大城市的地址。 据我了解,在匈牙利首都布达佩斯,很少有同名的街道(通过城市区号来区分),而其他城市则没有这样的地址(匈牙利的人可能会证实这是否属实)。 因此,地址格式的总数将是 numer_of_countries 乘以该国家/地区的地址格式数量……可以使用不同的表来完成,但这将是一项可怕的工作。
The only way is to split them to:
since almost every country has it's own standard for having address data, and evey country has a different format of zipcodes.
You can have a small sample of problems in my post from a similiar question.
This should not make sense to separate addresses for every country, since there are countries where you have few address conventions. Some popular conventions include not having streets in small villages, only village name and number, while streets are in larger cities’ addresses. I have learned that in Hungary’s capital – Budapest, there are few streets having the same name (you distinct them by city’s district number), while other cities does not have such addresses (someone from Hungary may actually confirm if this is true). So the total number of address formats will be numer_of_countries multiplied by number of address formats in this country… Can be done with different tables, but it will be horrible work to do.
我知道这是一个非常古老的话题,已经得到了回答,但我想我也应该投入我的两分钱。 这完全取决于您的项目目标以及您期望目标用户如何输入地址。 Ben 的建议将允许您准确地解析地址,但另一方面可能会导致更长(并且可能更令人沮丧)的用户数据输入过程。 Stephen Wrighton 的建议更简单,因此用户可以更轻松地输入地址。
我还看到一些模型只有一个“地址”列,可以将典型的街道号码、类型、街道名称、单元/公寓号码等全部记录在一列中,同时保留城市、国家、地区等。在其他列中。 与 Stephen 的模型类似,除了 Address1、Address2 和 Address3 全部合并到一列中。
我的观点是,最灵活的模型往往是那些限制最少的模型,具体取决于您对灵活性的解释。
I know this is an extremely old topic that is already answered, but I thought that I'd throw my two cents in as well. It all depends on what your project goals and how you expect your target users to enter addresses. Ben's suggestion will allow you to parse addresses accurately, but on the other hand could make for a longer (and possibly more frustrating) user data entry process. Stephen Wrighton's suggestion is simpler, and could be easier for users to enter addresses as a result.
I've also seen some models that simply had an "Address" column that would capture a typical street number, type, street name, unit / apartment number, etc. all in one column, while keeping City, Country, Region, etc. within other columns. Similar to Stephen's model, except Address1, Address2, and Address3 all consolidated into one column.
My opinion is that the most flexible models tend to be those that are least restrictive, depending on your interpretation of flexible.