用于匹配英国邮政编码的正则表达式
我正在寻找一个正则表达式,它仅在输入字符串中验证完整的复杂英国邮政编码。 所有不常见的邮政编码形式都必须包括在内。 例如:
匹配
- CW3 9SS
- SE5 0EG
- SE50EG
- se5 0eg
- WC2H 7LT
不匹配
- aWC2H 7LT
- WC2H 7LTa
- WC2H
如何解决此问题?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(30)
我在过去一天左右一直在寻找英国邮政编码正则表达式,并偶然发现了这个线程。 我按照自己的方式完成了上面的大部分建议,但没有一个对我有用,所以我想出了自己的正则表达式,据我所知,它捕获了截至 13 年 1 月的所有有效英国邮政编码(根据最新文献皇家邮政)。
下面发布了正则表达式和一些简单的邮政编码检查 PHP 代码。 注意:- 它允许小写或大写邮政编码以及 GIR 0AA 异常,但为了处理输入的邮政编码中间很可能存在空格,它还使用简单的 str_replace 在测试之前删除空格反对正则表达式。 除此之外的任何差异,皇家邮政本身甚至没有在其文献中提及它们(请参阅 http://www.royalmail.com/sites/default/files/docs/pdf/programmers_guide_edition_7_v5.pdf并从第17页开始阅读)!
注意:在皇家邮政自己的文献(上面的链接)中,第三个和第四个位置以及这些字符是字母时的例外情况存在轻微的歧义。 我直接联系皇家邮政来解决这个问题,用他们自己的话说“出境代码第四位的字母,格式为 AANA NAA 没有例外,第三位例外仅适用于出境代码的最后一个字母,格式为 AANA NAA”。格式为 ANA NAA。” 直接从马嘴里说出来!
我希望它可以帮助遇到此线程并寻找解决方案的其他人。
I've been looking for a UK postcode regex for the last day or so and stumbled on this thread. I worked my way through most of the suggestions above and none of them worked for me so I came up with my own regex which, as far as I know, captures all valid UK postcodes as of Jan '13 (according to the latest literature from the Royal Mail).
The regex and some simple postcode checking PHP code is posted below. NOTE:- It allows for lower or uppercase postcodes and the GIR 0AA anomaly but to deal with the, more than likely, presence of a space in the middle of an entered postcode it also makes use of a simple str_replace to remove the space before testing against the regex. Any discrepancies beyond that and the Royal Mail themselves don't even mention them in their literature (see http://www.royalmail.com/sites/default/files/docs/pdf/programmers_guide_edition_7_v5.pdf and start reading from page 17)!
Note: In the Royal Mail's own literature (link above) there is a slight ambiguity surrounding the 3rd and 4th positions and the exceptions in place if these characters are letters. I contacted Royal Mail directly to clear it up and in their own words "A letter in the 4th position of the Outward Code with the format AANA NAA has no exceptions and the 3rd position exceptions apply only to the last letter of the Outward Code with the format ANA NAA." Straight from the horse's mouth!
I hope it helps anyone else who comes across this thread looking for a solution.
这是一个基于链接到 marcj 答案的文档中指定的格式的正则表达式:
它与规范之间的唯一区别是,根据规范,最后 2 个字符不能位于 [CIKMOV] 中。
编辑:
这是另一个测试尾随字符限制的版本。
Here's a regex based on the format specified in the documents which are linked to marcj's answer:
The only difference between that and the specs is that the last 2 characters cannot be in [CIKMOV] according to the specs.
Edit:
Here's another version which does test for the trailing character limitations.
上面的一些正则表达式有一点限制。 请注意真正的邮政编码:鉴于上面的规则“位置 3 - 仅使用 AEHMNPRTVXY”,“W1K 7AA”将失败,因为“K”将被禁止。
正则表达式:
似乎更准确一点,请参阅 标题为“英国邮政编码”的维基百科文章 。
请注意,此正则表达式仅需要大写字符。
更大的问题是,您是否限制用户输入仅允许实际存在的邮政编码,或者您是否只是试图阻止用户在表单字段中输入完全垃圾。 正确匹配每个可能的邮政编码并使其面向未来,是一个更难的难题,除非您是 HMRC,否则可能不值得。
Some of the regexs above are a little restrictive. Note the genuine postcode: "W1K 7AA" would fail given the rule "Position 3 - AEHMNPRTVXY only used" above as "K" would be disallowed.
the regex:
Seems a little more accurate, see the Wikipedia article entitled 'Postcodes in the United Kingdom'.
Note that this regex requires uppercase only characters.
The bigger question is whether you are restricting user input to allow only postcodes that actually exist or whether you are simply trying to stop users entering complete rubbish into the form fields. Correctly matching every possible postcode, and future proofing it, is a harder puzzle, and probably not worth it unless you are HMRC.
我想要一个简单的正则表达式,可以允许太多,但不能拒绝有效的邮政编码。 我选择了这个(输入是一个剥离/修剪的字符串):
这允许最短的邮政编码,如“L1 8JQ”以及最长的邮政编码,如“OL14 5ET”。
因为它最多允许 8 个字符,所以如果没有空格,它也将允许不正确的 8 个字符邮政编码:“OL145ETX”。 但同样,这是一个简单的正则表达式,当它足够好时。
I wanted a simple regex, where it's fine to allow too much, but not to deny a valid postcode. I went with this (the input is a stripped/trimmed string):
This allows the shortest possible postcodes like "L1 8JQ" as well as the longest ones like "OL14 5ET".
Because it allows up to 8 characters, it will also allow incorrect 8 character postcodes if there is no space: "OL145ETX". But again, this is a simplistic regex, for when that's good enough.
虽然这里有很多答案,但我对其中任何一个都不满意。 它们中的大多数只是被破坏了,或者太复杂了,或者只是被破坏了。
我查看了 @ctwheels 答案,我发现它非常具有解释性和正确性; 我们必须为此感谢他。 然而,对于如此简单的事情来说,对我来说再次有太多的“数据”。
幸运的是,我设法获得了一个仅包含英格兰超过 100 万个有效邮政编码的数据库,并制作了一个小型 PowerShell 脚本来测试和基准测试结果。
英国邮政编码规范:有效邮政编码格式。
这是“我的”正则表达式:
简短、简单、有趣。 即使是最没有经验的人也能明白发生了什么。
说明:
结果(检查邮政编码):
Whilst there are many answers here, I'm not happy with either of them. Most of them are simply broken, are too complex or just broken.
I looked at @ctwheels answer and I found it very explanatory and correct; we must thank him for that. However once again too much "data" for me, for something so simple.
Fortunately, I managed to get a database with over 1 million active postcodes for England only and made a small PowerShell script to test and benchmark the results.
UK Postcode specifications: Valid Postcode Format.
This is "my" Regex:
Short, simple and sweet. Even the most unexperienced can understand what is going on.
Explanation:
Result (postcodes checked):
这是我们处理英国邮政编码问题的方式:
说明:
这获取了大多数格式,然后我们使用数据库来验证邮政编码是否实际上是真实的,该数据由 openpoint https://www.ordnancesurvey.co.uk/opendatadownload/products.html
希望这有帮助
here's how we have been dealing with the UK postcode issue:
Explanation:
This gets most formats, we then use the db to validate whether the postcode is actually real, this data is driven by openpoint https://www.ordnancesurvey.co.uk/opendatadownload/products.html
hope this helps
基本规则:
英国的邮政编码(或称为邮政编码)由五到七个字母数字字符组成,并以空格分隔。 哪些字符可以出现在特定位置的规则相当复杂并且充满例外。 因此,刚刚显示的正则表达式遵循基本规则。
完整规则:
如果您需要一个正则表达式来勾选邮政编码规则的所有框,但牺牲了可读性,那么您可以:
来源:https://www.safaribooksonline.com/library/view/regular-expressions-cookbook/9781449327453/ch04s16。 html
根据我们的客户数据库进行测试,看起来完全准确。
Basic rules:
Postal codes in the U.K. (or postcodes, as they’re called) are composed of five to seven alphanumeric characters separated by a space. The rules covering which characters can appear at particular positions are rather complicated and fraught with exceptions. The regular expression just shown therefore sticks to the basic rules.
Complete rules:
If you need a regex that ticks all the boxes for the postcode rules at the expense of readability, here you go:
Source: https://www.safaribooksonline.com/library/view/regular-expressions-cookbook/9781449327453/ch04s16.html
Tested against our customers database and seems perfectly accurate.
我使用以下正则表达式,我已针对所有有效的英国邮政编码进行了测试。 它基于推荐的规则,但尽可能合理地精简,并且不使用任何特殊语言特定的正则表达式规则。
它假定邮政编码已转换为大写并且没有前导或尾随字符,但接受输出代码和输入代码之间的可选空格。
特殊的“GIR0 0AA”邮政编码被排除在外,并且不会验证,因为它不在官方邮局邮政编码列表中,并且据我所知,不会用作注册地址。 如果需要的话,作为特殊情况添加它应该是微不足道的。
I use the following regex that I have tested against all valid UK postcodes. It is based on the recommended rules, but condensed as much as reasonable and does not make use of any special language specific regex rules.
It assumes that the postcode has been converted to uppercase and has not leading or trailing characters, but will accept an optional space between the outcode and incode.
The special "GIR0 0AA" postcode is excluded and will not validate as it's not in the official Post Office list of postcodes and as far as I'm aware will not be used as registered address. Adding it should be trivial as a special case if required.
邮政编码前半部分 有效格式
例外
位置 1 - QVX 未使用
位置 2 - 除 GIR 0AA 外,不使用 IJZ
位置 3 - 仅使用 AEMNPRTVXY
位置 4 - ABEHMNPRVWXY
邮政编码的后半部分
例外
位置 2+3 - CIKMOV 未使用
请记住,并非使用所有可能的代码,因此此列表是有效代码的必要但非充分条件。 仅匹配所有有效代码的列表可能会更容易吗?
First half of postcode Valid formats
Exceptions
Position 1 - QVX not used
Position 2 - IJZ not used except in GIR 0AA
Position 3 - AEHMNPRTVXY only used
Position 4 - ABEHMNPRVWXY
Second half of postcode
Exceptions
Position 2+3 - CIKMOV not used
Remember not all possible codes are used, so this list is a necessary but not sufficent condition for a valid code. It might be easier to just match against a list of all valid codes?
检查邮政编码的格式是否符合皇家邮政 程序员的要求指南:
doogal.co.uk 上的所有邮政编码均匹配,但以下情况除外不再使用。
在空格后添加
?
并使用不区分大小写的匹配来回答此问题:To check a postcode is in a valid format as per the Royal Mail's programmer's guide:
All postcodes on doogal.co.uk match, except for those no longer in use.
Adding a
?
after the space and using case-insensitive match to answer this question:如果您不想验证失败,则此选项允许两侧留有空格和制表符,然后将其修剪到服务器端。
This one allows empty spaces and tabs from both sides in case you don't want to fail validation and then trim it sever side.
通过实证测试和观察,以及通过 https://en.wikipedia.org/wiki 进行确认/Postcodes_in_the_United_Kingdom#Validation,这是我的 Python 正则表达式版本,它可以正确解析和验证英国邮政编码:
UK_POSTCODE_REGEX = r'(?P[AZ]{1,2})(? P<地区>(?:[0-9]{1,2})|(?:[0-9][AZ]))(?P<扇区>[0-9])(?P<邮政编码> [AZ]{2})'
此正则表达式很简单并且具有捕获组。 它不包括所有合法英国邮政编码的验证,但仅考虑字母与数字位置。
以下是我在代码中使用它的方法:
以下是单元测试:
Through empirical testing and observation, as well as confirming with https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation, here is my version of a Python regex that correctly parses and validates a UK postcode:
UK_POSTCODE_REGEX = r'(?P<postcode_area>[A-Z]{1,2})(?P<district>(?:[0-9]{1,2})|(?:[0-9][A-Z]))(?P<sector>[0-9])(?P<postcode>[A-Z]{2})'
This regex is simple and has capture groups. It does not include all of the validations of legal UK postcodes, but only takes into account the letter vs number positions.
Here is how I would use it in code:
Here are unit tests:
要添加到此列表中,我使用的允许用户输入空字符串的更实用的正则表达式是:
此正则表达式允许大写和小写字母,中间有一个可选空格
从软件开发人员的角度来看查看此正则表达式对于地址可能是可选的软件很有用。 例如,如果用户不想提供其地址详细信息
To add to this list a more practical regex that I use that allows the user to enter an
empty string
is:This regex allows capital and lower case letters with an optional space in between
From a software developers point of view this regex is useful for software where an address may be optional. For example if a user did not want to supply their address details
看看这个页面上的Python代码:
http://www.brunningonline .net/simon/blog/archives/001292.html
我用它来为我处理邮政编码。
Have a look at the python code on this page:
http://www.brunningonline.net/simon/blog/archives/001292.html
I've used it to process postcodes for me.
我有用于英国邮政编码验证的正则表达式。
这适用于所有类型的邮政编码,无论是内部还是外部。
这适用于所有类型的格式。
例子:
I have the regex for UK Postcode validation.
This is working for all type of Postcode either inner or outer
This is working for all type of format.
Example:
我们得到了一个规范:
我们想出了这个:
但请注意 - 这允许组之间有任意数量的空格。
We were given a spec:
We came up with this:
But note - this allows any number of spaces in between groups.
尽管正则表达式中存在拼写错误,但接受的答案反映了皇家邮政给出的规则。 这个拼写错误似乎也出现在 gov.uk 网站上(就像在 XML 存档页面中一样)。
在格式 A9A 9AA 中,规则允许在第三个位置使用 P 字符,而正则表达式不允许这样做。 正确的正则表达式是:
缩短此结果会产生以下正则表达式(使用 Perl/Ruby 语法):
它还包括第一个和第二个块之间的可选空格。
The accepted answer reflects the rules given by Royal Mail, although there is a typo in the regex. This typo seems to have been in there on the gov.uk site as well (as it is in the XML archive page).
In the format A9A 9AA the rules allow a P character in the third position, whilst the regex disallows this. The correct regex would be:
Shortening this results in the following regex (which uses Perl/Ruby syntax):
It also includes an optional space between the first and second block.
我在批量传输 pdf 中的几乎所有变体和正则表达式以及维基百科网站上发现的内容是这样的,特别是对于维基百科正则表达式来说,第一个 |(竖线)之后需要有一个 ^ 。 我通过测试 AA9A 9AA 发现了这一点,因为否则 A9A 9AA 的格式检查将验证它。 例如,检查应该无效的 EC1D 1BB 会返回有效,因为 C1D 1BB 是有效格式。
这是我想出的一个好的正则表达式:
What i have found in nearly all the variations and the regex from the bulk transfer pdf and what is on wikipedia site is this, specifically for the wikipedia regex is, there needs to be a ^ after the first |(vertical bar). I figured this out by testing for AA9A 9AA, because otherwise the format check for A9A 9AA will validate it. For Example checking for EC1D 1BB which should be invalid comes back valid because C1D 1BB is a valid format.
Here is what I've come up with for a good regex:
以下方法将检查邮政编码并提供完整信息
Below method will check the post code and provide complete info
我建议您查看英国政府邮政编码数据标准[链接现已失效; XML 档案,请参阅Wikipedia 进行讨论]。 有关于数据的简要描述,附加的 xml 模式提供了正则表达式。 它可能不完全是您想要的,但将是一个很好的起点。 RegEx 与 XML 略有不同,因为给定的定义允许在格式 A9A 9AA 中的第三个位置使用 P 字符。
英国政府提供的正则表达式是:
正如维基百科讨论中所指出的,这将允许一些非真实的邮政编码(例如以 AA、ZY 开头的邮政编码),并且它们确实提供了您可以尝试的更严格的测试。
I'd recommend taking a look at the UK Government Data Standard for postcodes [link now dead; archive of XML, see Wikipedia for discussion]. There is a brief description about the data and the attached xml schema provides a regular expression. It may not be exactly what you want but would be a good starting point. The RegEx differs from the XML slightly, as a P character in third position in format A9A 9AA is allowed by the definition given.
The RegEx supplied by the UK Government was:
As pointed out on the Wikipedia discussion, this will allow some non-real postcodes (e.g. those starting AA, ZY) and they do provide a more rigorous test that you could try.
我最近发布了答案到这个关于 R 语言的英国邮政编码的问题。 我发现英国政府的正则表达式模式不正确并且无法正确验证某些邮政编码。 不幸的是,这里的许多答案都是基于这种不正确的模式。
我将在下面概述其中一些问题,并提供一个实际有效的修改后的正则表达式。
注意
我的回答(以及一般的正则表达式):
如果您不关心错误的正则表达式而只想跳到答案,请向下滚动到答案部分。
错误的正则表达式
不应使用本节中的正则表达式。
这是英国政府向开发人员提供的失败正则表达式(不确定此链接会持续多久,但您可以在他们的 批量数据传输文档):
问题
1 - 复制/粘贴
查看此处使用的正则表达式。
正如许多开发人员可能所做的那样,他们复制/粘贴代码(尤其是正则表达式)并粘贴它们,期望它们能够工作。 虽然这在理论上很好,但在这种特殊情况下会失败,因为从该文档复制/粘贴实际上会将其中一个字符(空格)更改为换行符,如下所示:
大多数开发人员要做的第一件事就是删除换行符不假思索。 现在,正则表达式不会匹配包含空格的邮政编码(
GIR 0AA
邮政编码除外)。要解决此问题,应将换行符替换为空格字符:
问题 2 - 边界
请参阅此处使用的正则表达式。
邮政编码正则表达式不正确地锚定正则表达式。 如果像
fooA11 1AA
这样的值通过,使用此正则表达式验证邮政编码的任何人可能会感到惊讶。 这是因为它们锚定了第一个选项的开头和第二个选项的结尾(彼此独立),如上面的正则表达式中所指出的。这意味着
^
(断言行首位置)仅适用于第一个选项([Gg][Ii][Rr] 0[Aa]{2})< /code>,因此第二个选项将验证邮政编码中结尾的任何字符串(无论前面是什么)。
同样,第一个选项未锚定到行尾
$
,因此GIR 0AAfoo
也被接受。要解决此问题,这两个选项都应包含在另一个组(或非捕获组)中,并将锚点放置在该组周围:
问题 3 - 字符集不正确
查看此处使用的正则表达式。
正则表达式缺少
-
来指示字符范围。 按照目前的情况,如果邮政编码的格式为ANA NAA
(其中A
代表字母,N
代表数字),则它开始使用A
或Z
以外的任何内容,都会失败。这意味着它将匹配
A1A 1AA
和Z1A 1AA
,但不匹配B1A 1AA
。要解决此问题,应将字符
-
放置在相应字符集中的A
和Z
之间:问题 4 - 错误的可选字符集
查看此处使用的正则表达式。
我发誓他们在网上发布之前甚至没有测试过这个东西。 他们将错误的字符集设置为可选。 他们在选项2(第9组)的第四个子选项中制作了
[0-9]
选项。 这允许正则表达式匹配格式不正确的邮政编码,例如AAA 1AA
。要解决此问题,请将下一个字符类设置为可选(然后使
[0-9]
集仅匹配一次):问题 5 - 性能
此正则表达式的性能非常差。 首先,他们在开头放置了最不可能匹配 GIR 0AA 的模式选项。 与任何其他邮政编码相比,有多少用户可能拥有此邮政编码; 可能永远不会? 这意味着每次使用正则表达式时,它必须先耗尽此选项,然后再继续下一个选项。 要了解性能受到的影响,请检查原始正则表达式针对 < a href="https://regex101.com/r/ajQHrd/6" rel="noreferrer">翻转选项后的相同正则表达式 (22)。
性能的第二个问题是由于整个正则表达式的结构方式造成的。 如果一个选项失败了,就没有必要对每个选项进行回溯。 当前正则表达式的结构方式可以大大简化。 我在答案部分中提供了对此的修复。
问题 6 - 空格
查看此处使用的正则表达式
这可能不被视为问题 em> 本身,但这确实引起了大多数开发人员的担忧。 正则表达式中的空格不是可选的,这意味着输入邮政编码的用户必须在邮政编码中添加空格。 这是一个简单的修复方法,只需在空格后添加
?
即可将它们呈现为可选。 请参阅答案部分进行修复。答案
1. 修复英国政府的正则表达式
修复问题部分中概述的所有问题并简化模式会产生以下更短、更简洁的模式。 我们还可以删除大多数组,因为我们将邮政编码作为一个整体(而不是各个部分)进行验证:
参见此处使用的正则表达式
通过删除其中一种情况(大写或小写)中的所有范围并使用不区分大小写的标志,可以进一步缩短该范围。 注意:有些语言没有,因此请使用上面较长的一个。 每种语言以不同的方式实现不区分大小写标志。
查看此处使用的正则表达式。
再次用
\d
替换[0-9]
更短(如果您的正则表达式引擎支持它):查看此处使用的正则表达式。
2. 简化模式
在不确保特定字母字符的情况下,可以使用以下模式(请记住此处也应用了1.修复英国政府的正则表达式中的简化):
查看此处使用的正则表达式。
更进一步,如果您不关心特殊情况
GIR 0AA
:3. 复杂的模式
我不建议过度验证邮政编码,因为新的区域、区和分区可能会出现在任何地方时间点。 我建议可能做的是增加对边缘情况的支持。 存在一些特殊情况,并在这篇维基百科文章中进行了概述。
以下是复杂的正则表达式,其中包括 3. 的小节(3.1、3.2、3.3)。
与 1 中的模式相关。 修复英国政府的正则表达式:
查看此处使用的正则表达式
以及与2. 简化模式:
查看此处使用的正则表达式
3.1 英国海外领土 维基
百科文章目前指出(某些格式略有简化):
AI-1111
:安吉拉ASCN 1ZZ
:阿森松岛STHL 1ZZ
:圣赫勒拿TDCU 1ZZ
code>:特里斯坦达库尼亚BBND 1ZZ
:英属印度洋领地BIQQ 1ZZ
:英属南极领地FIQQ 1ZZ
:福克兰群岛GX11 1ZZ
:直布罗陀PCRN 1ZZ
:皮特凯恩群岛SIQQ 1ZZ
:南乔治亚岛和南桑威奇群岛TKCA 1ZZ
:特克斯和凯科斯群岛ZZ 11
&GE CX
:百慕大(根据本文档)KY1-1111
:开曼群岛(根据本文档)VG1111
:英属维尔京群岛(根据 本文档)MSR 1111
:蒙特塞拉特(根据本文件)仅匹配英国海外领土的包罗万象的正则表达式可能如下所示:
查看此处使用的正则表达式。
3.2 英国军队邮局
尽管最近已将其更改为
BF#
(其中#
代表数字),以更好地与英国邮政编码系统保持一致,但它们仍被视为可选的替代邮政编码。 这些邮政编码遵循BFPO
的格式,后跟 1-4 位数字:请参阅此处使用的正则表达式
3.3 圣诞老人?
圣诞老人还有另一个特殊情况(如其他答案中所述):
SAN TA1
是有效的邮政编码。 正则表达式非常简单:I recently posted an answer to this question on UK postcodes for the R language. I discovered that the UK Government's regex pattern is incorrect and fails to properly validate some postcodes. Unfortunately, many of the answers here are based on this incorrect pattern.
I'll outline some of these issues below and provide a revised regular expression that actually works.
Note
My answer (and regular expressions in general):
If you don't care about the bad regex and just want to skip to the answer, scroll down to the Answer section.
The Bad Regex
The regular expressions in this section should not be used.
This is the failing regex that the UK government has provided developers (not sure how long this link will be up, but you can see it in their Bulk Data Transfer documentation):
Problems
Problem 1 - Copy/Paste
See regex in use here.
As many developers likely do, they copy/paste code (especially regular expressions) and paste them expecting them to work. While this is great in theory, it fails in this particular case because copy/pasting from this document actually changes one of the characters (a space) into a newline character as shown below:
The first thing most developers will do is just erase the newline without thinking twice. Now the regex won't match postcodes with spaces in them (other than the
GIR 0AA
postcode).To fix this issue, the newline character should be replaced with the space character:
Problem 2 - Boundaries
See regex in use here.
The postcode regex improperly anchors the regex. Anyone using this regex to validate postcodes might be surprised if a value like
fooA11 1AA
gets through. That's because they've anchored the start of the first option and the end of the second option (independently of one another), as pointed out in the regex above.What this means is that
^
(asserts position at start of the line) only works on the first option([Gg][Ii][Rr] 0[Aa]{2})
, so the second option will validate any strings that end in a postcode (regardless of what comes before).Similarly, the first option isn't anchored to the end of the line
$
, soGIR 0AAfoo
is also accepted.To fix this issue, both options should be wrapped in another group (or non-capturing group) and the anchors placed around that:
Problem 3 - Improper Character Set
See regex in use here.
The regex is missing a
-
here to indicate a range of characters. As it stands, if a postcode is in the formatANA NAA
(whereA
represents a letter andN
represents a number), and it begins with anything other thanA
orZ
, it will fail.That means it will match
A1A 1AA
andZ1A 1AA
, but notB1A 1AA
.To fix this issue, the character
-
should be placed between theA
andZ
in the respective character set:Problem 4 - Wrong Optional Character Set
See regex in use here.
I swear they didn't even test this thing before publicizing it on the web. They made the wrong character set optional. They made
[0-9]
option in the fourth sub-option of option 2 (group 9). This allows the regex to match incorrectly formatted postcodes likeAAA 1AA
.To fix this issue, make the next character class optional instead (and subsequently make the set
[0-9]
match exactly once):Problem 5 - Performance
Performance on this regex is extremely poor. First off, they placed the least likely pattern option to match
GIR 0AA
at the beginning. How many users will likely have this postcode versus any other postcode; probably never? This means every time the regex is used, it must exhaust this option first before proceeding to the next option. To see how performance is impacted check the number of steps the original regex took (35) against the same regex after having flipped the options (22).The second issue with performance is due to the way the entire regex is structured. There's no point backtracking over each option if one fails. The way the current regex is structured can greatly be simplified. I provide a fix for this in the Answer section.
Problem 6 - Spaces
See regex in use here
This may not be considered a problem, per se, but it does raise concern for most developers. The spaces in the regex are not optional, which means the users inputting their postcodes must place a space in the postcode. This is an easy fix by simply adding
?
after the spaces to render them optional. See the Answer section for a fix.Answer
1. Fixing the UK Government's Regex
Fixing all the issues outlined in the Problems section and simplifying the pattern yields the following, shorter, more concise pattern. We can also remove most of the groups since we're validating the postcode as a whole (not individual parts):
See regex in use here
This can further be shortened by removing all of the ranges from one of the cases (upper or lower case) and using a case-insensitive flag. Note: Some languages don't have one, so use the longer one above. Each language implements the case-insensitivity flag differently.
See regex in use here.
Shorter again replacing
[0-9]
with\d
(if your regex engine supports it):See regex in use here.
2. Simplified Patterns
Without ensuring specific alphabetic characters, the following can be used (keep in mind the simplifications from 1. Fixing the UK Government's Regex have also been applied here):
See regex in use here.
And even further if you don't care about the special case
GIR 0AA
:3. Complicated Patterns
I would not suggest over-verification of a postcode as new Areas, Districts and Sub-districts may appear at any point in time. What I will suggest potentially doing, is added support for edge-cases. Some special cases exist and are outlined in this Wikipedia article.
Here are complex regexes that include the subsections of 3. (3.1, 3.2, 3.3).
In relation to the patterns in 1. Fixing the UK Government's Regex:
See regex in use here
And in relation to 2. Simplified Patterns:
See regex in use here
3.1 British Overseas Territories
The Wikipedia article currently states (some formats slightly simplified):
AI-1111
: AnguilaASCN 1ZZ
: Ascension IslandSTHL 1ZZ
: Saint HelenaTDCU 1ZZ
: Tristan da CunhaBBND 1ZZ
: British Indian Ocean TerritoryBIQQ 1ZZ
: British Antarctic TerritoryFIQQ 1ZZ
: Falkland IslandsGX11 1ZZ
: GibraltarPCRN 1ZZ
: Pitcairn IslandsSIQQ 1ZZ
: South Georgia and the South Sandwich IslandsTKCA 1ZZ
: Turks and Caicos IslandsBFPO 11
: Akrotiri and DhekeliaZZ 11
&GE CX
: Bermuda (according to this document)KY1-1111
: Cayman Islands (according to this document)VG1111
: British Virgin Islands (according to this document)MSR 1111
: Montserrat (according to this document)An all-encompassing regex to match only the British Overseas Territories might look like this:
See regex in use here.
3.2 British Forces Post Office
Although they've been recently changed it to better align with the British postcode system to
BF#
(where#
represents a number), they're considered optional alternative postcodes. These postcodes follow(ed) the format ofBFPO
, followed by 1-4 digits:See regex in use here
3.3 Santa?
There's another special case with Santa (as mentioned in other answers):
SAN TA1
is a valid postcode. A regex for this is very simply:看起来我们将使用
^(GIR ?0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9 ]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?[0-9][ABD-HJLNP-UW-Z]{2})$
,这是 Minglis 上面建议的稍作修改的版本。然而,我们将不得不调查到底是什么规则,因为上面列出的各种解决方案似乎对允许使用哪些字母应用了不同的规则。
经过一番研究,我们发现了更多信息。 显然,“govtalk.gov.uk”上的页面会将您指向邮政编码规范 govtalk-邮政编码。 这指向 XML Schema 处的 XML 架构,它提供邮政编码规则的“伪正则表达式”声明。
我们对此进行了一些改进,得到了以下表达式:
这使得空格成为可选的,但确实限制了一个空格(将 '&' 替换为 '{0,} 以获得无限的空格)。 它假设所有文本都必须大写。
如果您想允许小写字母和任意数量的空格,请使用:
这不涵盖海外领土,仅强制执行格式,而不强制执行不同区域的存在。 它基于以下规则:
可接受以下格式:
其中:
祝
Colin
It looks like we're going to be using
^(GIR ?0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?[0-9][ABD-HJLNP-UW-Z]{2})$
, which is a slightly modified version of that sugested by Minglis above.However, we're going to have to investigate exactly what the rules are, as the various solutions listed above appear to apply different rules as to which letters are allowed.
After some research, we've found some more information. Apparently a page on 'govtalk.gov.uk' points you to a postcode specification govtalk-postcodes. This points to an XML schema at XML Schema which provides a 'pseudo regex' statement of the postcode rules.
We've taken that and worked on it a little to give us the following expression:
This makes spaces optional, but does limit you to one space (replace the '&' with '{0,} for unlimited spaces). It assumes all text must be upper-case.
If you want to allow lower case, with any number of spaces, use:
This doesn't cover overseas territories and only enforces the format, NOT the existence of different areas. It is based on the following rules:
Can accept the following formats:
Where:
Best wishes
Colin
不存在能够验证邮政编码的综合英国邮政编码正则表达式。 您可以使用正则表达式检查邮政编码的格式是否正确; 并不是说它确实存在。
邮政编码任意复杂且不断变化。 例如,对于每个邮政编码区域,输出代码
W1
不会也可能永远不会包含 1 到 99 之间的每个数字。你不能指望当前存在的东西永远是真实的。 例如,1990 年,邮局认为阿伯丁变得有点拥挤。 他们在 AB1-5 的末尾添加了一个 0,使其成为 AB10-50,然后在它们之间创建了许多邮政编码。
每当修建一条新街道时,就会创建一个新的邮政编码。 这是获得建造许可过程的一部分; 地方当局有义务向邮局通报最新情况(并非所有人都这样做)。
此外,正如许多其他用户所指出的,还有一些特殊的邮政编码,例如 Girobank、GIR 0AA,以及写给圣诞老人的信件的邮政编码 SAN TA1 - 您可能不想在那里发布任何内容,但它似乎并没有被任何其他答案覆盖。
然后是 BFPO 邮政编码,现在更改为更标准的邮政编码格式。 两种格式都有效。 最后,还有海外领土来源维基百科。
接下来,您必须考虑到英国将其邮政编码系统“出口”到世界许多地方。 任何验证“英国”邮政编码的内容也将验证许多其他国家/地区的邮政编码。
如果您想验证英国邮政编码,最安全的方法是查找当前邮政编码。 有多种选项:
地形测量局发布Code-Point根据开放数据许可证开放。 它会稍微落后于时代,但它是免费的。 这将(可能 - 我不记得了)不包括北爱尔兰的数据,因为地形测量局在那里没有职权范围。 北爱尔兰的测绘由北爱尔兰地形测量局进行,他们有自己独立的付费指针产品。 您可以使用它并附加一些不太容易涵盖的内容。
皇家邮政发布了邮政编码地址文件 (PAF),其中包括我不确定代码的 BFPO -Point Open 可以。 它会定期更新,但要花钱(有时他们对此可能非常刻薄)。 PAF 包含完整地址而不仅仅是邮政编码,并附带其自己的 程序员指南。 开放数据用户组 (ODUG) 目前正在游说免费发布 PAF,这是对其立场的描述。
最后,还有AddressBase。 这是英国地形测量局、地方当局、皇家邮政和一家匹配公司之间的合作,旨在创建有关所有英国地址的所有信息的明确目录(他们也相当成功)。 它是付费的,但如果您与地方当局、政府部门或政府服务机构合作,他们可以免费使用。 除了邮政编码之外,还包含更多信息。
There is no such thing as a comprehensive UK postcode regular expression that is capable of validating a postcode. You can check that a postcode is in the correct format using a regular expression; not that it actually exists.
Postcodes are arbitrarily complex and constantly changing. For instance, the outcode
W1
does not, and may never, have every number between 1 and 99, for every postcode area.You can't expect what is there currently to be true forever. As an example, in 1990, the Post Office decided that Aberdeen was getting a bit crowded. They added a 0 to the end of AB1-5 making it AB10-50 and then created a number of postcodes in between these.
Whenever a new street is build a new postcode is created. It's part of the process for obtaining permission to build; local authorities are obliged to keep this updated with the Post Office (not that they all do).
Furthermore, as noted by a number of other users, there's the special postcodes such as Girobank, GIR 0AA, and the one for letters to Santa, SAN TA1 - you probably don't want to post anything there but it doesn't appear to be covered by any other answer.
Then, there's the BFPO postcodes, which are now changing to a more standard format. Both formats are going to be valid. Lastly, there's the overseas territories source Wikipedia.
Next, you have to take into account that the UK "exported" its postcode system to many places in the world. Anything that validates a "UK" postcode will also validate the postcodes of a number of other countries.
If you want to validate a UK postcode the safest way to do it is to use a look-up of current postcodes. There are a number of options:
Ordnance Survey releases Code-Point Open under an open data licence. It'll be very slightly behind the times but it's free. This will (probably - I can't remember) not include Northern Irish data as the Ordnance Survey has no remit there. Mapping in Northern Ireland is conducted by the Ordnance Survey of Northern Ireland and they have their, separate, paid-for, Pointer product. You could use this and append the few that aren't covered fairly easily.
Royal Mail releases the Postcode Address File (PAF), this includes BFPO which I'm not sure Code-Point Open does. It's updated regularly but costs money (and they can be downright mean about it sometimes). PAF includes the full address rather than just postcodes and comes with its own Programmers Guide. The Open Data User Group (ODUG) is currently lobbying to have PAF released for free, here's a description of their position.
Lastly, there's AddressBase. This is a collaboration between Ordnance Survey, Local Authorities, Royal Mail and a matching company to create a definitive directory of all information about all UK addresses (they've been fairly successful as well). It's paid-for but if you're working with a Local Authority, government department, or government service it's free for them to use. There's a lot more information than just postcodes included.
我查看了上面的一些答案,我建议不要使用 @Dan 的 回答(约 2010 年 12 月 15 日),因为它错误地将近 0.4% 的有效邮政编码标记为无效,而其他邮政编码则不然。
Ordnance Survey 提供称为 Code Point Open 的服务,其中:
我使用
grep
根据此数据的邮政编码完整列表(2013 年 7 月 6 日)运行了上面的每个正则表达式:总共有 1,686,202 个邮政编码。
以下是与每个
$pattern
不匹配的有效邮政编码的数量:当然,这些结果仅处理被错误标记为无效的有效邮政编码。 所以:
我没有说哪种模式最适合过滤掉无效的邮政编码。
# => 6016 (0.36%)当然,这些结果仅处理被错误标记为无效的有效邮政编码。 所以:
我没有说哪种模式最适合过滤掉无效的邮政编码。
# => 0当然,这些结果仅处理被错误标记为无效的有效邮政编码。 所以:
我没有说哪种模式最适合过滤掉无效的邮政编码。
# => 6016 (0.36%)当然,这些结果仅处理被错误标记为无效的有效邮政编码。 所以:
我没有说哪种模式最适合过滤掉无效的邮政编码。
# => 0我没有说哪种模式最适合过滤掉无效的邮政编码。
# => 6016 (0.36%)当然,这些结果仅处理被错误标记为无效的有效邮政编码。 所以:
我没有说哪种模式最适合过滤掉无效的邮政编码。
# => 0当然,这些结果仅处理被错误标记为无效的有效邮政编码。 所以:
我没有说哪种模式最适合过滤掉无效的邮政编码。
# => 6016 (0.36%)当然,这些结果仅处理被错误标记为无效的有效邮政编码。 所以:
我没有说哪种模式最适合过滤掉无效的邮政编码。
# => 0当然,这些结果仅处理被错误标记为无效的有效邮政编码。 所以:
我没有说哪种模式最适合过滤掉无效的邮政编码。
# => 6016 (0.36%)当然,这些结果仅处理被错误标记为无效的有效邮政编码。 所以:
我没有说哪种模式最适合过滤掉无效的邮政编码。
# => 0当然,这些结果仅处理被错误标记为无效的有效邮政编码。 所以:
我没有说哪种模式最适合过滤掉无效的邮政编码。
# => 6016 (0.36%)当然,这些结果仅处理被错误标记为无效的有效邮政编码。 所以:
我没有说哪种模式最适合过滤掉无效的邮政编码。
I had a look into some of the answers above and I'd recommend against using the pattern from @Dan's answer (c. Dec 15 '10), since it incorrectly flags almost 0.4% of valid postcodes as invalid, while the others do not.
Ordnance Survey provide service called Code Point Open which:
I ran each of the regexs above against the full list of postcodes (Jul 6 '13) from this data using
grep
:There are 1,686,202 postcodes total.
The following are the numbers of valid postcodes that do not match each
$pattern
:Of course, these results only deal with valid postcodes that are incorrectly flagged as invalid. So:
I'm saying nothing about which pattern is the best regarding filtering out invalid postcodes.
# => 6016 (0.36%)Of course, these results only deal with valid postcodes that are incorrectly flagged as invalid. So:
I'm saying nothing about which pattern is the best regarding filtering out invalid postcodes.
# => 0Of course, these results only deal with valid postcodes that are incorrectly flagged as invalid. So:
I'm saying nothing about which pattern is the best regarding filtering out invalid postcodes.
# => 6016 (0.36%)Of course, these results only deal with valid postcodes that are incorrectly flagged as invalid. So:
I'm saying nothing about which pattern is the best regarding filtering out invalid postcodes.
# => 0I'm saying nothing about which pattern is the best regarding filtering out invalid postcodes.
# => 6016 (0.36%)Of course, these results only deal with valid postcodes that are incorrectly flagged as invalid. So:
I'm saying nothing about which pattern is the best regarding filtering out invalid postcodes.
# => 0Of course, these results only deal with valid postcodes that are incorrectly flagged as invalid. So:
I'm saying nothing about which pattern is the best regarding filtering out invalid postcodes.
# => 6016 (0.36%)Of course, these results only deal with valid postcodes that are incorrectly flagged as invalid. So:
I'm saying nothing about which pattern is the best regarding filtering out invalid postcodes.
# => 0Of course, these results only deal with valid postcodes that are incorrectly flagged as invalid. So:
I'm saying nothing about which pattern is the best regarding filtering out invalid postcodes.
# => 6016 (0.36%)Of course, these results only deal with valid postcodes that are incorrectly flagged as invalid. So:
I'm saying nothing about which pattern is the best regarding filtering out invalid postcodes.
# => 0Of course, these results only deal with valid postcodes that are incorrectly flagged as invalid. So:
I'm saying nothing about which pattern is the best regarding filtering out invalid postcodes.
# => 6016 (0.36%)Of course, these results only deal with valid postcodes that are incorrectly flagged as invalid. So:
I'm saying nothing about which pattern is the best regarding filtering out invalid postcodes.
http://regexlib.com/REDetails.aspx?regexp_id=260
http://regexlib.com/REDetails.aspx?regexp_id=260
根据此维基百科表
此模式涵盖了所有情况
在Android \ Java上使用它时使用\\ d
According to this Wikipedia table
This pattern cover all the cases
When using it on Android\Java use \\d
这里的大多数答案并不适用于我数据库中的所有邮政编码。 我终于找到了一个可以通过政府提供的新正则表达式进行验证的方法:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/413338/Bulk_Data_Transfer_-_additional_validation_valid_from_March_2015.pdf
它不在任何之前的答案,所以我将其发布在这里,以防他们删除链接:
更新:更新了杰米·布尔(Jamie Bull)指出的正则表达式。 不确定这是我的复制错误还是政府正则表达式中的错误,链接现在已关闭...
更新:正如 ctwheels 发现的那样,此正则表达式适用于 javascript 正则表达式风格。 请参阅他的评论,了解适用于 PCRE (php) 风格的评论。
Most of the answers here didn't work for all the postcodes I have in my database. I finally found one that validates with all, using the new regex provided by the government:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/413338/Bulk_Data_Transfer_-_additional_validation_valid_from_March_2015.pdf
It isn't in any of the previous answers so I post it here in case they take the link down:
UPDATE: Updated regex as pointed by Jamie Bull. Not sure if it was my error copying or it was an error in the government's regex, the link is down now...
UPDATE: As ctwheels found, this regex works with the javascript regex flavor. See his comment for one that works with the pcre (php) flavor.
这是 Google 在其 i18napis.appspot.com 域上提供的正则表达式:
This is the regex Google serves on their i18napis.appspot.com domain:
一篇旧帖子,但在谷歌结果中仍然很高,所以我想更新一下。 此 10 月 14 日文档将英国邮政编码正则表达式定义为:
来自:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/359448/4__Bulk_Data_Transfer__-_additional_validation_valid.pdf
该文档还解释了其背后的逻辑。 然而,它有一个错误(粗体),并且也允许小写,尽管合法并不常见,所以修改版本:
这适用于新的伦敦邮政编码(例如 W1D 5LH),而以前的版本则不适用。
An old post but still pretty high in google results so thought I'd update. This Oct 14 doc defines the UK postcode regular expression as:
from:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/359448/4__Bulk_Data_Transfer_-_additional_validation_valid.pdf
The document also explains the logic behind it. However, it has an error (bolded) and also allows lower case, which although legal is not usual, so amended version:
This works with new London postcodes (e.g. W1D 5LH) that previous versions did not.
邮政编码可能会发生变化,验证邮政编码的唯一正确方法是拥有完整的邮政编码列表并查看它是否存在。
但正则表达式很有用,因为它们:
但正则表达式往往难以维护,尤其是对于那些一开始就没有想到的人。 所以它必须是:
这意味着这个答案中的大多数正则表达式都不够好。 例如,我可以看到
[A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y]
将匹配 AA1A 形式的邮政编码区域 - 但它是如果添加新的邮政编码区域,这将是一件令人头疼的事情,因为很难理解它匹配哪些邮政编码区域。我还希望我的正则表达式将邮政编码的前半部分和后半部分作为括号匹配进行匹配。
所以我想出了这个:
在 PCRE 格式中,它可以写成如下:
对我来说,这是尽可能多地验证、同时面向未来和易于维护之间的正确平衡。
Postcodes are subject to change, and the only true way of validating a postcode is to have the complete list of postcodes and see if it's there.
But regular expressions are useful because they:
But regular expressions tend to be difficult to maintain, especially for someone who didn't come up with it in the first place. So it must be:
That means that most of the regular expressions in this answer aren't good enough. E.g. I can see that
[A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y]
is going to match a postcode area of the form AA1A — but it's going to be a pain in the neck if and when a new postcode area gets added, because it's difficult to understand which postcode areas it matches.I also want my regular expression to match the first and second half of the postcode as parenthesised matches.
So I've come up with this:
In PCRE format it can be written as follows:
For me this is the right balance between validating as much as possible, while at the same time future-proofing and allowing for easy maintenance.