ruby 中的格式字符串(邮政编码)

发布于 2024-08-28 02:20:38 字数 290 浏览 7 评论 0原文

我需要重新格式化英国邮政编码列表,并从以下内容开始删除空格并大写:

postcode.upcase.gsub(/\s/,'')

我现在需要更改邮政编码,以便新的邮政编码将采用与以下正则表达式匹配的格式:

^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$

我将不胜感激任何协助。

I need to re-format a list of UK postcodes and have started with the following to strip whitespace and capitalize:

postcode.upcase.gsub(/\s/,'')

I now need to change the postcode so the new postcode will be in a format that will match the following regexp:

^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$

I would be grateful of any assistance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

莳間冲淡了誓言ζ 2024-09-04 02:20:38

如果此标准文档可信(并且维基百科同意),格式化有效的邮政编码以进行输出很简单:最后三个字符是第二部分,前面的都是第一部分!

因此,假设您有一个有效的邮政编码,没有任何预先嵌入的空间,您只需要

def format_post_code(pc)
  pc.strip.sub(/([A-Z0-9]+)([A-Z0-9]{3})/, '\1 \2')
end

如果您想首先验证输入邮政编码,那么您提供的正则表达式看起来是一个很好的起点。也许是这样的?

NORMAL_POSTCODE_RE = /^([A-PR-UWYZ][A-HK-Y0-9][A-HJKS-UW0-9]?[A-HJKS-UW0-9]?)\s*([0-9][ABD-HJLN-UW-Z]{2})$/i
GIROBANK_POSTCODE_RE = /^GIR\s*0AA$/i
def format_post_code(pc)
  return pc.strip.upcase.sub(NORMAL_POSTCODE_RE, '\1 \2') if pc =~ NORMAL_POSTCODE_RE
  return 'GIR 0AA' if pc =~ GIROBANK_POSTCODE_RE
end

请注意,我删除了第一个字符的“0-9”部分,根据我引用的来源,这似乎是不必要的。我还更改了 alpha 集以匹配第一个引用的文档。它仍然不完美:例如,格式为“AAA ANN”的代码可以验证,我认为可能需要更复杂的 RE。

我认为这可能会覆盖它(分阶段构建以便更容易修复!)

A1  = "[A-PR-UWYZ]"
A2  = "[A-HK-Y]"
A34 = "[A-HJKS-UW]"        # assume rule for alpha in fourth char is same as for third
A5  = "[ABD-HJLN-UW-Z]"
N   = "[0-9]"
AANN = A1 + A2 + N + N     # the six possible first-part combos
AANA = A1 + A2 + N + A34
ANA  = A1 + N + A34
ANN  = A1 + N + N
AAN  = A1 + A2 + N
AN   = A1 + N
PART_ONE = [AANN, AANA, ANA, ANN, AAN, AN].join('|') 
PART_TWO = N + A5 + A5

NORMAL_POSTCODE_RE = Regexp.new("^(#{PART_ONE})[ ]*(#{PART_TWO})$", Regexp::IGNORECASE)  

If this standards doc is to be believed (and Wikipedia concurs), formatting a valid post code for output is straightforward: the last three characters are the second part, everything before is the first part!

So assuming you have a valid postcode, without any pre-embedded space, you just need

def format_post_code(pc)
  pc.strip.sub(/([A-Z0-9]+)([A-Z0-9]{3})/, '\1 \2')
end

If you want to validate an input post code first, then the regex you gave looks like a good starting point. Perhaps something like this?

NORMAL_POSTCODE_RE = /^([A-PR-UWYZ][A-HK-Y0-9][A-HJKS-UW0-9]?[A-HJKS-UW0-9]?)\s*([0-9][ABD-HJLN-UW-Z]{2})$/i
GIROBANK_POSTCODE_RE = /^GIR\s*0AA$/i
def format_post_code(pc)
  return pc.strip.upcase.sub(NORMAL_POSTCODE_RE, '\1 \2') if pc =~ NORMAL_POSTCODE_RE
  return 'GIR 0AA' if pc =~ GIROBANK_POSTCODE_RE
end

Note that I removed the '0-9' part of the first character, which appears unnecessary according to the sources I quoted. I also changed the alpha sets to match the first-cited document. It's still not perfect: a code of the format 'AAA ANN' validates, for example, and I think a more complex RE is probably required.

I think this might cover it (constructed in stages for easier fixing!)

A1  = "[A-PR-UWYZ]"
A2  = "[A-HK-Y]"
A34 = "[A-HJKS-UW]"        # assume rule for alpha in fourth char is same as for third
A5  = "[ABD-HJLN-UW-Z]"
N   = "[0-9]"
AANN = A1 + A2 + N + N     # the six possible first-part combos
AANA = A1 + A2 + N + A34
ANA  = A1 + N + A34
ANN  = A1 + N + N
AAN  = A1 + A2 + N
AN   = A1 + N
PART_ONE = [AANN, AANA, ANA, ANN, AAN, AN].join('|') 
PART_TWO = N + A5 + A5

NORMAL_POSTCODE_RE = Regexp.new("^(#{PART_ONE})[ ]*(#{PART_TWO})$", Regexp::IGNORECASE)  
久随 2024-09-04 02:20:38

英国邮政编码并不一致,但它们是有限的 - 使用查找表可能会更好。

UK Postcodes aren't consistent, but they are finite - you might be better with a look-up table.

温柔女人霸气范 2024-09-04 02:20:38

重新格式化还是模式匹配?我怀疑后者,尽管首先升级它是一个好主意。

在我们继续之前,我会指出您正在删除空格,但您的正则表达式包含“ {1,2}”,即“一个或两个空格字符”。由于您已经删除了空格,因此您已经导致所有匹配失败。

给定邮政编码作为输入,我们可以使用 =~ 检查它是否与正则表达式匹配。

这里我们创建一些邮政编码示例(取自 维基百科页面),并根据正则表达式测试每一个:

post_codes = ["M1 1AA", "M60 1NW", "CR2 6XH", "DN55 1PT", "W1A 1HQ", "EC1A 1BB", "bad one", "cc93h29r2"]
r = /^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$/

post_codes.each do |pc|
  # pc =~ r will return something true if we have a match (specifically the integer of first match position)
  # We use !! to display it as true|false
  puts "#{pc}: #{!!(pc =~ r)}"
end
M1 1AA: true
M60 1NW: true
CR2 6XH: true
DN55 1PT: true
W1A 1HQ: true
EC1A 1BB: true
bad one: false
cc93h29r2: false

Reformat or pattern match? I suspect the latter, although upcasing it first is a good idea.

Before we proceed though I would point out that you are stripping spaces but your regex contains " {1,2}" which is "one or two space characters". As you have already stripped whitespace you've already caused all to fail the match.

Given a post code as input we can check whether it matches the regex using =~

Here we create some example post codes (taken from the wikipedia page), and test each one against the regex:

post_codes = ["M1 1AA", "M60 1NW", "CR2 6XH", "DN55 1PT", "W1A 1HQ", "EC1A 1BB", "bad one", "cc93h29r2"]
r = /^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$/

post_codes.each do |pc|
  # pc =~ r will return something true if we have a match (specifically the integer of first match position)
  # We use !! to display it as true|false
  puts "#{pc}: #{!!(pc =~ r)}"
end
M1 1AA: true
M60 1NW: true
CR2 6XH: true
DN55 1PT: true
W1A 1HQ: true
EC1A 1BB: true
bad one: false
cc93h29r2: false
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文