通用电话号码的正则表达式
我有一个通用电话号码的正则表达式:
**/^(\+\d)*\s*(\(\d{3}\)\s*)*\d{3}(-{0,1}|\s{0,1})\d{2}(-{0,1}|\s{0,1})\d{2}$/**
它接受以下字符串:
339-4248
(095) 2569835
+7 (095) 1452389
+1(963)9632587
+12365874
2365789
但它不接受
+12589637412
+1 963 9632587
+1701234567
这是怎么回事?请帮我找出我错在哪里。
I have a regex for universal phone numbers:
**/^(\+\d)*\s*(\(\d{3}\)\s*)*\d{3}(-{0,1}|\s{0,1})\d{2}(-{0,1}|\s{0,1})\d{2}$/**
It is accepting the following strings:
339-4248
(095) 2569835
+7 (095) 1452389
+1(963)9632587
+12365874
2365789
But it's not accepting
+12589637412
+1 963 9632587
+1701234567
What's the matter with this? Please help me figure out where I am wrong.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
为什么你关心用户在哪里想要分解数字组或者他们使用什么字符来分解?在这里(瑞典),经常会看到一个人将给定的电话号码写为
046 123 456 789
,而其他人则将其写为046 123 45 67 89
,但两者都是拨打方式相同,同样有效。 (就此而言,将是04 61 2345 6 78 9
- 这不是我见过的格式,但它仍然以相同的方式拨号。)只需删除非数字字符(除了一个前导的
+
,因为这是有意义的),检查它是否是合理的数字位数,存储它,并在显示数字时将其呈现为您喜欢的格式。或者保留用户输入的格式,尽管这样您需要采取正常的预防措施来防止 SQL 注入、CSS、XSRF 等攻击。Why do you care where users care to break up the groups of digits or what characters they use to do so? Around here (Sweden), it's common to see one person write a given phone number as
046 123 456 789
and someone else write it046 123 45 67 89
, but both are dialed identically and are equally valid. (As, for that matter, would be04 61 2345 6 78 9
- not a format I've ever seen used, but it still dials identically.)Just strip out non-numeric characters (other than a leading
+
, since that's meaningful), check that it's a reasonable number of digits, store that, and render it into your preferred format when displaying the number. Or keep the format as entered by the user, although then you need to take the normal precautions to prevent SQL injection, CSS, XSRF, etc. attacks.它只接受某些数字的倍数,并且只接受数字内某些位置的空格。我的建议是放弃它并恢复到一个非常简单、轻松的检查,或者一个有记录的、受支持的、经过国际测试的解决方案(libphone 或类似的)。
It only accepts certain multiples of digits, and it only accepts spaces in some places within a number. My recommendation would be to ditch it and revert to a really simple, relaxed check, or else a documented, supported, internationally tested solution (libphone or some such).
您可以做的一件事就是研究所有格式。你已经找到了一些好的。这里还有更多: http://en.wikipedia.org/wiki/Local_conventions_for_writing_telephone_numbers
接下来你想要在您的语料库中查找包含电话号码的文档,以及包含非电话号码的其他文档。如果您经常处理结构化数据,则不需要这样做。这个想法是你想要一个对照组来表明你没有过度扩张。
然后你想要得到像 Visual-regexp (一个常见的独立于操作系统的软件包)之类的东西,并将你的文本放入其中并开始创建正则表达式,直到涵盖所有情况。
仅用你的例子来做到这一点,我想出了这个:
regexp -nocase -all -line -- {+?(?[0-9])?\ ?[0-9-]} 字符串匹配
--Pete
One thing you can do is to research all the formats. You have found a few good ones. There are more here: http://en.wikipedia.org/wiki/Local_conventions_for_writing_telephone_numbers
Next you want to find documents in your corpus that have phone numbers in them, and others that have numbers that aren't phone numbers. This isn't needed if you are dealing with structured data as much. The idea is you want a control group to show you aren't overreaching.
Then you want to get something like visual-regexp (a common OS independent software package) and put your text into it and start creating regex's until you cover all of your cases.
Doing that with just your examples I came up with this:
regexp -nocase -all -line -- {+?(?[0-9])?\ ?[0-9-]} string match
--Pete