php regex ,从 text/html 中提取电话号码

发布于 2024-10-19 06:42:23 字数 695 浏览 3 评论 0原文

可能的重复:
php 正则表达式,从 html 文档中提取类似电话号码的正则表达式< /a>

我正在尝试从不同的 html 页面中提取电话号码。基本上,该信息是一个 10 位数字,可能有不同的形式,例如:

000-000-0000
000 - 000 - 0000
0000000000
please note that 000 - 000 - 0000000 is not a valid phone number so it should not extract the number if it contains any additional digits

如果您能帮助我创建适用于所有 3 种情况的完美正则表达式,我将不胜感激。到目前为止,我只能让它适用于最后一个(最简单的一个)。

Possible Duplicate:
php regex, extract like phone number regex from html documents

I'm trying to extract phone numbers from different html pages. Basically the information is a 10 digits number which may have different forms such :

000-000-0000
000 - 000 - 0000
0000000000
please note that 000 - 000 - 0000000 is not a valid phone number so it should not extract the number if it contains any additional digits

I would appreciate any help to create the perfect regex working on all the 3 situations . So far I could make it work only for the last one (the simplest one ).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

疏忽 2024-10-26 06:42:23

这是一个很好的起点:

<?php 

// all on one line... 
$regex = '/^(?:1(?:[. -])?)?(?:\((?=\d{3}\)))?([2-9]\d{2})(?:(?<=\(\d{3})\))? ?(?:(?<=\d{3})[.-])?([2-9]\d{2})[. -]?(\d{4})(?: (?i:ext)\.? ?(\d{1,5}))?$/';

// or broken up 
$regex = '/^(?:1(?:[. -])?)?(?:\((?=\d{3}\)))?([2-9]\d{2})' 
        .'(?:(?<=\(\d{3})\))? ?(?:(?<=\d{3})[.-])?([2-9]\d{2})' 
        .'[. -]?(\d{4})(?: (?i:ext)\.? ?(\d{1,5}))?$/'; 

?> 

注意非捕获子模式(看起来像 (?:stuff))。这使得格式化变得容易:

<?php 

$formatted = preg_replace($regex, '($1) $2-$3 ext. $4', $phoneNumber); 

// or, provided you use the $matches argument in preg_match 

$formatted = "($matches[1]) $matches[2]-$matches[3]"; 
if ($matches[4]) $formatted .= " $matches[4]"; 

?>

还有一些示例结果:

520-555-5542 :: MATCH 
520.555.5542 :: MATCH 
5205555542 :: MATCH 
520 555 5542 :: MATCH 
520) 555-5542 :: FAIL 
(520 555-5542 :: FAIL 
(520)555-5542 :: MATCH 
(520) 555-5542 :: MATCH 
(520) 555 5542 :: MATCH 
520-555.5542 :: MATCH 
520 555-0555 :: MATCH 
(520)5555542 :: MATCH 
520.555-4523 :: MATCH 
19991114444 :: FAIL 
19995554444 :: MATCH 
514 555 1231 :: MATCH 
1 555 555 5555 :: MATCH 
1.555.555.5555 :: MATCH 
1-555-555-5555 :: MATCH 
520-555-5542 ext.123 :: MATCH 
520.555.5542 EXT 123 :: MATCH 
5205555542 Ext. 7712 :: MATCH 
520 555 5542 ext 5 :: MATCH 
520) 555-5542 :: FAIL 
(520 555-5542 :: FAIL 
(520)555-5542 ext .4 :: FAIL 
(512) 555-1234 ext. 123 :: MATCH 
1(555)555-5555 :: MATCH

如果您像您建议的那样允许空格和破折号,您可能会得到很多误报。

Here's a good starting point:

<?php 

// all on one line... 
$regex = '/^(?:1(?:[. -])?)?(?:\((?=\d{3}\)))?([2-9]\d{2})(?:(?<=\(\d{3})\))? ?(?:(?<=\d{3})[.-])?([2-9]\d{2})[. -]?(\d{4})(?: (?i:ext)\.? ?(\d{1,5}))?$/';

// or broken up 
$regex = '/^(?:1(?:[. -])?)?(?:\((?=\d{3}\)))?([2-9]\d{2})' 
        .'(?:(?<=\(\d{3})\))? ?(?:(?<=\d{3})[.-])?([2-9]\d{2})' 
        .'[. -]?(\d{4})(?: (?i:ext)\.? ?(\d{1,5}))?$/'; 

?> 

Note the non-capturing subpatterns (which look like (?:stuff)). That makes formatting easy:

<?php 

$formatted = preg_replace($regex, '($1) $2-$3 ext. $4', $phoneNumber); 

// or, provided you use the $matches argument in preg_match 

$formatted = "($matches[1]) $matches[2]-$matches[3]"; 
if ($matches[4]) $formatted .= " $matches[4]"; 

?>

And some example results for you:

520-555-5542 :: MATCH 
520.555.5542 :: MATCH 
5205555542 :: MATCH 
520 555 5542 :: MATCH 
520) 555-5542 :: FAIL 
(520 555-5542 :: FAIL 
(520)555-5542 :: MATCH 
(520) 555-5542 :: MATCH 
(520) 555 5542 :: MATCH 
520-555.5542 :: MATCH 
520 555-0555 :: MATCH 
(520)5555542 :: MATCH 
520.555-4523 :: MATCH 
19991114444 :: FAIL 
19995554444 :: MATCH 
514 555 1231 :: MATCH 
1 555 555 5555 :: MATCH 
1.555.555.5555 :: MATCH 
1-555-555-5555 :: MATCH 
520-555-5542 ext.123 :: MATCH 
520.555.5542 EXT 123 :: MATCH 
5205555542 Ext. 7712 :: MATCH 
520 555 5542 ext 5 :: MATCH 
520) 555-5542 :: FAIL 
(520 555-5542 :: FAIL 
(520)555-5542 ext .4 :: FAIL 
(512) 555-1234 ext. 123 :: MATCH 
1(555)555-5555 :: MATCH

You'll probably get a lot of false positives if you allow spaces and dashes like you're suggesting.

堇色安年 2024-10-26 06:42:23

如果您想允许无限制地组合 10 位数字,那么这样就可以了:

^\D?((?:\d\D*){10})$

If you want to allow unlimited combinations of exactly 10 digits, then this will do the trick:

^\D?((?:\d\D*){10})$
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文