当前位置：文江博客话题详情

用于查找电话号码的正则表达式

发布于 2024-09-01 12:34:29 字数 621 浏览 8 评论 0原文

可能的重复：
用于电话号码验证的综合正则表达式
 使用正则表达式查找电话号码

大家好，

我是 Stackoverflow 的新手，我有一个简单的问题。假设我们有大量的 HTML 文件（理论上无限大）。如何使用正则表达式从所有这些文件中提取电话号码列表？

解释/表达将非常感激。电话号码可以采用以下任意格式：

(123) 456 7899
(123).456.7899
(123)-456-7899
123-456-7899
123 456 7899
1234567899

非常感谢您的所有帮助，祝您一切顺利！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

眼眸印温柔 2024-09-08 12:34:29

<代码>/^[\.-)( ]*([0-9]{3})[\.-)( ]*([0-9]{3})[\.-)( ]*( [0-9]{4})$/

应该完成您想要做的事情，

第一部分 ^ 表示“行的开头”，这将迫使它考虑。我在那里的

[\.-)( ]* 表示“任何出现 0 次或多次的句点、连字符、括号或空格”。

([0 -9]{3}) 簇匹配一组 3 个数字（最后一个设置为匹配 4）

希望有帮助！

回复收藏 0 原文

智商已欠费 2024-09-08 12:34:29

在不知道您使用什么语言的情况下，我不确定语法是否正确。

这应该匹配您的所有组，并且误报率非常低：

/\(?([0-9]{3})\)?([ .-]?)([0-9]{3})\2([0-9]{4})/

匹配后您感兴趣的组是组 1、组 3 和组 4。组 2 的存在只是为了确保第一个和第二个分隔符 、. 或 - 是相同的。

例如，使用 sed 命令删除字符并以 123456789 的形式保留电话号码：

sed "s/(\{0,1\}\([0-9]\{3\}\))\{0,1\}\([ .-]\{0,1\}\)\([0-9]\{3\}\)\2\([0-9]\{4\}\)/\1\3\4/"

以下是我的表达式的误报：

(123)456789
(123456789
(123 456 789
(123.456.789
(123-456-789
123)456789)
123) 456 789
123).456.789
123)-456-789

将表达式分成两部分，一部分与括号匹配，另一部分不匹配，这将消除除第一个之外的所有误报：

/\(([0-9]{3})\)([ .-]?)([0-9]{3})\2([0-9]{4})|([0-9]{3})([ .-]?)([0-9]{3})\5([0-9]{4})/

第 1 组、第 3 组、在这种情况下，4 或 5、7 和 8 很重要。

Without knowing what language you're using I am unsure whether or not the syntax is correct.

This should match all of your groups with very few false positives:

/\(?([0-9]{3})\)?([ .-]?)([0-9]{3})\2([0-9]{4})/

The groups you will be interested in after the match are groups 1, 3, and 4. Group 2 exists only to make sure the first and second separator characters , ., or - are the same.

For example a sed command to strip the characters and leave phone numbers in the form 123456789:

sed "s/(\{0,1\}\([0-9]\{3\}\))\{0,1\}\([ .-]\{0,1\}\)\([0-9]\{3\}\)\2\([0-9]\{4\}\)/\1\3\4/"

Here are the false positives of my expression:

(123)456789
(123456789
(123 456 789
(123.456.789
(123-456-789
123)456789
123) 456 789
123).456.789
123)-456-789

Breaking up the expression into two parts, one that matches with parenthesis and one that does not will eliminate all of these false positives except for the first one:

/\(([0-9]{3})\)([ .-]?)([0-9]{3})\2([0-9]{4})|([0-9]{3})([ .-]?)([0-9]{3})\5([0-9]{4})/

Groups 1, 3, and 4 or 5, 7, and 8 would matter in this case.

回复收藏 0 原文

若有似无的小暗淡 2024-09-08 12:34:29

这将帮助您捕获括号中带有区号的代码

([0-9]\{3\})[ .-][0-9]\{3\}[ .-][0-9]\{4\}

其他代码是：

[0-9]\{3\}[ -][0-9]\{3\}[ -][0-9]\{4\}
[0-9]\{10\}

我将第一个和第二个分开，因为将它们放在一起而不回溯可能会让您接受 (123 456 7890 或 < code>123) 456 7890

另请注意，在我的终端上使用 grep 时，我必须转义 { } 才能重复。您可能不需要，或者您可能必须转义其他字符，具体取决于您打算使用它的位置。

This will help you catch the ones with an area code in parentheses

([0-9]\{3\})[ .-][0-9]\{3\}[ .-][0-9]\{4\}

The others are:

[0-9]\{3\}[ -][0-9]\{3\}[ -][0-9]\{4\}
[0-9]\{10\}

I separated the first one and the second one because putting them together without backtracking could get you into accepting (123 456 7890 or 123) 456 7890

Note also that on my terminal using grep, I had to escape the { } for the repetition. You may not have to, or you may have to escape other characters depending on where you intend to use this.

回复收藏 0 原文

北陌 2024-09-08 12:34:29

^($?\d{3}$?)([ .-])(\d{3})([ .-])(\d{4})$

这应该匹配除最后一个模式之外的所有模式。
对于最后一个，您可以使用分隔模式 ^\d{10}$

并且有一个错误，它将匹配 (123 456 7899

^($?\d{3}$?)，如果我们破坏此代码，第一个字符 (^) 与文本的开头匹配。 ? 和 \)? 会接受或不接受这个字符，有一个问题是你必须检查是否有一个起始字符，如果有第二个必须匹配，我不知道是否可以仅使用正则表达式 \d{3} 匹配三个数字
([ .-]) 将匹配其中的任何一个，但只能匹配一个且仅一次。
(\d{3}) 将匹配三个数字
与 2 相同< /p>
，四个数字后跟文本末尾 ($)

由于您想从 HTML 页面中提取内容，因此必须忽略 ^< /code> 和 $ 来匹配文本的任何部分并设置一个标志 global，在 javascript 中 /exp/g

可以在此处测试正则表达式