当前位置：文江博客话题详情

如何知道文本字符串是 facebook url、电子邮件地址还是其他 uri？

发布于 2024-11-01 08:02:46 字数 389 浏览 6 评论 0原文

我正在创建一个系统来注册不同的活动。对于每个事件，它存储一个地址，可以是以下之一：

Facebook 资源（基本上以“facebook.com”开头的 URL）
电子邮件地址（任何有效的电子邮件）
另一个 URL
（虚假/thrash/等）

第四个并不重要。

我需要根据地址类型（FB API/发送电子邮件/发布表单）执行不同的操作。我正在考虑只存储它是什么类型，但我首先想问是否有一些正则表达式或类似的东西来知道它是什么类型。

第一个很简单，只需检查它是否以“http://www.facebook.com”开头即可。对于其他人，我考虑寻找像“http://”或“@”这样的标记，但后来我认为两者都可以包含这两者。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

薄凉少年不暖心 2024-11-08 08:02:46

首先，@zespri 的评论是正确的 - 这是存储实际类型的更好的设计。即使您使用我在下面建议的正则表达式，将来仍然可能会出现问题。

但是，是的，在这种情况下可以使用正则表达式：

以下正则表达式是典型的电子邮件检测器。它比仅使用“@”符号更安全：

([a-zA-Z]+[a-zA-Z0-9._+\-]{3,}(?:@|%40)[a-zA-Z0-9]+[a-zA-Z0-9\.\-]?(?:\.[a-zA-Z]+)+)

以下三个可查找 Facebook 个人资料和页面。
您可以去掉后缀，只保留 Facebook 域名，或者进行一些进一步的研究和编辑以限制其他类型的 Facebook 资源：

facebook\.(?:com?\.|net\.)?[a-z]{2,3}/.+\?id=(\d+)
facebook\.(?:com?\.|net\.)?[a-z]{2,3}/p\.php.+i=(\d+)
facebook\.(?:com?\.|net\.)?[a-z]{2,3}/(\w[\w\.\-]+\w)(?:$|[/\?#])

避免使用“http://www”。前缀 - 你永远不知道可以使用什么子域，而且它们经常被省略。
另请注意，Facebook 的顶级域名 (TLD) 不仅仅是 .com

对于“其他”URL，您可以只查找锚点

^https?://

It's unclear from your question whether users enter these into your system, or whether it's done in an uncontrolled manner. Note that people often omit the http prefix, so this isn't really a reliable way to detect URLs.

如果您正在寻找 HTML 页面中作为链接的 URL，则可以通过搜索锚点来更可靠地检测到它们：

<a\s+(?:.*?)href=['"]?(https?://[^'^"^\s]+)(?:.*?)>

First, @zespri is correct in his comment - it's a much better design to store the actual type. Even if you use the regular expressions I suggest below, things could still break in the future.

But yes, it's possible to use regex in this case:

The following regex is the quintessential email detector. It's much safer to use than just an '@' sign:

([a-zA-Z]+[a-zA-Z0-9._+\-]{3,}(?:@|%40)[a-zA-Z0-9]+[a-zA-Z0-9\.\-]?(?:\.[a-zA-Z]+)+)

The following three find facebook profiles and pages.
You can get rid of the suffix to stay with just the facebook domain(s), or do some further research and edits to limit to other kinds of facebook resources:

facebook\.(?:com?\.|net\.)?[a-z]{2,3}/.+\?id=(\d+)
facebook\.(?:com?\.|net\.)?[a-z]{2,3}/p\.php.+i=(\d+)
facebook\.(?:com?\.|net\.)?[a-z]{2,3}/(\w[\w\.\-]+\w)(?:$|[/\?#])

Avoid the 'http://www.' prefix - you never know what subdomain may be used, plus they're often omitted.
Also note that there are more tld's to facebook than just the .com

For 'other' URLs, you could just look for the anchor

^https?://

If you're looking for URLs as links within HTML pages they can be more reliably detected by searching for anchors:

<a\s+(?:.*?)href=['"]?(https?://[^'^"^\s]+)(?:.*?)>

回复收藏 0 原文

~没有更多了~

关于作者

此刻的回忆

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

如何知道文本字符串是 facebook url、电子邮件地址还是其他 uri？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

離殇

小姐丶请自重

Aik

国产ˉ祖宗

猥琐帝

半仙

友情链接

如何知道文本字符串是 facebook url、电子邮件地址还是其他 uri？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

離殇

小姐丶请自重

Aik

国产ˉ祖宗

猥琐帝

半仙

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。