去 PHP parse_url() 不去的地方 - 仅解析域名
PHP 的 parse_url() 有一个主机字段,其中包括完整的主机。 我正在寻找最可靠(且成本最低)的方法来仅返回域名和 TLD。
给出示例:
- http://www.google.com/foo,parse_url() 返回 www.google .com 表示主机
- http://www.google.co.uk/foo,parse_url()为主机返回 www.google.co.uk
我只查找 google.com 或 google.co.uk。 我考虑了一张有效顶级域名/后缀表,并且只允许使用这些和一个单词。 你会用其他方式做吗? 有谁知道针对此类事情的预装有效正则表达式?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
类似的事情怎么样?
将使用经典的
parse_url
提取域名,然后查找没有任何子域的有效域(www 是子域)。 不适用于“localhost”之类的东西。 如果不匹配任何内容,将返回 false。//编辑:
尝试一下:
它应该返回:
当然,如果没有通过
parse_url
,因此请确保它是格式正确的 URL。// 附录:
Alnitak 是对的。 上面提出的解决方案适用于大多数情况,但不一定适用于所有情况,并且需要进行维护,以确保它们不是带有 .morethan6 个字符等的新 TLD。 提取域名的唯一可靠方法是使用维护的列表,例如 http://publicsuffix.org/。 一开始会比较痛苦,但从长远来看会更容易、更稳健。 您需要确保了解每种方法的优缺点以及它如何适合您的项目。
How about something like that?
Will extract the domain name using the classic
parse_url
and then look for a valid domain without any subdomain (www being a subdomain). Won't work on things like 'localhost'. Will return false if it didn't match anything.// Edit:
Try it out with:
And it should return:
Of course, it won't return anything if it doesn't get through
parse_url
, so make sure it's a well-formed URL.// Addendum:
Alnitak is right. The solution presented above will work in most cases but not necessarily all and needs to be maintained to make sure, for example, that their aren't new TLD with .morethan6characters and so on. The only reliable way of extracting the domain is to use a maintained list such as http://publicsuffix.org/. It's more painful at first but easier and more robust on the long-term. You need to make sure you understand the pros and cons of each method and how it fits with your project.
目前,唯一“正确”的方法是使用一个列表,例如 http://publicsuffix.org/< 中维护的列表/a>
顺便说一句,这个问题也几乎是重复的:
IETF 正在开展标准化工作,着眼于声明 DNS 树中的特定节点是否用于“公共”注册的 DNS 方法,但它们仍处于开发的早期阶段。 所有流行的非 IE 浏览器都使用 publicsuffix.org 列表。
Currently the only "right" way to do this is to use a list such as that maintained at http://publicsuffix.org/
BTW, this question is also pretty much a duplicate of:
There are standardisation efforts at IETF looking at DNS methods of declaring whether a particular node in the DNS tree is used for "public" registrations, but they're in their early stages of development. All of the popular non-IE browsers use the publicsuffix.org list.
Python 的 tldextract 模块还有一个非常好的移植 http://w-shadow。 com/blog/2012/08/28/tldextract - 这超出了 parse_url 的范围,并允许您实际获取域/tld,而不需要子域。
来自模块网站:
There is also a very nice port of Python's tldextract module http://w-shadow.com/blog/2012/08/28/tldextract - this goes beyond parse_url and allows you to actually get the domain/tld out, without the subdomain.
From the module website:
从相关帖子中挖出此内容,以了解保留表格的想法: http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/src/ effective_tld_names.dat?raw=1
但我宁愿不这样做。
Dug this up from a related post, for the idea of keeping a table: http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/src/effective_tld_names.dat?raw=1
I'd rather not do that though.
您需要使用公共后缀列表的软件包,只有这样您才能正确提取具有两级、三级TLD的域名(co.uk、a.bg、b.bg 等)和多级子域。 正则表达式、parse_url() 或字符串函数永远不会产生绝对正确的结果。
我建议使用 TLD 提取。 这里是代码示例:
You need package that uses Public Suffix List, only in this way you can correctly extract domains with two-, third-level TLDs (co.uk, a.bg, b.bg, etc.) and multilevel subdomains. Regex, parse_url() or string functions will never produce absolutely correct result.
I recomend use TLD Extract. Here example of code:
当然,这取决于您的具体用例,但一般来说,我不会对 TLD 使用表查找。 新 TLD 出现后,您通常不想在任何地方维护它们。 只需询问我的 [email protected] 因短视而被拒绝的频率。
我想如果我知道你为什么不想要 www,我可以提供更好的帮助? 您需要它来发送电子邮件吗? 在这种情况下,您可以查询 MX 记录以验证它(最终)接受邮件。
您还可以找到有关处理 DNS 记录的 PHP 函数的帮助,以了解有关它们的更多信息,请参阅 http://php.net例如 /dns_get_record。
Of course it depends on your specific use case, but generally speaking I would not use a table lookup for TLDs. New TLDs come out and you usually don't want to maintain them anywhere. Just ask me how often my [email protected] has been rejected because of shortsightedness.
I guess I could help better if I knew why you not want the www? Do you need it for emails? You can query for MX records in such cases to verify it (eventually) accepts mails.
You may also find help with PHP functions dealing with DNS records to find out more information about them, see http://php.net/dns_get_record for example.
只是一个证明,假设允许的顶级域名被存储到哈希中。
代码可以缩短很多。
Just a proof, assuming the allowed tlds are memorized into an hash.
The code can be shortened a lot.
有一个非常简单的解决方案:
这肯定有效吗?
There is a really easy solution to this:
Surely this will work?