如何使用java识别URL对象的顶级域?

发布于 2024-08-18 20:57:19 字数 88 浏览 8 评论 0原文

鉴于此:

URL u=new URL("someURL");

我如何识别 URL 的顶级域..

Given this :

URL u=new URL("someURL");

How do i identify the top level domain of the URL..

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

遥远的她 2024-08-25 20:57:19

Guava 为此提供了一个很好的实用程序。它的工作原理如下:

InternetDomainName.from("someurl.co.uk").publicSuffix() 将为您提供 co.uk
InternetDomainName.from("someurl.de").publicSuffix() 将为您提供 de

Guava provides a nice utility for this. It works as follow:

InternetDomainName.from("someurl.co.uk").publicSuffix() will get you co.uk
InternetDomainName.from("someurl.de").publicSuffix() will get you de

夜夜流光相皎洁 2024-08-25 20:57:19

那么您只想拥有顶级域部分

//parameter urlString: a String
//returns: a String representing the TLD of urlString, or null iff urlString is malformed
private String getTldString(String urlString) {
    URL url = null;
    String tldString = null;
    try {
        url = new URL(urlString);
        String[] domainNameParts = url.getHost().split("\\.");
        tldString = domainNameParts[domainNameParts.length-1];
    }
    catch (MalformedURLException e) {   
    }

    return tldString;
}

我们来测试一下吧!

@Test 
public void identifyLocale() {
    String ukString = "http://www.amazon.co.uk/Harry-Potter-Sheet-Complete-Series/dp/0739086731";
    logger.debug("ukString TLD: {}", getTldString(ukString));

    String deString = "http://www.amazon.de/The-Essential-George-Gershwin/dp/B00008GEOT";
    logger.debug("deString TLD: {}", getTldString(deString));

    String ceShiString = "http://例子.测试";
    logger.debug("ceShiString TLD: {}", getTldString(ceShiString));

    String dokimeString = "http://παράδειγμα.δοκιμή";
    logger.debug("dokimeString TLD: {}", getTldString(dokimeString));

    String nullString = null;
    logger.debug("nullString TLD: {}", getTldString(nullString));

    String lolString = "lol, this is a malformed URL, amirite?!";
    logger.debug("lolString TLD: {}", getTldString(lolString));

}

输出:

ukString TLD: uk
deString TLD: de
ceShiString TLD: 测试
dokimeString TLD: δοκιμή
nullString TLD: null
lolString TLD: null

So you want to have the top-level domain part only?

//parameter urlString: a String
//returns: a String representing the TLD of urlString, or null iff urlString is malformed
private String getTldString(String urlString) {
    URL url = null;
    String tldString = null;
    try {
        url = new URL(urlString);
        String[] domainNameParts = url.getHost().split("\\.");
        tldString = domainNameParts[domainNameParts.length-1];
    }
    catch (MalformedURLException e) {   
    }

    return tldString;
}

Let's test it!

@Test 
public void identifyLocale() {
    String ukString = "http://www.amazon.co.uk/Harry-Potter-Sheet-Complete-Series/dp/0739086731";
    logger.debug("ukString TLD: {}", getTldString(ukString));

    String deString = "http://www.amazon.de/The-Essential-George-Gershwin/dp/B00008GEOT";
    logger.debug("deString TLD: {}", getTldString(deString));

    String ceShiString = "http://例子.测试";
    logger.debug("ceShiString TLD: {}", getTldString(ceShiString));

    String dokimeString = "http://παράδειγμα.δοκιμή";
    logger.debug("dokimeString TLD: {}", getTldString(dokimeString));

    String nullString = null;
    logger.debug("nullString TLD: {}", getTldString(nullString));

    String lolString = "lol, this is a malformed URL, amirite?!";
    logger.debug("lolString TLD: {}", getTldString(lolString));

}

Output:

ukString TLD: uk
deString TLD: de
ceShiString TLD: 测试
dokimeString TLD: δοκιμή
nullString TLD: null
lolString TLD: null
允世 2024-08-25 20:57:19

url 的主机部分符合 RFC 2732,根据 文档。这意味着简单地分割从中获得的字符串

  String host = u.getHost();

是不够的。您需要确保在搜索主机时符合 RFC 2732,或者如果您可以保证所有地址都是 server.com 形式,那么您可以搜索最后一个 .在字符串中并获取 tld。

The host part of the url conforms to RFC 2732 according to the docs. It would imply that simply splitting the string you get from

  String host = u.getHost();

would not be enough. You will need to ensure that you conform to the RFC 2732 when searching the host OR if you can guarantee that all addresses are of the form server.com then you can search for the last . in the string and grab the tld.

醉殇 2024-08-25 20:57:19

使用 URL# getHost() ,如有必要,此后为 String#split() 位于 “\\.” 上。

更新:如果您确实有一个IP地址作为主机,那么您需要使用InetAddress#getHostName() 独立。

Use URL#getHost() and if necessary thereafter a String#split() on "\\.".

Update: if you actually have an IP address as host, then you need to make use of InetAddress#getHostName() independently.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文