C# 中 URL 的顶级域

发布于 2024-10-11 18:43:23 字数 1062 浏览 13 评论 0 原文

我为此使用 C# 和 ASP.NET。

我们在 IIS 6.0 服务器上收到很多“奇怪”的请求,我想按域记录和编目这些请求。

例如。我们收到一些奇怪的请求,例如:

后三个有点明显,但是我想将它们全部分类为一个,因为“example.com”托管在我们的服务器上。其余的不是,抱歉:-)

所以我正在寻找一些关于如何从上面检索 example.com 的好主意。其次,我想将 m.、wap.、iphone 等匹配到一个组中,但这可能只是在移动快捷方式列表中快速查找。我可以手动编写此列表作为开始。

但是正则表达式是这里的答案还是纯字符串操作是最简单的方法?我正在考虑用“.”“分割”URL 字符串。以及寻找 item[0] 和 item[1]...

有什么想法吗?

I am using C# and ASP.NET for this.

We receive a lot of "strange" requests on our IIS 6.0 servers and I want to log and catalog these by domain.

Eg. we get some strange requests like these:

the latter three are kinda obvious, but I would like to sort them all into one as "example.com" IS hosted on our servers. The rest isn't, sorry :-)

So I am looking for some good ideas for how to retrieve example.com from the above. Secondly I would like to match the m., wap., iphone etc into a group, but that's probably just a quick lookup in a list of mobile shortcuts.I could handcode this list for a start.

But is regexp the answer here or is pure string manipulation the easiest way? I was thinking of "splitting" the URL string by "." and the look for item[0] and item[1]...

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

深者入戏 2024-10-18 18:43:23

您可以使用以下 nuget Nager.PublicSuffix 包。它使用与浏览器供应商相同的数据源。

nuget

PM> Install-Package Nager.PublicSuffix

示例

var ruleProvider = new LocalFileRuleProvider("public_suffix_list.dat");
await ruleProvider.BuildAsync();

var domainParser = new DomainParser(ruleProvider);
    
var domainInfo = domainParser.Parse("sub.test.co.uk");
//domainInfo.Domain = "test";
//domainInfo.Hostname = "sub.test.co.uk";
//domainInfo.RegistrableDomain = "test.co.uk";
//domainInfo.SubDomain = "sub";
//domainInfo.TLD = "co.uk";

You can use the following nuget Nager.PublicSuffix package. It uses the same data source that browser vendors use.

nuget

PM> Install-Package Nager.PublicSuffix

Example

var ruleProvider = new LocalFileRuleProvider("public_suffix_list.dat");
await ruleProvider.BuildAsync();

var domainParser = new DomainParser(ruleProvider);
    
var domainInfo = domainParser.Parse("sub.test.co.uk");
//domainInfo.Domain = "test";
//domainInfo.Hostname = "sub.test.co.uk";
//domainInfo.RegistrableDomain = "test.co.uk";
//domainInfo.SubDomain = "sub";
//domainInfo.TLD = "co.uk";
九局 2024-10-18 18:43:23

以下代码使用 Uri class 获取主机名,然后通过按句点分割主机名从 Uri.Host 获取二级主机(examplecompany.com)。

var uri = new Uri("http://www.poker.winner4ever.examplecompany.com/");
var splitHostName = uri.Host.Split('.');
if (splitHostName.Length >= 2)
{
    var secondLevelHostName = splitHostName[splitHostName.Length - 2] + "." +
                              splitHostName[splitHostName.Length - 1];
}

The following code uses the Uri class to obtain the host name, and then obtains the second level host (examplecompany.com) from Uri.Host by splitting the host name on periods.

var uri = new Uri("http://www.poker.winner4ever.examplecompany.com/");
var splitHostName = uri.Host.Split('.');
if (splitHostName.Length >= 2)
{
    var secondLevelHostName = splitHostName[splitHostName.Length - 2] + "." +
                              splitHostName[splitHostName.Length - 1];
}
绮烟 2024-10-18 18:43:23

可能有一些示例,这会返回与所需内容不同的内容,但国家/地区代码是唯一包含 2 个字符的代码,并且它们可能有也可能没有通常使用的短第二级(2 或 3 个字符)。因此,在大多数情况下,这将为您提供您想要的:

string GetRootDomain(string host)
{
    string[] domains = host.Split('.');

    if (domains.Length >= 3)
    {
        int c = domains.Length;
        // handle international country code TLDs 
        // www.amazon.co.uk => amazon.co.uk
        if (domains[c - 1].Length < 3 && domains[c - 2].Length <= 3)
            return string.Join(".", domains, c - 3, 3);
        else
            return string.Join(".", domains, c - 2, 2);
    }
    else
        return host;
}

There may be some examples where this returns something other than what is desired, but country codes are the only ones that are 2 characters, and they may or may not have a short second level (2 or 3 characters) typically used. Therefore, this will give you what you want in most cases:

string GetRootDomain(string host)
{
    string[] domains = host.Split('.');

    if (domains.Length >= 3)
    {
        int c = domains.Length;
        // handle international country code TLDs 
        // www.amazon.co.uk => amazon.co.uk
        if (domains[c - 1].Length < 3 && domains[c - 2].Length <= 3)
            return string.Join(".", domains, c - 3, 3);
        else
            return string.Join(".", domains, c - 2, 2);
    }
    else
        return host;
}
深白境迁sunset 2024-10-18 18:43:23

如果没有不同域级别的最新数据库,这是不可能的。

考虑一下:

s1.moh.gov.cn
moh.gov.cn
s1.google.com
google.com

那么您希望在哪个级别获得域名?这完全取决于TLDSLDccTLD...因为ccTLD在他们可能控制的国家/地区定义您不知道的非常特殊的SLD

This is not possible without a up-to-date database of different domain levels.

Consider:

s1.moh.gov.cn
moh.gov.cn
s1.google.com
google.com

Then at which level you want to get the domain? It's completely depends of the TLD, SLD, ccTLD... because ccTLD in under control of countries they may define very special SLD which is unknown to you.

从﹋此江山别 2024-10-18 18:43:23

我编写了一个用于 .NET 2+ 的 来帮助选择域URL 的组成部分。

更多详细信息位于 github 上,但与以前的选项相比,它的一个好处是它可以从 http://publicsuffix.org 下载最新数据自动(每月一次),因此库的输出应该或多或少与网络浏览器用于建立域安全边界的输出相当(即相当不错)。

它还不完美,但适合我的需求,并且不需要花费太多工作来适应其他用例,因此如果您愿意,请分叉并发送拉取请求。

I've written a library for use in .NET 2+ to help pick out the domain components of a URL.

More details are on github but one benefit over previous options is that it can download the latest data from http://publicsuffix.org automatically (once per month) so the output from the library should be more-or-less on a par with the output used by web browsers to establish domain security boundaries (i.e. pretty good).

It's not perfect yet but suits my needs and shouldn't take much work to adapt to other use cases so please fork and send a pull request if you want.

你如我软肋 2024-10-18 18:43:23

使用正则表达式:

^https?://([\w./]+[^.])?\.?(\w+\.(com)|(co.uk)|(com.au))$

这将匹配以您感兴趣的 TLD 结尾的任何 URL。将列表扩展到您想要的任意数量。此外,捕获组将分别包含子域、主机名和 TLD。

Use a regular expression:

^https?://([\w./]+[^.])?\.?(\w+\.(com)|(co.uk)|(com.au))$

This will match any URL ending with a TLD in which you are interested. Extend the list for as many as you want. Further, the capturing groups will contain the subdomain, hostname and TLD respectively.

无语# 2024-10-18 18:43:23
uri.Host.ToLower().Replace("www.","").Substring(uri.Host.ToLower().Replace("www.","").IndexOf('.'))
  • 返回“.com”

    Uri uri = new Uri("http://stackoverflow.com/questions/4643227/top-level-domain-from-url-in-c");

  • 返回“.co.jp”
    Uri uri = new Uri("http://stackoverflow.co.jp");

  • 返回“.s1.moh.gov.cn”
    Uri uri = new Uri("http://stackoverflow.s1.moh.gov.cn");

uri.Host.ToLower().Replace("www.","").Substring(uri.Host.ToLower().Replace("www.","").IndexOf('.'))
  • returns ".com" for

    Uri uri = new Uri("http://stackoverflow.com/questions/4643227/top-level-domain-from-url-in-c");

  • returns ".co.jp" for
    Uri uri = new Uri("http://stackoverflow.co.jp");

  • returns ".s1.moh.gov.cn" for
    Uri uri = new Uri("http://stackoverflow.s1.moh.gov.cn");

etc.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文