C# 中 URL 的顶级域
我为此使用 C# 和 ASP.NET。
我们在 IIS 6.0 服务器上收到很多“奇怪”的请求,我想按域记录和编目这些请求。
例如。我们收到一些奇怪的请求,例如:
- http://www.poker.winner4ever.example.com/< /a>
- http://www.hotgirls.example.com/
- http://santaclaus.example.com/
- http://m.example.com/
- http://wap.example.com /
- http://iphone.example.com/
后三个有点明显,但是我想将它们全部分类为一个,因为“example.com”托管在我们的服务器上。其余的不是,抱歉:-)
所以我正在寻找一些关于如何从上面检索 example.com 的好主意。其次,我想将 m.、wap.、iphone 等匹配到一个组中,但这可能只是在移动快捷方式列表中快速查找。我可以手动编写此列表作为开始。
但是正则表达式是这里的答案还是纯字符串操作是最简单的方法?我正在考虑用“.”“分割”URL 字符串。以及寻找 item[0] 和 item[1]...
有什么想法吗?
I am using C# and ASP.NET for this.
We receive a lot of "strange" requests on our IIS 6.0 servers and I want to log and catalog these by domain.
Eg. we get some strange requests like these:
- http://www.poker.winner4ever.example.com/
- http://www.hotgirls.example.com/
- http://santaclaus.example.com/
- http://m.example.com/
- http://wap.example.com/
- http://iphone.example.com/
the latter three are kinda obvious, but I would like to sort them all into one as "example.com" IS hosted on our servers. The rest isn't, sorry :-)
So I am looking for some good ideas for how to retrieve example.com from the above. Secondly I would like to match the m., wap., iphone etc into a group, but that's probably just a quick lookup in a list of mobile shortcuts.I could handcode this list for a start.
But is regexp the answer here or is pure string manipulation the easiest way? I was thinking of "splitting" the URL string by "." and the look for item[0] and item[1]...
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
您可以使用以下 nuget Nager.PublicSuffix 包。它使用与浏览器供应商相同的数据源。
nuget
示例
You can use the following nuget Nager.PublicSuffix package. It uses the same data source that browser vendors use.
nuget
Example
以下代码使用
Uri
class 获取主机名,然后通过按句点分割主机名从Uri.Host
获取二级主机(examplecompany.com)。The following code uses the
Uri
class to obtain the host name, and then obtains the second level host (examplecompany.com) fromUri.Host
by splitting the host name on periods.可能有一些示例,这会返回与所需内容不同的内容,但国家/地区代码是唯一包含 2 个字符的代码,并且它们可能有也可能没有通常使用的短第二级(2 或 3 个字符)。因此,在大多数情况下,这将为您提供您想要的:
There may be some examples where this returns something other than what is desired, but country codes are the only ones that are 2 characters, and they may or may not have a short second level (2 or 3 characters) typically used. Therefore, this will give you what you want in most cases:
如果没有不同域级别的最新数据库,这是不可能的。
考虑一下:
那么您希望在哪个级别获得域名?这完全取决于
TLD
、SLD
、ccTLD
...因为ccTLD
在他们可能控制的国家/地区定义您不知道的非常特殊的SLD
。This is not possible without a up-to-date database of different domain levels.
Consider:
Then at which level you want to get the domain? It's completely depends of the
TLD
,SLD
,ccTLD
... becauseccTLD
in under control of countries they may define very specialSLD
which is unknown to you.我编写了一个用于 .NET 2+ 的 库 来帮助选择域URL 的组成部分。
更多详细信息位于 github 上,但与以前的选项相比,它的一个好处是它可以从 http://publicsuffix.org 下载最新数据自动(每月一次),因此库的输出应该或多或少与网络浏览器用于建立域安全边界的输出相当(即相当不错)。
它还不完美,但适合我的需求,并且不需要花费太多工作来适应其他用例,因此如果您愿意,请分叉并发送拉取请求。
I've written a library for use in .NET 2+ to help pick out the domain components of a URL.
More details are on github but one benefit over previous options is that it can download the latest data from http://publicsuffix.org automatically (once per month) so the output from the library should be more-or-less on a par with the output used by web browsers to establish domain security boundaries (i.e. pretty good).
It's not perfect yet but suits my needs and shouldn't take much work to adapt to other use cases so please fork and send a pull request if you want.
使用正则表达式:
这将匹配以您感兴趣的 TLD 结尾的任何 URL。将列表扩展到您想要的任意数量。此外,捕获组将分别包含子域、主机名和 TLD。
Use a regular expression:
This will match any URL ending with a TLD in which you are interested. Extend the list for as many as you want. Further, the capturing groups will contain the subdomain, hostname and TLD respectively.
返回“.com”
Uri uri = new Uri("http://stackoverflow.com/questions/4643227/top-level-domain-from-url-in-c");
返回“.co.jp”
Uri uri = new Uri("http://stackoverflow.co.jp");
返回“.s1.moh.gov.cn”
Uri uri = new Uri("http://stackoverflow.s1.moh.gov.cn");
等
returns ".com" for
Uri uri = new Uri("http://stackoverflow.com/questions/4643227/top-level-domain-from-url-in-c");
returns ".co.jp" for
Uri uri = new Uri("http://stackoverflow.co.jp");
returns ".s1.moh.gov.cn" for
Uri uri = new Uri("http://stackoverflow.s1.moh.gov.cn");
etc.