uri.Host 怎么可能抛出 UriFormatException?

发布于 2024-09-28 09:22:45 字数 893 浏览 6 评论 0原文

foreach (var node in root.Find("a[href]"))
{
    var href = node.Attributes["href"].Value;
    Uri uri;
    try
    {
        uri = new Uri(item.Value.Uri, href);
    }
    catch(UriFormatException)
    {
        continue;
    }
    // *snip*
    try
    {
        if (_imageHosts.IsMatch(uri.Host)) // <--- problematic line
            priority--;
    }catch(UriFormatException)
    {
        MessageBox.Show(uri.OriginalString); // <--- gets displayed when I expected it wouldn't
        continue;
    }
    // *snip*
}

消息框显示的地址如下

mailto: 网站管理员 [@] somehost ?网站管理员

这显然是畸形的,但我不明白的是为什么它没有被 first catch 块捕获?

MSDN 说它只能抛出 InvalidOperationException。这是很有问题的,因为这意味着我的应用程序随时可能爆炸!

[[剪]]

foreach (var node in root.Find("a[href]"))
{
    var href = node.Attributes["href"].Value;
    Uri uri;
    try
    {
        uri = new Uri(item.Value.Uri, href);
    }
    catch(UriFormatException)
    {
        continue;
    }
    // *snip*
    try
    {
        if (_imageHosts.IsMatch(uri.Host)) // <--- problematic line
            priority--;
    }catch(UriFormatException)
    {
        MessageBox.Show(uri.OriginalString); // <--- gets displayed when I expected it wouldn't
        continue;
    }
    // *snip*
}

The message box shows up with an address like

mailto: webmaster [ @ ] somehost ?webmaster

Which is obviously malformed, but what I don't get is why it wasn't caught by the first catch block?

MSDN says it can only throw an InvalidOperationException. This is quite problematic, because it means my app can explode at any time then!

[[snip]]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

稀香 2024-10-05 09:22:45

首先,我想说使用 Exception 来检查有效性并不是一个好主意,因为您可以使用 Uri.TryCreate 方法。因此,您可以重写代码,而不依赖于它可以引发和捕获异常。

所以最好将您的更改

Uri uri;
try
{
    uri = new Uri(item.Value.Uri, href);
}
catch(UriFormatException)
{
    continue;
}

Uri uri;
if (!Uri.TryCreate(item.Value.Uri, href, out uri)) continue;

但这无论如何都不是完整的检查。

至于你的问题,答案比较简单。你假设格式错误是错误的:

mailto: 网站管理员 [@] somehost ?网站管理员

URI 是 统一资源标识符 所以它的 基本语法

{方案名称} : {层次结构部分} [ ? {查询}] [# {片段}]

显然对您的输入有效。您将以“mailto:”方案的资源 URI 结束。

当您尝试访问 Host 属性时,您假设资源是 Http,但默认使用的“mailto”方案解析器无法解析主机组件的原始字符串,因此引发异常。

因此,要正确编写支票,您必须稍微修改代码:

Uri uri;
if (!Uri.TryCreate(item.Value.Uri, href, out uri)) continue;

if (uri.Scheme != Uri.UriSchemeHttp && uri.Scheme != Uri.UriSchemeHttps) continue;

阅读有关 UriParser


这里根据@Mark评论进行更新。

我很确定当我尝试获取 AbsoluteUri 属性时它也抛出了异常......为什么会失败?

您无法通过方案检查,因为它将是“mailto”。所以这里快速测试:

        var baseUri = new Uri("http://localhost");
        const string href = "mailto: webmaster [ @ ] somehost ?webmaster";

        Uri uri;
        if (!Uri.TryCreate(baseUri,href, out uri)) 
        {
            Console.WriteLine("Can't create");
            return;
        }

        if (uri.Scheme != Uri.UriSchemeHttp && uri.Scheme != Uri.UriSchemeHttps)
        {
            Console.WriteLine("Wrong scheme");
            return;
        }

        Console.WriteLine("Testing uri: {0}", uri);

它以“错误的方案”结束。也许我没有正确理解你的意思?

当您将 href 更改为:

        const string href = "http: webmaster [ @ ] somehost ?webmaster";

它正确传递时,自动将 uri 转义为:

http://localhost/%20webmaster%20 %5B%20@%20%5D%20somehost%20?网站管理员

此外,所有 uri 的组件都将可供您使用。

我尝试在以下第一部分中解释主要问题:

在我看来,您错误地将任何统一资源标识符视为基于http(s)的url,但这是错误的。 mailto:[电子邮件受保护]gopher://gopher.hprc.utoronto.ca/myreshandler://something@somewhere 也是可以成功解析的有效 URI。查看官方 IANA 注册方案

Uri 构造函数行为符合预期且正确。

它尝试验证 已知方案 的传入 URI:

  • UriSchemeFile - 指定 URI 是指向文件的指针。
  • UriSchemeFtp - 指定通过文件传输协议 (FTP) 访问 URI。
  • UriSchemeGopher - 指定通过 Gopher 协议访问 URI。
  • UriSchemeHttp - 指定通过 Gopher 协议访问 URI 。超文本传输​​协议 (HTTP)
  • UriSchemeHttps - 指定通过安全超文本传输​​协议 (HTTPS) 访问 URI。
  • UriSchemeMailto - 指定 URI 是电子邮件地址,并通过简单网络邮件协议 (SNMP) 访问。
  • UriSchemeNews - 指定 URI 是 Internet 新闻组,并通过网络新闻传输协议 (NNTP) 访问。
  • UriSchemeNntp - 指定 URI 是 Internet 新闻组,并通过网络新闻传输协议 (NNTP) 访问)

当方案未知时,使用基本 URI 解析器(请参阅 URI 方案通用语法)。


基本上,Uri.TryCreate() 和方案检查足以获取可以传递给 .NET HttpWebRequest 的链接。您实际上不需要检查它们是否格式良好。如果链接不好(格式不正确或不存在),当尝试请求它们时,您只会收到相应的 HttpError 。

至于你的例子:

http://www.google.com/search?q=cheesy

它通过了我的检查并变为:

http://www.google.com/search?q=cheesy%20poof< /a>


您不需要检查它是否格式良好。只需进行基础检查并尝试请求即可。希望有帮助。


此外,字符串 mailto: webmaster [ @ ] somehost ?webmaster 格式错误。我的字面意思是,那个字符串,带有愚蠢的 [] 和其中的所有内容

这个字符串是格式错误,意思是不是格式良好(因为包含根据RFC 2396),但由于一致性,它仍然可以被视为有效 URI 方案的通用语法(另请检查使用 http: 创建时它是如何转义的)。

First of all, I want to say its no so good idea to use Exception for checking validity because you can use Uri.TryCreate method. So you can rewrite your code and not rely it on which exception can be thrown and catched.

So better change your

Uri uri;
try
{
    uri = new Uri(item.Value.Uri, href);
}
catch(UriFormatException)
{
    continue;
}

to

Uri uri;
if (!Uri.TryCreate(item.Value.Uri, href, out uri)) continue;

But this is not full check anyway.

As for your question, answer is relatively simple. You are wrong assuming malformed:

mailto: webmaster [ @ ] somehost ?webmaster

URI is Uniform Resource Identifier so its basic syntax

{scheme name} : {hierarchical part} [ ? {query} ] [ # {fragment} ]

obviously valid for your input. You are end with resource's URI with "mailto:" scheme.

When you try to access Host property you assume resource was Http, but "mailto"-scheme parser used by default can't parse original string for host component and hence raised exception.

So to write your check correctly you have to modify your code a bit:

Uri uri;
if (!Uri.TryCreate(item.Value.Uri, href, out uri)) continue;

if (uri.Scheme != Uri.UriSchemeHttp && uri.Scheme != Uri.UriSchemeHttps) continue;

Read some info about UriParser


Here update based on @Mark comments.

I'm pretty sure it threw an exception when I tried to get the AbsoluteUri property too..why should that fail?

You can't pass Scheme check since it will be "mailto". So here quick test:

        var baseUri = new Uri("http://localhost");
        const string href = "mailto: webmaster [ @ ] somehost ?webmaster";

        Uri uri;
        if (!Uri.TryCreate(baseUri,href, out uri)) 
        {
            Console.WriteLine("Can't create");
            return;
        }

        if (uri.Scheme != Uri.UriSchemeHttp && uri.Scheme != Uri.UriSchemeHttps)
        {
            Console.WriteLine("Wrong scheme");
            return;
        }

        Console.WriteLine("Testing uri: {0}", uri);

It ends with "Wrong scheme". Maybe I don't understand you correctly?

When you change href to:

        const string href = "http: webmaster [ @ ] somehost ?webmaster";

It passed correctly, automatically escaping uri to:

http://localhost/%20webmaster%20%5B%20@%20%5D%20somehost%20?webmaster

also all uri's components will be available to you.

The main problem I try to explain in first part following:

It seems to me you incorrectly treats any Uniform Resource Identifier as http(s) based url, but this is wrong. mailto:[email protected] or gopher://gopher.hprc.utoronto.ca/ or myreshandler://something@somewhere also valid URI which can be succesfully parsed. Take a look on Official IANA-registered schemes

So

Uri constructor behaviour is expected and correct.

it tries validate incoming URI for known schemes:

  • UriSchemeFile - Specifies that the URI is a pointer to a file.
  • UriSchemeFtp - Specifies that the URI is accessed through the File Transfer Protocol (FTP).
  • UriSchemeGopher - Specifies that the URI is accessed through the Gopher protocol.
  • UriSchemeHttp - Specifies that the URI is accessed through the Hypertext Transfer Protocol (HTTP)
  • UriSchemeHttps - Specifies that the URI is accessed through the Secure Hypertext Transfer Protocol (HTTPS).
  • UriSchemeMailto - Specifies that the URI is an email address and is accessed through the Simple Network Mail Protocol (SNMP).
  • UriSchemeNews - Specifes that the URI is an Internet news group and is accessed through the Network News Transport Protocol (NNTP).
  • UriSchemeNntp - Specifies that the URI is an Internet news group and is accessed through the Network News Transport Protocol (NNTP)

Basic URI parser is used when scheme is not known (see URI scheme generic syntax) .


Basicly Uri.TryCreate() and scheme checks enough to get links which can be passed to .NET HttpWebRequest for example. You don't reallyneed check whether they well-formed or no. If links are bad (not well-formed or don't exists) you just get corresponded HttpError when try to request them.

As for your example:

http://www.google.com/search?q=cheesy poof

it passes my check and becomes:

http://www.google.com/search?q=cheesy%20poof

You don't need to check is it well-formed or no. Just do base checks and try request. Hope it helps.


Also, the string mailto: webmaster [ @ ] somehost ?webmaster is malformed. I literally mean, that string, with the stupid []s and everything in it

This string is malformed by meaning is not well-formed (since contains excluded characters according RFC 2396) but it still can be considered as valid due to conformance generic syntax of URI scheme (check also how it escaped when created with http:).

百合的盛世恋 2024-10-05 09:22:45

如果您深入研究 Uri.Host 属性(真正深入),它最终可以调用静态函数 GetException ,该函数返回不同的 UriFormatException 对象。无效 URI 的条件。打印出您获得的完整 UriFormatException 并将其与 Uri.GetException 生成的异常进行比较。您可能会从中获得更多详细信息。

If you dig deep into the Uri.Host property (real deep), it can eventually call a static function GetException which returns UriFormatException objects for different conditions of invalid URIs. Print out the full UriFormatException you are getting and compare it to the ones generated by Uri.GetException. You might get more details out of it.

浸婚纱 2024-10-05 09:22:45

根据尼克的回答:

private static readonly string[] SupportedSchmes = { Uri.UriSchemeHttp, Uri.UriSchemeHttps, Uri.UriSchemeFtp, Uri.UriSchemeFile };

private static bool TryCreateUri(string uriString, out Uri result)
{
    return Uri.TryCreate(uriString, UriKind.Absolute, out result) && SupportedSchmes.Contains(result.Scheme);
}

private static bool TryCreateUri(Uri baseAddress, string relativeAddress, out Uri result)
{
    return Uri.TryCreate(baseAddress, relativeAddress, out result) && SupportedSchmes.Contains(result.Scheme);
}

Based on Nick's answer:

private static readonly string[] SupportedSchmes = { Uri.UriSchemeHttp, Uri.UriSchemeHttps, Uri.UriSchemeFtp, Uri.UriSchemeFile };

private static bool TryCreateUri(string uriString, out Uri result)
{
    return Uri.TryCreate(uriString, UriKind.Absolute, out result) && SupportedSchmes.Contains(result.Scheme);
}

private static bool TryCreateUri(Uri baseAddress, string relativeAddress, out Uri result)
{
    return Uri.TryCreate(baseAddress, relativeAddress, out result) && SupportedSchmes.Contains(result.Scheme);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文