uri.Host 怎么可能抛出 UriFormatException?
foreach (var node in root.Find("a[href]"))
{
var href = node.Attributes["href"].Value;
Uri uri;
try
{
uri = new Uri(item.Value.Uri, href);
}
catch(UriFormatException)
{
continue;
}
// *snip*
try
{
if (_imageHosts.IsMatch(uri.Host)) // <--- problematic line
priority--;
}catch(UriFormatException)
{
MessageBox.Show(uri.OriginalString); // <--- gets displayed when I expected it wouldn't
continue;
}
// *snip*
}
消息框显示的地址如下
mailto: 网站管理员 [@] somehost ?网站管理员
这显然是畸形的,但我不明白的是为什么它没有被 first catch 块捕获?
MSDN 说它只能抛出 InvalidOperationException
。这是很有问题的,因为这意味着我的应用程序随时可能爆炸!
[[剪]]
foreach (var node in root.Find("a[href]"))
{
var href = node.Attributes["href"].Value;
Uri uri;
try
{
uri = new Uri(item.Value.Uri, href);
}
catch(UriFormatException)
{
continue;
}
// *snip*
try
{
if (_imageHosts.IsMatch(uri.Host)) // <--- problematic line
priority--;
}catch(UriFormatException)
{
MessageBox.Show(uri.OriginalString); // <--- gets displayed when I expected it wouldn't
continue;
}
// *snip*
}
The message box shows up with an address like
mailto: webmaster [ @ ] somehost ?webmaster
Which is obviously malformed, but what I don't get is why it wasn't caught by the first catch block?
MSDN says it can only throw an InvalidOperationException
. This is quite problematic, because it means my app can explode at any time then!
[[snip]]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
首先,我想说使用 Exception 来检查有效性并不是一个好主意,因为您可以使用 Uri.TryCreate 方法。因此,您可以重写代码,而不依赖于它可以引发和捕获异常。
所以最好将您的更改
为
但这无论如何都不是完整的检查。
至于你的问题,答案比较简单。你假设格式错误是错误的:
URI 是 统一资源标识符 所以它的 基本语法
显然对您的输入有效。您将以“mailto:”方案的资源 URI 结束。
当您尝试访问 Host 属性时,您假设资源是 Http,但默认使用的“mailto”方案解析器无法解析主机组件的原始字符串,因此引发异常。
因此,要正确编写支票,您必须稍微修改代码:
阅读有关 UriParser
这里根据@Mark评论进行更新。
您无法通过方案检查,因为它将是“mailto”。所以这里快速测试:
它以“错误的方案”结束。也许我没有正确理解你的意思?
当您将 href 更改为:
它正确传递时,自动将 uri 转义为:
此外,所有 uri 的组件都将可供您使用。
我尝试在以下第一部分中解释主要问题:
它尝试验证 已知方案 的传入 URI:
UriSchemeFile
- 指定 URI 是指向文件的指针。UriSchemeFtp
- 指定通过文件传输协议 (FTP) 访问 URI。UriSchemeGopher
- 指定通过 Gopher 协议访问 URI。UriSchemeHttp
- 指定通过 Gopher 协议访问 URI 。超文本传输协议 (HTTP)UriSchemeHttps
- 指定通过安全超文本传输协议 (HTTPS) 访问 URI。UriSchemeMailto
- 指定 URI 是电子邮件地址,并通过简单网络邮件协议 (SNMP) 访问。UriSchemeNews
- 指定 URI 是 Internet 新闻组,并通过网络新闻传输协议 (NNTP) 访问。UriSchemeNntp
- 指定 URI 是 Internet 新闻组,并通过网络新闻传输协议 (NNTP) 访问)当方案未知时,使用基本 URI 解析器(请参阅 URI 方案通用语法)。
基本上,
Uri.TryCreate()
和方案检查足以获取可以传递给 .NET HttpWebRequest 的链接。您实际上不需要检查它们是否格式良好。如果链接不好(格式不正确或不存在),当尝试请求它们时,您只会收到相应的 HttpError 。至于你的例子:
它通过了我的检查并变为:
您不需要检查它是否格式良好。只需进行基础检查并尝试请求即可。希望有帮助。
这个字符串是格式错误,意思是不是格式良好(因为包含根据RFC 2396),但由于一致性,它仍然可以被视为有效 URI 方案的通用语法(另请检查使用 http: 创建时它是如何转义的)。
First of all, I want to say its no so good idea to use Exception for checking validity because you can use Uri.TryCreate method. So you can rewrite your code and not rely it on which exception can be thrown and catched.
So better change your
to
But this is not full check anyway.
As for your question, answer is relatively simple. You are wrong assuming malformed:
URI is Uniform Resource Identifier so its basic syntax
obviously valid for your input. You are end with resource's URI with "mailto:" scheme.
When you try to access Host property you assume resource was Http, but "mailto"-scheme parser used by default can't parse original string for host component and hence raised exception.
So to write your check correctly you have to modify your code a bit:
Read some info about UriParser
Here update based on @Mark comments.
You can't pass Scheme check since it will be "mailto". So here quick test:
It ends with "Wrong scheme". Maybe I don't understand you correctly?
When you change href to:
It passed correctly, automatically escaping uri to:
also all uri's components will be available to you.
The main problem I try to explain in first part following:
So
it tries validate incoming URI for known schemes:
UriSchemeFile
- Specifies that the URI is a pointer to a file.UriSchemeFtp
- Specifies that the URI is accessed through the File Transfer Protocol (FTP).UriSchemeGopher
- Specifies that the URI is accessed through the Gopher protocol.UriSchemeHttp
- Specifies that the URI is accessed through the Hypertext Transfer Protocol (HTTP)UriSchemeHttps
- Specifies that the URI is accessed through the Secure Hypertext Transfer Protocol (HTTPS).UriSchemeMailto
- Specifies that the URI is an email address and is accessed through the Simple Network Mail Protocol (SNMP).UriSchemeNews
- Specifes that the URI is an Internet news group and is accessed through the Network News Transport Protocol (NNTP).UriSchemeNntp
- Specifies that the URI is an Internet news group and is accessed through the Network News Transport Protocol (NNTP)Basic URI parser is used when scheme is not known (see URI scheme generic syntax) .
Basicly
Uri.TryCreate()
and scheme checks enough to get links which can be passed to .NET HttpWebRequest for example. You don't reallyneed check whether they well-formed or no. If links are bad (not well-formed or don't exists) you just get corresponded HttpError when try to request them.As for your example:
it passes my check and becomes:
You don't need to check is it well-formed or no. Just do base checks and try request. Hope it helps.
This string is malformed by meaning is not well-formed (since contains excluded characters according RFC 2396) but it still can be considered as valid due to conformance generic syntax of URI scheme (check also how it escaped when created with http:).
如果您深入研究
Uri.Host
属性(真正深入),它最终可以调用静态函数GetException
,该函数返回不同的UriFormatException
对象。无效 URI 的条件。打印出您获得的完整UriFormatException
并将其与Uri.GetException
生成的异常进行比较。您可能会从中获得更多详细信息。If you dig deep into the
Uri.Host
property (real deep), it can eventually call a static functionGetException
which returnsUriFormatException
objects for different conditions of invalid URIs. Print out the fullUriFormatException
you are getting and compare it to the ones generated byUri.GetException
. You might get more details out of it.根据尼克的回答:
Based on Nick's answer: