寻找部分 utf8 编码 URL 的解析 API
当解析某些网页(尤其是任何 Windows Live 页面)的 HTML 时,我遇到了许多以下格式的 URL。
http\x3a\x2f\x2fjs.wlxrs.com\x2fjt6xQREgnzkhGufPqwcJjg\x2fempty.htm
这些似乎是部分 UTF8 转义字符串(\x2f = /、\x3a=: 等)。 是否有 .Net API 可用于将这些字符串转换为 System.Uri? 看起来很容易解析,但我今天试图避免构建一个新轮子。
When parsing HTML for certain web pages (most notably, any windows live page) I encounter a lot of URL’s in the following format.
http\x3a\x2f\x2fjs.wlxrs.com\x2fjt6xQREgnzkhGufPqwcJjg\x2fempty.htm
These appear to be partially UTF8 escaped strings (\x2f = /, \x3a=:, etc …). Is there a .Net API that can be used to transform these strings into a System.Uri? Seems easy enough to parse but I’m trying to avoid building a new wheel today.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您发布的内容不是有效的 HTTP。 因此,
HttpUtility.UrlDecode()
当然不起作用。 但无论如何,您都可以将其转换回普通文本,如下所示:但请注意,这假定编码是 Latin-1 而不是 UTF-8。 您提供的意见在这方面尚无定论。 如果需要UTF-8才能工作,则需要稍长的路线; 您必须将字符串转换为字节,并用进程中的相关字节替换转义序列(可能需要 while 循环),然后对结果使用
Encoding.UTF8.GetString()
字节数组。What you posted is not valid HTTP. As such, of course
HttpUtility.UrlDecode()
won't work. But irrespective of that, you can turn this back into normal text like this:But notice that this assumes that the encoding is Latin-1 rather than UTF-8. The input you provided is inconclusive in that respect. If you need UTF-8 to work, you need a slightly longer route; you'll have to convert the string to bytes and replace the escape sequences with the relevant bytes in the process (probably needs a while loop), and then use
Encoding.UTF8.GetString()
on the resulting byte array.这是另一个解决方案:(继续@timwi解决方案)
here is another solution : (as continued from @timwi solution)
您是否尝试过HttpUtility.UrlDecode?
Did you try HttpUtility.UrlDecode?