当前位置：文江博客话题详情

寻找部分 utf8 编码 URL 的解析 API

发布于 2024-07-09 12:01:34 字数 255 浏览 3 评论 0原文

当解析某些网页（尤其是任何 Windows Live 页面）的 HTML 时，我遇到了许多以下格式的 URL。

http\x3a\x2f\x2fjs.wlxrs.com\x2fjt6xQREgnzkhGufPqwcJjg\x2fempty.htm

这些似乎是部分 UTF8 转义字符串（\x2f = /、\x3a=: 等）。是否有 .Net API 可用于将这些字符串转换为 System.Uri？看起来很容易解析，但我今天试图避免构建一个新轮子。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

终陌 2024-07-16 12:01:34

您发布的内容不是有效的 HTTP。因此，HttpUtility.UrlDecode() 当然不起作用。但无论如何，您都可以将其转换回普通文本，如下所示：

string input = @"http\x3a\x2f\x2fjs.wlxrs.com\x2fjt6xQREgnzkhGufPqwcJjg\x2fempty.htm";
string output = Regex.Replace(input, @"\\x([0-9a-f][0-9a-f])",
    m => ((char) int.Parse(m.Groups[1].Value, NumberStyles.HexNumber)).ToString());

但请注意，这假定编码是 Latin-1 而不是 UTF-8。您提供的意见在这方面尚无定论。如果需要UTF-8才能工作，则需要稍长的路线；您必须将字符串转换为字节，并用进程中的相关字节替换转义序列（可能需要 while 循环），然后对结果使用 Encoding.UTF8.GetString()字节数组。

What you posted is not valid HTTP. As such, of course HttpUtility.UrlDecode() won't work. But irrespective of that, you can turn this back into normal text like this:

string input = @"http\x3a\x2f\x2fjs.wlxrs.com\x2fjt6xQREgnzkhGufPqwcJjg\x2fempty.htm";
string output = Regex.Replace(input, @"\\x([0-9a-f][0-9a-f])",
    m => ((char) int.Parse(m.Groups[1].Value, NumberStyles.HexNumber)).ToString());

But notice that this assumes that the encoding is Latin-1 rather than UTF-8. The input you provided is inconclusive in that respect. If you need UTF-8 to work, you need a slightly longer route; you'll have to convert the string to bytes and replace the escape sequences with the relevant bytes in the process (probably needs a while loop), and then use Encoding.UTF8.GetString() on the resulting byte array.

回复收藏 0 原文

海拔太高太耀眼 2024-07-16 12:01:34

这是另一个解决方案：（继续@timwi解决方案）

string output = Regex.Replace(input, @"\\x([0-9a-f][0-9a-f])",
            m => ((char)Convert.ToInt32(m.Groups[1].Value, 16)).ToString());

here is another solution : (as continued from @timwi solution)

string output = Regex.Replace(input, @"\\x([0-9a-f][0-9a-f])",
            m => ((char)Convert.ToInt32(m.Groups[1].Value, 16)).ToString());

回复收藏 0 原文

仙气飘飘 2024-07-16 12:01:34

您是否尝试过HttpUtility.UrlDecode？

回复收藏 0 原文

~没有更多了~

关于作者

﹉夏雨初晴づ

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

寻找部分 utf8 编码 URL 的解析 API

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

杨绘峰

听闻余生

谜兔

xiaotwins

你说

若能看破又如何

友情链接

寻找部分 utf8 编码 URL 的解析 API

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

杨绘峰

听闻余生

谜兔

xiaotwins

你说

若能看破又如何

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。