URI 转义 C++字符串

发布于 2024-08-09 18:48:37 字数 420 浏览 5 评论 0原文

我正在寻找一种在 C++ 中进行 URI 转义的好方法,这对于跨平台项目来说是合理的。

我想要一个函数,它需要这样的字符串:

L"jiayou加油"

并返回:

L"jiayou%E5%8A%A0%E6%B2%B9"

我考虑使用类似 this 的东西,稍作修改即可使用 wchar_t。但是,这需要在 printf 调用之前从 utf-16 转换为 utf-8。这让我陷入了字符编码的地狱。

这种方法以及我研究过的所有其他方法都感觉是错误的方法。有没有一种好方法可以在 C++ 中转义 URI wstring?

I am looking for a good way to do a URI Escape in C++ that would be reasonable for a cross platform project.

I would like a function that would take a string like this:

L"jiayou加油"

And return:

L"jiayou%E5%8A%A0%E6%B2%B9"

I looked at using some thing like this, with minor modifacations to use wchar_t. However that would require converting from utf-16 to utf-8 before the printf call. This has lead me down character encoding hell.

This and all the other approaches I have looked into just feel like the wrong way. Is there a good way to URI Escape a wstring in C++?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

无风消散 2024-08-16 18:48:37

无论你做什么,你都会陷入某种字符编码地狱(这就是字符编码的方式)。

来自 http://labs.apache.org/webarch/uri/rfc /rfc3986.html#字符

URI 语法提供了一种将数据编码为字符序列的方法,大概是为了识别资源。 URI 字符又经常被编码为八位字节以供传输或表示。本规范不强制要求任何特定的字符编码用于 URI 字符和用于存储或传输这些字符的八位字节之间的映射。当 URI 出现在协议元素中时,字符编码由该协议定义;如果没有这样的定义,则假定 URI 与周围文本采用相同的字符编码。

因此,在某些时候,您需要将 URI 转换为适合您将 URI 发送到的目标的编码。如果是 UTF8,那么您最好在执行百分比编码之前进行转换,以便可以使用已经找到的库例程。如果它不是 UTF8 那么你需要知道 URI 的接收者期望什么(同样,这就是字符集编码的方式 - 你必须知道另一个人期望什么,或者能够告诉他),这样你就可以对期望的字符集中的字符进行百分比编码。

No matter what you do you're in some sort of character encoding hell (that's just the way it is with character encodings).

From http://labs.apache.org/webarch/uri/rfc/rfc3986.html#characters:

The URI syntax provides a method of encoding data, presumably for the sake of identifying a resource, as a sequence of characters. The URI characters are, in turn, frequently encoded as octets for transport or presentation. This specification does not mandate any particular character encoding for mapping between URI characters and the octets used to store or transmit those characters. When a URI appears in a protocol element, the character encoding is defined by that protocol; without such a definition, a URI is assumed to be in the same character encoding as the surrounding text.

So, at some point you need to convert your URI to to the encoding that's appropriate to whatever you're sending the URI to. If that's UTF8 then you might as well do that conversion before you perform percent-encoding so you can use the library routine you've already found. If it's not UTF8 then you need to know what the recipient of the URI is expecting (again, that's the way it is with charset encodings - you have to know what the other guy is expecting, or be able to tell him) so you can percent-encode the characters in the character set encoding it's expecting.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文