前几天我遇到了这个问题的编码部分。 对可用选项不满意,在查看此 C 示例代码后,我决定滚动我自己的 C++ url 编码函数:
#include <cctype>
#include <iomanip>
#include <sstream>
#include <string>
using namespace std;
string url_encode(const string &value) {
ostringstream escaped;
escaped.fill('0');
escaped << hex;
for (string::const_iterator i = value.begin(), n = value.end(); i != n; ++i) {
string::value_type c = (*i);
// Keep alphanumeric and other accepted characters intact
if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
escaped << c;
continue;
}
// Any other characters are percent-encoded
escaped << uppercase;
escaped << '%' << setw(2) << int((unsigned char) c);
escaped << nouppercase;
}
return escaped.str();
}
解码函数的实现留给读者作为练习。 :P
I faced the encoding half of this problem the other day. Unhappy with the available options, and after taking a look at this C sample code, i decided to roll my own C++ url-encode function:
#include <cctype>
#include <iomanip>
#include <sstream>
#include <string>
using namespace std;
string url_encode(const string &value) {
ostringstream escaped;
escaped.fill('0');
escaped << hex;
for (string::const_iterator i = value.begin(), n = value.end(); i != n; ++i) {
string::value_type c = (*i);
// Keep alphanumeric and other accepted characters intact
if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
escaped << c;
continue;
}
// Any other characters are percent-encoded
escaped << uppercase;
escaped << '%' << setw(2) << int((unsigned char) c);
escaped << nouppercase;
}
return escaped.str();
}
The implementation of the decode function is left as an exercise to the reader. :P
通常在编码时将 '%' 添加到 char 的 int 值中不起作用,该值应该是等效的十六进制值。 例如“/”是“%2F”而不是“%47”。
我认为这是 url 编码和解码的最佳且简洁的解决方案(没有太多的标头依赖性)。
string urlEncode(string str){
string new_str = "";
char c;
int ic;
const char* chars = str.c_str();
char bufHex[10];
int len = strlen(chars);
for(int i=0;i<len;i++){
c = chars[i];
ic = c;
// uncomment this if you want to encode spaces with +
/*if (c==' ') new_str += '+';
else */if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') new_str += c;
else {
sprintf(bufHex,"%X",c);
if(ic < 16)
new_str += "%0";
else
new_str += "%";
new_str += bufHex;
}
}
return new_str;
}
string urlDecode(string str){
string ret;
char ch;
int i, ii, len = str.length();
for (i=0; i < len; i++){
if(str[i] != '%'){
if(str[i] == '+')
ret += ' ';
else
ret += str[i];
}else{
sscanf(str.substr(i + 1, 2).c_str(), "%x", &ii);
ch = static_cast<char>(ii);
ret += ch;
i = i + 2;
}
}
return ret;
}
Ordinarily adding '%' to the int value of a char will not work when encoding, the value is supposed to the the hex equivalent. e.g '/' is '%2F' not '%47'.
I think this is the best and concise solutions for both url encoding and decoding (No much header dependencies).
string urlEncode(string str){
string new_str = "";
char c;
int ic;
const char* chars = str.c_str();
char bufHex[10];
int len = strlen(chars);
for(int i=0;i<len;i++){
c = chars[i];
ic = c;
// uncomment this if you want to encode spaces with +
/*if (c==' ') new_str += '+';
else */if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') new_str += c;
else {
sprintf(bufHex,"%X",c);
if(ic < 16)
new_str += "%0";
else
new_str += "%";
new_str += bufHex;
}
}
return new_str;
}
string urlDecode(string str){
string ret;
char ch;
int i, ii, len = str.length();
for (i=0; i < len; i++){
if(str[i] != '%'){
if(str[i] == '+')
ret += ' ';
else
ret += str[i];
}else{
sscanf(str.substr(i + 1, 2).c_str(), "%x", &ii);
ch = static_cast<char>(ii);
ret += ch;
i = i + 2;
}
}
return ret;
}
EDIT01:修复了零填充的东西 - 特别感谢Hartmut Kaiser EDIT02:CoLiRu 直播
[Necromancer mode on]
Stumbled upon this question when was looking for fast, modern, platform independent and elegant solution. Didnt like any of above, cpp-netlib would be the winner but it has horrific memory vulnerability in "decoded" function. So I came up with boost's spirit qi/karma solution.
I ended up on this question when searching for an api to decode url in a win32 c++ app. Since the question doesn't quite specify platform assuming windows isn't a bad thing.
InternetCanonicalizeUrl is the API for windows programs. More info here
LPTSTR lpOutputBuffer = new TCHAR[1];
DWORD dwSize = 1;
BOOL fRes = ::InternetCanonicalizeUrl(strUrl, lpOutputBuffer, &dwSize, ICU_DECODE | ICU_NO_ENCODE);
DWORD dwError = ::GetLastError();
if (!fRes && dwError == ERROR_INSUFFICIENT_BUFFER)
{
delete lpOutputBuffer;
lpOutputBuffer = new TCHAR[dwSize];
fRes = ::InternetCanonicalizeUrl(strUrl, lpOutputBuffer, &dwSize, ICU_DECODE | ICU_NO_ENCODE);
if (fRes)
{
//lpOutputBuffer has decoded url
}
else
{
//failed to decode
}
if (lpOutputBuffer !=NULL)
{
delete [] lpOutputBuffer;
lpOutputBuffer = NULL;
}
}
else
{
//some other error OR the input string url is just 1 char and was successfully decoded
}
InternetCrackUrl (here) also seems to have flags to specify whether to decode url
Adding a follow-up to Bill's recommendation for using libcurl: great suggestion, and to be updated:
after 3 years, the curl_escape function is deprecated, so for future use it's better to use curl_easy_escape.
I couldn't find a URI decode/unescape here that also decodes 2 and 3 byte sequences. Contributing my own version, that on-the-fly converts the c sting input to a wstring:
#include <stddef.h>
#include <ctype.h>
/**
* decode a percent-encoded C string with optional path normalization
*
* The buffer pointed to by @dst must be at least strlen(@src) bytes.
* Decoding stops at the first character from @src that decodes to null.
* Path normalization will remove redundant slashes and slash+dot sequences,
* as well as removing path components when slash+dot+dot is found. It will
* keep the root slash (if one was present) and will stop normalization
* at the first questionmark found (so query parameters won't be normalized).
*
* @param dst destination buffer
* @param src source buffer
* @param normalize perform path normalization if nonzero
* @return number of valid characters in @dst
* @author Johan Lindh <[email protected]>
* @legalese BSD licensed (http://opensource.org/licenses/BSD-2-Clause)
*/
ptrdiff_t urldecode(char* dst, const char* src, int normalize)
{
char* org_dst = dst;
int slash_dot_dot = 0;
char ch, a, b;
do {
ch = *src++;
if (ch == '%' && isxdigit(a = src[0]) && isxdigit(b = src[1])) {
if (a < 'A') a -= '0';
else if(a < 'a') a -= 'A' - 10;
else a -= 'a' - 10;
if (b < 'A') b -= '0';
else if(b < 'a') b -= 'A' - 10;
else b -= 'a' - 10;
ch = 16 * a + b;
src += 2;
}
if (normalize) {
switch (ch) {
case '/':
if (slash_dot_dot < 3) {
/* compress consecutive slashes and remove slash-dot */
dst -= slash_dot_dot;
slash_dot_dot = 1;
break;
}
/* fall-through */
case '?':
/* at start of query, stop normalizing */
if (ch == '?')
normalize = 0;
/* fall-through */
case '\0':
if (slash_dot_dot > 1) {
/* remove trailing slash-dot-(dot) */
dst -= slash_dot_dot;
/* remove parent directory if it was two dots */
if (slash_dot_dot == 3)
while (dst > org_dst && *--dst != '/')
/* empty body */;
slash_dot_dot = (ch == '/') ? 1 : 0;
/* keep the root slash if any */
if (!slash_dot_dot && dst == org_dst && *dst == '/')
++dst;
}
break;
case '.':
if (slash_dot_dot == 1 || slash_dot_dot == 2) {
++slash_dot_dot;
break;
}
/* fall-through */
default:
slash_dot_dot = 0;
}
}
*dst++ = ch;
} while(ch);
return (dst - org_dst) - 1;
}
This version is pure C and can optionally normalize the resource path. Using it with C++ is trivial:
#include <string>
#include <iostream>
int main(int argc, char** argv)
{
const std::string src("/some.url/foo/../bar/%2e/");
std::cout << "src=\"" << src << "\"" << std::endl;
// either do it the C++ conformant way:
char* dst_buf = new char[src.size() + 1];
urldecode(dst_buf, src.c_str(), 1);
std::string dst1(dst_buf);
delete[] dst_buf;
std::cout << "dst1=\"" << dst1 << "\"" << std::endl;
// or in-place with the &[0] trick to skip the new/delete
std::string dst2;
dst2.resize(src.size() + 1);
dst2.resize(urldecode(&dst2[0], src.c_str(), 1));
std::cout << "dst2=\"" << dst2 << "\"" << std::endl;
}
#include <stddef.h>
#include <ctype.h>
/**
* decode a percent-encoded C string with optional path normalization
*
* The buffer pointed to by @dst must be at least strlen(@src) bytes.
* Decoding stops at the first character from @src that decodes to null.
* Path normalization will remove redundant slashes and slash+dot sequences,
* as well as removing path components when slash+dot+dot is found. It will
* keep the root slash (if one was present) and will stop normalization
* at the first questionmark found (so query parameters won't be normalized).
*
* @param dst destination buffer
* @param src source buffer
* @param normalize perform path normalization if nonzero
* @return number of valid characters in @dst
* @author Johan Lindh <[email protected]>
* @legalese BSD licensed (http://opensource.org/licenses/BSD-2-Clause)
*/
ptrdiff_t urldecode(char* dst, const char* src, int normalize)
{
char* org_dst = dst;
int slash_dot_dot = 0;
char ch, a, b;
do {
ch = *src++;
if (ch == '%' && isxdigit(a = src[0]) && isxdigit(b = src[1])) {
if (a < 'A') a -= '0';
else if(a < 'a') a -= 'A' - 10;
else a -= 'a' - 10;
if (b < 'A') b -= '0';
else if(b < 'a') b -= 'A' - 10;
else b -= 'a' - 10;
ch = 16 * a + b;
src += 2;
}
if (normalize) {
switch (ch) {
case '/':
if (slash_dot_dot < 3) {
/* compress consecutive slashes and remove slash-dot */
dst -= slash_dot_dot;
slash_dot_dot = 1;
break;
}
/* fall-through */
case '?':
/* at start of query, stop normalizing */
if (ch == '?')
normalize = 0;
/* fall-through */
case '\0':
if (slash_dot_dot > 1) {
/* remove trailing slash-dot-(dot) */
dst -= slash_dot_dot;
/* remove parent directory if it was two dots */
if (slash_dot_dot == 3)
while (dst > org_dst && *--dst != '/')
/* empty body */;
slash_dot_dot = (ch == '/') ? 1 : 0;
/* keep the root slash if any */
if (!slash_dot_dot && dst == org_dst && *dst == '/')
++dst;
}
break;
case '.':
if (slash_dot_dot == 1 || slash_dot_dot == 2) {
++slash_dot_dot;
break;
}
/* fall-through */
default:
slash_dot_dot = 0;
}
}
*dst++ = ch;
} while(ch);
return (dst - org_dst) - 1;
}
我知道这个问题需要 C++ 方法,但对于那些可能需要它的人,我想出了一个用纯 C 编写的非常短的函数来编码字符串。 它不会创建新字符串,而是更改现有字符串,这意味着它必须有足够的大小来容纳新字符串。 很容易跟上。
void urlEncode(char *string)
{
char charToEncode;
int posToEncode;
while (((posToEncode=strspn(string,"1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_.~"))!=0) &&(posToEncode<strlen(string)))
{
charToEncode=string[posToEncode];
memmove(string+posToEncode+3,string+posToEncode+1,strlen(string+posToEncode));
string[posToEncode]='%';
string[posToEncode+1]="0123456789ABCDEF"[charToEncode>>4];
string[posToEncode+2]="0123456789ABCDEF"[charToEncode&0xf];
string+=posToEncode+3;
}
}
I know the question asks for a C++ method, but for those who might need it, I came up with a very short function in plain C to encode a string. It doesn't create a new string, rather it alters the existing one, meaning that it must have enough size to hold the new string. Very easy to keep up.
void urlEncode(char *string)
{
char charToEncode;
int posToEncode;
while (((posToEncode=strspn(string,"1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_.~"))!=0) &&(posToEncode<strlen(string)))
{
charToEncode=string[posToEncode];
memmove(string+posToEncode+3,string+posToEncode+1,strlen(string+posToEncode));
string[posToEncode]='%';
string[posToEncode+1]="0123456789ABCDEF"[charToEncode>>4];
string[posToEncode+2]="0123456789ABCDEF"[charToEncode&0xf];
string+=posToEncode+3;
}
}
发布评论
评论(19)
前几天我遇到了这个问题的编码部分。 对可用选项不满意,在查看此 C 示例代码后,我决定滚动我自己的 C++ url 编码函数:
解码函数的实现留给读者作为练习。 :P
I faced the encoding half of this problem the other day. Unhappy with the available options, and after taking a look at this C sample code, i decided to roll my own C++ url-encode function:
The implementation of the decode function is left as an exercise to the reader. :P
回答我自己的问题...
libcurl 有 curl_easy_escape 用于编码。
对于解码,curl_easy_unescape。
Answering my own question...
libcurl has curl_easy_escape for encoding.
For decoding, curl_easy_unescape.
不是最好的,但工作正常;-)
not the best, but working fine ;-)
cpp-netlib 具有
允许非常轻松地编码和解码 URL 字符串的函数。
cpp-netlib has functions
they allow to encode and decode URL strings very easy.
通常在编码时将 '%' 添加到 char 的 int 值中不起作用,该值应该是等效的十六进制值。 例如“/”是“%2F”而不是“%47”。
我认为这是 url 编码和解码的最佳且简洁的解决方案(没有太多的标头依赖性)。
Ordinarily adding '%' to the int value of a char will not work when encoding, the value is supposed to the the hex equivalent. e.g '/' is '%2F' not '%47'.
I think this is the best and concise solutions for both url encoding and decoding (No much header dependencies).
[死灵法师模式开启]
在寻找快速、现代、独立于平台且优雅的解决方案时偶然发现了这个问题。 与上述任何一个都不一样,cpp-netlib 将成为赢家,但它在“解码”功能中具有可怕的内存漏洞。 所以我想出了boost的灵气/业力解决方案。
上面的用法如下:
[死灵法师模式关闭]
EDIT01:修复了零填充的东西 - 特别感谢Hartmut Kaiser
EDIT02:CoLiRu 直播
[Necromancer mode on]
Stumbled upon this question when was looking for fast, modern, platform independent and elegant solution. Didnt like any of above, cpp-netlib would be the winner but it has horrific memory vulnerability in "decoded" function. So I came up with boost's spirit qi/karma solution.
The usage of above as following:
[Necromancer mode off]
EDIT01: fixed the zero padding stuff - special thanks to Hartmut Kaiser
EDIT02: Live on CoLiRu
CGICC 包含进行 url 编码和解码的方法。 form_urlencode 和 form_urldecode
CGICC includes methods to do url encode and decode. form_urlencode and form_urldecode
受 xperroni 的启发,我写了一个解码器。 谢谢你的指点。
编辑:删除了不需要的 cctype 和 iomainip 包含。
Inspired by xperroni I wrote a decoder. Thank you for the pointer.
edit: Removed unneeded cctype and iomainip includes.
当我在 win32 c++ 应用程序中搜索 api 来解码 url 时,我最终遇到了这个问题。 由于问题没有完全指定平台,假设 Windows 不是一件坏事。
InternetCanonicalizeUrl 是 Windows 程序的 API。 更多信息此处
InternetCrackUrl(此处 ) 似乎也有标志来指定是否解码 url
I ended up on this question when searching for an api to decode url in a win32 c++ app. Since the question doesn't quite specify platform assuming windows isn't a bad thing.
InternetCanonicalizeUrl is the API for windows programs. More info here
InternetCrackUrl (here) also seems to have flags to specify whether to decode url
Windows API 具有以下功能 <代码>UrlEscape/
UrlUnescape
,由 shlwapi.dll 导出,用于此任务。The Windows API has the functions
UrlEscape
/UrlUnescape
, exported by shlwapi.dll, for this task.添加比尔关于使用 libcurl 的建议的后续内容:很好的建议,并进行更新:
3年后,curl_escape函数已被弃用,因此为了将来使用,最好使用curl_easy_escape。
Adding a follow-up to Bill's recommendation for using libcurl: great suggestion, and to be updated:
after 3 years, the curl_escape function is deprecated, so for future use it's better to use curl_easy_escape.
您可以简单地使用函数
AtlEscapeUrl()
来自atlutil.h
,只需浏览其文档以了解如何使用它。you can simply use function
AtlEscapeUrl()
fromatlutil.h
, just go through its documentation on how to use it.另一种解决方案是使用 Facebook 的 folly 库 :
folly::uriEscape
和愚蠢::uriUnescape
。Another solution is available using Facebook's folly library :
folly::uriEscape
andfolly::uriUnescape
.我在这里找不到也解码 2 和 3 字节序列的 URI 解码/转义。 贡献我自己的版本,即时将 c 字符串输入转换为 wstring:
I couldn't find a URI decode/unescape here that also decodes 2 and 3 byte sequences. Contributing my own version, that on-the-fly converts the c sting input to a wstring:
该版本是纯C 版本,可以选择标准化资源路径。 将它与 C++ 一起使用是微不足道的:
输出:
以及实际功能:
This version is pure C and can optionally normalize the resource path. Using it with C++ is trivial:
Outputs:
And the actual function:
多汁的位
注意到,
如
the juicy bits
noting that
as in
您可以使用 glib.h 提供的“g_uri_escape_string()”函数。
https://developer.gnome.org/glib/stable/glib-URI -Functions.html
编译它:
You can use "g_uri_escape_string()" function provided glib.h.
https://developer.gnome.org/glib/stable/glib-URI-Functions.html
compile it with:
我知道这个问题需要 C++ 方法,但对于那些可能需要它的人,我想出了一个用纯 C 编写的非常短的函数来编码字符串。 它不会创建新字符串,而是更改现有字符串,这意味着它必须有足够的大小来容纳新字符串。 很容易跟上。
I know the question asks for a C++ method, but for those who might need it, I came up with a very short function in plain C to encode a string. It doesn't create a new string, rather it alters the existing one, meaning that it must have enough size to hold the new string. Very easy to keep up.
必须在没有 Boost 的项目中做到这一点。 所以,最终我自己写了。 我会将其放在 GitHub 上: https://github.com/corporateshark/LUrlParser
Had to do it in a project without Boost. So, ended up writing my own. I will just put it on GitHub: https://github.com/corporateshark/LUrlParser