由于下载 libcurl 接收的 utf8 编码 xml 数据而导致字符串损坏
在使用 libcurl 实现 Amazon S3 访问库的项目中,我遇到了 UTF8 问题。列出存储桶内容的方法将正确签名的适当请求发送到 S3 服务器。我收到一个 xml 文档,但数据已损坏。
我将它保存到 std::string 中。 例如,它以以下片段开头:
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult
在“ListBucketResult”的最后一个“t”之后,代码中有一个“0”(零),终止 std::string。在调试器中查看字符串的内容或将它们写入文件中会显示这一点,并且在不同位置处有更多的零,例如在某些(但不是全部)“>”处。右括号。
我使用在WinXP上运行的MS Visual Studio 2008,该项目是使用unicode支持编译的。
我应该怎么做才能在 std::string 中接收正确的 UTF8(根据多个来源,它应该与 unicode 无关)?关于这个有什么提示吗?
bool Http::Download(std::string& url, std::string& targetString, std::vector<std::string>* customHeaders)
{
CURLcode result = CURLE_FAILED_INIT;
dl = true;
if (curl)
{
curl = curl_easy_init();
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_HEADER, 0);
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteData);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &targetString);
if (unsafe)
{
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
}
if (customHeaders)
{
curl_slist* headers = 0;
for (std::vector<std::string>::const_iterator iter = customHeaders->begin(); iter != customHeaders->end(); iter++)
{
headers = curl_slist_append(headers, (*iter).c_str());
headers = curl_slist_append(headers, "\n");
}
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
}
result = curl_easy_perform(curl);
long http_code = 0;
curl_easy_getinfo (curl, CURLINFO_RESPONSE_CODE, &http_code);
lastHttpResult = static_cast<int>(http_code);
curl_easy_cleanup(curl);
}
return (result == CURLE_OK);
};
size_t Http::WriteData(char* data, size_t size, size_t nmemb, void* target)
{
if(target)
{
reinterpret_cast<std::string*>(target)->append(data);
size_t len = size * nmemb;
return len;
}
return 0;
};
In a project that implements an Amazon S3 access library with the use of libcurl, I have problems with UTF8. The method for listing a bucket's contents sends the appropriate request to the S3 server, correctly signed and all. I receive a xml document, but the data is corrupted.
I save it into a std::string.
For example, it starts with the following fragment:
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult
After the last "t" of "ListBucketResult", there is a "0" (zero) in the code, terminating the std::string. Viewing the contents of the string in the debugger or writing them into a file shows this, and many more zeros at different positions, e.g. at some (but not all) ">" closing brackets.
I use MS Visual Studio 2008 running on WinXP, the project is compiled with unicode support.
What should I do to receive proper UTF8 inside the std::string (which should be unicode agnostic, according to several sources)? Any hints on this one?
bool Http::Download(std::string& url, std::string& targetString, std::vector<std::string>* customHeaders)
{
CURLcode result = CURLE_FAILED_INIT;
dl = true;
if (curl)
{
curl = curl_easy_init();
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_HEADER, 0);
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteData);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &targetString);
if (unsafe)
{
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
}
if (customHeaders)
{
curl_slist* headers = 0;
for (std::vector<std::string>::const_iterator iter = customHeaders->begin(); iter != customHeaders->end(); iter++)
{
headers = curl_slist_append(headers, (*iter).c_str());
headers = curl_slist_append(headers, "\n");
}
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
}
result = curl_easy_perform(curl);
long http_code = 0;
curl_easy_getinfo (curl, CURLINFO_RESPONSE_CODE, &http_code);
lastHttpResult = static_cast<int>(http_code);
curl_easy_cleanup(curl);
}
return (result == CURLE_OK);
};
size_t Http::WriteData(char* data, size_t size, size_t nmemb, void* target)
{
if(target)
{
reinterpret_cast<std::string*>(target)->append(data);
size_t len = size * nmemb;
return len;
}
return 0;
};
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这行很可能是问题的一部分:
data
不是 NULL 终止的,所以谁知道你在字符串中放入了什么。将其替换为:It is quite likely that this line is part of the problem:
data
is not NULL terminated, so who knows what you're putting into your string. Replace it with this:在我看来,您应该在
WriteData()
函数中调用以下内容:CURLOPT_WRITEFUNCTION
的 libcurl 文档指出:因此,您不能依赖
append(const char*)
来正确处理追加。Seems to me that you should be calling the following in your
WriteData()
function:The libcurl documentation for
CURLOPT_WRITEFUNCTION
states:So you can't rely on
append(const char*)
to handle the append correctly.