由于下载 libcurl 接收的 utf8 编码 xml 数据而导致字符串损坏

发布于 2025-01-08 14:22:16 字数 2231 浏览 0 评论 0原文

在使用 libcurl 实现 Amazon S3 访问库的项目中，我遇到了 UTF8 问题。列出存储桶内容的方法将正确签名的适当请求发送到 S3 服务器。我收到一个 xml 文档，但数据已损坏。

我将它保存到 std::string 中。例如，它以以下片段开头：

<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult

在“ListBucketResult”的最后一个“t”之后，代码中有一个“0”（零），终止 std::string。在调试器中查看字符串的内容或将它们写入文件中会显示这一点，并且在不同位置处有更多的零，例如在某些（但不是全部）“>”处。右括号。

我使用在WinXP上运行的MS Visual Studio 2008，该项目是使用unicode支持编译的。

我应该怎么做才能在 std::string 中接收正确的 UTF8（根据多个来源，它应该与 unicode 无关）？关于这个有什么提示吗？

bool Http::Download(std::string& url, std::string& targetString, std::vector<std::string>* customHeaders)
{
    CURLcode result = CURLE_FAILED_INIT;
    dl = true;

    if (curl)
    {
        curl = curl_easy_init();

        curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
        curl_easy_setopt(curl, CURLOPT_HEADER, 0);
        curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteData);
        curl_easy_setopt(curl, CURLOPT_WRITEDATA, &targetString);

        if (unsafe)
        {
            curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
            curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
        }

        if (customHeaders)
        {
            curl_slist* headers = 0;

            for (std::vector<std::string>::const_iterator iter = customHeaders->begin(); iter != customHeaders->end(); iter++)
            {
                headers = curl_slist_append(headers, (*iter).c_str());
                headers = curl_slist_append(headers, "\n");
            }

            curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
        }

        result = curl_easy_perform(curl);

        long http_code = 0;
        curl_easy_getinfo (curl, CURLINFO_RESPONSE_CODE, &http_code);
        lastHttpResult = static_cast<int>(http_code);

        curl_easy_cleanup(curl);
    }

    return (result == CURLE_OK);
};

size_t Http::WriteData(char* data, size_t size, size_t nmemb, void* target)
{
    if(target)
    {
        reinterpret_cast<std::string*>(target)->append(data);
        size_t len = size * nmemb;
        return len;
    }

    return 0;
};

原文

In a project that implements an Amazon S3 access library with the use of libcurl, I have problems with UTF8. The method for listing a bucket's contents sends the appropriate request to the S3 server, correctly signed and all. I receive a xml document, but the data is corrupted.

I save it into a std::string.
For example, it starts with the following fragment:

<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult

After the last "t" of "ListBucketResult", there is a "0" (zero) in the code, terminating the std::string. Viewing the contents of the string in the debugger or writing them into a file shows this, and many more zeros at different positions, e.g. at some (but not all) ">" closing brackets.

I use MS Visual Studio 2008 running on WinXP, the project is compiled with unicode support.

What should I do to receive proper UTF8 inside the std::string (which should be unicode agnostic, according to several sources)? Any hints on this one?

bool Http::Download(std::string& url, std::string& targetString, std::vector<std::string>* customHeaders)
{
    CURLcode result = CURLE_FAILED_INIT;
    dl = true;

    if (curl)
    {
        curl = curl_easy_init();

        curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
        curl_easy_setopt(curl, CURLOPT_HEADER, 0);
        curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteData);
        curl_easy_setopt(curl, CURLOPT_WRITEDATA, &targetString);

        if (unsafe)
        {
            curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
            curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
        }

        if (customHeaders)
        {
            curl_slist* headers = 0;

            for (std::vector<std::string>::const_iterator iter = customHeaders->begin(); iter != customHeaders->end(); iter++)
            {
                headers = curl_slist_append(headers, (*iter).c_str());
                headers = curl_slist_append(headers, "\n");
            }

            curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
        }

        result = curl_easy_perform(curl);

        long http_code = 0;
        curl_easy_getinfo (curl, CURLINFO_RESPONSE_CODE, &http_code);
        lastHttpResult = static_cast<int>(http_code);

        curl_easy_cleanup(curl);
    }

    return (result == CURLE_OK);
};

size_t Http::WriteData(char* data, size_t size, size_t nmemb, void* target)
{
    if(target)
    {
        reinterpret_cast<std::string*>(target)->append(data);
        size_t len = size * nmemb;
        return len;
    }

    return 0;
};

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

吻泪 2025-01-15 14:22:16

这行很可能是问题的一部分：

reinterpret_cast<std::string*>(target)->append(data);

data 不是 NULL 终止的，所以谁知道你在字符串中放入了什么。将其替换为：

reinterpret_cast<std::string*>(target)->append(data, size * nmemb);

It is quite likely that this line is part of the problem:

reinterpret_cast<std::string*>(target)->append(data);

data is not NULL terminated, so who knows what you're putting into your string. Replace it with this:

reinterpret_cast<std::string*>(target)->append(data, size * nmemb);

回复收藏 0 原文

北凤男飞 2025-01-15 14:22:16

在我看来，您应该在 WriteData() 函数中调用以下内容：

size_t len = size * nmemb;
reinterpret_cast<std::string*>(target)->append(data, len);

CURLOPT_WRITEFUNCTION 的 libcurl 文档指出：

ptr指向的数据大小为size乘以nmemb，不会以零结尾。

因此，您不能依赖 append(const char*) 来正确处理追加。

Seems to me that you should be calling the following in your WriteData() function:

size_t len = size * nmemb;
reinterpret_cast<std::string*>(target)->append(data, len);

The libcurl documentation for CURLOPT_WRITEFUNCTION states:

The size of the data pointed to by ptr is size multiplied with nmemb, it will not be zero terminated.

So you can't rely on append(const char*) to handle the append correctly.

回复收藏 0 原文

~没有更多了~

关于作者

池木

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

由于下载 libcurl 接收的 utf8 编码 xml 数据而导致字符串损坏

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

微信用户

小情绪

追我者格杀勿论

ゞ记忆︶ㄣ

笨死的猪

彭明超

友情链接

由于下载 libcurl 接收的 utf8 编码 xml 数据而导致字符串损坏

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

微信用户

小情绪

追我者格杀勿论

ゞ记忆︶ㄣ

笨死的猪

彭明超

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。