由于下载 libcurl 接收的 utf8 编码 xml 数据而导致字符串损坏

发布于 2025-01-08 14:22:16 字数 2231 浏览 0 评论 0原文

在使用 libcurl 实现 Amazon S3 访问库的​​项目中,我遇到了 UTF8 问题。列出存储桶内容的方法将正确签名的适当请求发送到 S3 服务器。我收到一个 xml 文档,但数据已损坏。

我将它保存到 std::string 中。 例如,它以以下片段开头:

<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult

在“ListBucketResult”的最后一个“t”之后,代码中有一个“0”(零),终止 std::string。在调试器中查看字符串的内容或将它们写入文件中会显示这一点,并且在不同位置处有更多的零,例如在某些(但不是全部)“>”处。右括号。

我使用在WinXP上运行的MS Visual Studio 2008,该项目是使用unicode支持编译的。

我应该怎么做才能在 std::string 中接收正确的 UTF8(根据多个来源,它应该与 unicode 无关)?关于这个有什么提示吗?

bool Http::Download(std::string& url, std::string& targetString, std::vector<std::string>* customHeaders)
{
    CURLcode result = CURLE_FAILED_INIT;
    dl = true;

    if (curl)
    {
        curl = curl_easy_init();

        curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
        curl_easy_setopt(curl, CURLOPT_HEADER, 0);
        curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteData);
        curl_easy_setopt(curl, CURLOPT_WRITEDATA, &targetString);

        if (unsafe)
        {
            curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
            curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
        }

        if (customHeaders)
        {
            curl_slist* headers = 0;

            for (std::vector<std::string>::const_iterator iter = customHeaders->begin(); iter != customHeaders->end(); iter++)
            {
                headers = curl_slist_append(headers, (*iter).c_str());
                headers = curl_slist_append(headers, "\n");
            }

            curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
        }

        result = curl_easy_perform(curl);

        long http_code = 0;
        curl_easy_getinfo (curl, CURLINFO_RESPONSE_CODE, &http_code);
        lastHttpResult = static_cast<int>(http_code);

        curl_easy_cleanup(curl);
    }

    return (result == CURLE_OK);
};

size_t Http::WriteData(char* data, size_t size, size_t nmemb, void* target)
{
    if(target)
    {
        reinterpret_cast<std::string*>(target)->append(data);
        size_t len = size * nmemb;
        return len;
    }

    return 0;
};

In a project that implements an Amazon S3 access library with the use of libcurl, I have problems with UTF8. The method for listing a bucket's contents sends the appropriate request to the S3 server, correctly signed and all. I receive a xml document, but the data is corrupted.

I save it into a std::string.
For example, it starts with the following fragment:

<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult

After the last "t" of "ListBucketResult", there is a "0" (zero) in the code, terminating the std::string. Viewing the contents of the string in the debugger or writing them into a file shows this, and many more zeros at different positions, e.g. at some (but not all) ">" closing brackets.

I use MS Visual Studio 2008 running on WinXP, the project is compiled with unicode support.

What should I do to receive proper UTF8 inside the std::string (which should be unicode agnostic, according to several sources)? Any hints on this one?

bool Http::Download(std::string& url, std::string& targetString, std::vector<std::string>* customHeaders)
{
    CURLcode result = CURLE_FAILED_INIT;
    dl = true;

    if (curl)
    {
        curl = curl_easy_init();

        curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
        curl_easy_setopt(curl, CURLOPT_HEADER, 0);
        curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteData);
        curl_easy_setopt(curl, CURLOPT_WRITEDATA, &targetString);

        if (unsafe)
        {
            curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
            curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
        }

        if (customHeaders)
        {
            curl_slist* headers = 0;

            for (std::vector<std::string>::const_iterator iter = customHeaders->begin(); iter != customHeaders->end(); iter++)
            {
                headers = curl_slist_append(headers, (*iter).c_str());
                headers = curl_slist_append(headers, "\n");
            }

            curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
        }

        result = curl_easy_perform(curl);

        long http_code = 0;
        curl_easy_getinfo (curl, CURLINFO_RESPONSE_CODE, &http_code);
        lastHttpResult = static_cast<int>(http_code);

        curl_easy_cleanup(curl);
    }

    return (result == CURLE_OK);
};

size_t Http::WriteData(char* data, size_t size, size_t nmemb, void* target)
{
    if(target)
    {
        reinterpret_cast<std::string*>(target)->append(data);
        size_t len = size * nmemb;
        return len;
    }

    return 0;
};

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

吻泪 2025-01-15 14:22:16

这行很可能是问题的一部分:

reinterpret_cast<std::string*>(target)->append(data);

data 不是 NULL 终止的,所以谁知道你在字符串中放入了什么。将其替换为:

reinterpret_cast<std::string*>(target)->append(data, size * nmemb);

It is quite likely that this line is part of the problem:

reinterpret_cast<std::string*>(target)->append(data);

data is not NULL terminated, so who knows what you're putting into your string. Replace it with this:

reinterpret_cast<std::string*>(target)->append(data, size * nmemb);
北凤男飞 2025-01-15 14:22:16

在我看来,您应该在 WriteData() 函数中调用以下内容:

size_t len = size * nmemb;
reinterpret_cast<std::string*>(target)->append(data, len);

CURLOPT_WRITEFUNCTION 的 libcurl 文档指出:

ptr指向的数据大小为size乘以nmemb,不会以零结尾。

因此,您不能依赖 append(const char*) 来正确处理追加。

Seems to me that you should be calling the following in your WriteData() function:

size_t len = size * nmemb;
reinterpret_cast<std::string*>(target)->append(data, len);

The libcurl documentation for CURLOPT_WRITEFUNCTION states:

The size of the data pointed to by ptr is size multiplied with nmemb, it will not be zero terminated.

So you can't rely on append(const char*) to handle the append correctly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文