如何正确地将 char* 转换为 std::string? (使用 expat / std::string(char*) 时出现问题)
问题描述
我正在将 Expat 与自定义 C++ 包装器一起使用,我已经在其他项目上对其进行了测试。 我遇到了问题,因为原始数据(c_str)没有以正确的方式转换为 std::string 。这让我很担心,因为我没有改变包装器的来源。
在此转换后,字符串似乎得到了以空字符结尾的字符:
onCharacterData( std::string( pszData, nLength ) ) // --> std::string( char* pszData)
我该如何解决这个问题?
自己的 expat 包装器
// Wrapper defines the class Expat and implements for example:
void XMLCALL Expat::CharacterDataHandler( void *pUserData, const XML_Char *pszData,
int nLength )
{
Expat* pThis = static_cast<Expat*>( pUserData );
// XML_Char is char, therefore this call contains i.e.: std::string("hello", 5)
pThis->onCharacterData( std::string( pszData, nLength ) );
}
自定义解析器
// Parser is defined as: class Parser : Expat
void Parser::onCharacterData(const std::string& data )
{
// data is no longer char*, but a std::string.
// It seems to contain \0 after each character which is wrong!
// [...]
}
expat 包装器内的字符数据 (char*)
字符解析器内的数据 (std::string)
Problem Description
I'm using Expat with a custom C++ wrapper, which I already tested on other projects.
I'm running into problems, because the original data (c_str) is not converted to a std::string in the right way. This concers me, because I did not change the source of the wrapper.
It seems like the string gets null-terminated chars after this conversion:
onCharacterData( std::string( pszData, nLength ) ) // --> std::string( char* pszData)
How can I fix this?
Own expat wrapper
// Wrapper defines the class Expat and implements for example:
void XMLCALL Expat::CharacterDataHandler( void *pUserData, const XML_Char *pszData,
int nLength )
{
Expat* pThis = static_cast<Expat*>( pUserData );
// XML_Char is char, therefore this call contains i.e.: std::string("hello", 5)
pThis->onCharacterData( std::string( pszData, nLength ) );
}
Custom parser
// Parser is defined as: class Parser : Expat
void Parser::onCharacterData(const std::string& data )
{
// data is no longer char*, but a std::string.
// It seems to contain \0 after each character which is wrong!
// [...]
}
Character data within the expat wrapper (char*)
Character data within the parser (std::string)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您的 pszData 似乎采用某种特定于实现的 Unicode 派生格式,其中每个“字符”占用两个 char 。
这意味着源数据已损坏;也许它应该是一个
wchar_t
缓冲区。Your
pszData
appears to be in some implementation-specific Unicode-derived format, where each "character" takes up twochar
s.This means the source data is broken; it should have been a
wchar_t
buffer, perhaps.看起来外籍人士正在使用宽字符和/或 UTF-16。尝试在返回时使用
std::wstring
。编辑我在文档中发现,如果定义了
XML_UNICODE
或XML_UNICODE_WCHAR_T
宏,则它正在使用wchar_t
。It looks like the expat is using wide chars and/or UTF-16. Try using
std::wstring
on a way back.EDIT I found in docs that it is using
wchar_t
ifXML_UNICODE
orXML_UNICODE_WCHAR_T
macro are defined.正如其他人指出的那样, pszData 似乎是一个多字节字符串。您应该尝试使用
std::basic_string
代替std::string
或std::wstring
。如果看起来太冗长,请使用typedef
。当然,如果
XML_Char
既不是char
也不是wchar_t
,您可能必须为std::char_traits< 提供模板专门化/代码>
编辑:
一些谷歌搜索显示 XML_Char 是 UTF-8;如果您定义
XML_UNICODE
或XML_UNICODE_WCHAR_T
,则可以使库使用 UTF-16。As others have pointed out it appears
pszData
is a multibyte character string. You should try usingstd::basic_string<XML_Char>
in place ofstd::string
orstd::wstring
. Use atypedef
if that seems too verbose.Of course, if
XML_Char
is neither achar
nor awchar_t
you might have to provide a template specialization forstd::char_traits
EDIT:
Some googling revealed that XML_Char is UTF-8; the library can be made to use UTF-16 if you define
XML_UNICODE
orXML_UNICODE_WCHAR_T
.