如何在源文件中嵌入 unicode 字符串常量？

发布于 2024-07-11 10:39:33 字数 1093 浏览 6 评论 0原文

我正在编写一些单元测试，这些测试将验证我们对使用除普通拉丁字母之外的其他字符集的各种资源的处理：西里尔字母、希伯来语等。

我遇到的问题是我找不到一种方法将期望嵌入到测试源文件：这是我正在尝试做的一个示例...

///
/// Protected: TestGetHebrewConfigString
///  
void CPrIniFileReaderTest::TestGetHebrewConfigString()
{
    prwstring strHebrewTestFilePath = GetTestFilePath( strHebrewTestFileName );
    CPrIniFileReader prIniListReader( strHebrewTestFilePath.c_str() );
    prIniListReader.SetCurrentSection( strHebrewSubSection );   

    CPPUNIT_ASSERT( prIniListReader.GetConfigString( L"דונדארןמע" ) == L"דונהשךוק") );
}

这根本行不通。以前，我使用一个宏来解决这个问题，该宏调用一个例程将窄字符串转换为宽字符串（我们在应用程序中到处使用拖字符串，因此它是现有代码）

#define UNICODE_CONSTANT( CONSTANT ) towstring( CONSTANT )

wstring towstring( LPCSTR lpszValue )
{
    wostringstream os;
    os << lpszValue;
    return os.str();
}

上面测试中的断言变成：

CPPUNIT_ASSERT( prIniListReader.GetConfigString( UNICODE_CONSTANT( "דונדארןמע" ) ) == UNICODE_CONSTANT( "דונהשךוק" ) );

这可以正常工作OS X 但现在我正在移植到 Linux，我发现测试都失败了：这一切都感觉相当黑客。谁能告诉我他们是否有更好的解决方案来解决这个问题？

原文

I'm writing some unit tests which are going to verify our handling of various resources that use other character sets apart from the normal latin alphabet: Cyrilic, Hebrew etc.

The problem I have is that I cannot find a way to embed the expectations in the test source file: here's an example of what I'm trying to do...

///
/// Protected: TestGetHebrewConfigString
///  
void CPrIniFileReaderTest::TestGetHebrewConfigString()
{
    prwstring strHebrewTestFilePath = GetTestFilePath( strHebrewTestFileName );
    CPrIniFileReader prIniListReader( strHebrewTestFilePath.c_str() );
    prIniListReader.SetCurrentSection( strHebrewSubSection );   

    CPPUNIT_ASSERT( prIniListReader.GetConfigString( L"דונדארןמע" ) == L"דונהשךוק") );
}

This quite simply doesnt work. Previously I worked around this using a macro which calls a routine to transform a narrow string to a wide string (we use towstring all over the place in our applications so it's existing code)

#define UNICODE_CONSTANT( CONSTANT ) towstring( CONSTANT )

wstring towstring( LPCSTR lpszValue )
{
    wostringstream os;
    os << lpszValue;
    return os.str();
}

The assertion in the test above then became:

CPPUNIT_ASSERT( prIniListReader.GetConfigString( UNICODE_CONSTANT( "דונדארןמע" ) ) == UNICODE_CONSTANT( "דונהשךוק" ) );

This worked OK on OS X but now I'm porting to linux and I'm finding that the tests are all failing: it all feels rather hackish as well. Can anyone tell me if they have a nicer solution to this problem?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柠北森屋 2024-07-18 10:39:33

一种乏味但可移植的方法是使用数字转义码构建字符串。例如：

wchar_t *string = L"דונדארןמע";

变为：

wchar_t *string = "\x05d3\x05d5\x05e0\x05d3\x05d0\x05e8\x05df\x05de\x05e2";

您必须将所有 Unicode 字符转换为数字转义字符。这样你的源代码就变得与编码无关。

您可以使用在线工具进行转换，例如这个。它输出 JavaScript 转义格式 \uXXXX，因此只需搜索 & 将 \u 替换为 \x 以获取 C 格式。

A tedious but portable way is to build your strings using numeric escape codes. For example:

wchar_t *string = L"דונדארןמע";

becomes:

wchar_t *string = "\x05d3\x05d5\x05e0\x05d3\x05d0\x05e8\x05df\x05de\x05e2";

You have to convert all your Unicode characters to numeric escapes. That way your source code becomes encoding-independent.

You can use online tools for conversion, such as this one. It outputs the JavaScript escape format \uXXXX, so just search & replace \u with \x to get the C format.

回复收藏 0 原文

秋叶绚丽 2024-07-18 10:39:33

您必须告诉 GCC 您的文件使用哪种编码将这些字符编码到文件中。

使用选项-finput-charset=charset，例如-finput-charset=UTF-8。然后您需要告诉它在运行时这些字符串文字使用的编码。这将确定字符串中 wchar_t 项的值。您可以使用 -fwide-exec-charset=charset 设置该编码，例如 -fwide-exec-charset=UTF-32。请注意，编码的大小（utf-32 需要 32 位，utf-16 需要 16 位）不得超过 gcc 使用的 wchar_t 大小。

你可以调整一下。该选项主要用于为 wine 编译程序，旨在与 Windows 兼容。该选项称为 -fshort-wchar，很可能是 16 位而不是 32 位，这是 Linux 上 gcc 的常用宽度。

这些选项在 gcc 联机帮助页 man gcc 中有更详细的描述。

回复收藏 0 原文

偏爱你一生 2024-07-18 10:39:33

#define UNICODE_CONSTANT( CONSTANT ) towstring( CONSTANT )

wstring towstring( LPCSTR lpszValue ) {
    wostringstream os;
    os << lpszValue;
    return os.str(); 
}

这实际上根本不会在 Unicode 编码之间进行转换，这需要专用例程。您需要保持源代码和数据编码统一（大多数人使用 UTF-8），然后在必要时将其转换为操作系统特定的编码（例如 Winders 上的 UTF-16）。

#define UNICODE_CONSTANT( CONSTANT ) towstring( CONSTANT )

wstring towstring( LPCSTR lpszValue ) {
    wostringstream os;
    os << lpszValue;
    return os.str(); 
}

This does not actually convert at all between Unicode encodings, which requires a dedicated routine. You need to keep your source code and data encodings unified- most people use UTF-8- and then convert that to the OS-specific encoding if necessary (such as UTF-16 on Winders).

回复收藏 0 原文

~没有更多了~