将 wchar_t 转换为 int

发布于 2024-11-08 18:45:02 字数 276 浏览 5 评论 0原文

如何将 wchar_t ('9') 转换为 int 形式的数字 (9 )?

我有以下代码,用于检查 peek 是否为数字:

if (iswdigit(peek)) {
    // store peek as numeric
}

我可以只减去 '0' 还是有一些我应该担心的 Unicode 细节?

how can I convert a wchar_t ('9') to a digit in the form of an int (9)?

I have the following code where I check whether or not peek is a digit:

if (iswdigit(peek)) {
    // store peek as numeric
}

Can I just subtract '0' or is there some Unicode specifics I should worry about?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

吖咩 2024-11-15 18:45:02

如果问题仅涉及 '9'(或罗马数字之一)
数字),只需减去 '0' 就是正确的解决方案。如果
您关心 iswdigit 返回的任何内容
然而,如果非零,问题可能要复杂得多。这
标准规定,如果 iswdigit 返回一个非零值,则
参数是“十进制数字宽字符代码[在当前
local]”。这是模糊的,并由语言环境决定
准确定义其含义。在“C”语言环境或“Posix”中
locale,“Posix”标准,至少保证只有
罗马数字 0 到 9 被视为十进制数字(如果
我理解正确),所以如果你在“C”或“Posix”中
locale,只需减去“0”就可以了。

据推测,在 Unicode 语言环境中,这将是任何字符
它具有一般类别Nd。有许多
这些。最安全的解决方案就是简单地创建一些东西
就像(这里的变量具有静态生命周期):

wchar_t const* const digitTables[] =
{
    L"0123456789",
    L"\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669",
    // ...
};

//!     \return
//!         wch as a numeric digit, or -1 if it is not a digit
int asNumeric( wchar_t wch )
{
    int result = -1;
    for ( wchar_t const* const* p = std::begin( digitTables );
            p != std::end( digitTables ) && result == -1;
            ++ p ) {
        wchar_t const* q = std::find( *p, *p + 10, wch );
        if ( q != *p + 10 ) {
            result = q - *p;
    }
    return result;
}

如果你这样做:

  1. 你肯定会想要下载
    来自 Unicode 联盟的 UnicodeData.txt 文件
    ("取消编码字符
    数据库
    ”—此页面包含指向 Unicode 数据的链接
    文件以及其中使用的编码的说明),并
  2. 可能编写该文件的简单解析器来提取
    自动信息(例如,当有新版本时
    Unicode)—该文件是为简单的编程而设计的
    解析。

最后,请注意基于 ostringstream
istringstream(这包括 boost::lexical_cast)不会
工作,因为流中使用的转换被定义为仅
使用罗马数字。 (另一方面,也可能是
将您的代码限制为仅使用罗马数字是合理的。在
在这种情况下,测试变为 if ( wch >= L'0' && wch <= L'9' )
只需减去 L'0' 即可完成转换 —
始终假设宽字符的本机编码
你的编译器中的常量是 Unicode (这种情况,我很漂亮
当然,VC++ 和 g++ 都适用)。或者只是确保区域设置是
“C”(或“Posix”,在 Unix 机器上)。

编辑:我忘了提:如果你正在做任何严肃的 Unicode 编程,你
应该查看ICU。处理统一码
正确是非常重要的,而且它们已经有很多功能
实施的。

If the question concerns just '9' (or one of the Roman
digits), just subtracting '0' is the correct solution. If
you're concerned with anything for which iswdigit returns
non-zero, however, the issue may be far more complex. The
standard says that iswdigit returns a non-zero value if its
argument is "a decimal digit wide-character code [in the current
local]". Which is vague, and leaves it up to the locale to
define exactly what is meant. In the "C" locale or the "Posix"
locale, the "Posix" standard, at least, guarantees that only the
Roman digits zero through nine are considered decimal digits (if
I understand it correctly), so if you're in the "C" or "Posix"
locale, just subtracting '0' should work.

Presumably, in a Unicode locale, this would be any character
which has the general category Nd. There are a number of
these. The safest solution would be simply to create something
like (variables here with static lifetime):

wchar_t const* const digitTables[] =
{
    L"0123456789",
    L"\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669",
    // ...
};

//!     \return
//!         wch as a numeric digit, or -1 if it is not a digit
int asNumeric( wchar_t wch )
{
    int result = -1;
    for ( wchar_t const* const* p = std::begin( digitTables );
            p != std::end( digitTables ) && result == -1;
            ++ p ) {
        wchar_t const* q = std::find( *p, *p + 10, wch );
        if ( q != *p + 10 ) {
            result = q - *p;
    }
    return result;
}

If you go this way:

  1. you'll definitely want to download the
    UnicodeData.txt file from the Unicode consortium
    ("Uncode Character
    Database
    "—this page has a links to both the Unicode data
    file and an explination of the encodings used in it), and
  2. possibly write a simple parser of this file to extract the
    information automatically (e.g. when there is a new version of
    Unicode)—the file is designed for simple programmatic
    parsing.

Finally, note that solutions based on ostringstream and
istringstream (this includes boost::lexical_cast) will not
work, since the conversions used in streams are defined to only
use the Roman digits. (On the other hand, it might be
reasonable to restrict your code to just the Roman digits. In
which case, the test becomes if ( wch >= L'0' && wch <= L'9' ),
and the conversion is done by simply subtracting L'0'
always supposing the the native encoding of wide character
constants in your compiler is Unicode (the case, I'm pretty
sure, of both VC++ and g++). Or just ensure that the locale is
"C" (or "Posix", on a Unix machine).

EDIT: I forgot to mention: if you're doing any serious Unicode programming, you
should look into ICU. Handling Unicode
correctly is extremely non-trivial, and they've a lot of functionality already
implemented.

静赏你的温柔 2024-11-15 18:45:02

查看 atoi 函数类:http://msdn.microsoft.com/en-us/library/hc25t012(v=vs.71).aspx

特别是 _wtoi(const wchar_t *string); 似乎成为您正在寻找的人。不过,您必须确保您的 wchar_t 正确地以 null 终止,因此请尝试以下操作:

if (iswdigit(peek)) {
    // store peek as numeric
    wchar_t s[2];
    s[0] = peek;
    s[1] = 0;
    int numeric_peek = _wtoi(s);
}

Look into the atoi class of functions: http://msdn.microsoft.com/en-us/library/hc25t012(v=vs.71).aspx

Especially _wtoi(const wchar_t *string); seems to be what you're looking for. You would have to make sure your wchar_t is properly null terminated, though, so try something like this:

if (iswdigit(peek)) {
    // store peek as numeric
    wchar_t s[2];
    s[0] = peek;
    s[1] = 0;
    int numeric_peek = _wtoi(s);
}
妥活 2024-11-15 18:45:02

您可以使用 boost::lexical_cast< /a>:

const wchar_t c = '9';
int n = boost::lexical_cast<int>( c );

You could use boost::lexical_cast:

const wchar_t c = '9';
int n = boost::lexical_cast<int>( c );
习ぎ惯性依靠 2024-11-15 18:45:02

尽管MSDN文档,一个简单的测试表明不仅游侠 L'0'-L'9' 返回 true。

for(wchar_t i = 0; i < 0xFFFF; ++i)
{
    if (iswdigit(i))
    {
        wprintf(L"%d : %c\n", i, i);
    }
}

这意味着 L'0' 减法可能不会像您预期的那样起作用。

Despite MSDN documentation, a simple test suggest that not only ranger L'0'-L'9' returns true.

for(wchar_t i = 0; i < 0xFFFF; ++i)
{
    if (iswdigit(i))
    {
        wprintf(L"%d : %c\n", i, i);
    }
}

That means that L'0' subtraction probably won't work as you may expected.

前事休说 2024-11-15 18:45:02

对于大多数用途,您只需减去“0”的代码即可。

然而,维基百科关于 Unicode 数字 的文章提到,十进制数字以 23 个单独的块表示(包括阿拉伯语两次)。

如果您不担心这一点,那么只需减去“0”的代码即可。

For most purposes you can just subtract the code for '0'.

However, the Wikipedia article on Unicode numerials mentions that the decimal digits are represented in 23 separate blocks (including twice in Arabic).

If you are not worried about that, then just subtract the code for '0'.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文