将 wchar_t 转换为 int
如何将 wchar_t
('9'
) 转换为 int
形式的数字 (9
)?
我有以下代码,用于检查 peek
是否为数字:
if (iswdigit(peek)) {
// store peek as numeric
}
我可以只减去 '0'
还是有一些我应该担心的 Unicode 细节?
how can I convert a wchar_t
('9'
) to a digit in the form of an int
(9
)?
I have the following code where I check whether or not peek
is a digit:
if (iswdigit(peek)) {
// store peek as numeric
}
Can I just subtract '0'
or is there some Unicode specifics I should worry about?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果问题仅涉及
'9'
(或罗马数字之一)数字),只需减去
'0'
就是正确的解决方案。如果您关心
iswdigit
返回的任何内容然而,如果非零,问题可能要复杂得多。这
标准规定,如果
iswdigit
返回一个非零值,则参数是“十进制数字宽字符代码[在当前
local]”。这是模糊的,并由语言环境决定
准确定义其含义。在“C”语言环境或“Posix”中
locale,“Posix”标准,至少保证只有
罗马数字 0 到 9 被视为十进制数字(如果
我理解正确),所以如果你在“C”或“Posix”中
locale,只需减去“0”就可以了。
据推测,在 Unicode 语言环境中,这将是任何字符
它具有一般类别
Nd
。有许多这些。最安全的解决方案就是简单地创建一些东西
就像(这里的变量具有静态生命周期):
如果你这样做:
来自 Unicode 联盟的
UnicodeData.txt
文件("取消编码字符
数据库”—此页面包含指向 Unicode 数据的链接
文件以及其中使用的编码的说明),并
自动信息(例如,当有新版本时
Unicode)—该文件是为简单的编程而设计的
解析。
最后,请注意基于
ostringstream
和istringstream
(这包括boost::lexical_cast
)不会工作,因为流中使用的转换被定义为仅
使用罗马数字。 (另一方面,也可能是
将您的代码限制为仅使用罗马数字是合理的。在
在这种情况下,测试变为
if ( wch >= L'0' && wch <= L'9' )
,只需减去
L'0'
即可完成转换 —始终假设宽字符的本机编码
你的编译器中的常量是 Unicode (这种情况,我很漂亮
当然,VC++ 和 g++ 都适用)。或者只是确保区域设置是
“C”(或“Posix”,在 Unix 机器上)。
编辑:我忘了提:如果你正在做任何严肃的 Unicode 编程,你
应该查看ICU。处理统一码
正确是非常重要的,而且它们已经有很多功能
实施的。
If the question concerns just
'9'
(or one of the Romandigits), just subtracting
'0'
is the correct solution. Ifyou're concerned with anything for which
iswdigit
returnsnon-zero, however, the issue may be far more complex. The
standard says that
iswdigit
returns a non-zero value if itsargument is "a decimal digit wide-character code [in the current
local]". Which is vague, and leaves it up to the locale to
define exactly what is meant. In the "C" locale or the "Posix"
locale, the "Posix" standard, at least, guarantees that only the
Roman digits zero through nine are considered decimal digits (if
I understand it correctly), so if you're in the "C" or "Posix"
locale, just subtracting '0' should work.
Presumably, in a Unicode locale, this would be any character
which has the general category
Nd
. There are a number ofthese. The safest solution would be simply to create something
like (variables here with static lifetime):
If you go this way:
UnicodeData.txt
file from the Unicode consortium("Uncode Character
Database"—this page has a links to both the Unicode data
file and an explination of the encodings used in it), and
information automatically (e.g. when there is a new version of
Unicode)—the file is designed for simple programmatic
parsing.
Finally, note that solutions based on
ostringstream
andistringstream
(this includesboost::lexical_cast
) will notwork, since the conversions used in streams are defined to only
use the Roman digits. (On the other hand, it might be
reasonable to restrict your code to just the Roman digits. In
which case, the test becomes
if ( wch >= L'0' && wch <= L'9' )
,and the conversion is done by simply subtracting
L'0'
—always supposing the the native encoding of wide character
constants in your compiler is Unicode (the case, I'm pretty
sure, of both VC++ and g++). Or just ensure that the locale is
"C" (or "Posix", on a Unix machine).
EDIT: I forgot to mention: if you're doing any serious Unicode programming, you
should look into ICU. Handling Unicode
correctly is extremely non-trivial, and they've a lot of functionality already
implemented.
查看
atoi
函数类:http://msdn.microsoft.com/en-us/library/hc25t012(v=vs.71).aspx特别是
_wtoi(const wchar_t *string);
似乎成为您正在寻找的人。不过,您必须确保您的wchar_t
正确地以 null 终止,因此请尝试以下操作:Look into the
atoi
class of functions: http://msdn.microsoft.com/en-us/library/hc25t012(v=vs.71).aspxEspecially
_wtoi(const wchar_t *string);
seems to be what you're looking for. You would have to make sure yourwchar_t
is properly null terminated, though, so try something like this:您可以使用
boost::lexical_cast
< /a>:You could use
boost::lexical_cast
:尽管MSDN文档,一个简单的测试表明不仅游侠 L'0'-L'9' 返回 true。
这意味着 L'0' 减法可能不会像您预期的那样起作用。
Despite MSDN documentation, a simple test suggest that not only ranger L'0'-L'9' returns true.
That means that L'0' subtraction probably won't work as you may expected.
对于大多数用途,您只需减去“0”的代码即可。
然而,维基百科关于 Unicode 数字 的文章提到,十进制数字以 23 个单独的块表示(包括阿拉伯语两次)。
如果您不担心这一点,那么只需减去“0”的代码即可。
For most purposes you can just subtract the code for '0'.
However, the Wikipedia article on Unicode numerials mentions that the decimal digits are represented in 23 separate blocks (including twice in Arabic).
If you are not worried about that, then just subtract the code for '0'.