isdigit 在 C 中可以合法地依赖于区域设置吗

发布于 2024-09-03 06:30:02 字数 692 浏览 2 评论 0 原文

在涉及 setlocale 的部分中,ANSI C 标准在脚注中指出,其行为不受当前语言环境影响的唯一 ctype.h 函数是 isdigit 和 isxdigit。

isdigit 的 Microsoft 实现依赖于区域设置,因为例如,在使用代码页 1250 的区域设置中,isdigit 仅对 0x30 ('0') - 0x39 ('9') 范围内的字符返回非零,而在使用代码页的区域设置中第 1252 章

Microsoft 使 isdigit 区域设置依赖,是否违反了 C 标准?

在这个问题中,我主要对微软声称符合的 C90 感兴趣,而不是 C99。

其他背景:

Microsoft 自己的 setlocale 文档错误地指出 isdigit 不受区域设置的 LC_CTYPE 部分的影响。

C 标准中涉及 ctype.h 函数的部分包含一些我认为不明确的措辞:

这些函数的行为受当前区域设置的影响。那些功能 仅当不在“C”语言环境中时才具有特定于语言环境的方面,如下所述。

我认为这是不明确的,因为不清楚它试图对诸如 isdigit 之类的函数说些什么,因为这些函数没有关于特定于语言环境的方面的注释。它可能试图说必须假设此类函数与区域设置相关,在这种情况下,Microsoft 的 isdigit 实现就可以了。 (除了我之前提到的脚注似乎与这种解释相矛盾。)

In the section covering setlocale, the ANSI C standard states in a footnote that the only ctype.h functions whose behaviour is not affected by the current locale are isdigit and isxdigit.

The Microsoft implementation of isdigit is locale dependent because, for example, in locales using code page 1250 isdigit only returns non-zero for characters in the range 0x30 ('0') - 0x39 ('9'), whereas in locales using code page 1252 isdigit also returns non-zero for the superscript digits 0xB2 ('²'), 0xB3 ('³') and 0xB9 ('¹').

Is Microsoft in violation of the C standard by making isdigit locale dependent?

In this question I am primarily interested in C90, which Microsoft claims to conform to, rather than C99.

Additional background:

Microsoft's own documentation of setlocale incorrectly states that isdigit is unaffected by the LC_CTYPE part of the locale.

The section of the C standard that covers the ctype.h functions contains some wording that I consider ambiguous:

The behavior of these functions is affected by the current locale. Those functions that
have locale-specific aspects only when not in the "C" locale are noted below.

I consider this ambiguous because it is unclear what it is trying to say about functions such as isdigit for which there are no notes about locale-specific aspects. It might be trying to say that such functions must be assumed to be locale dependent, in which case Microsoft's implementation of isdigit would be OK. (Except that the footnote I mentioned earlier seems to contradict this interpretation.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

奈何桥上唱咆哮 2024-09-10 06:30:02
  1. 微软永远是对的。
  2. 如果 Microsoft 不正确,请参阅第 1 条

Microsoft 始终对该规范有自己的解释。通常,“但是微软错了”这句话对你的首席执行官来说没有任何影响力,所以你必须围绕微软的错误/解释进行编码。

支持 IE 和 Outlook 错误行为的代码量是惊人的。

在许多情况下,唯一的解决方案是推出自己的函数版本来执行正确的操作,并执行如下操作:

int my_isdigit( int c )
{
#ifdef WIN32
  your implementation goes here
#else
  return isdigit( c );
#endif
}
  1. Microsoft is always right.
  2. If Microsoft is not right see Item 1

Microsoft always has its own interpretation of the spec. And usually the sentence “but Microsoft is wrong” does not carry any weight with your CEO, so you have to code around MS bugs/interpretations.

The amount of code to support incorrect behavior of IE and Outlook is staggering.

In many cases, the only solution is to roll your own version of the function that does the right thing and do something like this:

int my_isdigit( int c )
{
#ifdef WIN32
  your implementation goes here
#else
  return isdigit( c );
#endif
}
森林迷了鹿 2024-09-10 06:30:02

所需的字符集在 2.2.1 节中定义。然后第 2.2.1.2 节继续描述扩展字符的行为:

  • 应出现 $2.2.1 中定义的单字节字符。
  • 任何其他成员的存在、含义和表示都是特定于区域设置的。

The required character set is defined in section 2.2.1. Section 2.2.1.2 then goes on to describe the behavior of extension characters:

  • The single-byte characters defined in $2.2.1 shall be present.
  • The presence, meaning, and representation of any additional members is locale-specific.
怪我太投入 2024-09-10 06:30:02

对于所有版本的 C 标准,答案都是相同的,但在这里我将使用 N3054,C23 的草案。

7.4.1.5中对isdigit的描述非常简单:

isdigit 函数测试任何十进制数字字符(如定义
5.2.1)。

所以我们需要查看 5.2.1 来了解什么是十进制数字字符。确切的短语“十进制数字字符”并没有出现在那里,但我们确实得到了基本字符集中所需字符的描述,其中包括“十进制数字”,后面跟着一个明确的列出从 0 到 9 的数字。这肯定是我们寻求的定义,因为没有其他可用的候选。

这明确表明 isdigit 函数只测试这 10 个字符,而不测试其他字符。特别是,它不能是特定于区域设置的。

顺便说一句,通过类似的推理,isxdigit 函数也不是特定于区域设置的。

The answer is the same for all versions of the C standard, but here I will be using N3054, a draft for C23.

The description of isdigit, in 7.4.1.5, is very simple:

The isdigit function tests for any decimal-digit character (as defined
in 5.2.1).

So we need to look at 5.2.1 to see what a decimal-digit character is. The exact phrase "decimal-digit character" does not appear there, but we do get a description of characters required to be in the basic character sets, which includes "the 10 decimal digits" follows by an explicit listing of the digits from 0 to 9. This is surely the definition we seek, since there is no other candidate available.

This unambiguously indicates that the isdigit function tests for precisely those 10 characters, and none others. In particular, it cannot be locale-specific.

Incidentally, by similar reasoning, the isxdigit function is also not locale-specific.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文