strtol 等规范中令人困惑的语言
strtol
的规范在概念上将输入字符串分为“初始空白”、“主题序列”和“最终字符串”,并将“主题序列”定义为:
输入字符串的最长初始子序列,以预期形式的第一个非空白字符开始。如果输入字符串为空或完全由空白字符组成,或者第一个非空白字符不是符号或允许的字母或数字,则主题序列不应包含任何字符。
有一次,我认为“最长初始子序列”业务类似于 scanf
的工作方式,其中 "0x@"
将扫描为 "0x",失败的匹配,后跟
"@"
作为下一个未读字符。然而,经过一番讨论,我基本上相信 strtol
处理预期形式的最长初始子序列,而不是最长初始字符串,它是预期形式的某些可能字符串的初始子序列。
仍然让我困惑的是规范中的这种语言:
如果主题序列为空或不具有预期的形式,则不执行转换; str的值存储在endptr指向的对象中,前提是endptr不是空指针。
如果我们接受“主题序列”的正确定义,则不存在不具有预期形式的非空主题序列之类的东西,相反(为了避免冗余和混乱)文本应该阅读:
如果主题序列为空,则不进行转换; str的值存储在endptr指向的对象中,前提是endptr不是空指针。
谁能为我澄清这些问题吗?也许过去讨论或任何相关缺陷报告的链接会很有用。
The specification for strtol
conceptually divides the input string into "initial whitespace", a "subject sequence", and a "final string", and defines the "subject sequence" as:
the longest initial subsequence of the input string, starting with the first non-white-space character that is of the expected form. The subject sequence shall contain no characters if the input string is empty or consists entirely of white-space characters, or if the first non-white-space character is other than a sign or a permissible letter or digit.
At one time I thought the "longest initial subsequence" business was akin to the way scanf
works, where "0x@"
would scan as "0x"
, a failed match, followed by "@"
as the next unread character. However, after some discussion, I'm mostly convinced that strtol
processes the longest initial subsequence that is of the expected form, not the longest initial string which is the initial subsequence of some possible string of the expected form.
What's still confusing me is this language in the specification:
If the subject sequence is empty or does not have the expected form, no conversion is performed; the value of str is stored in the object pointed to by endptr, provided that endptr is not a null pointer.
If we accept what seems to be the correct definition of "subject sequence", there is no such thing as a non-empty subject sequence that does not have the expected form, and instead (to avoid redundancy and confusion) the text should just read:
If the subject sequence is empty, no conversion is performed; the value of str is stored in the object pointed to by endptr, provided that endptr is not a null pointer.
Can anyone clarify these issues for me? Perhaps a link to past discussions or any relevant defect reports would be useful.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我觉得C99语言已经说的很清楚了:
给定
"0x@"
,"0x@"
不是预期的形式;“0x”
不是预期的形式;因此"0"
是预期形式的最长初始子序列。我同意这意味着您不能拥有不符合预期形式的非空主题序列 - 除非您解释以下内容:
...允许语言环境定义主题序列可能具有的其他可能形式,但它们不是“预期形式”。
最后一段的措辞似乎只是“腰带和括号”。
I think the C99 language is quite clear:
Given
"0x@"
,"0x@"
is not of the expected form;"0x"
is not of the expected form; therefore"0"
is the longest initial subsequence that is of the expected form.I agree that this implies that you cannot have a non-empty subject sequence that isn't of the expected form - unless you interpret the following:
...as allowing a locale to define other possible forms that the subject sequence might have, that are nonetheless not of "the expected form".
The wording in the final paragraph seems to be just "belt-and-braces".
如果您从 C99 标准的 §7.20.1.4(strtol、strtoll、strtoul 和 strtoull 函数)¶2(而不是 ¶4)开始,可能会更容易理解:
特别是,¶3 阐明了主题序列是什么。
It might be easier to understand if you started at §7.20.1.4 (The strtol, strtoll, strtoul, and strtoull functions) ¶2 of the C99 standard, instead of ¶4:
In particular, ¶3 clarifies what a subject sequence is.
strtol 的 POSIX 规范 似乎更清楚:
但当然,它不是规范性的,并且“遵循 ISO C 标准”。
The POSIX spec for strtol seems to be more clear:
But of course, it is not normative and "defers to the ISO C standard".
我完全同意您的评估:根据定义,所有非空主题序列都是预期的形式,因此标准的措辞是可疑的。
对于浮点转换函数,还有另一个错误(C99:TC3 第 7.20.1.3 节,§3):
这意味着整个输入字符串必须是预期的形式,这违背了
endptr
参数的目的。有人可能会说输入字符串的预期形式与主题序列的预期形式不同,但这仍然很令人困惑。您也正确地认为
strto*()
和*scanf()
系列函数的语义是不同的:如果两者匹配,它们将始终在值上达成一致,并且消耗相同数量的字符(以及任何不被破坏的 libc 实现,包括我上次检查时的 newlib 和 glibc),但*scanf()
另外无法匹配需要的情况回溯多个字符,如您的示例“0x@”
和“1.0e+”
。I completely agree with your assessment: By definition, all non-empty subject sequences are of expected form, so the wording of the standard is dubious.
In case of the floating point conversion functions, there's another blunder (C99:TC3 section 7.20.1.3, §3):
This implies that the whole input string must be of expected form, defeating the purpose of the
endptr
parameter. One could argue that the expected form for the input string is different from the expected form for the subject sequence, but it's still pretty confusing.You are also correct that the semantics of the
strto*()
and*scanf()
family of functions are different: If both match, they will always agree on the value and consume the same number of characters (and any libc implemetation where they do not is broken, including newlib and glibc last time I checked), but*scanf()
additionally fails to match cases where it would need to backtrack more than one character, as in your examples"0x@"
and"1.0e+"
.