scanf 的“正则表达式”是吗?支持标准吗?

发布于 2024-11-07 15:51:33 字数 228 浏览 0 评论 0原文

scanf 的“正则表达式”支持是标准吗?我在任何地方都找不到答案。

此代码在 gcc 中有效,但在 Visual Studio 中无效:

scanf("%[^\n]",a);

这是 Visual Studio 错误还是 gcc 扩展?

编辑:看起来 VS 可以工作,但必须考虑 Linux 和 Windows 之间行结束的差异。(\r\n)

Is scanf's "regex" support a standard? I can't find the answer anywhere.

This code works in gcc but not in Visual Studio:

scanf("%[^\n]",a);

It is a Visual Studio fault or a gcc extension ?

EDIT: Looks like VS works, but have to consider the difference in line ends between Linux and Windows.(\r\n)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

辞旧 2024-11-14 15:51:33

该特定格式字符串应该在一致的实现中正常工作。 [ 字符引入一个扫描集,用于匹配非空字符集(^ 表示扫描集是所提供字符的反转)。换句话说,格式说明符 %[^\n] 应匹配每个非换行符。

来自 C99 7.19.6.2,稍作解释:

[ 格式说明符匹配一组预期字符(扫描集)中的非空字符序列。如果没有 l 长度修饰符存在,则相应的参数应是一个指向字符数组的初始元素的指针,该元素足够大以接受序列和一个终止空字符,该字符将自动添加。 p>

如果存在l长度修饰符,则输入应是从初始移位状态开始的多字节字符序列。每个多字节字符都会转换为宽字符,就像调用 mbrtowc 函数一样,并且由初始化为零的 mbstate_t 对象描述的转换状态
在转换第一个多字节字符之前。相应的参数应是指向 wchar_t 数组的初始元素的指针,该数组足够大以接受序列和终止空宽字符,该字符将自动添加。

转换说明符包括格式字符串中的所有后续字符,直到并包括匹配的右括号 ]。括号之间的字符(扫描列表)组成扫描集,除非左括号后面的字符是扬抑符 ^,在这种情况下扫描集包含所有
未出现在扫描列表中扬抑符和右括号之间的字符。如果转换说明符以 [][^] 开头,则右括号字符位于扫描列表中,并且下一个右括号字符是结束该转换的匹配右括号规格;否则第一个右括号字符是结束规范的字符。如果 - 字符位于扫描列表中,并且不是第一个,也不是第二个(其中第一个字符是 ^),也不是最后一个字符,则该行为是实现定义的.

如果 MSVC 无法正常工作,这可能只是 Microsoft 不符合最新标准或认为自己更了解的众多示例之一:-)

That particular format string should work fine in a conforming implementation. The [ character introduces a scanset for matching a non-empty set of characters (with the ^ meaning that the scanset is an inversion of the characters supplied). In other words, the format specifier %[^\n] should match every character that's not a newline.

From C99 7.19.6.2, slightly paraphrased:

The [ format specifier matches a nonempty sequence of characters from a set of expected characters (the scanset). If no l length modifier is present, the corresponding argument shall be a pointer to the initial element of a character array large enough to accept the sequence and a terminating null character, which will be added automatically.

If an l length modifier is present, the input shall be a sequence of multibyte characters that begins in the initial shift state. Each multibyte character is converted to a wide character as if by a call to the mbrtowc function, with the conversion state described by an mbstate_t object initialized to zero
before the first multibyte character is converted. The corresponding argument shall be a pointer to the initial element of an array of wchar_t large enough to accept the sequence and the terminating null wide character, which will be added automatically.

The conversion specifier includes all subsequent characters in the format string, up to and including the matching right bracket ]. The characters between the brackets (the scanlist) compose the scanset, unless the character after the left bracket is a circumflex ^, in which case the scanset contains all
characters that do not appear in the scanlist between the circumflex and the right bracket. If the conversion specifier begins with [] or [^], the right bracket character is in the scanlist and the next following right bracket character is the matching right bracket that ends the specification; otherwise the first following right bracket character is the one that ends the specification. If a - character is in the scanlist and is not the first, nor the second where the first character is a ^, nor the last character, the behavior is implementation-defined.

It's possible, if MSVC isn't working correctly, that this is just one of the many examples where Microsoft either don't conform to the latest standard, or think they know better :-)

暖伴 2024-11-14 15:51:33

scanf()"%[" 格式规范是标准的,自 C90 以来一直如此。

MSVC 确实支持它。

您还可以在格式规范中提供字段宽度,以防止缓冲区溢出:

int main()
{
    char buf[9];

    scanf("%8[^\n]",buf);

    printf("%s\n", buf);
    printf("strlen(buf) == %u\n", strlen(buf));

    return 0;
}

另请注意,"%[" 格式规范并不意味着 scanf()支持正则表达式。该特定格式规范类似于正则表达式的功能(毫无疑问受到正则表达式的影响),但它比正则表达式受到更多限制。

The "%[" format spec for scanf() is standard and has been since C90.

MSVC does support it.

You can also provide a field width in the format spec to provide safety against buffer overruns:

int main()
{
    char buf[9];

    scanf("%8[^\n]",buf);

    printf("%s\n", buf);
    printf("strlen(buf) == %u\n", strlen(buf));

    return 0;
}

Also note that the "%[" format spec doesn't mean that scanf() supports regular expressions. That particular format spec is similar to a capability of regexs (and no doubt was an influenced by regex), but it's far more limited than regular expressions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文