grep 命令中的 [:space:] 不包含换行符和回车符吗?

发布于 2025-01-15 09:47:16 字数 756 浏览 5 评论 0原文

我目前正在编写一个简单的 Bash 脚本。这个想法是使用 grep 来查找某些文件中找到特定模式的行。该模式包含 3 个大写字母开头,后跟 6 个数字;因此正则表达式为 [AZ]{3}[0-9}{6}

但是,我只需要包含该模式未与其他字符串连接的行,或者换句话说,如果找到这样的模式,则必须用空格将其与其他字符串分隔开。

因此,如果与模式匹配的字符串是 ABC123456,则 something ABC123456 something 行应该没问题,但 somethingABC123456something 应该失败。

我使用 [:space:] 字符类扩展了我的正则表达式,如下所示:

[[:space:]][AZ]{3}[0-9}{6} [[:space:]]

这似乎有效,除非与模式匹配的字符串是行中的第一个或最后一个字符串。

因此,行 something ABC123456 some 将正确匹配;

ABC123456 some 行不会;

something ABC123456 行也不会。

我相信这与 [:space:] 不将新行和回车符计为空白字符有关,尽管根据我的理解应该如此。有人能发现我在这里做错了什么吗?

I'm curently writing a simple Bash script. The idea is to use grep to find the lines where a certain pattern is found, within some files. The pattern contains 3 capital letters at the start, followed by 6 digits; so the regex is [A-Z]{3}[0-9}{6}.

However, I need to only include the lines where this pattern is not concatenated with other strings, or in other words, if such a pattern is found, it has to be separated from other strings with spaces.

So if the string which matches the pattern is ABC123456 for example, the line something ABC123456 something should be fine, but somethingABC123456something should fail.

I've extended my regex using the [:space:] character class, like so:

[[:space:]][A-Z]{3}[0-9}{6}[[:space:]]

And this seems to work, except for when the string which matches the pattern is the first or last one in the line.

So, the line something ABC123456 something will match correctly;

The line ABC123456 something won't;

And the line something ABC123456 won't as well.

I believe this has something to do with [:space:] not counting new lines and carriage returns as whitespace characters, even though it should from my understanding. Could anyone spot if I'm doing something wrong here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

扛刀软妹 2025-01-22 09:47:16

解决问题的常见方法是标准化输入,以便每个单词之前和之后都有一个空格。

sed 's/^ //;s/$/ /' file |
grep -oE '[[:space:]][A-Z]{3}[0-9}{6}[[:space:]]'

您的问题假设换行符是 grep 所看到的一部分,但事实并非如此(或者至少不是 grep 通常实现的方式)。相反,它仅将每个新行的内容读取到内存缓冲区中,然后将正则表达式应用于该缓冲区。

一个类似但不同的解决方案是指定行首或空格,以及相应的空格或行尾:

grep -oE '(^|[[:space:]])[A-Z]{3}[0-9}{6}([[:space:]]|$)' file

但这可能不完全可移植。

您可能还想对结果进行后处理,以删除提取的字符串中的任何空格;但我已经不得不猜测一些关于您实际想要完成的事情,所以我就到此为止。

(当然,sed 可以做 grep 可以做的所有事情,而且还可以做一些,所以也许完全切换到 sed 或 Awk,而不是构建一个复杂的围绕 grep 的标准化管道。)

A common solution to your problem is to normalize the input so that there is a space before and after each word.

sed 's/^ //;s/$/ /' file |
grep -oE '[[:space:]][A-Z]{3}[0-9}{6}[[:space:]]'

Your question assumes that the newlines are part of what grep sees, but that is not true (or at least not how grep is commonly implemented). Instead, it reads just the contents of each new line into a memory buffer, and then applies the regular expression to that buffer.

A similar but different solution is to specify beginning of line or space, and correspondingly space or end of line:

grep -oE '(^|[[:space:]])[A-Z]{3}[0-9}{6}([[:space:]]|$)' file

but this might not be entirely portable.

You might want to postprocess the results to trim any spaces from the extracted strings, too; but I have already had to guess several things about what you are actually trying to accomplish, so I'll stop here.

(Of course, sed can do everything grep can do, and then some, so perhaps switch to sed or Awk entirely rather than build an elaborate normalization pipeline around grep.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文