需要用于多行搜索的正则表达式（grep）

发布于 2024-09-19 03:51:38 字数 396 浏览 5 评论 0原文

我正在运行 grep 来查找任何包含单词 select 、后跟单词 customerName 和单词 的 *.sql 文件>来自。此 select 语句可以跨越多行，并且可以包含制表符和换行符。

我尝试了以下几种变体：

$ grep -liIr --include="*.sql" --exclude-dir="\.svn*" --regexp="select[a-zA-Z0-
9+\n\r]*customerName[a-zA-Z0-9+\n\r]*from"

然而，这只是永远运行。谁能帮我正确的语法吗？

原文

I'm running a grep to find any *.sql file that has the word select followed by the word customerName followed by the word from. This select statement can span many lines and can contain tabs and newlines.

I've tried a few variations on the following:

$ grep -liIr --include="*.sql" --exclude-dir="\.svn*" --regexp="select[a-zA-Z0-
9+\n\r]*customerName[a-zA-Z0-9+\n\r]*from"

This, however, just runs forever. Can anyone help me with the correct syntax please?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦里南柯 2024-09-26 03:51:45

您的根本问题是 grep 一次只运行一行 - 因此它无法找到跨行的 SELECT 语句。

您的第二个问题是您使用的正则表达式不能处理 SELECT 和 FROM 之间可能出现的内容的复杂性 - 特别是，它省略了逗号、句号（句号）和空格，但也省略了引号和任何可以在其中的内容带引号的字符串。

我可能会采用基于 Perl 的解决方案，让 Perl 一次读取“段落”并对其应用正则表达式。缺点是必须处理递归搜索 - 当然，有一些模块可以做到这一点，包括核心模块文件::查找。

简而言之，对于单个文件：

$/ = "\n\n";    # Paragraphs

while (<>)
{
     if ($_ =~ m/SELECT.*customerName.*FROM/mi)
     {
         printf file name
         go to next file
     }
}

需要将其包装到一个子文件中，然后由 File::Find 的方法调用。

Your fundamental problem is that grep works one line at a time - so it cannot find a SELECT statement spread across lines.

Your second problem is that the regex you are using doesn't deal with the complexity of what can appear between SELECT and FROM - in particular, it omits commas, full stops (periods) and blanks, but also quotes and anything that can be inside a quoted string.

I would likely go with a Perl-based solution, having Perl read 'paragraphs' at a time and applying a regex to that. The downside is having to deal with the recursive search - there are modules to do that, of course, including the core module File::Find.

In outline, for a single file:

$/ = "\n\n";    # Paragraphs

while (<>)
{
     if ($_ =~ m/SELECT.*customerName.*FROM/mi)
     {
         printf file name
         go to next file
     }
}

That needs to be wrapped into a sub that is then invoked by the methods of File::Find.

回复收藏 0 原文

≈。彩虹 2024-09-26 03:51:44

我不太擅长 grep。但是您的问题可以使用 AWK 命令解决。
只要看到

awk '/select/,/from/' *.sql

上面的代码将从第一次出现 select 到第一个 from 序列产生。现在您需要验证返回的语句是否具有 customername 。为此，您可以通过管道传输结果。并且可以再次使用 awk 或 grep。

I am not very good in grep. But your problem can be solved using AWK command.
Just see

awk '/select/,/from/' *.sql

The above code will result from first occurence of select till first sequence of from. Now you need to verify whether returned statements are having customername or not. For this you can pipe the result. And can use awk or grep again.

回复收藏 0 原文

野の 2024-09-26 03:51:42

无需安装 grep 变体 pcregrep，您就可以使用 grep 进行多行搜索。

$ grep -Pzo "(?s)^(\s*)\N*main.*?{.*?^\1}" *.c

说明：

-P 激活 grep 的 perl-regexp （一个强大的扩展）正则表达式）

-z 将输入视为一组行，每行以零字节（ASCII NUL 字符）而不是换行符结尾。也就是说，grep 知道行的结尾在哪里，但将输入视为一大行。请注意，如果与 -o 一起使用，这还会添加尾随 NUL 字符，请参阅注释。

-o 仅打印匹配的内容。因为我们使用 -z，整个文件就像一个大行，所以如果有匹配，整个文件将被打印；这样就不会那样做了。

在正则表达式中：

(?s) 激活 PCRE_DOTALL，这意味着 . 查找任何字符或换行符

\N 查找除了换行符之外的任何内容，即使 PCRE_DOTALL 激活

.*? 以非贪婪模式查找 .，即尽快停止。

行 \1 反向引用的开头

^ 找到对第一组 (\s*) 的。这是一种尝试寻找相同缩进的方法。

正如您可以想象的，此搜索会打印 C (*.c) 源文件中的 main 方法。

Without the need to install the grep variant pcregrep, you can do a multiline search with grep.

$ grep -Pzo "(?s)^(\s*)\N*main.*?{.*?^\1}" *.c

Explanation:

-P activate perl-regexp for grep (a powerful extension of regular expressions)

-z Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline. That is, grep knows where the ends of the lines are, but sees the input as one big line. Beware this also adds a trailing NUL char if used with -o, see comments.

-o print only matching. Because we're using -z, the whole file is like a single big line, so if there is a match, the entire file would be printed; this way it won't do that.

In regexp:

(?s) activate PCRE_DOTALL, which means that . finds any character or newline

\N find anything except newline, even with PCRE_DOTALL activated

.*? find . in non-greedy mode, that is, stops as soon as possible.

^ find start of line

\1 backreference to the first group (\s*). This is a try to find the same indentation of method.

As you can imagine, this search prints the main method in a C (*.c) source file.

回复收藏 0 原文

~没有更多了~