当前位置：文江博客话题详情

regex bbedit

Grep 中有真正通用的通配符吗？

发布于 2024-08-14 18:34:28 字数 196 浏览 13 评论 0 原文

这是非常基本的问题。所以我被告知是一个点。匹配除换行符之外的任何字符。我正在寻找与任何字符匹配的内容，包括换行符。

我想要做的就是捕获网站页面中两个特定字符串之间的所有文本，去掉页眉和页脚。像 HEADER TEXT(.+)FOOTER TEXT 这样的东西，然后提取括号中的内容，但我找不到一种方法来包含页眉和页脚之间的所有文本和换行符，这有意义吗？提前致谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

·深蓝 2024-08-21 18:34:29

当我需要匹配多个字符（包括换行符）时，我会这样做：

[\s\S]*?

注意我使用的是非贪婪模式

When I need to match several characters, including line breaks, I do:

[\s\S]*?

Note I'm using a non-greedy pattern

回复收藏 0 原文

够钟 2024-08-21 18:34:29

您可以使用 Perl 做到这一点：

$ perl -ne 'print if /HEADER TEXT/ .. /FOOTER TEXT/' file.html

要仅打印分隔符之间的文本，请使用

$ perl -000 -lne 'print $1 while /HEADER TEXT(.+?)FOOTER TEXT/sg' file.html

/s 开关使正则表达式匹配器将整个字符串视为单行，这意味着点匹配换行符，而 /g 意味着匹配尽可能多的次数。

上面的示例假设您正在处理本地磁盘上的 HTML 文件。如果您需要先获取它们，请使用 get rel="nofollow noreferrer">LWP::Simple：

$ perl -MLWP::Simple -le '$_ = get "http://stackoverflow.com";
                          print $1 while m!<head>(.+?)</head>!sg'

请注意，使用上述正则表达式解析 HTML在一般情况下不起作用！如果您正在开发一个快速而肮脏的扫描器，很好，但是对于需要更健壮的应用程序，请使用真正的解析器。

You could do it with Perl:

$ perl -ne 'print if /HEADER TEXT/ .. /FOOTER TEXT/' file.html

To print only the text between the delimiters, use

$ perl -000 -lne 'print $1 while /HEADER TEXT(.+?)FOOTER TEXT/sg' file.html

The /s switch makes the regular expression matcher treat the entire string as a single line, which means dot matches newlines, and /g means match as many times as possible.

The examples above assume you're cranking on HTML files on the local disk. If you need to fetch them first, use get from LWP::Simple:

$ perl -MLWP::Simple -le '$_ = get "http://stackoverflow.com";
                          print $1 while m!<head>(.+?)</head>!sg'

Please note that parsing HTML with regular expressions as above does not work in the general case! If you're working on a quick-and-dirty scanner, fine, but for an application that needs to be more robust, use a real parser.

回复收藏 0 原文

兔小萌 2024-08-21 18:34:29

根据定义，grep 查找匹配的行；它读取一行，查看是否匹配，然后打印该行。

一种可能的方法是使用 sed 来完成您想要的操作：

sed -n '/HEADER TEXT/,/FOOTER TEXT/p' "$@"

这会从与“HEADER TEXT”匹配的第一行打印到与“FOOTER TEXT”匹配的第一行，然后进行迭代； “-n”停止默认的“打印每行”操作。如果页眉和页脚文本出现在同一行，则此方法效果不佳。

为了完成您想要的操作，我可能会使用 perl （但如果您愿意，也可以使用 Python）。我会考虑读取整个文件，然后使用适当限定的正则表达式来查找文件的匹配部分。然而，“@gbacon”给出的 Perl 一行代码几乎是上面“sed”脚本的 Perl 精确音译，并且比 slurp 更简洁。

By definition, grep looks for lines which match; it reads a line, sees whether it matches, and prints the line.

One possible way to do what you want is with sed:

sed -n '/HEADER TEXT/,/FOOTER TEXT/p' "$@"

This prints from the first line that matches 'HEADER TEXT' to the first line that matches 'FOOTER TEXT', and then iterates; the '-n' stops the default 'print each line' operation. This won't work well if the header and footer text appear on the same line.

To do what you want, I'd probably use perl (but you could use Python if you prefer). I'd consider slurping the whole file, and then use a suitably qualified regex to find the matching portions of the file. However, the Perl one-liner given by '@gbacon' is an almost exact transliteration into Perl of the 'sed' script above and is neater than slurping.

回复收藏 0 原文

能怎样 2024-08-21 18:34:29

grep 的手册页显示：

grep、egrep、fgrep、rgrep - 打印与模式匹配的行

grep 不适用于匹配多行。您应该尝试使用 perl 或 awk 来解决此任务。

回复收藏 0 原文

南城追梦 2024-08-21 18:34:29

匹配换行符

由于它被标记为“bbedit”并且 BBedit 支持 Perl 样式模式修饰符，因此您可以允许点与开关 (?s) (?s)

。将匹配任何字符。是的，
(?s).+
将匹配整个文本。

回复收藏 0 原文

滴情不沾 2024-08-21 18:34:29

正如其他地方所指出的，grep 适用于单行内容。

对于多行（在 ruby 中使用 Regexp::MULTILINE，或者在 python、awk、sed 等中），“\s”也应该捕获换行符，所以

HEADER TEXT(.*\s*)FOOTER TEXT

可能会工作...

As pointed elsewhere, grep will work for single line stuff.

For multiple-lines (in ruby with Regexp::MULTILINE, or in python, awk, sed, whatever), "\s" should also capture line breaks, so

HEADER TEXT(.*\s*)FOOTER TEXT

might work ...

回复收藏 0 原文

氛圍 2024-08-21 18:34:29

如果你有的话，这是使用 gawk 的一种方法

awk -vRS="FOOTER" '/HEADER/{gsub(/.*HEADER/,"");print}' file

here's one way to do it with gawk, if you have it

awk -vRS="FOOTER" '/HEADER/{gsub(/.*HEADER/,"");print}' file

回复收藏 0 原文

~没有更多了~

关于作者

花海

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

Grep 中有真正通用的通配符吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

我早已燃尽

就像说晚安

donghfcn

脱单之前绝不改名′

凡尘雨

鲜血染红嫁衣

友情链接

Grep 中有真正通用的通配符吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

我早已燃尽

就像说晚安

donghfcn

脱单之前绝不改名′

凡尘雨

鲜血染红嫁衣

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。