如何使用 grep、正则表达式或 perl 提取符合模式的字符串
我有一个看起来像这样的文件:
<table name="content_analyzer" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer_items" primary-key="id">
<type="global" />
</table>
我需要提取 name=
后面的引号内的任何内容,即 content_analyzer
、content_analyzer2
和content_analyzer_items
。
我在 Linux 机器上执行此操作,因此使用 sed、perl、grep 或 bash 的解决方案就可以了。
I have a file that looks something like this:
<table name="content_analyzer" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer_items" primary-key="id">
<type="global" />
</table>
I need to extract anything within the quotes that follow name=
, i.e., content_analyzer
, content_analyzer2
and content_analyzer_items
.
I am doing this on a Linux box, so a solution using sed, perl, grep or bash is fine.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
由于您需要匹配内容而不将其包含在结果中(必须
匹配
name="
但它不是所需结果的一部分)某种形式需要零宽度匹配或组捕获。这可以做到
使用以下工具可以轻松实现:
Perl
使用 Perl,您可以使用
n
选项逐行循环并打印捕获组的内容(如果匹配):
GNU grep
如果您有 grep 的改进版本,例如 GNU grep,您可能有
-P
选项可用。此选项将启用类似 Perl 的正则表达式,允许您使用
\K
这是一个简写的lookbehind。它将重置匹配位置,因此它之前的任何内容都是零宽度。
o
选项使 grep 只打印匹配的文本,而不是全线。
Vim - 文本编辑器
另一种方法是直接使用文本编辑器。 Vim 是其中之一
实现此目的的各种方法是删除行而不
name=
然后从结果行中提取内容:Standard grep
如果由于某种原因您无权使用这些工具,
使用标准 grep 可以实现类似的效果。然而,没有外观
稍后需要对其进行一些清理:
关于保存结果的说明
在上面的所有命令中,结果都将发送到
stdout
。它是重要的是要记住,您始终可以通过管道将它们保存到
通过将:
附加到命令末尾来创建文件。
Since you need to match content without including it in the result (must
match
name="
but it's not part of the desired result) some form ofzero-width matching or group capturing is required. This can be done
easily with the following tools:
Perl
With Perl you could use the
n
option to loop line by line and printthe content of a capturing group if it matches:
GNU grep
If you have an improved version of grep, such as GNU grep, you may have
the
-P
option available. This option will enable Perl-like regex,allowing you to use
\K
which is a shorthand lookbehind. It will resetthe match position, so anything before it is zero-width.
The
o
option makes grep print only the matched text, instead of thewhole line.
Vim - Text Editor
Another way is to use a text editor directly. With Vim, one of the
various ways of accomplishing this would be to delete lines without
name=
and then extract the content from the resulting lines:Standard grep
If you don't have access to these tools, for some reason, something
similar could be achieved with standard grep. However, without the look
around it will require some cleanup later:
A note about saving results
In all of the commands above the results will be sent to
stdout
. It'simportant to remember that you can always save them by piping it to a
file by appending:
to the end of the command.
正则表达式为:
那么分组将在 \1 中
The regular expression would be:
Then the grouping would be in the \1
如果您使用 Perl,请下载一个模块来解析 XML: XML::简单,XML::Twig ,或XML::LibXML。不要重新发明轮子。
If you're using Perl, download a module to parse the XML: XML::Simple, XML::Twig, or XML::LibXML. Don't re-invent the wheel.
为此,应使用 HTML 解析器而不是正则表达式。使用
HTML::TreeBuilder
:程序
输出
An HTML parser should be used for this purpose rather than regular expressions. A Perl program that makes use of
HTML::TreeBuilder
:Program
Output
这可以做到:
this could do it:
这是使用 HTML tidy & 的解决方案xmlstarlet:
Here's a solution using HTML tidy & xmlstarlet:
哎呀,sed 命令当然必须在 tidy 命令之前:
Oops, the sed command has to precede the tidy command of course:
如果 xml(或一般文本)的结构是固定的,最简单的方法是使用
cut
。对于您的具体情况:If the structure of your xml (or text in general) is fixed, the easiest way is using
cut
. For your specific case: