如何使用 grep、正则表达式或 perl 提取符合模式的字符串

发布于 2024-10-18 23:17:07 字数 625 浏览 5 评论 0原文

我有一个看起来像这样的文件：

    <table name="content_analyzer" primary-key="id">
      <type="global" />
    </table>
    <table name="content_analyzer2" primary-key="id">
      <type="global" />
    </table>
    <table name="content_analyzer_items" primary-key="id">
      <type="global" />
    </table>

我需要提取 name= 后面的引号内的任何内容，即 content_analyzer、content_analyzer2 和content_analyzer_items。

我在 Linux 机器上执行此操作，因此使用 sed、perl、grep 或 bash 的解决方案就可以了。

原文

I have a file that looks something like this:

    <table name="content_analyzer" primary-key="id">
      <type="global" />
    </table>
    <table name="content_analyzer2" primary-key="id">
      <type="global" />
    </table>
    <table name="content_analyzer_items" primary-key="id">
      <type="global" />
    </table>

I need to extract anything within the quotes that follow name=, i.e., content_analyzer, content_analyzer2 and content_analyzer_items.

I am doing this on a Linux box, so a solution using sed, perl, grep or bash is fine.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

☆獨立☆ 2024-10-25 23:17:07

由于您需要匹配内容而不将其包含在结果中（必须
匹配 name=" 但它不是所需结果的一部分）某种形式
需要零宽度匹配或组捕获。这可以做到
使用以下工具可以轻松实现：

Perl

使用 Perl，您可以使用 n 选项逐行循环并打印
捕获组的内容（如果匹配）：

perl -ne 'print "$1\n" if /name="(.*?)"/' filename

GNU grep

如果您有 grep 的改进版本，例如 GNU grep，您可能有
-P 选项可用。此选项将启用类似 Perl 的正则表达式，
允许您使用 \K 这是一个简写的lookbehind。它将重置
匹配位置，因此它之前的任何内容都是零宽度。

grep -Po 'name="\K.*?(?=")' filename

o 选项使 grep 只打印匹配的文本，而不是
全线。

Vim - 文本编辑器

另一种方法是直接使用文本编辑器。 Vim 是其中之一
实现此目的的各种方法是删除行而不
name= 然后从结果行中提取内容：

:v/.*name="\v([^"]+).*/d|%s//\1

Standard grep

如果由于某种原因您无权使用这些工具，
使用标准 grep 可以实现类似的效果。然而，没有外观
稍后需要对其进行一些清理：

grep -o 'name="[^"]*"' filename

关于保存结果的说明

在上面的所有命令中，结果都将发送到 stdout。它是
重要的是要记住，您始终可以通过管道将它们保存到
通过将:

> result

附加到命令末尾来创建文件。

Since you need to match content without including it in the result (must
match name=" but it's not part of the desired result) some form of
zero-width matching or group capturing is required. This can be done
easily with the following tools:

Perl

With Perl you could use the n option to loop line by line and print
the content of a capturing group if it matches:

perl -ne 'print "$1\n" if /name="(.*?)"/' filename

GNU grep

If you have an improved version of grep, such as GNU grep, you may have
the -P option available. This option will enable Perl-like regex,
allowing you to use \K which is a shorthand lookbehind. It will reset
the match position, so anything before it is zero-width.

grep -Po 'name="\K.*?(?=")' filename

The o option makes grep print only the matched text, instead of the
whole line.

Vim - Text Editor

Another way is to use a text editor directly. With Vim, one of the
various ways of accomplishing this would be to delete lines without
name= and then extract the content from the resulting lines:

:v/.*name="\v([^"]+).*/d|%s//\1

Standard grep

If you don't have access to these tools, for some reason, something
similar could be achieved with standard grep. However, without the look
around it will require some cleanup later:

grep -o 'name="[^"]*"' filename

A note about saving results

In all of the commands above the results will be sent to stdout. It's
important to remember that you can always save them by piping it to a
file by appending:

> result

to the end of the command.

回复收藏 0 原文

过潦 2024-10-25 23:17:07

正则表达式为：

.+name="([^"]+)"

那么分组将在 \1 中

The regular expression would be:

.+name="([^"]+)"

Then the grouping would be in the \1

回复收藏 0 原文

别再吹冷风 2024-10-25 23:17:07

如果您使用 Perl，请下载一个模块来解析 XML： XML::简单，XML::Twig ，或XML::LibXML。不要重新发明轮子。

回复收藏 0 原文

懵少女 2024-10-25 23:17:07

为此，应使用 HTML 解析器而不是正则表达式。使用 HTML::TreeBuilder：

程序

#!/usr/bin/env perl

use strict;
use warnings;

use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new_from_file( \*DATA );
my @elements = $tree->look_down(
    sub { defined $_[0]->attr('name') }
);

for (@elements) {
    print $_->attr('name'), "\n";
}

__DATA__
<table name="content_analyzer" primary-key="id">
  <type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
  <type="global" />
</table>
<table name="content_analyzer_items" primary-key="id">
  <type="global" />
</table>

输出

content_analyzer
content_analyzer2
content_analyzer_items

An HTML parser should be used for this purpose rather than regular expressions. A Perl program that makes use of HTML::TreeBuilder:

Program

#!/usr/bin/env perl

use strict;
use warnings;

use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new_from_file( \*DATA );
my @elements = $tree->look_down(
    sub { defined $_[0]->attr('name') }
);

for (@elements) {
    print $_->attr('name'), "\n";
}

__DATA__
<table name="content_analyzer" primary-key="id">
  <type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
  <type="global" />
</table>
<table name="content_analyzer_items" primary-key="id">
  <type="global" />
</table>

Output

content_analyzer
content_analyzer2
content_analyzer_items

回复收藏 0 原文

新雨望断虹 2024-10-25 23:17:07

这可以做到：

perl -ne 'if(m/name="(.*?)"/){ print $1 . "\n"; }'

this could do it:

perl -ne 'if(m/name="(.*?)"/){ print $1 . "\n"; }'

回复收藏 0 原文

云醉月微眠 2024-10-25 23:17:07

这是使用 HTML tidy & 的解决方案xmlstarlet：

htmlstr='
<table name="content_analyzer" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer_items" primary-key="id">
<type="global" />
</table>
'

echo "$htmlstr" | tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null |
sed '/type="global"/d' |
xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -T -t -m "//x:table" -v '@name' -n

Here's a solution using HTML tidy & xmlstarlet:

htmlstr='
<table name="content_analyzer" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer_items" primary-key="id">
<type="global" />
</table>
'

echo "$htmlstr" | tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null |
sed '/type="global"/d' |
xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -T -t -m "//x:table" -v '@name' -n

回复收藏 0 原文

瞄了个咪的 2024-10-25 23:17:07

哎呀，sed 命令当然必须在 tidy 命令之前：

echo "$htmlstr" | 
sed '/type="global"/d' |
tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null |
xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -T -t -m "//x:table" -v '@name' -n

Oops, the sed command has to precede the tidy command of course:

echo "$htmlstr" | 
sed '/type="global"/d' |
tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null |
xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -T -t -m "//x:table" -v '@name' -n

回复收藏 0 原文

春夜浅 2024-10-25 23:17:07

如果 xml（或一般文本）的结构是固定的，最简单的方法是使用 cut。对于您的具体情况：

echo '<table name="content_analyzer" primary-key="id">
  <type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
  <type="global" />
</table>
<table name="content_analyzer_items" primary-key="id">
  <type="global" />
</table>' | grep name= | cut -f2 -d '"'

If the structure of your xml (or text in general) is fixed, the easiest way is using cut. For your specific case:

echo '<table name="content_analyzer" primary-key="id">
  <type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
  <type="global" />
</table>
<table name="content_analyzer_items" primary-key="id">
  <type="global" />
</table>' | grep name= | cut -f2 -d '"'

回复收藏 0 原文

~没有更多了~

关于作者

若能看破又如何

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

如何使用 grep、正则表达式或 perl 提取符合模式的字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

Perl

GNU grep

Vim - 文本编辑器

Standard grep

关于保存结果的说明

Perl

GNU grep

Vim - Text Editor

Standard grep

A note about saving results

程序

输出

Program

Output

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

如何使用 grep、正则表达式或 perl 提取符合模式的字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

Perl

GNU grep

Vim - 文本编辑器

Standard grep

关于保存结果的说明

Perl

GNU grep

Vim - Text Editor

Standard grep

A note about saving results

程序

输出

Program

Output

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。