如何使用awk提取带引号的字段?

发布于 2024-09-14 01:47:05 字数 124 浏览 4 评论 0原文

我用来

awk '{ printf "%s", $3 }'

从空格分隔的行中提取一些字段。当然,当字段引用内部有可用空间时,我会得到部分结果。有人可以提出解决方案吗?

I am using

awk '{ printf "%s", $3 }'

to extract some field from a space delimited line. Of course I get partial results when the field is quoted with free spaces inside. May any body suggest a solution please?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

野心澎湃 2024-09-21 01:47:05

下次显示您的输入文件和所需的输出。要获取引用字段,

$ cat file
field1 field2 "field 3" field4 "field5"

$ awk -F'"' '{for(i=2;i<=NF;i+=2) print $i}' file
field 3
field5

show your input file and desired output next time. To get quoted fields,

$ cat file
field1 field2 "field 3" field4 "field5"

$ awk -F'"' '{for(i=2;i<=NF;i+=2) print $i}' file
field 3
field5
梦萦几度 2024-09-21 01:47:05

这其实是相当困难的。我想出了以下 awk 脚本,该脚本手动分割行并将所有字段存储在数组中。

{
    s = $0
    i = 0
    split("", a)
    while ((m = match(s, /"[^"]*"/)) > 0) {
        # Add all unquoted fields before this field
        n = split(substr(s, 1, m - 1), t)
        for (j = 1; j <= n; j++)
            a[++i] = t[j]
        # Add this quoted field
        a[++i] = substr(s, RSTART + 1, RLENGTH - 2)
        s = substr(s, RSTART + RLENGTH)
        if (i >= 3) # We can stop once we have field 3
            break
    }
    # Process the remaining unquoted fields after the last quoted field
    n = split(s, t)
    for (j = 1; j <= n; j++)
        a[++i] = t[j]
    print a[3]
}

This is actually quite difficult. I came up with the following awk script that splits the line manually and stores all fields in an array.

{
    s = $0
    i = 0
    split("", a)
    while ((m = match(s, /"[^"]*"/)) > 0) {
        # Add all unquoted fields before this field
        n = split(substr(s, 1, m - 1), t)
        for (j = 1; j <= n; j++)
            a[++i] = t[j]
        # Add this quoted field
        a[++i] = substr(s, RSTART + 1, RLENGTH - 2)
        s = substr(s, RSTART + RLENGTH)
        if (i >= 3) # We can stop once we have field 3
            break
    }
    # Process the remaining unquoted fields after the last quoted field
    n = split(s, t)
    for (j = 1; j <= n; j++)
        a[++i] = t[j]
    print a[3]
}
你如我软肋 2024-09-21 01:47:05

这是解决此问题的可能的替代解决方案。它的工作原理是查找以引号开头或结尾的字段,然后将它们连接在一起。最后它会更新字段和 NF,因此如果您在进行合并的模式之后放置更多模式,则可以使用所有正常的 awk 功能来处理(新)字段。

我认为这仅使用 POSIX awk 的功能并且不依赖于 gawk 扩展,但我不完全确定。

# This function joins the fields $start to $stop together with FS, shifting
# subsequent fields down and updating NF.
#
function merge_fields(start, stop) {
    #printf "Merge fields $%d to $%d\n", start, stop;
    if (start >= stop)
        return;
    merged = "";
    for (i = start; i <= stop; i++) {
        if (merged)
            merged = merged OFS $i;
        else
            merged = $i;
    }
    $start = merged;

    offs = stop - start;
    for (i = start + 1; i <= NF; i++) {
        #printf "$%d = $%d\n", i, i+offs;
        $i = $(i + offs);
    }
    NF -= offs;
}

# Merge quoted fields together.
{
    start = stop = 0;
    for (i = 1; i <= NF; i++) {
        if (match($i, /^"/))
            start = i;
        if (match($i, /"$/))
            stop = i;
        if (start && stop && stop > start) {
            merge_fields(start, stop);
            # Start again from the beginning.
            i = 0;
            start = stop = 0;
        }
    }
}

# This rule executes after the one above. It sees the fields after merging.
{
    for (i = 1; i <= NF; i++) {
        printf "Field %d: >>>%s<<<\n", i, $i;
    }
}

在像这样的输入文件上:

thing "more things" "thing" "more things and stuff"

它会生成:

Field 1: >>>thing<<<
Field 2: >>>"more things"<<<
Field 3: >>>"thing"<<<
Field 4: >>>"more things and stuff"<<<

Here's a possible alternative solution to this problem. It works by finding the fields that begin or end with quotes, and then joining those together. At the end it updates the fields and NF, so if you put more patterns after the one that does the merging, you can process the (new) fields using all the normal awk features.

I think this uses only features of POSIX awk and doesn't rely on gawk extensions, but I'm not completely sure.

# This function joins the fields $start to $stop together with FS, shifting
# subsequent fields down and updating NF.
#
function merge_fields(start, stop) {
    #printf "Merge fields $%d to $%d\n", start, stop;
    if (start >= stop)
        return;
    merged = "";
    for (i = start; i <= stop; i++) {
        if (merged)
            merged = merged OFS $i;
        else
            merged = $i;
    }
    $start = merged;

    offs = stop - start;
    for (i = start + 1; i <= NF; i++) {
        #printf "$%d = $%d\n", i, i+offs;
        $i = $(i + offs);
    }
    NF -= offs;
}

# Merge quoted fields together.
{
    start = stop = 0;
    for (i = 1; i <= NF; i++) {
        if (match($i, /^"/))
            start = i;
        if (match($i, /"$/))
            stop = i;
        if (start && stop && stop > start) {
            merge_fields(start, stop);
            # Start again from the beginning.
            i = 0;
            start = stop = 0;
        }
    }
}

# This rule executes after the one above. It sees the fields after merging.
{
    for (i = 1; i <= NF; i++) {
        printf "Field %d: >>>%s<<<\n", i, $i;
    }
}

On an input file like:

thing "more things" "thing" "more things and stuff"

it produces:

Field 1: >>>thing<<<
Field 2: >>>"more things"<<<
Field 3: >>>"thing"<<<
Field 4: >>>"more things and stuff"<<<
扛起拖把扫天下 2024-09-21 01:47:05

如果您只是寻找特定领域,那么

$ cat file
field1 field2 "field 3" field4 "field5"

awk -F"\"" '{print $2}' file

就可以。它按“分割文件,因此上例中的第二个字段就是您想要的字段。

If you are just looking for a specific field then

$ cat file
field1 field2 "field 3" field4 "field5"

awk -F"\"" '{print $2}' file

works. It splits the file by ", so the 2nd field in the example above is the one you want.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文