补充图案

发布于 2024-12-24 19:17:38 字数 699 浏览 1 评论 0 原文

我的文件中有这些类型的记录:

1867    121 2 56 
1868    121 1 6 
1868    121 2 65 
1868    122 0 53 
1869    121 0 41 
1869    121 1 41 
1871    121 1 13 
1871    121 2 194

我想得到以下输出:

1867    121 2 56 
1868    121 1 6 
1868    121 2 65 
1868    122 0 53 
1869    121 0 41 
1869    121 1 41 
1870    121 0 0
1871    121 1 13 
1871    121 2 194

区别在于 1870 121 0 0 行。

因此,如果第一列中的数字之间的差值大于 1,那么我们必须包含缺少数字的行(上面的情况为 1870)和其他列。人们应该以某种方式获取其他列,让第二列成为该列数字的可能值的最小值(在示例中,这些值可能是 121122< /code>),与第三列情况相同。最后一列的值始终为零。

有人能给我建议吗?提前致谢!

我正在尝试用 awk 解决它,但也许还有其他更好或更实用的解决方案......

I have these kind of records in a file:

1867    121 2 56 
1868    121 1 6 
1868    121 2 65 
1868    122 0 53 
1869    121 0 41 
1869    121 1 41 
1871    121 1 13 
1871    121 2 194

I would like to get this output:

1867    121 2 56 
1868    121 1 6 
1868    121 2 65 
1868    122 0 53 
1869    121 0 41 
1869    121 1 41 
1870    121 0 0
1871    121 1 13 
1871    121 2 194

The difference is the 1870 121 0 0 row.

So, if the difference between the numbers in the first column is greater than 1, then we have to include a line with the missing number (the above case it is 1870) and the other columns. One should get the other columns in a way, that let the second column be the minimum of the possible values of the numbers of the column (in the example these values might be 121 or 122), and for the same as in the third column case. The value of the last column let be always zero.

Can anybody suggest me something? Thanks in advance!

I am trying to solve it with awk, but maybe there is (are) other nicer or more practical solution(s) for this...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

指尖上的星空 2024-12-31 19:17:38

像这样的东西可以工作 -

awk 'BEGIN{getline;a=$1;b=$2;c=$3}
NR==FNR{if (b>$2) b=$2; if (c>$3) c=$3;next} 
{if ($1-a>1) {x=($1-a); for (i=1;i<x;i++) {print (a+1)"\t"b,c,"0";a++};a=$1} else a=$1;print}' file file

解释:

  1. BEGIN{getline;a=$1;b=$2;c=$3} -

    在这个BEGIN块中,我们读取第一行并将第1列中的值分配给变量a第 2 列 到变量 b第 3 列 到变量 c

  2. NR==FNR{if (b>$2) b=$2; if (c>$3) c=$3;next} -

    在此,我们扫描整个文件 (NR==FNR) 并跟踪第 2 列第 3 列中的最低可能值 code> 并将它们分别存储在变量 bc 中。我们使用 next 来避免运行第二个 pattern{action} 语句。

  3. {if ($1-a>1) {x=($1-a); for (i=1;i< /strong> -

    action语句检查第1列中的值并将其与a进行比较。如果差值大于 1,我们执行一个 for 循环 来添加所有缺失的行,并将 a 的值设置为 $1。如果连续行的column 1中的值不大于1,我们将column 1的值分配给aprint 它。

测试:

[jaypal:~/Temp] cat file
1867    121 2 56 
1868    121 1 6 
1868    121 2 65 
1868    122 0 53 
1869    121 0 41 
1869    121 1 41 
1871    121 1 13  # <--- 1870 skipped
1871    121 2 194
1875    120 1 12 # <--- 1872, 1873, 1874 skipped

[jaypal:~/Temp] awk 'BEGIN{getline;a=$1;b=$2;c=$3}
NR==FNR{if (b>$2) b=$2; if (c>$3) c=$3;next} 
{if ($1-a>1) {x=($1-a); for (i=1;i<x;i++) {print (a+1)"\t"b,c,"0";a++};a=$1} else a=$1;print}' file file
1867    121 2 56 
1868    121 1 6 
1868    121 2 65 
1868    122 0 53 
1869    121 0 41 
1869    121 1 41 
1870    120 0 0 # Assigned minimum value in col 2 (120) and col 3 (0).
1871    121 1 13 
1871    121 2 194
1872    120 0 0 # Assigned minimum value in col 2 (120) and col 3 (0).
1873    120 0 0 # Assigned minimum value in col 2 (120) and col 3 (0).
1874    120 0 0 # Assigned minimum value in col 2 (120) and col 3 (0).
1875    120 1 12

Something like this could work -

awk 'BEGIN{getline;a=$1;b=$2;c=$3}
NR==FNR{if (b>$2) b=$2; if (c>$3) c=$3;next} 
{if ($1-a>1) {x=($1-a); for (i=1;i<x;i++) {print (a+1)"\t"b,c,"0";a++};a=$1} else a=$1;print}' file file

Explanation:

  1. BEGIN{getline;a=$1;b=$2;c=$3} -

    In this BEGIN block we read the first line and assign values in column 1 to variable a, column 2 to variable b and column 3 to variable c.

  2. NR==FNR{if (b>$2) b=$2; if (c>$3) c=$3;next} -

    In this we scan through the entire file (NR==FNR) and keep track of the lowest possible values in column 2 and column 3 and store them in variables b and c respectively. We use next to avoid running the second pattern{action} statement.

  3. {if ($1-a>1) {x=($1-a); for (i=1;i<x;i++) {print (a+1)"\t"b,c,"0";a++};a=$1} else a=$1;print} -

    This action statement checks the for the value in column 1 and compares it with a. If the the difference is more than 1, we do a for loop to add all the missing lines and set the value of a to $1. If the value in column 1 on successive lines is not greater than 1, we assign the value of column 1 to a and print it.

Test:

[jaypal:~/Temp] cat file
1867    121 2 56 
1868    121 1 6 
1868    121 2 65 
1868    122 0 53 
1869    121 0 41 
1869    121 1 41 
1871    121 1 13  # <--- 1870 skipped
1871    121 2 194
1875    120 1 12 # <--- 1872, 1873, 1874 skipped

[jaypal:~/Temp] awk 'BEGIN{getline;a=$1;b=$2;c=$3}
NR==FNR{if (b>$2) b=$2; if (c>$3) c=$3;next} 
{if ($1-a>1) {x=($1-a); for (i=1;i<x;i++) {print (a+1)"\t"b,c,"0";a++};a=$1} else a=$1;print}' file file
1867    121 2 56 
1868    121 1 6 
1868    121 2 65 
1868    122 0 53 
1869    121 0 41 
1869    121 1 41 
1870    120 0 0 # Assigned minimum value in col 2 (120) and col 3 (0).
1871    121 1 13 
1871    121 2 194
1872    120 0 0 # Assigned minimum value in col 2 (120) and col 3 (0).
1873    120 0 0 # Assigned minimum value in col 2 (120) and col 3 (0).
1874    120 0 0 # Assigned minimum value in col 2 (120) and col 3 (0).
1875    120 1 12
初心 2024-12-31 19:17:38

Perl 解决方案。也应该适用于大文件,因为它不会将整个文件加载到内存中,而是遍历该文件两次。

#!/usr/bin/perl
use warnings;
use strict;

my $file = shift;

open my $IN, '<', $file or die $!;

my @mins;
while (<$IN>) {
    my @cols = split;
    for (0, 1) {
        $mins[$_] = $cols[$_ + 1] if $cols[$_ + 1] < $mins[$_ ]
                                     or ! defined $mins[$_];
    }
}

seek $IN, 0, 0;
my $last;
while (<$IN>) {
    my @cols = split;
    $last //= $cols[0];

    for my $i ($last .. $cols[0]-2) {
        print $i + 1, "\t@mins 0\n";
    }
    print;
    $last = $cols[0];
}

Perl solution. Should work for large files, too, as it does not load the whole file into memory, but goes over the file two times.

#!/usr/bin/perl
use warnings;
use strict;

my $file = shift;

open my $IN, '<', $file or die $!;

my @mins;
while (<$IN>) {
    my @cols = split;
    for (0, 1) {
        $mins[$_] = $cols[$_ + 1] if $cols[$_ + 1] < $mins[$_ ]
                                     or ! defined $mins[$_];
    }
}

seek $IN, 0, 0;
my $last;
while (<$IN>) {
    my @cols = split;
    $last //= $cols[0];

    for my $i ($last .. $cols[0]-2) {
        print $i + 1, "\t@mins 0\n";
    }
    print;
    $last = $cols[0];
}
清醇 2024-12-31 19:17:38

bash 解决方案:

# initialize minimum of 2. and 3. column
read no min2 min3 c4 < "$infile"

# get minimum of 2. and 3. column
while read c1 c2 c3 c4 ; do
  [ $c2 -lt $min2 ] && min=$c2
  [ $c3 -lt $min3 ] && min=$c3
done < "$infile"

while read c1 c2 c3 c4 ; do
  # insert missing line(s) ?
  while (( c1- no > 1 )) ; do
    ((no++))
    echo -e "$no $min2 $min3 0"
  done
  # now insert existing line
  echo -e "$c1 $c2 $c3 $c4"
  no=$c1
done < "$infile"

A Bash solution:

# initialize minimum of 2. and 3. column
read no min2 min3 c4 < "$infile"

# get minimum of 2. and 3. column
while read c1 c2 c3 c4 ; do
  [ $c2 -lt $min2 ] && min=$c2
  [ $c3 -lt $min3 ] && min=$c3
done < "$infile"

while read c1 c2 c3 c4 ; do
  # insert missing line(s) ?
  while (( c1- no > 1 )) ; do
    ((no++))
    echo -e "$no $min2 $min3 0"
  done
  # now insert existing line
  echo -e "$c1 $c2 $c3 $c4"
  no=$c1
done < "$infile"
箹锭⒈辈孓 2024-12-31 19:17:38

使用 awk 的一种方法:

BEGIN { 
        if ( ARGC > 2 ) {
                print "Usage: awk -f script.awk <file-name>"
                exit 0
        }

        ## Need to process file twice, duplicate the input filename.
        ARGV[2] = ARGV[1]
        ++ARGC

        col2 = -1
        col3 = -1
}

## First processing of file. Get min values of second and third columns.
FNR == NR {
        col2 = col2 < 0 || col2 > $2 ? $2 : col2
        col3 = col3 < 0 || col3 > $3 ? $3 : col3
        next
}

## Second processing of file.
FNR < NR {
        ## Get value of column 1 in first row.
        if ( FNR == 1 ) {
                col1 = $1
                print
                next
        }

        ## Compare current value of column 1 with value of previous row.
        ## Add a new row while difference is bigger than '1'.
        while ( $1 - col1 > 1 ) {
                ++col1
                printf "%d\t%d %d %d\n", col1, col2, col3, 0
        }

        ## Assing new value of column 1.
        col1 = $1
        print
}

运行脚本:

awk -f script.awk infile

结果:

1867    121 2 56 
1868    121 1 6 
1868    121 2 65 
1868    122 0 53 
1869    121 0 41 
1869    121 1 41 
1870    121 0 0
1871    121 1 13 
1871    121 2 194

One way using awk:

BEGIN { 
        if ( ARGC > 2 ) {
                print "Usage: awk -f script.awk <file-name>"
                exit 0
        }

        ## Need to process file twice, duplicate the input filename.
        ARGV[2] = ARGV[1]
        ++ARGC

        col2 = -1
        col3 = -1
}

## First processing of file. Get min values of second and third columns.
FNR == NR {
        col2 = col2 < 0 || col2 > $2 ? $2 : col2
        col3 = col3 < 0 || col3 > $3 ? $3 : col3
        next
}

## Second processing of file.
FNR < NR {
        ## Get value of column 1 in first row.
        if ( FNR == 1 ) {
                col1 = $1
                print
                next
        }

        ## Compare current value of column 1 with value of previous row.
        ## Add a new row while difference is bigger than '1'.
        while ( $1 - col1 > 1 ) {
                ++col1
                printf "%d\t%d %d %d\n", col1, col2, col3, 0
        }

        ## Assing new value of column 1.
        col1 = $1
        print
}

Running the script:

awk -f script.awk infile

Result:

1867    121 2 56 
1868    121 1 6 
1868    121 2 65 
1868    122 0 53 
1869    121 0 41 
1869    121 1 41 
1870    121 0 0
1871    121 1 13 
1871    121 2 194
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文