以 Unix 方式总结两列

发布于 2024-08-03 10:00:30 字数 804 浏览 10 评论 0 原文

# 解决问题

如何有效地总结以下列?

第 1 列

1
3
3
...   

第 2 列

2323
343
232
...

这应该给我

预期结果< /em>

2324
346
235
...

我有两个文件中的列。


# 初始情况

有时我使用了太多的大括号,因此我在文件中使用了多个 this { 比 this } 。 我试图找到我在哪里使用了一个不必要的大括号。 我已使用以下步骤获取数据

Find 命令

 find . * -exec grep '{' {} + > /tmp/1
 find . * -exec grep '}' {} + > /tmp/2

AWK 命令

 awk -F: '{ print $2 }' /tmp/1 > /tmp/11
 awk -F: '{ print $2 }' /tmp/2 > /tmp/22

该列位于文件 /tmp/11 和 /tmp/22 中。

我在程序中重复了很多类似的命令。 这表明我这不是正确的方法。

请向我建议任何可以减少步骤数的方法,例如 Python、Perl 或任何 Unix 工具。

# To fix the symptom

How can you sum up the following columns effectively?

Column 1

1
3
3
...   

Column 2

2323
343
232
...

This should give me

Expected result

2324
346
235
...

I have the columns in two files.


# Initial situation

I use sometimes too many curly brackets such that I have used one more this { than this } in my files.
I am trying to find where I have used the one unnecessary curly bracket.
I have used the following steps in getting the data

Find commands

 find . * -exec grep '{' {} + > /tmp/1
 find . * -exec grep '}' {} + > /tmp/2

AWK commands

 awk -F: '{ print $2 }' /tmp/1 > /tmp/11
 awk -F: '{ print $2 }' /tmp/2 > /tmp/22

The column are in the files /tmp/11 and /tmp/22.

I repeat a lot of similar commands in my procedure.
This suggests me that this is not the right way.

Please, suggests me any way such as Python, Perl or any Unix tool which can decrease the number of steps.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

满意归宿 2024-08-10 10:00:30

如果 c1 和 c2 是您的文件,您可以执行以下操作:

$ paste c1 c2 | awk '{print $1 + $2}'

或者(不使用 AWK):

$ paste c1 c2 | while read i j; do echo $(($i+$j)); done

If c1 and c2 are youre files, you can do this:

$ paste c1 c2 | awk '{print $1 + $2}'

Or (without AWK):

$ paste c1 c2 | while read i j; do echo $(($i+$j)); done
机场等船 2024-08-10 10:00:30

使用Python:

totals = [ int(i)+int(j) for i, j in zip ( open(fname1), open(fname2) ) ]

Using python:

totals = [ int(i)+int(j) for i, j in zip ( open(fname1), open(fname2) ) ]
删除会话 2024-08-10 10:00:30

您可以通过仅使用同时进行计数和比较的命令来避免中间步骤:

find . -type f -exec perl -nle 'END { print $ARGV if $h{"{"} != $h{"}"} } $h{$_}++ for /([}{])/g' {}\;

这对每个文件调用一次 Perl 程序,Perl 程序计算每种类型大括号的数量并打印文件名如果它们计数不匹配。

您必须小心 /([}{]])/ 部分,find 会认为它需要对 {} 进行替换如果你说/([{}]])/

警告:如果您尝试针对源代码运行此代码,则会出现误报和误报。考虑以下情况:

平衡,但字符串中有花句:

if ($s eq '{') {
    print "I saw a {\n"
}

不平衡,但字符串中有花句:

while (1) {
   print "}";

您可以使用 B::Deparse:

perl -MO=Deparse -nle 'END { print $ARGV if $h{"{"} != $h{"}"} } $h{$_ }++ for /([}{])/g'

结果是:

BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    sub END {
        print $ARGV if $h{'{'} != $h{'}'};
    }
    ;
    ++$h{$_} foreach (/([}{])/g);
}

我们现在可以查看程序的每个部分:

BEGIN { $/ = "\n"; $\ = "\n"; }

这是由 -l 选项引起的。它将输入和输出记录分隔符设置为“\n”。这意味着读入的任何内容都将被分解为基于“\n”的记录,并且任何打印语句都将附加“\n”。

LINE: while (defined($_ = <ARGV>)) {
}

这是由 -n 选项创建的。它循环遍历通过命令行(如果没有传递文件则为 STDIN)传入的每个文件,读取这些文件的每一行。这也恰好将 $ARGV 设置为 读取的最后一个文件。

chomp $_;

这会从刚刚读取的行 ($_) 中删除 $/ 变量中的所有内容,它在这里没有任何用处。这是由 -l 选项引起的。

sub END {
    print $ARGV if $h{'{'} != $h{'}'};
}

这是一个 END 块,该代码将在程序末尾运行。如果 %h 中存储的值与键 '{'$ARGV(上次读取的文件的名称,见上文) code> 和 '}' 是相等的。

++$h{$_} foreach (/([}{])/g);

这需要进一步细分:

/
    (    #begin capture
    [}{] #match any of the '}' or '{' characters
    )    #end capture
/gx

是一个正则表达式,返回匹配字符串中的“{”和“}”字符列表。由于未指定字符串,因此将匹配 $_ 变量(它保存最后从文件中读取的行,请参见上文)。该列表被输入到 foreach 语句中,然后该语句针对列表中的每个项目(因此得名)运行其前面的语句。它还将 $_ (如您所见 $_ 是 Perl 中的一个流行变量)设置为列表中的项目。

++h{$_}

此行将 $h 中与 $_ 关联的值(可以是“{”或“}”,见上文)加一。

You can avoid the intermediate steps by just using a command that do the counts and the comparison at the same time:

find . -type f -exec perl -nle 'END { print $ARGV if $h{"{"} != $h{"}"} } $h{$_}++ for /([}{])/g' {}\;

This calls the Perl program once per file, the Perl program counts the number of each type curly brace and prints the name of the file if they counts don't match.

You must be careful with the /([}{]])/ section, find will think it needs to do the replacement on {} if you say /([{}]])/.

WARNING: this code will have false positives and negatives if you are trying to run it against source code. Consider the following cases:

balanced, but curlies in strings:

if ($s eq '{') {
    print "I saw a {\n"
}

unbalanced, but curlies in strings:

while (1) {
   print "}";

You can expand the Perl command by using B::Deparse:

perl -MO=Deparse -nle 'END { print $ARGV if $h{"{"} != $h{"}"} } $h{$_}++ for /([}{])/g'

Which results in:

BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    sub END {
        print $ARGV if $h{'{'} != $h{'}'};
    }
    ;
    ++$h{$_} foreach (/([}{])/g);
}

We can now look at each piece of the program:

BEGIN { $/ = "\n"; $\ = "\n"; }

This is caused by the -l option. It sets both the input and output record separators to "\n". This means anything read in will be broken into records based "\n" and any print statement will have "\n" appended to it.

LINE: while (defined($_ = <ARGV>)) {
}

This is created by the -n option. It loops over every file passed in via the commandline (or STDIN if no files are passed) reading each line of those files. This also happens to set $ARGV to the last file read by <ARGV>.

chomp $_;

This removes whatever is in the $/ variable from the line that was just read ($_), it does nothing useful here. It was caused by the -l option.

sub END {
    print $ARGV if $h{'{'} != $h{'}'};
}

This is an END block, this code will run at the end of the program. It prints $ARGV (the name of the file last read from, see above) if the values stored in %h associated with the keys '{' and '}' are equal.

++$h{$_} foreach (/([}{])/g);

This needs to be broken down further:

/
    (    #begin capture
    [}{] #match any of the '}' or '{' characters
    )    #end capture
/gx

Is a regex that returns a list of '{' and '}' characters that are in the string being matched. Since no string was specified the $_ variable (which holds the line last read from the file, see above) will be matched against. That list is fed into the foreach statement which then runs the statement it is in front of for each item (hence the name) in the list. It also sets $_ (as you can see $_ is a popular variable in Perl) to be the item from the list.

++h{$_}

This line increments the value in $h that is associated with $_ (which will be either '{' or '}', see above) by one.

执妄 2024-08-10 10:00:30

在Python(或Perl,Awk,&c)中,你可以合理地在一个独立的“pass”中完成它——我不确定你所说的“太多大括号”是什么意思,但你肯定可以计算大括号每个文件使用。例如(除非您必须担心多 GB 文件),使用最多花括号的 10 个文件:

import heapq
import os
import re

curliest = dict()

for path, dirs, files in os.walk('.'):
  for afile in files:
    fn = os.path.join(path, afile)
    with open(fn) as f:
      data = f.read()
      braces = data.count('{') + data.count('}')
    curliest[fn] = bracs

top10 = heapq.nlargest(10, curlies, curliest.get)
top10.sort(key=curliest.get)
for fn in top10:
  print '%6d %s' % (curliest[fn], fn)

In Python (or Perl, Awk, &c) you can reasonably do it in a single stand-alone "pass" -- I'm not sure what you mean by "too many curly brackets", but you can surely count curly use per file. For example (unless you have to worry about multi-GB files), the 10 files using most curly braces:

import heapq
import os
import re

curliest = dict()

for path, dirs, files in os.walk('.'):
  for afile in files:
    fn = os.path.join(path, afile)
    with open(fn) as f:
      data = f.read()
      braces = data.count('{') + data.count('}')
    curliest[fn] = bracs

top10 = heapq.nlargest(10, curlies, curliest.get)
top10.sort(key=curliest.get)
for fn in top10:
  print '%6d %s' % (curliest[fn], fn)
甜警司 2024-08-10 10:00:30

回复Lutz的回答

我的问题终于被这个comnad解决了

paste -d: /tmp/1 /tmp/2 | awk -F: '{ print $1 "\t" $2 - $4 }'

Reply to Lutz'n answer

My problem was finally solved by this commnad

paste -d: /tmp/1 /tmp/2 | awk -F: '{ print $1 "\t" $2 - $4 }'
记忆之渊 2024-08-10 10:00:30

只需 1 个 awk 命令即可解决您的问题...

awk '{getline i<"file1";print i+$0}'  file2

your problem can be solved with just 1 awk command...

awk '{getline i<"file1";print i+$0}'  file2
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文