# 解决问题
如何有效地总结以下列?
第 1 列
1
3
3
...
第 2 列
2323
343
232
...
这应该给我
预期结果< /em>
2324
346
235
...
我有两个文件中的列。
# 初始情况
有时我使用了太多的大括号,因此我在文件中使用了多个 this { 比 this } 。
我试图找到我在哪里使用了一个不必要的大括号。
我已使用以下步骤获取数据
Find 命令
find . * -exec grep '{' {} + > /tmp/1
find . * -exec grep '}' {} + > /tmp/2
AWK 命令
awk -F: '{ print $2 }' /tmp/1 > /tmp/11
awk -F: '{ print $2 }' /tmp/2 > /tmp/22
该列位于文件 /tmp/11 和 /tmp/22 中。
我在程序中重复了很多类似的命令。
这表明我这不是正确的方法。
请向我建议任何可以减少步骤数的方法,例如 Python、Perl 或任何 Unix 工具。
# To fix the symptom
How can you sum up the following columns effectively?
Column 1
1
3
3
...
Column 2
2323
343
232
...
This should give me
Expected result
2324
346
235
...
I have the columns in two files.
# Initial situation
I use sometimes too many curly brackets such that I have used one more this { than this } in my files.
I am trying to find where I have used the one unnecessary curly bracket.
I have used the following steps in getting the data
Find commands
find . * -exec grep '{' {} + > /tmp/1
find . * -exec grep '}' {} + > /tmp/2
AWK commands
awk -F: '{ print $2 }' /tmp/1 > /tmp/11
awk -F: '{ print $2 }' /tmp/2 > /tmp/22
The column are in the files /tmp/11 and /tmp/22.
I repeat a lot of similar commands in my procedure.
This suggests me that this is not the right way.
Please, suggests me any way such as Python, Perl or any Unix tool which can decrease the number of steps.
发布评论
评论(6)
如果 c1 和 c2 是您的文件,您可以执行以下操作:
或者(不使用 AWK):
If c1 and c2 are youre files, you can do this:
Or (without AWK):
使用Python:
Using python:
您可以通过仅使用同时进行计数和比较的命令来避免中间步骤:
这对每个文件调用一次 Perl 程序,Perl 程序计算每种类型大括号的数量并打印文件名如果它们计数不匹配。
您必须小心
/([}{]])/
部分,find
会认为它需要对{}
进行替换如果你说/([{}]])/
。警告:如果您尝试针对源代码运行此代码,则会出现误报和误报。考虑以下情况:
平衡,但字符串中有花句:
不平衡,但字符串中有花句:
您可以使用 B::Deparse:
perl -MO=Deparse -nle 'END { print $ARGV if $h{"{"} != $h{"}"} } $h{$_ }++ for /([}{])/g'
结果是:
我们现在可以查看程序的每个部分:
这是由
-l
选项引起的。它将输入和输出记录分隔符设置为“\n”。这意味着读入的任何内容都将被分解为基于“\n”的记录,并且任何打印语句都将附加“\n”。这是由
-n
选项创建的。它循环遍历通过命令行(如果没有传递文件则为 STDIN)传入的每个文件,读取这些文件的每一行。这也恰好将$ARGV
设置为
读取的最后一个文件。这会从刚刚读取的行 (
$_
) 中删除$/
变量中的所有内容,它在这里没有任何用处。这是由-l
选项引起的。这是一个 END 块,该代码将在程序末尾运行。如果
%h
中存储的值与键'{'$ARGV
(上次读取的文件的名称,见上文) code> 和'}'
是相等的。这需要进一步细分:
是一个正则表达式,返回匹配字符串中的“{”和“}”字符列表。由于未指定字符串,因此将匹配
$_
变量(它保存最后从文件中读取的行,请参见上文)。该列表被输入到 foreach 语句中,然后该语句针对列表中的每个项目(因此得名)运行其前面的语句。它还将$_
(如您所见$_
是 Perl 中的一个流行变量)设置为列表中的项目。此行将 $h 中与
$_
关联的值(可以是“{”或“}”,见上文)加一。You can avoid the intermediate steps by just using a command that do the counts and the comparison at the same time:
This calls the Perl program once per file, the Perl program counts the number of each type curly brace and prints the name of the file if they counts don't match.
You must be careful with the
/([}{]])/
section,find
will think it needs to do the replacement on{}
if you say/([{}]])/
.WARNING: this code will have false positives and negatives if you are trying to run it against source code. Consider the following cases:
balanced, but curlies in strings:
unbalanced, but curlies in strings:
You can expand the Perl command by using B::Deparse:
perl -MO=Deparse -nle 'END { print $ARGV if $h{"{"} != $h{"}"} } $h{$_}++ for /([}{])/g'
Which results in:
We can now look at each piece of the program:
This is caused by the
-l
option. It sets both the input and output record separators to "\n". This means anything read in will be broken into records based "\n" and any print statement will have "\n" appended to it.This is created by the
-n
option. It loops over every file passed in via the commandline (or STDIN if no files are passed) reading each line of those files. This also happens to set$ARGV
to the last file read by<ARGV>
.This removes whatever is in the
$/
variable from the line that was just read ($_
), it does nothing useful here. It was caused by the-l
option.This is an END block, this code will run at the end of the program. It prints
$ARGV
(the name of the file last read from, see above) if the values stored in%h
associated with the keys'{'
and'}'
are equal.This needs to be broken down further:
Is a regex that returns a list of '{' and '}' characters that are in the string being matched. Since no string was specified the
$_
variable (which holds the line last read from the file, see above) will be matched against. That list is fed into theforeach
statement which then runs the statement it is in front of for each item (hence the name) in the list. It also sets$_
(as you can see$_
is a popular variable in Perl) to be the item from the list.This line increments the value in $h that is associated with
$_
(which will be either '{' or '}', see above) by one.在Python(或Perl,Awk,&c)中,你可以合理地在一个独立的“pass”中完成它——我不确定你所说的“太多大括号”是什么意思,但你肯定可以计算大括号每个文件使用。例如(除非您必须担心多 GB 文件),使用最多花括号的 10 个文件:
In Python (or Perl, Awk, &c) you can reasonably do it in a single stand-alone "pass" -- I'm not sure what you mean by "too many curly brackets", but you can surely count curly use per file. For example (unless you have to worry about multi-GB files), the 10 files using most curly braces:
回复Lutz的回答
我的问题终于被这个comnad解决了
Reply to Lutz'n answer
My problem was finally solved by this commnad
只需 1 个 awk 命令即可解决您的问题...
your problem can be solved with just 1 awk command...