删除 AWK 中的列选择

发布于 2024-12-08 13:15:32 字数 1626 浏览 1 评论 0原文

我想从 CSV 文件列表中删除选定的列。 awk 调用是内联的，因为它在 shell 脚本中使用。我事先不知道文件有多少列，只知道我想要删除的列包含在列表的每个文件中。

假设我想要删除前 4 列。清空列值将留下分隔符，我也希望将其消失。

我认为以下方法可行：创建一个要删除的列号数组，然后重新创建没有这些列的相应行。

下面的 length(row) 值符合预期，但最终循环仍然迭代原始列数，而不是实际的 length(row) 值。

头 $f | awk 'BEGIN{FS=",";split("1,2,3,4",dropers,",")}{split($0,row,FS);for(i in dropers) 删除 row[i] ; print NF "," 长度(行) "<<<";out=""; print NF "," 长度(行) ">>>";for(i=1;i<=长度(行);i++){print row[i] "lulu";输出=输出“，”行[i]}； sub(/[ \t]*$/,"",out);打印出}' > $g

或格式化：

head $f | awk 'BEGIN{FS=",";split("1,2,3,4",dropers,",")}{split($0,row,FS);for(i in dropers) delete row[i]; print NF "," length(row) "<<<";out=""; print NF "," length(row) ">>>";for(i=1;i<=length(row);i++){print row[i] "lulu"; out = out "," row[i]}; sub(/[ \t]*$/,"",out);print out}'  > $g

这是 2 个文件的输出：6 列进入，当我删除第 1 列到第 4 列时留下 2 列，但循环迭代完整的 6 列而不是预期的 2 列。谢谢您的建议。

奥斯特。

6,2<<<
6,2>>>
lulu
lulu
lulu
lulu
0000009lulu
461474lulu
,,,,,0000009,461474
6,2<<<
6,2>>>
lulu
lulu
lulu
lulu
0000010lulu
94942lulu
,,,,,0000010,94942

编辑 (贝利撒留)
格式化代码如下：

BEGIN {FS=",";
       split("1,2,3,4",dropers,",")
      }

{ split($0,row,FS);
  for(i in dropers) delete row[i]; 
  print NF "," length(row) "<<<";
  out=""; 
  print NF "," length(row) ">>>";
  for(i=1;i<=length(row);i++){print row[i] "lulu"; 
                              out = out "," row[i]}; 
  sub(/[ \t]*$/,"",out);
  print out
}

原文

I'd like to delete a selection of columns from a list of CSV files. The awk call is in-line as it is used in a shell script. I don't know beforehand how many columns the files have, only that the columns that I want gone are included in each file of the list.

Let's say I want the first 4 columns removed. Blanking out the column values will leave the separators, which I also want gone.

I though the following would work: create an array of column numbers to drop, and recreate the corresponding row without those columns.

The value of length(row) below is as expected, but the final loop still iterates over the original column count, not the actual length(row) value.

head $f | awk 'BEGIN{FS=",";split("1,2,3,4",dropers,",")}{split($0,row,FS);for(i in dropers) delete row[i]; print NF "," length(row) "<<<";out=""; print NF "," length(row) ">>>";for(i=1;i<=length(row);i++){print row[i] "lulu"; out = out "," row[i]}; sub(/[ \t]*$/,"",out);print out}' > $g

or formatted:

head $f | awk 'BEGIN{FS=",";split("1,2,3,4",dropers,",")}{split($0,row,FS);for(i in dropers) delete row[i]; print NF "," length(row) "<<<";out=""; print NF "," length(row) ">>>";for(i=1;i<=length(row);i++){print row[i] "lulu"; out = out "," row[i]}; sub(/[ \t]*$/,"",out);print out}'  > $g

Here's the output for 2 files: 6 columns going in, 2 left when I've deleted columns 1 through 4, yet the loop iterates over the full 6 cols rather than the expected 2. Thank you for any advice.

Aust.

6,2<<<
6,2>>>
lulu
lulu
lulu
lulu
0000009lulu
461474lulu
,,,,,0000009,461474
6,2<<<
6,2>>>
lulu
lulu
lulu
lulu
0000010lulu
94942lulu
,,,,,0000010,94942

Edit (Belisarius)
Formatted code follows:

BEGIN {FS=",";
       split("1,2,3,4",dropers,",")
      }

{ split($0,row,FS);
  for(i in dropers) delete row[i]; 
  print NF "," length(row) "<<<";
  out=""; 
  print NF "," length(row) ">>>";
  for(i=1;i<=length(row);i++){print row[i] "lulu"; 
                              out = out "," row[i]}; 
  sub(/[ \t]*$/,"",out);
  print out
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

童话 2024-12-15 13:15:32

BEGIN {FS=",";
       split("1,2,3,4",dropers,",")
      }

{ split($0,row,FS);
  for(i in dropers) delete row[i]; 
  print NF "," length(row) "<<<";
  out=""; 
  print NF "," length(row) ">>>";
  for(i in row){print row[i] "lulu"; 
                out = out "," row[i]}; 
  out = substr(out,2)
  sub(/[ \t]*$/,"",out);
  print out
}

输入：

a,b,c,d,e,f,g

打印：

7,3<<<
7,3>>>
elulu
flulu
glulu
e,f,g

BEGIN {FS=",";
       split("1,2,3,4",dropers,",")
      }

{ split($0,row,FS);
  for(i in dropers) delete row[i]; 
  print NF "," length(row) "<<<";
  out=""; 
  print NF "," length(row) ">>>";
  for(i in row){print row[i] "lulu"; 
                out = out "," row[i]}; 
  out = substr(out,2)
  sub(/[ \t]*$/,"",out);
  print out
}

with input: