BASH - 加入非第一列

发布于 2024-12-08 11:15:28 字数 1498 浏览 0 评论 0原文

我正在尝试将 2 个文件连接在一起 - 两个文件都是 CSV 格式 - 两个文件具有相同的列。以下是每个文件的示例：

文件 1：

CustName,AccountReference,InvoiceDate,InvoiceRefID,TelNo,Rental,GPRS,Mnet,MnetPlus,SMS,CSD,IntRoaming,NetAmount
acme,107309   ,2011-09-24 12:47:11.000,AP/157371,07741992165     ,2.3900,.0000,.0000,.0000,.0000,.0000,.0000,2.3900
acme,107309   ,2011-09-24 12:58:32.000,AP/162874,07740992165     ,2.0000,.0000,.0000,.0000,.0000,.0000,.0000,2.0000
anot,107308   ,2011-09-24 12:58:32.000,AP/162874,07824912428     ,2.0000,.0000,.0000,.0000,.0000,.0000,.0000,2.0000
anot,107308   ,2011-09-24 12:47:11.000,AP/157371,07834919928     ,1.5500,.0000,.0000,.0000,.0000,.0000,.0000,1.5500

文件 2：

CustName,AccountReference,InvoiceDate,InvoiceRefID,TelNo,Rental,GPRS,Mnet,MnetPlus,SMS,CSD,IntRoaming,NetAmount
acme,100046,2011-10-05 08:29:19,AB/020152,07824352342,12.77,0.00,0.00,0.00,0.00,0.00,0.00,12.77
anbe,100046,2011-10-05 08:29:19,AB/020152,07741992165,2.50,0.00,0.00,0.00,0.00,0.00,0.00,2.50
acve,100046,2011-10-05 08:29:19,AB/020152,07740992165,10.00,0.00,0.00,0.00,0.00,0.00,0.00,10.00
asce,100046,2011-10-05 08:29:19,AB/020152,07771335702,2.50,0.00,0.00,0.00,0.00,0.00,0.00,2.50

我想将这两个文件连接在一起 - 但只取其中一些列，其他列可以忽略（有些相同，有些不同） -

AccountRef,telno,rental_file1,rental_file2,gprs_file1,gprs_file2 等等....

连接应该在 telno 列上完成（看起来我有空格）文件 1 - 希望可以忽略吗？

我找到了很多使用 JOIN 的示例，但它们都使用第一列作为连接上的键......任何指针都会很棒 - 谢谢

原文

I am trying to join 2 files together - both files are in CSV format - both files have the same columns. Here is an example of each file :

File 1:

CustName,AccountReference,InvoiceDate,InvoiceRefID,TelNo,Rental,GPRS,Mnet,MnetPlus,SMS,CSD,IntRoaming,NetAmount
acme,107309   ,2011-09-24 12:47:11.000,AP/157371,07741992165     ,2.3900,.0000,.0000,.0000,.0000,.0000,.0000,2.3900
acme,107309   ,2011-09-24 12:58:32.000,AP/162874,07740992165     ,2.0000,.0000,.0000,.0000,.0000,.0000,.0000,2.0000
anot,107308   ,2011-09-24 12:58:32.000,AP/162874,07824912428     ,2.0000,.0000,.0000,.0000,.0000,.0000,.0000,2.0000
anot,107308   ,2011-09-24 12:47:11.000,AP/157371,07834919928     ,1.5500,.0000,.0000,.0000,.0000,.0000,.0000,1.5500

File 2:

CustName,AccountReference,InvoiceDate,InvoiceRefID,TelNo,Rental,GPRS,Mnet,MnetPlus,SMS,CSD,IntRoaming,NetAmount
acme,100046,2011-10-05 08:29:19,AB/020152,07824352342,12.77,0.00,0.00,0.00,0.00,0.00,0.00,12.77
anbe,100046,2011-10-05 08:29:19,AB/020152,07741992165,2.50,0.00,0.00,0.00,0.00,0.00,0.00,2.50
acve,100046,2011-10-05 08:29:19,AB/020152,07740992165,10.00,0.00,0.00,0.00,0.00,0.00,0.00,10.00
asce,100046,2011-10-05 08:29:19,AB/020152,07771335702,2.50,0.00,0.00,0.00,0.00,0.00,0.00,2.50

I would like to join the 2 files together - but just taking some of the columns the other columns can be ignored (some are the same, some are different) -

AccountRef,telno, rental_file1,rental_file2,gprs_file1,gprs_file2 etc etc ....

The join should be done on the telno column (it seems I have white space in file 1 - hope that can be ignored ?

i have found lots of examples using JOIN but all of them use the first column for the key on the join .... any pointers would be great - thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鹿港小镇 2024-12-15 11:15:28

基本答案是：

join -t , -1 3 -2 4 -1 6 -2 2 file1 file2

这会将 file 第 3 列上的文件 file1 和 file2 与 file2 第 4 列连接起来>，然后在第 6 列和第 2 列上。当然，数据文件必须在相同的列上排序。 -t , 设置 CSV 的分隔符 - 但 join 不会处理带引号的字符串中嵌入的逗号。

如果您的数据很简单（没有带引号的字符串），那么您也可以使用awk。如果您的数据包含可能包含逗号等的引号字符串，那么您需要一个支持 CSV 的工具。我可能会将 Perl 与 Text::CSV 模块（以及 < href="http://search.cpan.org/perldoc?Text%3a%3aCSV_XS" rel="nofollow">Text::CSV_XS 模块以提高性能）。

The basic answer is:

join -t , -1 3 -2 4 -1 6 -2 2 file1 file2

This will join the files file1 and file2 on column 3 from file with column 4 from file2, then on columns 6 and 2. The data files must be sorted on those same columns, of course. The -t , sets the separator for CSV - but join will not handle embedded commas inside quoted strings.

If your data is simple (no quoted strings) then you can also use awk. If your data has quoted strings which may contain commas, etc, then you need a CSV-aware tool. I'd probably use Perl with the Text::CSV module (and the Text::CSV_XS module for performance).

回复收藏 0 原文

鯉魚旗 2024-12-15 11:15:28

awk -F' *, *' 'NR > 1 && NR == FNR {
  _[$5] = $0; next
  }  
NR == 1 {
  print "AccountReference", "TelNo", "Rental_" ARGV[2], \
  "Rental_" ARGV[3], "GPRS_" ARGV[2], "GPRS_" ARGV[3]
  next
  }
$5 in _ {
  split(_[$5], t)
  print $2, $5, $6, t[6], $7, t[7]
  }' OFS=, file2 file1

awk -F' *, *' 'NR > 1 && NR == FNR {
  _[$5] = $0; next
  }  
NR == 1 {
  print "AccountReference", "TelNo", "Rental_" ARGV[2], \
  "Rental_" ARGV[3], "GPRS_" ARGV[2], "GPRS_" ARGV[3]
  next
  }
$5 in _ {
  split(_[$5], t)
  print $2, $5, $6, t[6], $7, t[7]
  }' OFS=, file2 file1

回复收藏 0 原文

别低头，皇冠会掉 2024-12-15 11:15:28

看看 cat 和 cut :-)

例如

cat file1 file2 | cut -d, -f2,5

产量

107309   ,07741992165     
107309   ,07740992165     
107308   ,07824912428     
107308   ,07834919928
100046,07824352342
100046,07741992165
100046,07740992165
100046,07771335702

Have a look at cat and cut :-)

For instance

cat file1 file2 | cut -d, -f2,5

yields

107309   ,07741992165     
107309   ,07740992165     
107308   ,07824912428     
107308   ,07834919928
100046,07824352342
100046,07741992165
100046,07740992165
100046,07771335702

回复收藏 0 原文