如何在 python 中比较两个 CSV 中的列?
你好 我有两个 CSV 文件作为输入,例如:
file1 :
AK163828 chr5 s1 + e1 cttt 4
AK163828 chr5 s2 + e2 gtca 4
AK168688 chr6 s3 + e3 ggcg 4
AK168688 chr6 s4 + e4 tctg 4
file2 :
chr6s3+e3 ggcg
chr5s1+e1 cttt
chr6s4+e4 tata
chr5s2+e2 ggcg
#as you can see the file2 is randomly sorted
我想将 file2 的第 1 列与 file1 的合并后的第 2、3、4、5 列进行比较,同时将 file2 的第 2 列与文件的第 6 列进行比较1、仅选择匹配的行。
所需的输出是
chr6s3+e3 ggcg
chr5s1+e1 cttt
我尝试使用此代码:
import csv
reader1 = csv.reader(open(file1), dialect='excel-tab' )
reader2 = csv.reader(open(file2), dialect='excel-tab' )
for row1, row2 in zip(reader1,reader2):
F1 = row1[1] + row1[2] + row1[3] + row1[4] + '\t' row1[5]
F2 = row2[0] + '\t' + row2[1]
print set(F1) & set(F2)
但它不起作用。你能帮我修复我的代码或给我其他方法吗? 感谢您的帮助!
Hi
I have two CSV files as input, for example:
file1 :
AK163828 chr5 s1 + e1 cttt 4
AK163828 chr5 s2 + e2 gtca 4
AK168688 chr6 s3 + e3 ggcg 4
AK168688 chr6 s4 + e4 tctg 4
file2 :
chr6s3+e3 ggcg
chr5s1+e1 cttt
chr6s4+e4 tata
chr5s2+e2 ggcg
#as you can see the file2 is randomly sorted
I want to compare column 1 of file2 with column 2, 3, 4, 5 merged of file1 and at the same time the column 2 of file2 with column 6 of file 1, and select only the matching lines.
The desired output is
chr6s3+e3 ggcg
chr5s1+e1 cttt
I tried to use this code:
import csv
reader1 = csv.reader(open(file1), dialect='excel-tab' )
reader2 = csv.reader(open(file2), dialect='excel-tab' )
for row1, row2 in zip(reader1,reader2):
F1 = row1[1] + row1[2] + row1[3] + row1[4] + '\t' row1[5]
F2 = row2[0] + '\t' + row2[1]
print set(F1) & set(F2)
But it doesn't work. Can you help me to fix my code or give me an other way to do it?
Thanks for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
快速而肮脏:
输出:
Quick and dirty:
Output:
怎么样:
顺便说一句:您使用的格式 (
dialect='excel-tab'
) 通常称为 TSV,而不是 CSV,尽管它是 CSV 的变体。您还必须确保您的值是用制表符分隔的,而不是像您的帖子中那样用空格分隔的。我猜是的,但由于 Stack Overflow 格式问题,您只有空格?How about this:
By the way: the format you're using (
dialect='excel-tab'
) is usually called TSV, and not CSV, although it is a variant of CSV. You also have to make sure your values are separated by tabs and not by spaces, like in your post. I guess they are, and you only have spaces beacause of Stack Overflow formatting issues?