使用python读取和提取csv文件列

发布于 2024-09-09 03:44:39 字数 1248 浏览 2 评论 0原文

我有以下代码...

reader=csv.DictReader(open("test1.csv","r"))
allrows = list(reader)

keepcols = [c for c in allrows[0] if all(r[c] != '0' for r in allrows)]

print keepcols
writer=csv.DictWriter(open("output1.csv","w"),fieldnames='keepcols',extrasaction='ignore')
writer.writerows(allrows)

我有一个 csv 文件，大约有 45 列..
第一列有一些名称..
除了第一列，其他都只有0和1...... 当然，整个表格也有一些标题..
我试图从 csv 文件中读取列，并且我只需要提取那些带有 1 的列
问题是输出文件是空的，即使表中有几列为 1..

有人可以帮我吗.... :( 我陷入了困境..

Title    3003_contact    3003_backbone   3003_sidechain  3003_polar  3003_hydrophobic    3003_acceptor   3003_donor  3003_aromatic
l1  1   1   0   1   1   0   0   0
l1  1   0   1   0   0   0   1   0
l1  1   0   0   0   0   0   0   0
l1  1   0   0   0   1   0   0   1
l1  1   0   0   0   0   0   0   0
l2  1   0   0   0   1   0   0   0
l2  1   0   0   0   0   1   0   0
l3  1   0   0   0   0   0   0   0
l3  1   0   0   0   0   0   1   0
l3  1   0   0   0   0   0   0   1
l3  1   0   0   0   0   0   0   0
l3  1   0   0   0   0   0   0   0
l4  1   0   0   0   0   0   0   0
l4  1   0   0   0   0   0   0   0
l4  1   0   0   0   0   0   0   0

它只返回第 1 列... 我已经尝试将“keepcols”更改为keepcols...我首先得到column2，然后得到column1作为输出

原文

i have the following code...

reader=csv.DictReader(open("test1.csv","r"))
allrows = list(reader)

keepcols = [c for c in allrows[0] if all(r[c] != '0' for r in allrows)]

print keepcols
writer=csv.DictWriter(open("output1.csv","w"),fieldnames='keepcols',extrasaction='ignore')
writer.writerows(allrows)

i have a csv file which has about 45 cols..
the first column has some names..
except the first column, all others have only 0's and 1's...
and of course, the whole table has some titles as well..
i m trying to read columns from csv file and i need to extract only those cols with 1's
the problem is the output file is empty even though there are a few columns in the table with 1's..

could somebody please help me out.... :( i m stuck terribly..

Title    3003_contact    3003_backbone   3003_sidechain  3003_polar  3003_hydrophobic    3003_acceptor   3003_donor  3003_aromatic
l1  1   1   0   1   1   0   0   0
l1  1   0   1   0   0   0   1   0
l1  1   0   0   0   0   0   0   0
l1  1   0   0   0   1   0   0   1
l1  1   0   0   0   0   0   0   0
l2  1   0   0   0   1   0   0   0
l2  1   0   0   0   0   1   0   0
l3  1   0   0   0   0   0   0   0
l3  1   0   0   0   0   0   1   0
l3  1   0   0   0   0   0   0   1
l3  1   0   0   0   0   0   0   0
l3  1   0   0   0   0   0   0   0
l4  1   0   0   0   0   0   0   0
l4  1   0   0   0   0   0   0   0
l4  1   0   0   0   0   0   0   0

it returns only column 1... I've tried changing 'keepcols' to keepcols... and I get column2 first and then column1 as output

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

故人的歌 2024-09-16 03:44:39

编辑：如果输入文件是逗号分隔值文件，则
要保持键的顺序，请使用 reader.fieldnames 而不是 allrows[0] 中的键。

所以解决方案是：

keepcols = [c for c in reader.fieldnames if any(r[c] != '0' for r in allrows)]

上面发布的输入文件看起来有空格分隔的列。在这种情况下，我认为 csv 不是解析它的正确工具。相反，您可以使用 split：

import csv
with open("test1.csv","r") as f:
    fields=next(f).split()
    # print(fields)
    allrows=[]
    for line in f:
        line=line.split()
        row=dict(zip(fields,line))
        allrows.append(row)
        # print(row)
    keepcols = [c for c in fields if any(row[c] != '0' for row in allrows)]
    print keepcols
    writer=csv.DictWriter(open("output1.csv","w"),fieldnames=keepcols,extrasaction='ignore')
    writer.writerows(allrows)

Edit2: 列顺序更改的原因是 for c in allrows[0] 返回键以未指定的顺序排列的 allrows[0]。默认情况下，dict 键没有排序。上面的代码通过将 fields 定义为列表而不是 dict 来解决这个问题。

原始答案：
将 fieldnames='keepcols' 更改为 fieldnames=keepcols。

fieldnames 需要是一系列键，例如 ['fieldA','fieldB',...]。

Python 中需要注意的一个潜在陷阱是字符串是序列。当你迭代一个字符串时，你会得到该字符串的字符。因此，当您说 fieldnames='keepcols' 时，您将 fieldnames 设置为字符序列 ['k','e','e', 'p','c','o','l','s']。您不会收到错误，因为这是有效的键序列。但是您的字典列表 allrows 碰巧没有这些键。 writer.writerows 愉快地忽略了这一点，因为extrasaction='ignore'。

Edit: If the input file is a comma-separated values file, then
to maintain the order of the keys, use reader.fieldnames instead of the keys in allrows[0].

So the solution would be:

keepcols = [c for c in reader.fieldnames if any(r[c] != '0' for r in allrows)]

The input file posted above looks like it has space-separated columns. In this case, I don't think csv is the right tool for parsing it. Instead, you can use split:

import csv
with open("test1.csv","r") as f:
    fields=next(f).split()
    # print(fields)
    allrows=[]
    for line in f:
        line=line.split()
        row=dict(zip(fields,line))
        allrows.append(row)
        # print(row)
    keepcols = [c for c in fields if any(row[c] != '0' for row in allrows)]
    print keepcols
    writer=csv.DictWriter(open("output1.csv","w"),fieldnames=keepcols,extrasaction='ignore')
    writer.writerows(allrows)

Edit2: The reason why the column order was changing is because for c in allrows[0] returns the keys of allrows[0] in an unspecified order. dict keys are not ordered by default. The above code works around this by defining fields to be a list, not a dict.

Original answer:
Change fieldnames='keepcols' to fieldnames=keepcols.

fieldnames needs to be a sequence of keys, such as ['fieldA','fieldB',...].

A potential pitfall to be aware of in Python is that strings are sequences. When you iterate over a string, you get the characters of the string. So when you say fieldnames='keepcols', you are setting fieldnames to be the sequence of characters ['k','e','e','p','c','o','l','s']. You don't get an error because this is a valid sequence of keys. But your list of dicts, allrows doesn't happen to have these keys. writer.writerows blithely ignores this since extrasaction='ignore'.

回复收藏 0 原文

~没有更多了~