查找从 CSV 文件读取的多个列表中的重复项 (Python)
标题似乎令人困惑,但假设我正在使用以下 CSV 文件(“names.csv”)。
name1,name2,name3
Bob,Jane,Joe
Megan,Tom,Jane
Jane,Joe,Rob
我的问题是,我将如何编写返回至少出现 3 次的字符串的代码。所以输出应该是“Jane”,因为这种情况至少发生了 3 次。这里真的很困惑..也许一些示例代码可以帮助我更好地理解?
到目前为止,我已经:
import csv
reader = csv.DictReader(open("names.csv"))
for row in reader:
names = [row['name1'], row['name2'], row['name3']]
print names
这返回:
['Bob', 'Jane', 'Joe']
['Megan', 'Tom', 'Jane']
['Jane', 'Joe', 'Rob']
我从这里去哪里?或者我会做错事吗?我对Python真的很陌生(嗯,完全是编程),所以我几乎不知道我在做什么......
干杯
Title seems confusing, but let's say I'm working with the following CSV file ('names.csv').
name1,name2,name3
Bob,Jane,Joe
Megan,Tom,Jane
Jane,Joe,Rob
My question is, how would I go about making code that returns the string that occurs at least 3 times. So the output should be 'Jane', because that occurs at least 3 times. Really confused here.. perhaps some sample code would help me better understand?
So far I have:
import csv
reader = csv.DictReader(open("names.csv"))
for row in reader:
names = [row['name1'], row['name2'], row['name3']]
print names
This returns:
['Bob', 'Jane', 'Joe']
['Megan', 'Tom', 'Jane']
['Jane', 'Joe', 'Rob']
Where do I go from here? Or am I going about this wrong? I'm really new to Python (well, programming altogether), so I have close to no clue what I'm doing..
Cheers
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
总而言之(并显示正确的 csv.reader 用法):
更新以响应评论:
无论您是否使用 DictReader 方法,您都需要通读整个 CSV 文件。例如,如果您想忽略“name2”列(而不是行),则忽略它。您不需要像使用变量名称“rows”所建议的那样保存所有数据。以下是更通用方法的代码,该方法不依赖于特定顺序的列标题,并允许选择/拒绝特定列。
Putting it altogether (and showing proper csv.reader usage):
Update in response to comment:
You need to read through the whole CSV file whether you use the DictReader approach or not. If you want to e.g. ignore the 'name2' column (not row), then ignore it. You don't need to save all the data as your use of the variable name "rows" suggests. Here is code for a more general approach that doesn't rely on the column headings being in a particular order and allows selection/rejection of particular columns.
我会这样做:
它使用默认值为 0 的字典来计算每个名称在文件中出现的次数,然后根据条件过滤字典(计数> = 3)。
I'd do it like this:
It uses dict with default value of 0 to count how many times each name happens in the file, and then it filters the dict with according condition (count >= 3).