在 Python 中对数据进行排序并从 CSV 文件生成表输出

发布于 2024-11-05 09:29:16 字数 617 浏览 0 评论 0原文

假设我有以下 CSV 文件 (subjects.csv)，

subjects,name1,name2,name3
Chemistry,Tom,Will,Rob
Biology,Megan,Sam,Tim
Physics,Tim,Will,Bob
Maths,Will,Tim,Joe

我想查找哪对学生共享同一班级，仅关注 Tim、Tom 和 Will。我该如何在 Python 中将它们配对呢？

即

蒂姆和威尔一起上两堂课。

汤姆和威尔一起上一节课。

此外，我想将其绘制在一个像我下面所写的表格上，其中两个轴上都有名称以及一对学生共享的班级数量（名称按字母升序或降序排序）。已经阅读了如何为整个 CSV 文件生成表格，但我无法从头开始制作表格，同时从 CSV 文件中剥离列和行。

             Tim        Tom     Will

    Tim   0           0       0

    Tom   0           0       1

    Will     2           0       0

这超出了我的个人技能范围水平，但我仍然想知道如何做并尝试理解。

原文

Let's say I've got the following CSV file (subjects.csv)

subjects,name1,name2,name3
Chemistry,Tom,Will,Rob
Biology,Megan,Sam,Tim
Physics,Tim,Will,Bob
Maths,Will,Tim,Joe

I want to find which pairs of students share the same class, focusing only on Tim, Tom and Will. How would I go about pairing these in Python?

i.e.

Tim and Will attend 2 classes together.

Tom and Will attend in 1 class together.

Furthermore, I want to plot this on a table like what I've written below, where it has names on both axis and the number of classes a pair of students both share (with names sorted in ascending or descending alphabetical order).. I've read about how to generate tables for entire CSV files, but I can't get my head around making tables from scratch whilst at the same time, stripping columns and rows from the CSV file..

             Tim        Tom     Will

    Tim   0           0       0

    Tom   0           0       1

    Will     2           0       0

This is way out of my personal skill level, but I'd still like to know how to do it and try to understand.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

最偏执的依靠 2024-11-12 09:29:16

您可以创建一个字典，其中包含每个学生正在参加的课程：

>>> import csv
>>> import collections
>>> D = collections.defaultdict(set)
>>> with open('subjects.csv','rb') as f:
...     subject_reader = csv.reader(f)
...     header = subject_reader.next()
...     for row in subject_reader:
...         for name in row[1:]:
...             D[name].add(row[0])
... 
>>> import pprint
>>> pprint.pprint(dict(D))
{'Bob': set(['Physics']),
 'Joe': set(['Maths']),
 'Megan': set(['Biology']),
 'Rob': set(['Chemistry']),
 'Sam': set(['Biology']),
 'Tim': set(['Biology', 'Maths', 'Physics']),
 'Tom': set(['Chemistry']),
 'Will': set(['Chemistry', 'Maths', 'Physics'])}
>>>

要检查人们一起参加了多少课程，您可以使用集合的交集方法：

>>> D['Tom'].intersection(D['Will'])
set(['Chemistry'])
>>> len(_)
1
>>> D['Tim'].intersection(D['Will'])
set(['Maths', 'Physics'])
>>> len(_)
2
>>>

要打印示例中的表格，您可以执行以下操作：

>>> EXAMPLE_NAMES = ['Tom','Tim','Will']
>>> for y_name in EXAMPLE_NAMES:
...     print '{0:{width}}'.format(y_name,width=5),
...     for x_name in EXAMPLE_NAMES:
...         if y_name==x_name:
...             print '{0:{width}}'.format('-'*5, width=5),
...         else:
...             print '{0:{width}}'.format(len(D[y_name].intersection(D[x_name])), width=5),
...     print
... 
Tom   -----     0     1
Tim       0 -----     2
Will      1     2 -----

表格的标题可能是看起来像这样：

    >>> for x_name in [' ']+EXAMPLE_NAMES:
    ...     print '{0:{width}}'.format(x_name, width=5),
    ... 
          Tom   Tim   Will

正如约翰在评论中提到的，我将名称硬编码到列表中，以模仿您上面给出的示例。要查看整个表，您可以使用 .iterkeys() 或 .keys() 从您创建的字典中获取或迭代键：

>>> import csv
>>> import collections
>>> 
>>> my_d = collections.defaultdict(set)
>>> with open('subjects.csv','rb') as f:
...     subject_reader = csv.reader(f)
...     header = subject_reader.next()
...     for row in subject_reader:
...         for name in row[1:]:
...             my_d[name].add(row[0])
... 
>>> def display_header(D):
...     for x_name in [' ']+D.keys():
...         print '{0:{width}}'.format(x_name, width=5),
...     print
... 
>>> def display_body(D):
...     for y_name in D.iterkeys():
...         print '{0:{width}}'.format(y_name,width=5),
...         for x_name in D.iterkeys():
...             if y_name==x_name:
...                 print '{0:{width}}'.format('-'*5, width=5),
...             else:
...                 print '{0:{width}}'.format(len(D[y_name].intersection(D[x_name])), width=5),
...         print
... 
>>> def display_table(D):
...     display_header(D)
...     display_body(D)
... 
>>> display_table(my_d)
      Sam   Rob   Megan Will  Tim   Joe   Tom   Bob  
Sam   -----     0     1     0     1     0     0     0
Rob       0 -----     0     1     0     0     1     0
Megan     1     0 -----     0     1     0     0     0
Will      0     1     0 -----     2     1     1     1
Tim       1     0     1     2 -----     1     0     1
Joe       0     0     0     1     1 -----     0     0
Tom       0     1     0     1     0     0 -----     0
Bob       0     0     0     1     1     0     0 -----
>>>

You can create a dictionary with what class each student is taking:

>>> import csv
>>> import collections
>>> D = collections.defaultdict(set)
>>> with open('subjects.csv','rb') as f:
...     subject_reader = csv.reader(f)
...     header = subject_reader.next()
...     for row in subject_reader:
...         for name in row[1:]:
...             D[name].add(row[0])
... 
>>> import pprint
>>> pprint.pprint(dict(D))
{'Bob': set(['Physics']),
 'Joe': set(['Maths']),
 'Megan': set(['Biology']),
 'Rob': set(['Chemistry']),
 'Sam': set(['Biology']),
 'Tim': set(['Biology', 'Maths', 'Physics']),
 'Tom': set(['Chemistry']),
 'Will': set(['Chemistry', 'Maths', 'Physics'])}
>>>

To check how many classes people are taking together you can use set's intersection method:

>>> D['Tom'].intersection(D['Will'])
set(['Chemistry'])
>>> len(_)
1
>>> D['Tim'].intersection(D['Will'])
set(['Maths', 'Physics'])
>>> len(_)
2
>>>

To print out the table in your example you can do something like this:

>>> EXAMPLE_NAMES = ['Tom','Tim','Will']
>>> for y_name in EXAMPLE_NAMES:
...     print '{0:{width}}'.format(y_name,width=5),
...     for x_name in EXAMPLE_NAMES:
...         if y_name==x_name:
...             print '{0:{width}}'.format('-'*5, width=5),
...         else:
...             print '{0:{width}}'.format(len(D[y_name].intersection(D[x_name])), width=5),
...     print
... 
Tom   -----     0     1
Tim       0 -----     2
Will      1     2 -----

a header for the table might look like this:

    >>> for x_name in [' ']+EXAMPLE_NAMES:
    ...     print '{0:{width}}'.format(x_name, width=5),
    ... 
          Tom   Tim   Will

As John mentions in the comments, I am hard coding the names into a list, to mimic the example you gave above. To see an entire table you can get or iterate the keys from the dictionary you created using .iterkeys() or .keys():

>>> import csv
>>> import collections
>>> 
>>> my_d = collections.defaultdict(set)
>>> with open('subjects.csv','rb') as f:
...     subject_reader = csv.reader(f)
...     header = subject_reader.next()
...     for row in subject_reader:
...         for name in row[1:]:
...             my_d[name].add(row[0])
... 
>>> def display_header(D):
...     for x_name in [' ']+D.keys():
...         print '{0:{width}}'.format(x_name, width=5),
...     print
... 
>>> def display_body(D):
...     for y_name in D.iterkeys():
...         print '{0:{width}}'.format(y_name,width=5),
...         for x_name in D.iterkeys():
...             if y_name==x_name:
...                 print '{0:{width}}'.format('-'*5, width=5),
...             else:
...                 print '{0:{width}}'.format(len(D[y_name].intersection(D[x_name])), width=5),
...         print
... 
>>> def display_table(D):
...     display_header(D)
...     display_body(D)
... 
>>> display_table(my_d)
      Sam   Rob   Megan Will  Tim   Joe   Tom   Bob  
Sam   -----     0     1     0     1     0     0     0
Rob       0 -----     0     1     0     0     1     0
Megan     1     0 -----     0     1     0     0     0
Will      0     1     0 -----     2     1     1     1
Tim       1     0     1     2 -----     1     0     1
Joe       0     0     0     1     1 -----     0     0
Tom       0     1     0     1     0     0 -----     0
Bob       0     0     0     1     1     0     0 -----
>>>

回复收藏 0 原文

~没有更多了~