Python CSV - 需要根据一键对值进行分组和计算

发布于 2024-10-22 18:23:33 字数 392 浏览 1 评论 0原文

我有一个简单的 3 列 csv 文件，我需要使用 python 根据一个键对每一行进行分组，然后对另一个键的值进行平均并返回它们。文件为标准csv格式，照此设置；

ID, ZIPCODE, RATE
1, 19003, 27.50
2, 19003, 31.33
3, 19083, 41.4
4, 19083, 17.9
5, 19102, 21.40

所以基本上我需要做的是计算该文件中每个唯一邮政编码 col[1] 的平均速率 col[2] 并返回结果。因此，获取 19003 年、19083 年等所有记录的平均汇率。

我已经考虑过使用 csv 模块并将文件读入字典中，然后根据邮政编码列中的唯一值对字典进行排序，但似乎无法取得任何进展。

任何帮助/建议表示赞赏。

原文

I have a simple 3 column csv file that i need to use python to group each row based on one key, then average the values for another key and return them. File is standard csv format, set up as so;

ID, ZIPCODE, RATE
1, 19003, 27.50
2, 19003, 31.33
3, 19083, 41.4
4, 19083, 17.9
5, 19102, 21.40

So basically what I need to do is calculate the average rate col[2] for each unique zipcode col[1] in that file and return the results. So get average rate for all records in 19003, 19083, and so on.

I've looked at using csv module and reading the file into a dictionary, then sorting the dict based on unique values in the zipcode col but can't seem to make any progress.

Any help/suggestions appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一身骄傲 2024-10-29 18:23:34

我记录了一些步骤来帮助澄清事情：

import csv
from collections import defaultdict

# a dictionary whose value defaults to a list.
data = defaultdict(list)
# open the csv file and iterate over its rows. the enumerate()
# function gives us an incrementing row number
for i, row in enumerate(csv.reader(open('data.csv', 'rb'))):
    # skip the header line and any empty rows
    # we take advantage of the first row being indexed at 0
    # i=0 which evaluates as false, as does an empty row
    if not i or not row:
        continue
    # unpack the columns into local variables
    _, zipcode, level = row
    # for each zipcode, add the level the list
    data[zipcode].append(float(level))

# loop over each zipcode and its list of levels and calculate the average
for zipcode, levels in data.iteritems():
    print zipcode, sum(levels) / float(len(levels))

输出：

19102 21.4
19003 29.415
19083 29.65

I've documented some steps to help clarify things:

import csv
from collections import defaultdict

# a dictionary whose value defaults to a list.
data = defaultdict(list)
# open the csv file and iterate over its rows. the enumerate()
# function gives us an incrementing row number
for i, row in enumerate(csv.reader(open('data.csv', 'rb'))):
    # skip the header line and any empty rows
    # we take advantage of the first row being indexed at 0
    # i=0 which evaluates as false, as does an empty row
    if not i or not row:
        continue
    # unpack the columns into local variables
    _, zipcode, level = row
    # for each zipcode, add the level the list
    data[zipcode].append(float(level))

# loop over each zipcode and its list of levels and calculate the average
for zipcode, levels in data.iteritems():
    print zipcode, sum(levels) / float(len(levels))

Output:

19102 21.4
19003 29.415
19083 29.65

回复收藏 0 原文

丶情人眼里出诗心の 2024-10-29 18:23:34

通常，如果我必须进行复杂的阐述，我会使用 csv 加载关系数据库表中的行（sqlite 是最快的方法），然后使用标准 sql 方法来提取数据并计算平均值：

import csv
from StringIO import StringIO
import sqlite3

data = """1,19003,27.50
2,19003,31.33
3,19083,41.4
4,19083,17.9
5,19102,21.40
"""

f = StringIO(data)
reader = csv.reader(f)

conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table data (ID text, ZIPCODE text, RATE real)''')
conn.commit()

for e in reader:
    e[2] = float(e[2])
    c.execute("""insert into data
          values (?,?,?)""", e)

conn.commit()

c.execute('''select ZIPCODE, avg(RATE) from data group by ZIPCODE''')
for row in c:
    print row

Usually if I have to do complicate elaboration I use csv to load the rows in a table of a relational DB (sqlite is the fastest way) then I use the standard sql methods to extract data and calculate average values:

import csv
from StringIO import StringIO
import sqlite3

data = """1,19003,27.50
2,19003,31.33
3,19083,41.4
4,19083,17.9
5,19102,21.40
"""

f = StringIO(data)
reader = csv.reader(f)

conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table data (ID text, ZIPCODE text, RATE real)''')
conn.commit()

for e in reader:
    e[2] = float(e[2])
    c.execute("""insert into data
          values (?,?,?)""", e)

conn.commit()

c.execute('''select ZIPCODE, avg(RATE) from data group by ZIPCODE''')
for row in c:
    print row

回复收藏 0 原文

~没有更多了~