当前位置：文江博客话题详情

Python：合并计数数据

发布于 2024-11-06 04:50:57 字数 748 浏览 9 评论 0原文

好的 - 我确信这个问题以前已经在这里得到过回答，但我找不到它......

我的问题：我有一个包含此成分的列表

0.2 A

0.1 A

0.3 A

0.3 B

0.2 C

0.5 C

我的目标是输出以下内容：

0.6 A

0.3 B

0.7 C

换句话说，我需要将多行数据合并在一起。

这是我正在使用的代码：

unique_percents = []

for line in percents:
    new_percent = float(line[0])
    for inner_line in percents:
        if line[1] == inner_line[1]:
           new_percent += float(inner_line[0])
        else:
            temp = []
            temp.append(new_percent)
            temp.append(line[1])
            unique_percents.append(temp)
            break

我认为它应该可以工作，但它没有将百分比相加，并且仍然有重复项。也许我不明白“休息”是如何运作的？

我还将采纳更好的循环结构或算法的建议。谢谢，大卫。

原文

Okay - I'm sure this has been answered here before but I can't find it....

My problem: I have a list of lists with this composition

0.2 A

0.1 A

0.3 A

0.3 B

0.2 C

0.5 C

My goal is to output the following:

0.6 A

0.3 B

0.7 C

In other words, I need to merge the data from multiple lines together.

Here's the code I'm using:

unique_percents = []

for line in percents:
    new_percent = float(line[0])
    for inner_line in percents:
        if line[1] == inner_line[1]:
           new_percent += float(inner_line[0])
        else:
            temp = []
            temp.append(new_percent)
            temp.append(line[1])
            unique_percents.append(temp)
            break

I think it should work, but it's not adding the percents up and still has the duplicates. Perhaps I'm not understanding how "break" works?

I'll also take suggestions of a better loop structure or algorithm to use. Thanks, David.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

守望孤独 2024-11-13 04:50:57

您想使用字典，但是 collections.defaultdict 在这里非常方便，这样您就不必担心字典中是否存在该键 - 它只是默认为 0.0 :

import collections

lines = [[0.2, 'A'], [0.1, 'A'], [0.3, 'A'], [0.3, 'B'], [0.2, 'C'], [0.5, 'C']]
amounts = collections.defaultdict(float)
for amount, letter in lines:
    amounts[letter] += amount

for letter, amount in sorted(amounts.iteritems()):
    print amount, letter

You want to use a dict, but collections.defaultdict can come in really handy here so that you don't have to worry about whether the key exists in the dict or not -- it just defaults to 0.0:

import collections

lines = [[0.2, 'A'], [0.1, 'A'], [0.3, 'A'], [0.3, 'B'], [0.2, 'C'], [0.5, 'C']]
amounts = collections.defaultdict(float)
for amount, letter in lines:
    amounts[letter] += amount

for letter, amount in sorted(amounts.iteritems()):
    print amount, letter

回复收藏 0 原文

绅刃 2024-11-13 04:50:57

试试这个：

result = {}
for line in percents:
    value, key = line
    result[key] = result.get(key, 0) + float(value)

Try this out:

result = {}
for line in percents:
    value, key = line
    result[key] = result.get(key, 0) + float(value)

回复收藏 0 原文

暗恋未遂 2024-11-13 04:50:57

total = {}
data = [('0.1', 'A'), ('0.2', 'A'), ('.3', 'B'), ('.4', 'B'), ('-10', 'C')]
for amount, key in data:
    total[key] = total.get(key, 0.0) + float(amount)

for key, amount in total.items():
    print key, amount

total = {}
data = [('0.1', 'A'), ('0.2', 'A'), ('.3', 'B'), ('.4', 'B'), ('-10', 'C')]
for amount, key in data:
    total[key] = total.get(key, 0.0) + float(amount)

for key, amount in total.items():
    print key, amount

回复收藏 0 原文

唯憾梦倾城 2024-11-13 04:50:57

由于所有字母等级都分组在一起，因此您可以使用 itertools.groupby （如果没有，只需提前对列表进行排序即可）：

data = [
    [0.2, 'A'],
    [0.1, 'A'],
    [0.3, 'A'],
    [0.3, 'B'],
    [0.2, 'C'],
    [0.5, 'C'],
]

from itertools import groupby

summary = dict((k, sum(i[0] for i in items)) 
                for k,items in groupby(data, key=lambda x:x[1]))

print summary

给出：

{'A': 0.60000000000000009, 'C': 0.69999999999999996, 'B': 0.29999999999999999}

Since all of the letter grades are grouped together, you can use itertools.groupby (and if not, just sort the list ahead of time to make them so):

data = [
    [0.2, 'A'],
    [0.1, 'A'],
    [0.3, 'A'],
    [0.3, 'B'],
    [0.2, 'C'],
    [0.5, 'C'],
]

from itertools import groupby

summary = dict((k, sum(i[0] for i in items)) 
                for k,items in groupby(data, key=lambda x:x[1]))

print summary

Gives:

{'A': 0.60000000000000009, 'C': 0.69999999999999996, 'B': 0.29999999999999999}

回复收藏 0 原文

温柔一刀 2024-11-13 04:50:57

如果您有这样的列表列表：
[ [0.2, A], [0.1, A], ...] （事实上，它看起来像一个元组列表:)

res_dict = {}

for pair in lst:
    letter = pair[1]
    val = pair[0]
    try:
        res_dict[letter] += val
    except KeyError:
        res_dict[letter] = val

res_lst = [(val, letter) for letter, val in res_dict] # note, a list of tuples!

If you have a list of lists like this:
[ [0.2, A], [0.1, A], ...] (in fact it looks like a list of tuples :)

res_dict = {}

for pair in lst:
    letter = pair[1]
    val = pair[0]
    try:
        res_dict[letter] += val
    except KeyError:
        res_dict[letter] = val

res_lst = [(val, letter) for letter, val in res_dict] # note, a list of tuples!

回复收藏 0 原文

溺ぐ爱和你が 2024-11-13 04:50:57

使用 collections.defaultdict 计算值
（假设d中的文本数据）：

>>> s=collections.defaultdict(float)
>>> for ln in d:
...     v,k=ln.split()
...     s[k] += float(v)
>>> s
defaultdict(<type 'float'>, {'A': 0.60000000000000009, 'C': 0.69999999999999996, 'B': 0.29999999999999999})
>>> ["%s %s" % (v,k) for k,v in s.iteritems()]
['0.6 A', '0.7 C', '0.3 B']
>>>

Using collections.defaultdict to tally values
(assuming text data in d):

>>> s=collections.defaultdict(float)
>>> for ln in d:
...     v,k=ln.split()
...     s[k] += float(v)
>>> s
defaultdict(<type 'float'>, {'A': 0.60000000000000009, 'C': 0.69999999999999996, 'B': 0.29999999999999999})
>>> ["%s %s" % (v,k) for k,v in s.iteritems()]
['0.6 A', '0.7 C', '0.3 B']
>>>

回复收藏 0 原文

眼泪都笑了 2024-11-13 04:50:57

如果您使用的是 Python 3.1 或更高版本，则可以使用 collections.Counter。我还建议使用 decimal.Decimal 而不是浮点数：

# Counter requires python 3.1 and newer
from collections import Counter
from decimal import Decimal

lines = ["0.2 A", "0.1 A", "0.3 A", "0.3 B", "0.2 C", "0.5 C"]
results = Counter()
for line in lines:
    percent, label = line.split()
    results[label] += Decimal(percent)
print(results)

结果是：

计数器({'C': 十进制('0.7'), 'A': 十进制('0.6'), 'B': 十进制('0.3')})

If you are using Python 3.1 or newer, you can use collections.Counter. Also I suggest using decimal.Decimal instead of floats:

# Counter requires python 3.1 and newer
from collections import Counter
from decimal import Decimal

lines = ["0.2 A", "0.1 A", "0.3 A", "0.3 B", "0.2 C", "0.5 C"]
results = Counter()
for line in lines:
    percent, label = line.split()
    results[label] += Decimal(percent)
print(results)

The result is:

Counter({'C': Decimal('0.7'), 'A': Decimal('0.6'), 'B': Decimal('0.3')})

回复收藏 0 原文

凉月流沐 2024-11-13 04:50:57

这很冗长，但有效：

# Python 2.7
lines = """0.2 A
0.1 A
0.3 A
0.3 B
0.2 C
0.5 C"""

lines = lines.split('\n')
#print(lines)
pctg2total = {}
thing2index = {}
index = 0
for line in lines:
    pctg, thing = line.split()
    pctg = float(pctg)
    if thing not in thing2index:
        thing2index[thing] = index
        index = index + 1
        pctg2total[thing] = pctg
    else:
        pctg2total[thing] = pctg2total[thing] + pctg
output = ((pctg2total[thing], thing) for thing in pctg2total)
# Let's sort by the first occurrence.
output = list(sorted(output, key = lambda thing: thing2index[thing[1]]))
print(output)

>>> 
[(0.60000000000000009, 'A'), (0.29999999999999999, 'B'), (0.69999999999999996, 'C')]

This is verbose, but works:

# Python 2.7
lines = """0.2 A
0.1 A
0.3 A
0.3 B
0.2 C
0.5 C"""

lines = lines.split('\n')
#print(lines)
pctg2total = {}
thing2index = {}
index = 0
for line in lines:
    pctg, thing = line.split()
    pctg = float(pctg)
    if thing not in thing2index:
        thing2index[thing] = index
        index = index + 1
        pctg2total[thing] = pctg
    else:
        pctg2total[thing] = pctg2total[thing] + pctg
output = ((pctg2total[thing], thing) for thing in pctg2total)
# Let's sort by the first occurrence.
output = list(sorted(output, key = lambda thing: thing2index[thing[1]]))
print(output)

>>> 
[(0.60000000000000009, 'A'), (0.29999999999999999, 'B'), (0.69999999999999996, 'C')]

回复收藏 0 原文

沦落红尘 2024-11-13 04:50:57

letters = {}
for line in open("data", "r"):
    lineStrip = line.strip().split()
    percent = float(lineStrip[0])
    letter = lineStrip[1]
    if letter in letters:
        letters[letter] = percent + letters[letter]
    else:
        letters[letter] = percent

for letter, percent in letters.items():
    print letter, percent

A 0.6
C 0.7
B 0.3

letters = {}
for line in open("data", "r"):
    lineStrip = line.strip().split()
    percent = float(lineStrip[0])
    letter = lineStrip[1]
    if letter in letters:
        letters[letter] = percent + letters[letter]
    else:
        letters[letter] = percent

for letter, percent in letters.items():
    print letter, percent

A 0.6
C 0.7
B 0.3

回复收藏 0 原文

风苍溪 2024-11-13 04:50:57

假设我们有这个，

data =[(b, float(a)) for a,b in 
    (line.split() for line in
        """
        0.2 A
        0.1 A
        0.3 A
        0.3 B
        0.2 C
        0.5 C""".splitlines()
        if line)]
print data 
# [('A', 0.2), ('A', 0.1), ('A', 0.3), ('B', 0.3), ('C', 0.2), ('C', 0.5)]

您现在可以遍历这个和 sum

counter = {}
for letter, val in data:
    if letter in counter:
        counter[letter]+=val
    else:
        counter[letter]=val

print counter.items()

或将值组合在一起并使用 sum：

from itertools import groupby
# you want the name and the sum of the values
print [(name, sum(value for k,value in grp)) 
    # from each group
    for name, grp in 
    # where the group name of a item `p` is given by `p[0]`
    groupby(sorted(data), key=lambda p:p[0])]

Lets say we have this

data =[(b, float(a)) for a,b in 
    (line.split() for line in
        """
        0.2 A
        0.1 A
        0.3 A
        0.3 B
        0.2 C
        0.5 C""".splitlines()
        if line)]
print data 
# [('A', 0.2), ('A', 0.1), ('A', 0.3), ('B', 0.3), ('C', 0.2), ('C', 0.5)]

You can now just go though this and sum

counter = {}
for letter, val in data:
    if letter in counter:
        counter[letter]+=val
    else:
        counter[letter]=val

print counter.items()

Or group values together and use sum:

from itertools import groupby
# you want the name and the sum of the values
print [(name, sum(value for k,value in grp)) 
    # from each group
    for name, grp in 
    # where the group name of a item `p` is given by `p[0]`
    groupby(sorted(data), key=lambda p:p[0])]

回复收藏 0 原文

梦晓ヶ微光ヅ倾城 2024-11-13 04:50:57

>>> from itertools import groupby, imap
>>> from operator import itemgetter
>>> data = [['0.2', 'A'], ['0.1', 'A'], ['0.3', 'A'], ['0.3', 'B'], ['0.2', 'C'], ['0.5', 'C']]
>>> # data = sorted(data, key=itemgetter(1))
... 
>>> for k, g in groupby(data, key=itemgetter(1)):
...     print sum(imap(float, imap(itemgetter(0), g))), k
... 
0.6 A
0.3 B
0.7 C
>>>

>>> from itertools import groupby, imap
>>> from operator import itemgetter
>>> data = [['0.2', 'A'], ['0.1', 'A'], ['0.3', 'A'], ['0.3', 'B'], ['0.2', 'C'], ['0.5', 'C']]
>>> # data = sorted(data, key=itemgetter(1))
... 
>>> for k, g in groupby(data, key=itemgetter(1)):
...     print sum(imap(float, imap(itemgetter(0), g))), k
... 
0.6 A
0.3 B
0.7 C
>>>

回复收藏 0 原文

~没有更多了~