在增加现有值的同时向字典添加新键

发布于 2024-09-29 00:45:13 字数 553 浏览 8 评论 0原文

我正在处理一个 CSV 文件并计算第 4 列的唯一值。到目前为止，我已经用三种方式进行了编码。第一个使用“if key indictionary”，第二个使用“KeyError”，第三个使用“DefaultDictionary”。例如（其中 x[3] 是文件中的值，“a”是字典）：

第一种方式：

if x[3] in a:
    a[x[3]] += 1
else:
    a[x[3]] = 1

第二种方式：

try:
    b[x[3]] += 1
except KeyError:
    b[x[3]] = 1

第三种方式：

from collections import defaultdict
c = defaultdict(int)
c[x[3]] += 1

我的问题是：哪种方式更有效......更干净......更好。 ..等等或者有更好的方法吗？两种方法都有效并给出相同的答案，但我想我应该将蜂巢思维作为一个学习案例。

谢谢 -

原文

I am processing a CSV file and counting the unique values of column 4. So far I have coded this three ways. One uses "if key in dictionary", the second traps the KeyError and the third uses "DefaultDictionary". For example (where x[3] is the value from the file and "a" is a dictionary):

First way:

if x[3] in a:
    a[x[3]] += 1
else:
    a[x[3]] = 1

Second way:

try:
    b[x[3]] += 1
except KeyError:
    b[x[3]] = 1

Third way:

from collections import defaultdict
c = defaultdict(int)
c[x[3]] += 1

My question is: which way is more efficient... cleaner... better... etc. Or is there a better way. Both ways work and give the same answer, but I thought I would tap the hive mind as a learning case.

Thanks -

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甲如呢乙后呢 2024-10-06 00:45:21

使用setdefault。

a[x[3]] = a.setdefault(x[3], 0) + 1

setdefault 获取指定键的值（本例中为x[3]），如果不存在，则获取指定值（0 > 在这种情况下）。

Use setdefault.

a[x[3]] = a.setdefault(x[3], 0) + 1

setdefault gets the value of the specified key (x[3] in this case), or if it does not exist, the specified value (0 in this case).

回复收藏 0 原文

纵情客 2024-10-06 00:45:19

由于您无权访问 Counter，因此最好的选择是第三种方法。它更干净、更容易阅读。此外，它没有前两种方法所具有的永久测试（和分支），这使得它更加高效。

回复收藏 0 原文

一场春暖 2024-10-06 00:45:18

from collections import Counter
Counter(a)

from collections import Counter
Counter(a)

回复收藏 0 原文

丢了幸福的猪 2024-10-06 00:45:17

你问哪个更有效率。假设您正在谈论执行速度：如果您的数据很小，那没关系。如果它很大并且很典型，那么“已经存在”的情况会比“不在字典中”的情况发生得更频繁。这一观察结果解释了一些结果。

下面是一些代码，可以与 timeit 模块一起使用来探索速度，而无需文件读取开销。我冒昧地添加了第五种方法，该方法并非没有竞争力，并且可以在至少 1.5.2 [已测试] 及以上版本的任何 Python 上运行。

from collections import defaultdict, Counter

def tally0(iterable):
    # DOESN'T WORK -- common base case for timing
    d = {}
    for item in iterable:
        d[item] = 1
    return d

def tally1(iterable):
    d = {}
    for item in iterable:
        if item in d:
            d[item] += 1
        else:
            d[item] = 1
    return d

def tally2(iterable):
    d = {}
    for item in iterable:
        try:
            d[item] += 1
        except KeyError:
            d[item] = 1
    return d

def tally3(iterable):
    d = defaultdict(int)
    for item in iterable:
        d[item] += 1

def tally4(iterable):
    d = Counter()
    for item in iterable:
        d[item] += 1

def tally5(iterable):
    d = {}
    dg = d.get
    for item in iterable:
        d[item] = dg(item, 0) + 1
    return d

典型运行（在 Windows XP“命令提示符”窗口中）：

prompt>\python27\python -mtimeit -s"t=1000*'now is the winter of our discontent made glorious summer by this son of york';import tally_bench as tb" "tb.tally1(t)"
10 loops, best of 3: 29.5 msec per loop

以下是结果（每个循环毫秒）：

0 base case   13.6
1 if k in d   29.5
2 try/except  26.1
3 defaultdict 23.4
4 Counter     79.4
5 d.get(k, 0) 29.2

另一个计时试验：

prompt>\python27\python -mtimeit -s"from collections import defaultdict;d=defaultdict(int)" "d[1]+=1"
1000000 loops, best of 3: 0.309 usec per loop

prompt>\python27\python -mtimeit -s"from collections import Counter;d=Counter()" "d[1]+=1"
1000000 loops, best of 3: 1.02 usec per loop

Counter 的速度可能是由于它部分地在 Python 代码中实现，而defaultdict 完全是用 C 语言编写的（至少在 2.7 中是这样）。

请注意，Counter() 不仅仅是 defaultdict(int) 的“语法糖”——它实现了一个完整的 bag 又名 multiset 对象——详细信息请参阅文档；如果您需要一些花哨的后期处理，它们可以让您免于重新发明轮子。如果您只想计算数量，请使用 defaultdict。

更新回答@Steven Rumbalski的问题：“”“我很好奇，如果将可迭代移到 Counter 构造函数中会发生什么： d = Counter(iterable)？（我有 python 2.6并且无法测试它。） """

tally6: 只是 d = Count(iterable); return d，需要 60.0 毫秒

您可以查看源代码（SVN 存储库中的 collections.py）...这是我的 Python27\Lib\collections.py 在 时执行的操作iterable 不是 Mapping 实例：

            self_get = self.get
            for elem in iterable:
                self[elem] = self_get(elem, 0) + 1

以前在任何地方见过该代码吗？为了调用可在 Python 1.5.2 中运行的代码，需要进行大量的操作 :-O

You asked which was more efficient. Assuming that you are talking about execution speed: If your data is small, it doesn't matter. If it is large and typical, the "already exists" case will happen much more often than the "not in dict" case. This observation explains some of the results.

Below is some code which can be used with the timeit module to explore speed without file-reading overhead. I have taken the liberty of adding a 5th method, which is not uncompetetive and will run on any Python from at least 1.5.2 [tested] onwards.

from collections import defaultdict, Counter

def tally0(iterable):
    # DOESN'T WORK -- common base case for timing
    d = {}
    for item in iterable:
        d[item] = 1
    return d

def tally1(iterable):
    d = {}
    for item in iterable:
        if item in d:
            d[item] += 1
        else:
            d[item] = 1
    return d

def tally2(iterable):
    d = {}
    for item in iterable:
        try:
            d[item] += 1
        except KeyError:
            d[item] = 1
    return d

def tally3(iterable):
    d = defaultdict(int)
    for item in iterable:
        d[item] += 1

def tally4(iterable):
    d = Counter()
    for item in iterable:
        d[item] += 1

def tally5(iterable):
    d = {}
    dg = d.get
    for item in iterable:
        d[item] = dg(item, 0) + 1
    return d

Typical run (in Windows XP "Command Prompt" window):

prompt>\python27\python -mtimeit -s"t=1000*'now is the winter of our discontent made glorious summer by this son of york';import tally_bench as tb" "tb.tally1(t)"
10 loops, best of 3: 29.5 msec per loop

Here are the results (msec per loop):

0 base case   13.6
1 if k in d   29.5
2 try/except  26.1
3 defaultdict 23.4
4 Counter     79.4
5 d.get(k, 0) 29.2

Another timing trial:

prompt>\python27\python -mtimeit -s"from collections import defaultdict;d=defaultdict(int)" "d[1]+=1"
1000000 loops, best of 3: 0.309 usec per loop

prompt>\python27\python -mtimeit -s"from collections import Counter;d=Counter()" "d[1]+=1"
1000000 loops, best of 3: 1.02 usec per loop

The speed of Counter is possibly due to it being implemented partly in Python code whereas defaultdict is entirely in C (in 2.7, at least).

Note that Counter() is NOT just "syntactic sugar" for defaultdict(int) -- it implements a full bag aka multiset object -- see the docs for details; they may save you from reinventing the wheel if you need some fancy post-processing. If all you want to do is count things, use defaultdict.

Update in response to a question from @Steven Rumbalski: """ I'm curious, what happens if you move the iterable into the Counter constructor: d = Counter(iterable)? (I have python 2.6 and cannot test it.) """

tally6: just does d = Count(iterable); return d, takes 60.0 msecs

You could look at the source (collections.py in the SVN repository) ... here's what my Python27\Lib\collections.py does when iterable is not a Mapping instance:

            self_get = self.get
            for elem in iterable:
                self[elem] = self_get(elem, 0) + 1

Seen that code anywhere before? There's a whole lot of carry-on just to call code that's runnable in Python 1.5.2 :-O

回复收藏 0 原文

不忘初心 2024-10-06 00:45:16

使用collections.Counter。 Counter 是 defaultdict(int) 的语法糖，但它最酷的地方在于它在构造函数中接受一个可迭代对象，从而节省了一个额外的步骤（我假设你的所有上面的示例包含在 for 循环中。）

from collections import Counter
count = Counter(x[3] for x in my_csv_reader)

在引入 collections.Counter 之前，collections.defaultdict 是执行此任务的最惯用方法，因此对于用户 collections.Counter 来说，这是最惯用的。 2.7、使用defaultdict。

from collections import defaultdict
count = defaultdict(int)
for x in my_csv_reader:
    count[x[3]] += 1

Use collections.Counter. Counter is syntactic sugar for defaultdict(int), but what's cool about it is that it accepts an iterable in the constructor, thus saving an extra step (I assume all of your examples above are wrapped in a for-loop.)

from collections import Counter
count = Counter(x[3] for x in my_csv_reader)

Prior to the introduction of collections.Counter, collections.defaultdict was the most idiomatic for this task, so for users < 2.7, use defaultdict.

from collections import defaultdict
count = defaultdict(int)
for x in my_csv_reader:
    count[x[3]] += 1

回复收藏 0 原文

~没有更多了~

关于作者

甚是思念

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

在增加现有值的同时向字典添加新键

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

知足的幸福

我一向站在原地

慕烟庭风

秉忠贞之诚守退让之实

小兔几

mb_3y7WUgWY

友情链接

在增加现有值的同时向字典添加新键

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

知足的幸福

我一向站在原地

慕烟庭风

秉忠贞之诚 守退让之实

小兔几

mb_3y7WUgWY

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

秉忠贞之诚守退让之实