在增加现有值的同时向字典添加新键
我正在处理一个 CSV 文件并计算第 4 列的唯一值。到目前为止,我已经用三种方式进行了编码。第一个使用“if key indictionary”,第二个使用“KeyError”,第三个使用“DefaultDictionary”。例如(其中 x[3] 是文件中的值,“a”是字典):
第一种方式:
if x[3] in a:
a[x[3]] += 1
else:
a[x[3]] = 1
第二种方式:
try:
b[x[3]] += 1
except KeyError:
b[x[3]] = 1
第三种方式:
from collections import defaultdict
c = defaultdict(int)
c[x[3]] += 1
我的问题是:哪种方式更有效......更干净......更好。 ..等等或者有更好的方法吗?两种方法都有效并给出相同的答案,但我想我应该将蜂巢思维作为一个学习案例。
谢谢 -
I am processing a CSV file and counting the unique values of column 4. So far I have coded this three ways. One uses "if key in dictionary", the second traps the KeyError and the third uses "DefaultDictionary". For example (where x[3] is the value from the file and "a" is a dictionary):
First way:
if x[3] in a:
a[x[3]] += 1
else:
a[x[3]] = 1
Second way:
try:
b[x[3]] += 1
except KeyError:
b[x[3]] = 1
Third way:
from collections import defaultdict
c = defaultdict(int)
c[x[3]] += 1
My question is: which way is more efficient... cleaner... better... etc. Or is there a better way. Both ways work and give the same answer, but I thought I would tap the hive mind as a learning case.
Thanks -
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
使用
setdefault
。setdefault
获取指定键的值(本例中为x[3]
),如果不存在,则获取指定值(0
> 在这种情况下)。Use
setdefault
.setdefault
gets the value of the specified key (x[3]
in this case), or if it does not exist, the specified value (0
in this case).由于您无权访问 Counter,因此最好的选择是第三种方法。它更干净、更容易阅读。此外,它没有前两种方法所具有的永久测试(和分支),这使得它更加高效。
Since you don't have access to Counter, your best bet is your third approach. It's much cleaner and easier to read. In addition, it doesn't have the perpetual testing (and branching) that the first two approaches have, which makes it more efficient.
你问哪个更有效率。假设您正在谈论执行速度:如果您的数据很小,那没关系。如果它很大并且很典型,那么“已经存在”的情况会比“不在字典中”的情况发生得更频繁。这一观察结果解释了一些结果。
下面是一些代码,可以与 timeit 模块一起使用来探索速度,而无需文件读取开销。我冒昧地添加了第五种方法,该方法并非没有竞争力,并且可以在至少 1.5.2 [已测试] 及以上版本的任何 Python 上运行。
典型运行(在 Windows XP“命令提示符”窗口中):
以下是结果(每个循环毫秒):
另一个计时试验:
Counter
的速度可能是由于它部分地在 Python 代码中实现,而defaultdict
完全是用 C 语言编写的(至少在 2.7 中是这样)。请注意,
Counter()
不仅仅是defaultdict(int)
的“语法糖”——它实现了一个完整的bag
又名multiset
对象——详细信息请参阅文档;如果您需要一些花哨的后期处理,它们可以让您免于重新发明轮子。如果您只想计算数量,请使用defaultdict
。更新回答@Steven Rumbalski的问题:“”“我很好奇,如果将可迭代移到 Counter 构造函数中会发生什么: d = Counter(iterable)?(我有 python 2.6并且无法测试它。) """
tally6: 只是
d = Count(iterable); return d
,需要 60.0 毫秒您可以查看源代码(SVN 存储库中的 collections.py)...这是我的
Python27\Lib\collections.py
在时执行的操作iterable
不是 Mapping 实例:以前在任何地方见过该代码吗?为了调用可在 Python 1.5.2 中运行的代码,需要进行大量的操作
:-O
You asked which was more efficient. Assuming that you are talking about execution speed: If your data is small, it doesn't matter. If it is large and typical, the "already exists" case will happen much more often than the "not in dict" case. This observation explains some of the results.
Below is some code which can be used with the
timeit
module to explore speed without file-reading overhead. I have taken the liberty of adding a 5th method, which is not uncompetetive and will run on any Python from at least 1.5.2 [tested] onwards.Typical run (in Windows XP "Command Prompt" window):
Here are the results (msec per loop):
Another timing trial:
The speed of
Counter
is possibly due to it being implemented partly in Python code whereasdefaultdict
is entirely in C (in 2.7, at least).Note that
Counter()
is NOT just "syntactic sugar" fordefaultdict(int)
-- it implements a fullbag
akamultiset
object -- see the docs for details; they may save you from reinventing the wheel if you need some fancy post-processing. If all you want to do is count things, usedefaultdict
.Update in response to a question from @Steven Rumbalski: """ I'm curious, what happens if you move the iterable into the Counter constructor: d = Counter(iterable)? (I have python 2.6 and cannot test it.) """
tally6: just does
d = Count(iterable); return d
, takes 60.0 msecsYou could look at the source (collections.py in the SVN repository) ... here's what my
Python27\Lib\collections.py
does wheniterable
is not a Mapping instance:Seen that code anywhere before? There's a whole lot of carry-on just to call code that's runnable in Python 1.5.2
:-O
使用
collections.Counter
。Counter
是defaultdict(int)
的语法糖,但它最酷的地方在于它在构造函数中接受一个可迭代对象,从而节省了一个额外的步骤(我假设你的所有上面的示例包含在 for 循环中。)在引入
collections.Counter
之前,collections.defaultdict
是执行此任务的最惯用方法,因此对于用户collections.Counter
来说,这是最惯用的。 2.7、使用defaultdict
。Use
collections.Counter
.Counter
is syntactic sugar fordefaultdict(int)
, but what's cool about it is that it accepts an iterable in the constructor, thus saving an extra step (I assume all of your examples above are wrapped in a for-loop.)Prior to the introduction of
collections.Counter
,collections.defaultdict
was the most idiomatic for this task, so for users < 2.7, usedefaultdict
.