我有一个包含数千行的生成文件,如下所示:
CODE,XXX,DATE,20101201,TIME,070400,CONDITION_CODES,LTXT,PRICE,999.0000,QUANTITY,100,TSN,1510000001
有些行有字段较多,其他字段较少,但都遵循相同的键值对模式,并且每行都有一个 TSN 字段。
在对文件进行一些分析时,我编写了一个如下所示的循环来将文件读入字典中:
#!/usr/bin/env python
from sys import argv
records = {}
for line in open(argv[1]):
fields = line.strip().split(',')
record = dict(zip(fields[::2], fields[1::2]))
records[record['TSN']] = record
print 'Found %d records in the file.' % len(records)
...这很好,并且完全按照我的要求执行( print
只是一个简单的例子)。
然而,它对我来说并没有特别“Pythonic”的感觉,而这行:
dict(zip(fields[::2], fields[1::2]))
只是感觉“笨重”(它在字段上迭代了多少次?)。
在 Python 2.6 中,是否有更好的方法只需使用标准模块即可实现此目的?
I have a generated file with thousands of lines like the following:
CODE,XXX,DATE,20101201,TIME,070400,CONDITION_CODES,LTXT,PRICE,999.0000,QUANTITY,100,TSN,1510000001
Some lines have more fields and others have fewer, but all follow the same pattern of key-value pairs and each line has a TSN field.
When doing some analysis on the file, I wrote a loop like the following to read the file into a dictionary:
#!/usr/bin/env python
from sys import argv
records = {}
for line in open(argv[1]):
fields = line.strip().split(',')
record = dict(zip(fields[::2], fields[1::2]))
records[record['TSN']] = record
print 'Found %d records in the file.' % len(records)
...which is fine and does exactly what I want it to (the print
is just a trivial example).
However, it doesn't feel particularly "pythonic" to me and the line with:
dict(zip(fields[::2], fields[1::2]))
Which just feels "clunky" (how many times does it iterate over the fields?).
Is there a better way of doing this in Python 2.6 with just the standard modules to hand?
发布评论
评论(4)
在 Python 2 中,您可以在
itertools
模块中使用izip
以及生成器对象的魔力来编写自己的函数,以简化dict 值对的创建
记录。我从一个类似名称(尽管功能不同)的 pairwise() 的想法Python 2itertools
文档中的 ="nofollow noreferrer">配方。要在 Python 3 中使用该方法,您只需使用普通的
zip()
即可,因为它执行 Python 2 中izip()
的操作,导致后者从中删除itertools
— 下面的示例解决了这个问题,并且应该在两个版本中都可以工作。可以在读取
for
循环的文件中像这样使用:但是等等,还有更多!
可以创建一个通用版本,我将其称为
grouper()
,它又对应于一个类似名称的itertools
配方(在pairwise()
下面列出):可以在您的
中像这样使用for
循环:当然,对于像这样的特定情况,可以轻松使用
functools.partial()
并用它创建类似的pairwise()
函数(这在 Python 2 和 3 中都适用):Postscript
除非有大量字段,否则您可以从成对的行项目中创建一个实际序列(而不是使用 生成器表达式 没有
len()
):优点是它允许使用简单的切片来完成分组:
In Python 2 you could use
izip
in theitertools
module and the magic of generator objects to write your own function to simplify the creation of pairs of values for thedict
records. I got the idea forpairwise()
from a similarly named (although functionally different) recipe in the Python 2itertools
docs.To use the approach in Python 3, you can just use plain
zip()
since it does whatizip()
did in Python 2 resulting in the latter's removal fromitertools
— the example below addresses this and should work in both versions.Which can be used like this in your file reading
for
loop:But wait, there's more!
It's possible to create a generalized version I'll call
grouper()
, which again corresponds to a similarly nameditertools
recipe (which is listed right belowpairwise()
):Which could be used like this in your
for
loop:Of course, for specific cases like this, it's easy to use
functools.partial()
and create a similarpairwise()
function with it (which will work in both Python 2 & 3):Postscript
Unless there's a really huge number of fields, you could instead create a actual sequence out of the pairs of line items (rather than using a generator expression which has no
len()
):The advantage being that it would allow the grouping to be done using simple slicing:
没有那么好,只是更高效......
完整说明
Not so much better as just more efficient...
Full explanation
来源
source
如果我们无论如何都要把它抽象成一个函数,那么“从头开始”编写并不太难:
不过,罗伯特的配方版本肯定在灵活性方面赢得了分数。
If we're going to abstract it into a function anyway, it's not too hard to write "from scratch":
robert's recipe version definitely wins points for flexibility, though.