如何从一行文本创建字典?

发布于 2024-10-06 05:35:20 字数 778 浏览 0 评论 0 原文

我有一个包含数千行的生成文件,如下所示:

CODE,XXX,DATE,20101201,TIME,070400,CONDITION_CODES,LTXT,PRICE,999.0000,QUANTITY,100,TSN,1510000001

有些行有字段较多,其他字段较少,但都遵循相同的键值对模式,并且每行都有一个 TSN 字段。

在对文件进行一些分析时,我编写了一个如下所示的循环来将文件读入字典中:

#!/usr/bin/env python

from sys import argv

records = {}
for line in open(argv[1]):
    fields = line.strip().split(',')
    record = dict(zip(fields[::2], fields[1::2]))
    records[record['TSN']] = record

print 'Found %d records in the file.' % len(records)

...这很好,并且完全按照我的要求执行( print 只是一个简单的例子)。

然而,它对我来说并没有特别“Pythonic”的感觉,而这行:

dict(zip(fields[::2], fields[1::2]))

只是感觉“笨重”(它在字段上迭代了多少次?)。

在 Python 2.6 中,是否有更好的方法只需使用标准模块即可实现此目的?

I have a generated file with thousands of lines like the following:

CODE,XXX,DATE,20101201,TIME,070400,CONDITION_CODES,LTXT,PRICE,999.0000,QUANTITY,100,TSN,1510000001

Some lines have more fields and others have fewer, but all follow the same pattern of key-value pairs and each line has a TSN field.

When doing some analysis on the file, I wrote a loop like the following to read the file into a dictionary:

#!/usr/bin/env python

from sys import argv

records = {}
for line in open(argv[1]):
    fields = line.strip().split(',')
    record = dict(zip(fields[::2], fields[1::2]))
    records[record['TSN']] = record

print 'Found %d records in the file.' % len(records)

...which is fine and does exactly what I want it to (the print is just a trivial example).

However, it doesn't feel particularly "pythonic" to me and the line with:

dict(zip(fields[::2], fields[1::2]))

Which just feels "clunky" (how many times does it iterate over the fields?).

Is there a better way of doing this in Python 2.6 with just the standard modules to hand?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

萌无敌 2024-10-13 05:35:20

在 Python 2 中,您可以在 itertools 模块中使用 izip 以及生成器对象的魔力来编写自己的函数,以简化 dict 值对的创建 记录。我从一个类似名称(尽管功能不同)的 pairwise() 的想法Python 2 itertools 文档中的 ="nofollow noreferrer">配方

要在 Python 3 中使用该方法,您只需使用普通的 zip() 即可,因为它执行 Python 2 中 izip() 的操作,导致后者从 中删除itertools — 下面的示例解决了这个问题,并且应该在两个版本中都可以工作。

try:
    from itertools import izip
except ImportError:  # Python 3
    izip = zip

def pairwise(iterable):
    "s -> (s0,s1), (s2,s3), (s4, s5), ..."
    a = iter(iterable)
    return izip(a, a)

可以在读取 for 循环的文件中像这样使用:

from sys import argv

records = {}
for line in open(argv[1]):
    fields = (field.strip() for field in line.split(','))  # generator expr
    record = dict(pairwise(fields))
    records[record['TSN']] = record

print('Found %d records in the file.' % len(records))

但是等等,还有更多!

可以创建一个通用版本,我将其称为 grouper() ,它又对应于一个类似名称的 itertools 配方(在 pairwise() 下面列出):

def grouper(n, iterable):
    "s -> (s0,s1,...sn-1), (sn,sn+1,...s2n-1), (s2n,s2n+1,...s3n-1), ..."
    return izip(*[iter(iterable)]*n)

可以在您的 中像这样使用for 循环:

    record = dict(grouper(2, fields))

当然,对于像这样的特定情况,可以轻松使用 functools.partial() 并用它创建类似的 pairwise() 函数(这在 Python 2 和 3 中都适用):

import functools
pairwise = functools.partial(grouper, 2)

Postscript

除非有大量字段,否则您可以从成对的行项目中创建一个实际序列(而不是使用 生成器表达式 没有 len()):

fields = tuple(field.strip() for field in line.split(','))

优点是它允许使用简单的切片来完成分组:

try:
    xrange
except NameError:  # Python 3
    xrange = range

def grouper(n, sequence):
    for i in xrange(0, len(sequence), n):
        yield sequence[i:i+n]

pairwise = functools.partial(grouper, 2)

In Python 2 you could use izip in the itertools module and the magic of generator objects to write your own function to simplify the creation of pairs of values for the dict records. I got the idea for pairwise() from a similarly named (although functionally different) recipe in the Python 2 itertools docs.

To use the approach in Python 3, you can just use plain zip() since it does what izip() did in Python 2 resulting in the latter's removal from itertools — the example below addresses this and should work in both versions.

try:
    from itertools import izip
except ImportError:  # Python 3
    izip = zip

def pairwise(iterable):
    "s -> (s0,s1), (s2,s3), (s4, s5), ..."
    a = iter(iterable)
    return izip(a, a)

Which can be used like this in your file reading for loop:

from sys import argv

records = {}
for line in open(argv[1]):
    fields = (field.strip() for field in line.split(','))  # generator expr
    record = dict(pairwise(fields))
    records[record['TSN']] = record

print('Found %d records in the file.' % len(records))

But wait, there's more!

It's possible to create a generalized version I'll call grouper(), which again corresponds to a similarly named itertools recipe (which is listed right below pairwise()):

def grouper(n, iterable):
    "s -> (s0,s1,...sn-1), (sn,sn+1,...s2n-1), (s2n,s2n+1,...s3n-1), ..."
    return izip(*[iter(iterable)]*n)

Which could be used like this in your for loop:

    record = dict(grouper(2, fields))

Of course, for specific cases like this, it's easy to use functools.partial() and create a similar pairwise() function with it (which will work in both Python 2 & 3):

import functools
pairwise = functools.partial(grouper, 2)

Postscript

Unless there's a really huge number of fields, you could instead create a actual sequence out of the pairs of line items (rather than using a generator expression which has no len()):

fields = tuple(field.strip() for field in line.split(','))

The advantage being that it would allow the grouping to be done using simple slicing:

try:
    xrange
except NameError:  # Python 3
    xrange = range

def grouper(n, sequence):
    for i in xrange(0, len(sequence), n):
        yield sequence[i:i+n]

pairwise = functools.partial(grouper, 2)
醉殇 2024-10-13 05:35:20

没有那么好,只是更高效......

完整说明

Not so much better as just more efficient...

Full explanation

对不⑦ 2024-10-13 05:35:20
import itertools

def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

record = dict(grouper(2, line.strip().split(","))

来源

import itertools

def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

record = dict(grouper(2, line.strip().split(","))

source

请别遗忘我 2024-10-13 05:35:20

如果我们无论如何都要把它抽象成一个函数,那么“从头开始”编写并不太难:

def pairs(iterable):
    iterator = iter(iterable)
    while True:
        try: yield (iterator.next(), iterator.next())
        except: return

不过,罗伯特的配方版本肯定在灵活性方面赢得了分数。

If we're going to abstract it into a function anyway, it's not too hard to write "from scratch":

def pairs(iterable):
    iterator = iter(iterable)
    while True:
        try: yield (iterator.next(), iterator.next())
        except: return

robert's recipe version definitely wins points for flexibility, though.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文