Python字典,查找相似之处

发布于 2024-10-13 19:08:19 字数 384 浏览 3 评论 0原文

我有一本包含一千个项目的Python字典。每个项目本身就是一本字典。我正在寻找一种干净而优雅的方式来解析每个项目,并找到并找到它。创建模板。

这是各个词典结构的简化示例:

{'id': 1,
 'template': None,
 'height': 80,
 'width': 120,
 'length': 75,
 'weight': 100}

由此,我想遍历一次,如果 1000 个词典中有 500 个共享相同的高度和宽度,请确定这一点,以便我可以根据该数据构建一个模板,并分配模板 ID 为“模板”。我可以构建一个巨大的引用哈希,但我希望有一种更干净、更优雅的方法来实现这一点。

实际数据包括接近 30 个键,其中一小部分需要从模板检查中排除。

I have a python dictionary with a thousand items. Each item is, itself, a dictionary. I'm looking for a clean and elegant way to parse through each item, and find & create templates.

Here's a simplified example of the individual dictionaries' structure:

{'id': 1,
 'template': None,
 'height': 80,
 'width': 120,
 'length': 75,
 'weight': 100}

From this, I want to pass through once, and if, 500 of the 1000 share the same height and width, determine that, so I can build a template off that data, and assign the template id to 'template'. I can build a gigantic reference hash, but I'm hoping there's a cleaner more elegant way to accomplish this.

The actual data includes closer to 30 keys, of which a small subset need to be excluded from the template checking.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦与时光遇 2024-10-20 19:08:19

给定字典 items 的字典:

import itertools as it

for (height, width), itemIter in it.groupby (items.values(), lambda x: (x['height'], x['width'])):
    # in list(itemIter) you will find all items with dimensions (height, width)

Given dict of dicts items:

import itertools as it

for (height, width), itemIter in it.groupby (items.values(), lambda x: (x['height'], x['width'])):
    # in list(itemIter) you will find all items with dimensions (height, width)
梦在深巷 2024-10-20 19:08:19

@eumiro 有一个出色的核心思想,即使用 itertools.groupby() 将具有共同值的项目批量排列在一起。然而,除了忽略首先使用 @Jochen Ritzel 指出的相同关键函数(并且在文档中也提到)对事物进行排序之外,他也没有解决您提到想要做的其他几件事。

下面是一个更完整、更长的答案。它确定模板并通过字典中的字典一次性分配它们。为此,在首先创建排序的项目列表后,它使用 groupby() 对它们进行批处理,如果每个组中有足够的项目,则创建一个模板并将其 ID 分配给每个成员。

inventory = {
    'item1': {'id': 1, 'template': None, 'height': 80, 'width': 120, 'length': 75, 'weight': 100},
    'item2': {'id': 2, 'template': None, 'height': 30, 'width': 40,  'length': 20, 'weight': 20},
    'item3': {'id': 3, 'template': None, 'height': 80, 'width': 100, 'length': 96, 'weight': 150},
    'item4': {'id': 4, 'template': None, 'height': 30, 'width': 40,  'length': 60, 'weight': 75},
    'item5': {'id': 5, 'template': None, 'height': 80, 'width': 100, 'length': 36, 'weight': 33}
}

import itertools as itools

def print_inventory():
    print 'inventory:'
    for key in sorted(inventory.iterkeys()):
        print '  {}: {}'.format(key, inventory[key])

print "-- BEFORE --"
print_inventory()

THRESHOLD = 2
ALLKEYS = ['template', 'height', 'width', 'length', 'weight']
EXCLUDEDKEYS = ['template', 'length', 'weight']
INCLUDEDKEYS = [key for key in ALLKEYS if key not in EXCLUDEDKEYS]

# determines which keys make up a template
sortby = lambda item, keys=INCLUDEDKEYS: tuple(item[key] for key in keys)

templates = {}
templateID = 0
sortedinventory = sorted(inventory.itervalues(), key=sortby)
for templatetuple, similariter in itools.groupby(sortedinventory, sortby):
    similaritems = list(similariter)
    if len(similaritems) >= THRESHOLD:
        # create and assign a template
        templateID += 1
        templates[templateID] = templatetuple # tuple of values of INCLUDEDKEYS
        for item in similaritems:
            item['template'] = templateID
print
print "-- AFTER --"
print_inventory()
print
print 'templates:', templates
print

当我运行它时,输出如下:

-- BEFORE --
inventory:
  item1: {'weight': 100, 'height': 80, 'width': 120, 'length': 75, 'template': None, 'id': 1}
  item2: {'weight': 20, 'height': 30, 'width': 40, 'length': 20, 'template': None, 'id': 2}
  item3: {'weight': 150, 'height': 80, 'width': 100, 'length': 96, 'template': None, 'id': 3}
  item4: {'weight': 75, 'height': 30, 'width': 40, 'length': 60, 'template': None, 'id': 4}
  item5: {'weight': 33, 'height': 80, 'width': 100, 'length': 36, 'template': None, 'id': 5}

-- AFTER --
inventory:
  item1: {'weight': 100, 'height': 80, 'width': 120, 'length': 75, 'template': None, 'id': 1}
  item2: {'weight': 20, 'height': 30, 'width': 40, 'length': 20, 'template': 1, 'id': 2}
  item3: {'weight': 150, 'height': 80, 'width': 100, 'length': 96, 'template': 2, 'id': 3}
  item4: {'weight': 75, 'height': 30, 'width': 40, 'length': 60, 'template': 1, 'id': 4}
  item5: {'weight': 33, 'height': 80, 'width': 100, 'length': 36, 'template': 2, 'id': 5}

templates: {1: (30, 40), 2: (80, 100)}

@eumiro had an excellent core idea, namely that of using itertools.groupby() to arrange the items with common values together in batches. However besides neglecting to sort things first using the same key function as @Jochen Ritzel pointed-out (and is also mentioned in the documentation), he also didn't address the several other things you mentioned wanting to do.

Below is a more complete and somewhat longer answer. It determines the templates and assigns them in one pass thought the dict-of-dicts. To do this, after first creating a sorted list of items, it uses groupby() to batch them, and if there are enough in each group, creates a template and assigns its ID to each member.

inventory = {
    'item1': {'id': 1, 'template': None, 'height': 80, 'width': 120, 'length': 75, 'weight': 100},
    'item2': {'id': 2, 'template': None, 'height': 30, 'width': 40,  'length': 20, 'weight': 20},
    'item3': {'id': 3, 'template': None, 'height': 80, 'width': 100, 'length': 96, 'weight': 150},
    'item4': {'id': 4, 'template': None, 'height': 30, 'width': 40,  'length': 60, 'weight': 75},
    'item5': {'id': 5, 'template': None, 'height': 80, 'width': 100, 'length': 36, 'weight': 33}
}

import itertools as itools

def print_inventory():
    print 'inventory:'
    for key in sorted(inventory.iterkeys()):
        print '  {}: {}'.format(key, inventory[key])

print "-- BEFORE --"
print_inventory()

THRESHOLD = 2
ALLKEYS = ['template', 'height', 'width', 'length', 'weight']
EXCLUDEDKEYS = ['template', 'length', 'weight']
INCLUDEDKEYS = [key for key in ALLKEYS if key not in EXCLUDEDKEYS]

# determines which keys make up a template
sortby = lambda item, keys=INCLUDEDKEYS: tuple(item[key] for key in keys)

templates = {}
templateID = 0
sortedinventory = sorted(inventory.itervalues(), key=sortby)
for templatetuple, similariter in itools.groupby(sortedinventory, sortby):
    similaritems = list(similariter)
    if len(similaritems) >= THRESHOLD:
        # create and assign a template
        templateID += 1
        templates[templateID] = templatetuple # tuple of values of INCLUDEDKEYS
        for item in similaritems:
            item['template'] = templateID
print
print "-- AFTER --"
print_inventory()
print
print 'templates:', templates
print

When I run it, the following is the output:

-- BEFORE --
inventory:
  item1: {'weight': 100, 'height': 80, 'width': 120, 'length': 75, 'template': None, 'id': 1}
  item2: {'weight': 20, 'height': 30, 'width': 40, 'length': 20, 'template': None, 'id': 2}
  item3: {'weight': 150, 'height': 80, 'width': 100, 'length': 96, 'template': None, 'id': 3}
  item4: {'weight': 75, 'height': 30, 'width': 40, 'length': 60, 'template': None, 'id': 4}
  item5: {'weight': 33, 'height': 80, 'width': 100, 'length': 36, 'template': None, 'id': 5}

-- AFTER --
inventory:
  item1: {'weight': 100, 'height': 80, 'width': 120, 'length': 75, 'template': None, 'id': 1}
  item2: {'weight': 20, 'height': 30, 'width': 40, 'length': 20, 'template': 1, 'id': 2}
  item3: {'weight': 150, 'height': 80, 'width': 100, 'length': 96, 'template': 2, 'id': 3}
  item4: {'weight': 75, 'height': 30, 'width': 40, 'length': 60, 'template': 1, 'id': 4}
  item5: {'weight': 33, 'height': 80, 'width': 100, 'length': 36, 'template': 2, 'id': 5}

templates: {1: (30, 40), 2: (80, 100)}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文