从字典中提取键值对的子集?

发布于 2024-10-22 22:40:14 字数 244 浏览 6 评论 0原文

我有一个大字典对象,有几个键值对(大约 16 个),但我只对其中 3 个感兴趣。对此类字典进行子集化的最佳方法(最短/有效/最优雅)是什么?

我所知道的最清楚的是:

bigdict = {'a':1,'b':2,....,'z':26} 
subdict = {'l':bigdict['l'], 'm':bigdict['m'], 'n':bigdict['n']}

我确信有比这更优雅的方式。

I have a big dictionary object that has several key value pairs (about 16), but I am only interested in 3 of them. What is the best way (shortest/efficient/most elegant) to subset such dictionary?

The best I know is:

bigdict = {'a':1,'b':2,....,'z':26} 
subdict = {'l':bigdict['l'], 'm':bigdict['m'], 'n':bigdict['n']}

I am sure there is a more elegant way than this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

待"谢繁草 2024-10-29 22:40:15

你可以尝试:

dict((k, bigdict[k]) for k in ('l', 'm', 'n'))

...或者在 Python 2.7 或更高版本中:

{k: bigdict[k] for k in ('l', 'm', 'n')}

我假设你知道键将在字典中。请参阅 如果您不这样做,请由 Håvard S 回答。

或者,正如 timbo 在评论中指出的那样,如果您想要 bigdict 中缺少的密钥要映射到 None,您可以这样做:

{k: bigdict.get(k, None) for k in ('l', 'm', 'n')}

如果您使用的是 Python 3,并且您想要新字典中实际存在于原始字典中的键,您可以使用事实来查看对象实现一些集合操作:

{k: bigdict[k] for k in bigdict.keys() & {'l', 'm', 'n'}}

You could try:

dict((k, bigdict[k]) for k in ('l', 'm', 'n'))

... or in Python versions 2.7 or later:

{k: bigdict[k] for k in ('l', 'm', 'n')}

I'm assuming that you know the keys are going to be in the dictionary. See the answer by Håvard S if you don't.

Alternatively, as timbo points out in the comments, if you want a key that's missing in bigdict to map to None, you can do:

{k: bigdict.get(k, None) for k in ('l', 'm', 'n')}

If you're using Python 3, and you only want keys in the new dict that actually exist in the original one, you can use the fact to view objects implement some set operations:

{k: bigdict[k] for k in bigdict.keys() & {'l', 'm', 'n'}}
烂人 2024-10-29 22:40:15

至少短一点:

wanted_keys = ['l', 'm', 'n'] # The keys you want
dict((k, bigdict[k]) for k in wanted_keys if k in bigdict)

A bit shorter, at least:

wanted_keys = ['l', 'm', 'n'] # The keys you want
dict((k, bigdict[k]) for k in wanted_keys if k in bigdict)
孤芳又自赏 2024-10-29 22:40:15

对所有提到的方法进行一些速度比较:

于 2020 年 7 月 13 日更新(感谢@user3780389):
仅适用于来自 bigdict 的键。

 IPython 5.5.0 -- An enhanced Interactive Python.
Python 2.7.18 (default, Aug  8 2019, 00:00:00) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2
import numpy.random as nprnd
  ...: keys = nprnd.randint(100000, size=10000)
  ...: bigdict = dict([(_, nprnd.rand()) for _ in range(100000)])
  ...: 
  ...: %timeit {key:bigdict[key] for key in keys}
  ...: %timeit dict((key, bigdict[key]) for key in keys)
  ...: %timeit dict(map(lambda k: (k, bigdict[k]), keys))
  ...: %timeit {key:bigdict[key] for key in set(keys) & set(bigdict.keys())}
  ...: %timeit dict(filter(lambda i:i[0] in keys, bigdict.items()))
  ...: %timeit {key:value for key, value in bigdict.items() if key in keys}
100 loops, best of 3: 2.36 ms per loop
100 loops, best of 3: 2.87 ms per loop
100 loops, best of 3: 3.65 ms per loop
100 loops, best of 3: 7.14 ms per loop
1 loop, best of 3: 577 ms per loop
1 loop, best of 3: 563 ms per loop

正如预期的那样:字典理解是最好的选择。

A bit of speed comparison for all mentioned methods:

UPDATED on 2020.07.13 (thx to @user3780389):
ONLY for keys from bigdict.

 IPython 5.5.0 -- An enhanced Interactive Python.
Python 2.7.18 (default, Aug  8 2019, 00:00:00) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2
import numpy.random as nprnd
  ...: keys = nprnd.randint(100000, size=10000)
  ...: bigdict = dict([(_, nprnd.rand()) for _ in range(100000)])
  ...: 
  ...: %timeit {key:bigdict[key] for key in keys}
  ...: %timeit dict((key, bigdict[key]) for key in keys)
  ...: %timeit dict(map(lambda k: (k, bigdict[k]), keys))
  ...: %timeit {key:bigdict[key] for key in set(keys) & set(bigdict.keys())}
  ...: %timeit dict(filter(lambda i:i[0] in keys, bigdict.items()))
  ...: %timeit {key:value for key, value in bigdict.items() if key in keys}
100 loops, best of 3: 2.36 ms per loop
100 loops, best of 3: 2.87 ms per loop
100 loops, best of 3: 3.65 ms per loop
100 loops, best of 3: 7.14 ms per loop
1 loop, best of 3: 577 ms per loop
1 loop, best of 3: 563 ms per loop

As it was expected: dictionary comprehensions are the best option.

寄人书 2024-10-29 22:40:15
interesting_keys = ('l', 'm', 'n')
subdict = {x: bigdict[x] for x in interesting_keys if x in bigdict}
interesting_keys = ('l', 'm', 'n')
subdict = {x: bigdict[x] for x in interesting_keys if x in bigdict}
小女人ら 2024-10-29 22:40:15

该答案使用与所选答案类似的字典理解,但不会排除缺少的项目。

python 2 版本:

{k:v for k, v in bigDict.iteritems() if k in ('l', 'm', 'n')}

python 3 版本:

{k:v for k, v in bigDict.items() if k in ('l', 'm', 'n')}

This answer uses a dictionary comprehension similar to the selected answer, but will not except on a missing item.

python 2 version:

{k:v for k, v in bigDict.iteritems() if k in ('l', 'm', 'n')}

python 3 version:

{k:v for k, v in bigDict.items() if k in ('l', 'm', 'n')}
温折酒 2024-10-29 22:40:15

如果您想保留大部分键并删除一些键,则可以采用另一种方法:

{k: bigdict[k] for k in bigdict.keys() if k not in ['l', 'm', 'n']}

An alternative approach for if you want to retain the majority of the keys while removing a few:

{k: bigdict[k] for k in bigdict.keys() if k not in ['l', 'm', 'n']}
酒解孤独 2024-10-29 22:40:15

也许:

subdict=dict([(x,bigdict[x]) for x in ['l', 'm', 'n']])

Python 3 甚至支持以下内容:

subdict={a:bigdict[a] for a in ['l','m','n']}

请注意,您可以按如下方式检查字典中是否存在:

subdict=dict([(x,bigdict[x]) for x in ['l', 'm', 'n'] if x in bigdict])

resp。对于Python 3

subdict={a:bigdict[a] for a in ['l','m','n'] if a in bigdict}

Maybe:

subdict=dict([(x,bigdict[x]) for x in ['l', 'm', 'n']])

Python 3 even supports the following:

subdict={a:bigdict[a] for a in ['l','m','n']}

Note that you can check for existence in dictionary as follows:

subdict=dict([(x,bigdict[x]) for x in ['l', 'm', 'n'] if x in bigdict])

resp. for python 3

subdict={a:bigdict[a] for a in ['l','m','n'] if a in bigdict}
稀香 2024-10-29 22:40:15

您还可以使用map(无论如何,这是一个非常有用的函数):

sd = dict(map(lambda k: (k, l. get(k, None)), l))

示例:

large_dictionary = {'a1':123, 'a2':45, 'a3':344}
list_of_keys = ['a1', 'a3']
small_dictionary = dict(map(lambda key: (key, large_dictionary.get(key, None)), list_of_keys))

PS:我从之前的答案中借用了 .get(key, None) :)

You can also use map (which is a very useful function to get to know anyway):

sd = dict(map(lambda k: (k, l.get(k, None)), l))

Example:

large_dictionary = {'a1':123, 'a2':45, 'a3':344}
list_of_keys = ['a1', 'a3']
small_dictionary = dict(map(lambda key: (key, large_dictionary.get(key, None)), list_of_keys))

PS: I borrowed the .get(key, None) from a previous answer :)

捎一片雪花 2024-10-29 22:40:15

好吧,这个问题已经困扰了我好几次了,所以谢谢 Jayesh 的提问。

上面的答案似乎是一个很好的解决方案,但如果您在整个代码中使用它,那么包装功能恕我直言是有意义的。此外,这里有两种可能的用例:一种是您关心所有关键字是否都在原始字典中。还有一个你不知道的地方。平等对待两者就好了。

因此,对于我的二便士价值,我建议编写一个字典的子类,例如

class my_dict(dict):
    def subdict(self, keywords, fragile=False):
        d = {}
        for k in keywords:
            try:
                d[k] = self[k]
            except KeyError:
                if fragile:
                    raise
        return d

现在您可以使用以下命令拉出一个子字典

orig_dict.subdict(keywords)

用法示例:

#
## our keywords are letters of the alphabet
keywords = 'abcdefghijklmnopqrstuvwxyz'
#
## our dictionary maps letters to their index
d = my_dict([(k,i) for i,k in enumerate(keywords)])
print('Original dictionary:\n%r\n\n' % (d,))
#
## constructing a sub-dictionary with good keywords
oddkeywords = keywords[::2]
subd = d.subdict(oddkeywords)
print('Dictionary from odd numbered keys:\n%r\n\n' % (subd,))
#
## constructing a sub-dictionary with mixture of good and bad keywords
somebadkeywords = keywords[1::2] + 'A'
try:
    subd2 = d.subdict(somebadkeywords)
    print("We shouldn't see this message")
except KeyError:
    print("subd2 construction fails:")
    print("\toriginal dictionary doesn't contain some keys\n\n")
#
## Trying again with fragile set to false
try:
    subd3 = d.subdict(somebadkeywords, fragile=False)
    print('Dictionary constructed using some bad keys:\n%r\n\n' % (subd3,))
except KeyError:
    print("We shouldn't see this message")

如果运行上述所有代码,您应该看到(类似)以下输出(抱歉格式错误):

原始词典:
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': 6, 'f': 5,
“i”:8、“h”:7、“k”:10、“j”:9、“m”:12、“l”:11、“o”:14、
“n”:13,“q”:16,“p”:15,“s”:18,“r”:17,“u”:20,
“t”:19、“w”:22、“v”:21、“y”:24、“x”:23、“z”:25}

奇数键的字典:
{'a': 0, 'c': 2, 'e': 4, 'g': 6, 'i': 8, 'k': 10, 'm': 12, 'o': 14, ' q':16,'s':18,'u':20,'w':22,'y':24}

subd2 构建失败:
原始字典不包含某些键

使用一些坏键构造的字典:
{'b':1,'d':3,'f':5,'h':7,'j':9,'l':11,'n':13,'p':15,' r':17,'t':19,'v':21,'x':23,'z':25}

Okay, this is something that has bothered me a few times, so thank you Jayesh for asking it.

The answers above seem like as good a solution as any, but if you are using this all over your code, it makes sense to wrap the functionality IMHO. Also, there are two possible use cases here: one where you care about whether all keywords are in the original dictionary. and one where you don't. It would be nice to treat both equally.

So, for my two-penneth worth, I suggest writing a sub-class of dictionary, e.g.

class my_dict(dict):
    def subdict(self, keywords, fragile=False):
        d = {}
        for k in keywords:
            try:
                d[k] = self[k]
            except KeyError:
                if fragile:
                    raise
        return d

Now you can pull out a sub-dictionary with

orig_dict.subdict(keywords)

Usage examples:

#
## our keywords are letters of the alphabet
keywords = 'abcdefghijklmnopqrstuvwxyz'
#
## our dictionary maps letters to their index
d = my_dict([(k,i) for i,k in enumerate(keywords)])
print('Original dictionary:\n%r\n\n' % (d,))
#
## constructing a sub-dictionary with good keywords
oddkeywords = keywords[::2]
subd = d.subdict(oddkeywords)
print('Dictionary from odd numbered keys:\n%r\n\n' % (subd,))
#
## constructing a sub-dictionary with mixture of good and bad keywords
somebadkeywords = keywords[1::2] + 'A'
try:
    subd2 = d.subdict(somebadkeywords)
    print("We shouldn't see this message")
except KeyError:
    print("subd2 construction fails:")
    print("\toriginal dictionary doesn't contain some keys\n\n")
#
## Trying again with fragile set to false
try:
    subd3 = d.subdict(somebadkeywords, fragile=False)
    print('Dictionary constructed using some bad keys:\n%r\n\n' % (subd3,))
except KeyError:
    print("We shouldn't see this message")

If you run all the above code, you should see (something like) the following output (sorry for the formatting):

Original dictionary:
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': 6, 'f': 5,
'i': 8, 'h': 7, 'k': 10, 'j': 9, 'm': 12, 'l': 11, 'o': 14,
'n': 13, 'q': 16, 'p': 15, 's': 18, 'r': 17, 'u': 20,
't': 19, 'w': 22, 'v': 21, 'y': 24, 'x': 23, 'z': 25}

Dictionary from odd numbered keys:
{'a': 0, 'c': 2, 'e': 4, 'g': 6, 'i': 8, 'k': 10, 'm': 12, 'o': 14, 'q': 16, 's': 18, 'u': 20, 'w': 22, 'y': 24}

subd2 construction fails:
original dictionary doesn't contain some keys

Dictionary constructed using some bad keys:
{'b': 1, 'd': 3, 'f': 5, 'h': 7, 'j': 9, 'l': 11, 'n': 13, 'p': 15, 'r': 17, 't': 19, 'v': 21, 'x': 23, 'z': 25}

2024-10-29 22:40:15

解决方案

from operator import itemgetter
from typing import List, Dict, Union


def subdict(d: Union[Dict, List], columns: List[str]) -> Union[Dict, List[Dict]]:
    """Return a dict or list of dicts with subset of 
    columns from the d argument.
    """
    getter = itemgetter(*columns)

    if isinstance(d, list):
        result = []
        for subset in map(getter, d):
            record = dict(zip(columns, subset))
            result.append(record)
        return result
    elif isinstance(d, dict):
        return dict(zip(columns, getter(d)))

    raise ValueError('Unsupported type for `d`')

使用示例

# pure dict

d = dict(a=1, b=2, c=3)
print(subdict(d, ['a', 'c']))

>>> In [5]: {'a': 1, 'c': 3}
# list of dicts

d = [
    dict(a=1, b=2, c=3),
    dict(a=2, b=4, c=6),
    dict(a=4, b=8, c=12),
]

print(subdict(d, ['a', 'c']))

>>> In [5]: [{'a': 1, 'c': 3}, {'a': 2, 'c': 6}, {'a': 4, 'c': 12}]

solution

from operator import itemgetter
from typing import List, Dict, Union


def subdict(d: Union[Dict, List], columns: List[str]) -> Union[Dict, List[Dict]]:
    """Return a dict or list of dicts with subset of 
    columns from the d argument.
    """
    getter = itemgetter(*columns)

    if isinstance(d, list):
        result = []
        for subset in map(getter, d):
            record = dict(zip(columns, subset))
            result.append(record)
        return result
    elif isinstance(d, dict):
        return dict(zip(columns, getter(d)))

    raise ValueError('Unsupported type for `d`')

examples of use

# pure dict

d = dict(a=1, b=2, c=3)
print(subdict(d, ['a', 'c']))

>>> In [5]: {'a': 1, 'c': 3}
# list of dicts

d = [
    dict(a=1, b=2, c=3),
    dict(a=2, b=4, c=6),
    dict(a=4, b=8, c=12),
]

print(subdict(d, ['a', 'c']))

>>> In [5]: [{'a': 1, 'c': 3}, {'a': 2, 'c': 6}, {'a': 4, 'c': 12}]
不再让梦枯萎 2024-10-29 22:40:15

py3.8+ 中避免 big_dict 中缺少键的 None 值的另一种方法是使用 walrus:

small_dict = {key: val for key in ('l', 'm', 'n') if (val := big_dict.get(key))}

Another way in py3.8+ that avoids None values for keys missing from big_dict uses the walrus:

small_dict = {key: val for key in ('l', 'm', 'n') if (val := big_dict.get(key))}
極樂鬼 2024-10-29 22:40:15

还有一个(我更喜欢Mark Longair的回答

di = {'a':1,'b':2,'c':3}
req = ['a','c','w']
dict([i for i in di.iteritems() if i[0] in di and i[0] in req])

Yet another one (I prefer Mark Longair's answer)

di = {'a':1,'b':2,'c':3}
req = ['a','c','w']
dict([i for i in di.iteritems() if i[0] in di and i[0] in req])
彻夜缠绵 2024-10-29 22:40:15

使用地图(halfdanrump的答案)对我来说是最好的,虽然还没有计时......

但是如果你去找一本字典,并且如果你有一个big_dict:

  1. 请绝对确定你循环了req。这是至关重要的,并且会影响算法的运行时间(大O,theta,你能想到的)
  2. 将其编写得足够通用,以避免在键不存在时出现错误。

所以例如:

big_dict = {'a':1,'b':2,'c':3,................................................}
req = ['a','c','w']

{k:big_dict.get(k,None) for k in req )
# or 
{k:big_dict[k] for k in req if k in big_dict)

请注意,在相反的情况下,req很大,但my_dict很小,你应该循环遍历my_dict。

一般来说,我们正在做一个交集,问题的复杂度是 O(min(len(dict)),min (len(req)))。 Python 的自己的交集实现考虑两个集合的大小,所以这看起来是最优的。此外,在 c 语言中并且是核心库的一部分,可能比大多数未优化的 python 语句更快。
因此,我会考虑的一个解决方案是:

dict = {'a':1,'b':2,'c':3,................................................}
req = ['a','c','w',...................]

{k:dic[k] for k in set(req).intersection(dict.keys())}

它将关键操作移到 python 的 c 代码中,并且适用于所有情况。

Using map (halfdanrump's answer) is best for me, though haven't timed it...

But if you go for a dictionary, and if you have a big_dict:

  1. Make absolutely certain you loop through the the req. This is crucial, and affects the running time of the algorithm (big O, theta, you name it)
  2. Write it generic enough to avoid errors if keys are not there.

so e.g.:

big_dict = {'a':1,'b':2,'c':3,................................................}
req = ['a','c','w']

{k:big_dict.get(k,None) for k in req )
# or 
{k:big_dict[k] for k in req if k in big_dict)

Note that in the converse case, that the req is big, but my_dict is small, you should loop through my_dict instead.

In general, we are doing an intersection and the complexity of the problem is O(min(len(dict)),min(len(req))). Python's own implementation of intersection considers the size of the two sets, so it seems optimal. Also, being in c and part of the core library, is probably faster than most not optimized python statements.
Therefore, a solution that I would consider is:

dict = {'a':1,'b':2,'c':3,................................................}
req = ['a','c','w',...................]

{k:dic[k] for k in set(req).intersection(dict.keys())}

It moves the critical operation inside python's c code and will work for all cases.

晌融 2024-10-29 22:40:15

如果有人想要字典的前几项 n 而不知道键:

n = 5 # First Five Items
ks = [*dikt.keys()][:n]
less_dikt = {i: dikt[i] for i in ks}

In case someone wants first few items n of the dictionary without knowing the keys:

n = 5 # First Five Items
ks = [*dikt.keys()][:n]
less_dikt = {i: dikt[i] for i in ks}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文