根据值的位置从多个列创建字典

发布于 2025-01-11 02:04:34 字数 982 浏览 0 评论 0原文

我有一个像这样的数据框

import pandas as pd

df = pd.DataFrame(
    {
        'C1': list('aabbab'),
        'C2': list('abbbaa'),
        'value': range(11, 17)
    }
)

  C1 C2  value
0  a  a     11
1  a  b     12
2  b  b     13
3  b  b     14
4  a  a     15
5  b  a     16

,我想生成一个像这样的字典:

{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}

逻辑如下:

df 中,我转到列 C1 和第一个 我在列中找到的 a 对应于值 11,第二个对应于值 12,第三个对应于值 15a 的位置和相应的值应存储在键 C1a 的字典中。

我可以做这样的事情

df_ss = df.loc[df['C1'] == 'a', 'value']
d = {ind: val for ind, val in enumerate(df_ss.values, 1)}

,它会产生 for d:

{1: 11, 2: 12, 3: 15}

这确实是所需的输出。然后我可以将其放入循环中并生成所有必需的字典。

有没有人看到比这更有效的东西?

I have a dataframe like this

import pandas as pd

df = pd.DataFrame(
    {
        'C1': list('aabbab'),
        'C2': list('abbbaa'),
        'value': range(11, 17)
    }
)

  C1 C2  value
0  a  a     11
1  a  b     12
2  b  b     13
3  b  b     14
4  a  a     15
5  b  a     16

and I would like to generate a dictionary like this:

{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}

Logic is as follows:

In df I go to the column C1 and the first a I find in the column corresponds to value 11, the second one to value 12 and the third one to 15. The position of the a and the corresponding value should be stored in the dictionary for the keys C1 and a.

I could do something like this

df_ss = df.loc[df['C1'] == 'a', 'value']
d = {ind: val for ind, val in enumerate(df_ss.values, 1)}

which yields for d:

{1: 11, 2: 12, 3: 15}

which is indeed the desired output. I could then put this into a loop and generate all required dictionaries.

Does anyone sees something more efficient than this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

﹉夏雨初晴づ 2025-01-18 02:04:34

您可以使用 groupby 和嵌套字典理解:

import pandas as pd

df = pd.DataFrame(
    {
        'C1': list('aabbab'),
        'C2': list('abbbaa'),
        'value': range(11, 17)
    }
)

d = {
    c: {k: dict(enumerate(g["value"], 1)) for k, g in df.groupby(c)}
    for c in ["C1", "C2"]
}

哪个输出:

{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
 'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}

You could use a groupby and a nested dict comprehension:

import pandas as pd

df = pd.DataFrame(
    {
        'C1': list('aabbab'),
        'C2': list('abbbaa'),
        'value': range(11, 17)
    }
)

d = {
    c: {k: dict(enumerate(g["value"], 1)) for k, g in df.groupby(c)}
    for c in ["C1", "C2"]
}

Which outputs:

{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
 'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}
ヤ经典坏疍 2025-01-18 02:04:34

groupby 上使用字典理解 对于每个带有 枚举

cols = ['C1', 'C2']
# or, programmatically
# cols = df.filter(regex='^C').columns

out = {c: {k: dict(enumerate(g, start=1)) for k,g in df.groupby(c)['value']} 
       for c in cols}

输出:

{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
 'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}

Use a dictionary comprehension on the groupby for each C-value with enumerate:

cols = ['C1', 'C2']
# or, programmatically
# cols = df.filter(regex='^C').columns

out = {c: {k: dict(enumerate(g, start=1)) for k,g in df.groupby(c)['value']} 
       for c in cols}

output:

{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
 'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}
通知家属抬走 2025-01-18 02:04:34

这可能比 @mozway 和 @Alex 方法慢一点:

{
    c: df.set_index([df.groupby(c).cumcount() + 1, c])["value"]
         .unstack(0)
         .to_dict("index")
    for c in ["C1", "C2"]
}

输出:

{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
 'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}

This is probably a bit slower than @mozway and @Alex methods:

{
    c: df.set_index([df.groupby(c).cumcount() + 1, c])["value"]
         .unstack(0)
         .to_dict("index")
    for c in ["C1", "C2"]
}

Output:

{'C1': {'a': {1: 11, 2: 12, 3: 15}, 'b': {1: 13, 2: 14, 3: 16}},
 'C2': {'a': {1: 11, 2: 15, 3: 16}, 'b': {1: 12, 2: 13, 3: 14}}}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文