将字典列表的字典转换为数据框

发布于 2025-01-19 09:56:32 字数 1356 浏览 2 评论 0原文

我有以下字典列表词典的子样本(来自数百万个项目的较大字典):

bool_dict = {0: [{0: 4680}, {1: 1185}], 
             1: [{0: 172}, {1: 9}], 
             2: [{0: 149}, {1: 1282}], 
             3: [{0: 20}, {1: 127}], 
             4: [{0: 0}, {1: 0}]}

我将其转换为表单的数据框架:

          0          1
0  {0: 4680}  {1: 1185}
1   {0: 172}     {1: 9}
2   {0: 149}  {1: 1282}
3    {0: 20}   {1: 127}
4     {0: 0}     {1: 0}

通过执行以下操作:

test=pd.DataFrame(bool_dict.values(),columns['0','1'],index=bool_dict.keys()).sort_index()

问题是我只需要每个单元格的值,而不是每个单元格的值钥匙,在数据框架中。因此,所需的输出是:

       0          1
0      4680       1185
1       172          9
2       149       1282
3        20        127
4         0          0

我尝试了以下操作:

test['0'] = test['0'].apply(lambda x: x[0])

但是后来我会在我认为是字典上遇到一个密钥错误。

为了确保它确实是一本词典,然后我尝试了

from ast import literal_eval
test['0']=test['0'].apply(lambda x: literal_eval(str(x)))

再次尝试

test['0'] = test['0'].apply(lambda x: x[0])

(我也尝试过键为'0')。

更新:为了确保lambda是问题所在,这很好:

test['0'].head(): 

0    {0: 4680}
1     {0: 247}
2       {0: 0}
3       {0: 0}
4     {0: 104}

我可以用拆分的黑客做事,然后删除外部内容,但由于许多原因,这只是觉得错误。

I have the following subsample of dictionary of lists of dictionaries (from a larger dictionary of millions of items):

bool_dict = {0: [{0: 4680}, {1: 1185}], 
             1: [{0: 172}, {1: 9}], 
             2: [{0: 149}, {1: 1282}], 
             3: [{0: 20}, {1: 127}], 
             4: [{0: 0}, {1: 0}]}

which I converted to a dataframe of the form:

          0          1
0  {0: 4680}  {1: 1185}
1   {0: 172}     {1: 9}
2   {0: 149}  {1: 1282}
3    {0: 20}   {1: 127}
4     {0: 0}     {1: 0}

by doing the following:

test=pd.DataFrame(bool_dict.values(),columns['0','1'],index=bool_dict.keys()).sort_index()

The problem is that I only need each cell's value, not the key, in the dataframe. So, the desired output is:

       0          1
0      4680       1185
1       172          9
2       149       1282
3        20        127
4         0          0

I tried the following:

test['0'] = test['0'].apply(lambda x: x[0])

but then I get a key error on what I thought was a dictionary.

To make sure it indeed was a dictionary, I then tried

from ast import literal_eval
test['0']=test['0'].apply(lambda x: literal_eval(str(x)))

then tried this again

test['0'] = test['0'].apply(lambda x: x[0])

with no success (I also tried the key as '0').

UPDATE: to make sure the lambda was the issue, this works just fine:

test['0'].head(): 

0    {0: 4680}
1     {0: 247}
2       {0: 0}
3       {0: 0}
4     {0: 104}

I could do the hacky thing of a split by the : and then remove extraneous stuff, but that just feels wrong for so many reasons.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

心房敞 2025-01-26 09:56:32

一种方法是将内部列表转换为字典,然后将其传递给 DataFrame 构造函数:

bool_dict_flattened = {i: {k:v for d in lst for k,v in d.items()} for i, lst in bool_dict.items()}
df = pd.DataFrame.from_dict(bool_dict_flattened, orient='index')

另一种选择是通过使用列名和键与每列匹配的事实,在列上应用 str 访问器:

out = pd.DataFrame.from_dict(bool_dict, orient='index').apply(lambda x: x.str[x.name])

输出:

      0     1
0  4680  1185
1   172     9
2   149  1282
3    20   127
4     0     0

One way is to convert the inner list into a dictionary then pass it to the DataFrame constructor:

bool_dict_flattened = {i: {k:v for d in lst for k,v in d.items()} for i, lst in bool_dict.items()}
df = pd.DataFrame.from_dict(bool_dict_flattened, orient='index')

Another option is to apply str accessor on the columns by using the fact that column names and keys match for each column:

out = pd.DataFrame.from_dict(bool_dict, orient='index').apply(lambda x: x.str[x.name])

Output:

      0     1
0  4680  1185
1   172     9
2   149  1282
3    20   127
4     0     0
月隐月明月朦胧 2025-01-26 09:56:32

您可以通过第一个 lambda 迭代每一行,并使用第二个 lambda 迭代该行中的每个单元格并读取字典的值:

df = pd.DataFrame(bool_dict).T
df.apply(lambda x: x.apply(lambda y: list(y.values())[0]))
df

      0     1
0  4680  1185
1   172     9
2   149  1282
3    20   127
4     0     0

You can iterate through each row by first lambda and iterate through each cell in that row with the second lambda and read the values of the dictionary:

df = pd.DataFrame(bool_dict).T
df.apply(lambda x: x.apply(lambda y: list(y.values())[0]))
df

      0     1
0  4680  1185
1   172     9
2   149  1282
3    20   127
4     0     0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文