枢轴pandas dataframe从单行到一行

发布于 2025-02-01 22:30:56 字数 1979 浏览 3 评论 0原文

我最近应用了一个转换以使一个嵌套的JSON不打开，以便可以使用一个平坦的数据集，而在转换起作用时，最终格式并不是我想要的。它将所有数据压缩到一行中，并将后缀添加到列名中，而不是将每个id_prop分为不同的列。

我的JSON格式数据集以使用Pandas复制：

import pandas as pd
json = {"id_prop.0":{"0":1},"id_prop.1":{"0":2},"id_prop.2":{"0":3},"prop_number.0":{"0":123},"prop_number.1":{"0":325},"prop_number.2":{"0":754},"prop_value.0":{"0":1},"prop_value.1":{"0":1},"prop_value.2":{"0":1}}
df = pd.DataFrame.from_dict(json, orient='columns')

我的结果：

	ID_PROP.0	ID_PROP.1	ID_PROP.2	PROP_NUMBER.0	PROP_NUMBER.1	PROP_NUMBER.2	PROP_VALUE.0	PROP_VALUE.1	PROP_VALUE.2
0	1	2	3	123	325	754	1	1 1 1 1 1 1 1 1 1 1	1

我期望结果：

	ID_PROP	PROP_NUMBER	PROP_VALUE
0	1	123	1
1	2	325	1
2	3	754	1

是否有任何方法可以将数据框架旋转到我需要的格式中，其中每行代表单个id_prop的值？

attemps

我已经提取了我需要的列的名称，

def extract_cols(columns):
    myset = set()
    myset_add = myset.add
    return [x for x in columns if not (x in myset or myset_add(x))]

cols = extract_cols(df.columns.str.replace("\.[0-9]", "", regex=True))

而没有后缀：“垂直”我需要使用stack（）>：

df_stacked = df.stack().reset_index(level=1, drop=True)

但是我还没有想出如何组合该信息尚未。任何帮助将不胜感激。

额外：

如果也有一种使用Pyspark应用程序的方法，那么更好！

原文

I recently applied a transformation to unnest a nested json, in order to have a flat dataset to work with, and while the transformation works, the final format is not the one I am looking for. It compressed all the data into a single row and added suffixes to column names, instead of separating into different columns for each id_prop.

My dataset in JSON format to replicate with Pandas:

import pandas as pd
json = {"id_prop.0":{"0":1},"id_prop.1":{"0":2},"id_prop.2":{"0":3},"prop_number.0":{"0":123},"prop_number.1":{"0":325},"prop_number.2":{"0":754},"prop_value.0":{"0":1},"prop_value.1":{"0":1},"prop_value.2":{"0":1}}
df = pd.DataFrame.from_dict(json, orient='columns')

My result:

	id_prop.0	id_prop.1	id_prop.2	prop_number.0	prop_number.1	prop_number.2	prop_value.0	prop_value.1	prop_value.2
0	1	2	3	123	325	754	1	1	1

The result I expect:

	id_prop	prop_number	prop_value
0	1	123	1
1	2	325	1
2	3	754	1

Is there any way to pivot the dataframe into the format I need, where each row represents the values of a single id_prop?

Attemps

I have already extracted the names of the columns I need without suffixes:

def extract_cols(columns):
    myset = set()
    myset_add = myset.add
    return [x for x in columns if not (x in myset or myset_add(x))]

cols = extract_cols(df.columns.str.replace("\.[0-9]", "", regex=True))

And also "verticalized" the results I need using stack():

df_stacked = df.stack().reset_index(level=1, drop=True)

But I haven't figured out how to combine that info yet. Any help would be highly appreciated.

Extra:

If there is also a way to apply this using pyspark, then much better!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

就是爱搞怪 2025-02-08 22:30:56

这是一种方法：

df = df.T
df.index = pd.MultiIndex.from_arrays([[x[x.find('.')+1:] for x in df.index], [x[:x.find('.')] for x in df.index]])
df = df.unstack()

输入：

   id_prop.0  id_prop.1  id_prop.2  prop_number.0  prop_number.1  prop_number.2  prop_value.0  prop_value.1  prop_value.2
0          1          2          3            123            325            754             1             1             1

输出：

  id_prop prop_number prop_value
0       1         123          1
1       2         325          1
2       3         754          1

说明：

转台，以便我们可以使用索引，而不是列
将每个标签解析为所需的标签（前缀）（前缀），结果编号（后缀）由。字符
将DF的索引更新为具有两个级别的多索引：结果编号列表和所需标签的列表
call unstack旋转一个多索引（所需标签）为列标题

<强>更新：要处理结果编号是第二个。分开的令牌，其正确的令牌（如OP的注释中所述），我们可以做到这一点：

import pandas as pd
json = {
    "building.0.description.bedrooms":{"0":"qrs"}, 
    "building.1.description.bedrooms":{"0":"tuv"}, 
    "building.2.description.bedrooms":{"0":"xyz"}, 

    "id_prop.0":{"0":1},"id_prop.1":{"0":2},"id_prop.2":{"0":3},
    "prop_number.0":{"0":123},"prop_number.1":{"0":325},"prop_number.2":{"0":754},
    "prop_value.0":{"0":1},"prop_value.1":{"0":1},"prop_value.2":{"0":1}}
df = pd.DataFrame.from_dict(json, orient='columns')
print(df.to_string())

df = df.T
df.index = pd.MultiIndex.from_arrays([[x.split('.')[1] for x in df.index], ['.'.join(x.split('.')[0:1] + x.split('.')[2:]) for x in df.index]])
df = df.unstack()
df.columns = df.columns.get_level_values(1)
print(df)

输入：输出：

  building.0.description.bedrooms building.1.description.bedrooms building.2.description.bedrooms  id_prop.0  id_prop.1  id_prop.2  prop_number.0  prop_number.1  prop_number.2  prop_value.0  prop_value.1  prop_value.2
0                             qrs                             tuv                             xyz          1          2          3            123            325            754             1             1             1

输出：输出：

  building.description.bedrooms id_prop prop_number prop_value
0                           qrs       1         123          1
1                           tuv       2         325          1
2                           xyz       3         754          1

Here's a way:

df = df.T
df.index = pd.MultiIndex.from_arrays([[x[x.find('.')+1:] for x in df.index], [x[:x.find('.')] for x in df.index]])
df = df.unstack()

Input:

   id_prop.0  id_prop.1  id_prop.2  prop_number.0  prop_number.1  prop_number.2  prop_value.0  prop_value.1  prop_value.2
0          1          2          3            123            325            754             1             1             1

Output:

  id_prop prop_number prop_value
0       1         123          1
1       2         325          1
2       3         754          1

Explanation:

transpose so we can work with the index instead of the columns
parse each label into the desired label (prefix) and the result number (suffix) split by the . character
update the df's index to be a MultiIndex with two levels: an list of result numbers and a list of desired labels
call unstack to pivot a level of the MultiIndex (the desired labels) to be column headings

UPDATE: To handle labels where result number is the second . separated token with additional tokens to its right (as described in OP's comment), we can do this:

import pandas as pd
json = {
    "building.0.description.bedrooms":{"0":"qrs"}, 
    "building.1.description.bedrooms":{"0":"tuv"}, 
    "building.2.description.bedrooms":{"0":"xyz"}, 

    "id_prop.0":{"0":1},"id_prop.1":{"0":2},"id_prop.2":{"0":3},
    "prop_number.0":{"0":123},"prop_number.1":{"0":325},"prop_number.2":{"0":754},
    "prop_value.0":{"0":1},"prop_value.1":{"0":1},"prop_value.2":{"0":1}}
df = pd.DataFrame.from_dict(json, orient='columns')
print(df.to_string())

df = df.T
df.index = pd.MultiIndex.from_arrays([[x.split('.')[1] for x in df.index], ['.'.join(x.split('.')[0:1] + x.split('.')[2:]) for x in df.index]])
df = df.unstack()
df.columns = df.columns.get_level_values(1)
print(df)

Input:

  building.0.description.bedrooms building.1.description.bedrooms building.2.description.bedrooms  id_prop.0  id_prop.1  id_prop.2  prop_number.0  prop_number.1  prop_number.2  prop_value.0  prop_value.1  prop_value.2
0                             qrs                             tuv                             xyz          1          2          3            123            325            754             1             1             1

Output:

  building.description.bedrooms id_prop prop_number prop_value
0                           qrs       1         123          1
1                           tuv       2         325          1
2                           xyz       3         754          1

回复收藏 0 原文

叹倦 2025-02-08 22:30:56

您可以将列将其拆分为多索引，然后将其堆叠：

df.set_axis(
    pd.MultiIndex.from_tuples([tuple(i) for i in df.columns.str.split(".")]), axis=1
).stack().droplevel(0)

You can split the columns to a multiindex and then stack it:

df.set_axis(
    pd.MultiIndex.from_tuples([tuple(i) for i in df.columns.str.split(".")]), axis=1
).stack().droplevel(0)

回复收藏 0 原文

~没有更多了~

关于作者

写下不归期

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

枢轴pandas dataframe从单行到一行

我的结果：

我期望结果：

attemps

额外：

My result:

The result I expect:

Attemps

Extra:

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

达拉崩吧

PANGOO

kkgtx

WordPress小学生

酷炫老祖宗

硪扪都還晓

友情链接

枢轴pandas dataframe从单行到一行

我的结果：

我期望结果：

attemps

额外：

My result:

The result I expect:

Attemps

Extra:

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

达拉崩吧

PANGOO

kkgtx

WordPress小学生

酷炫老祖宗

硪扪都還晓

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。