concat 索引并创建列表作为单元格的值，其中的值受到 (python pandas) 中 concat 的影响

发布于 2025-01-13 13:44:27 字数 1742 浏览 0 评论 0原文

描述起来有点奇怪，基本上我有这个初始数据帧：

test_df
Out[149]: 
                           value
timestamp                       
2019-01-01 00:00:00+00:00  0.640
2019-01-01 01:00:00+00:00  0.224
2019-01-01 02:00:00+00:00  0.320
2019-01-01 03:00:00+00:00  0.304
2019-01-01 04:00:00+00:00  0.736
                         ...
2019-12-30 19:00:00+00:00  0.704
2019-12-30 20:00:00+00:00  0.272
2019-12-30 21:00:00+00:00  0.288
2019-12-30 22:00:00+00:00  0.272
2019-12-30 23:00:00+00:00  0.496

[8736 rows x 1 columns]

然后，根据时间戳索引我创建一个新列（timestamp_type），它具有以下属性（小时，日类型，月）：

                           value timestamp_type
timestamp                                      
2019-01-01 00:00:00+00:00  0.640          0,1,1
2019-01-01 01:00:00+00:00  0.224          1,1,1
2019-01-01 02:00:00+00:00  0.320          2,1,1
2019-01-01 03:00:00+00:00  0.304          3,1,1
2019-01-01 04:00:00+00:00  0.736          4,1,1
                         ...            ...
2019-12-30 19:00:00+00:00  0.704        19,0,12
2019-12-30 20:00:00+00:00  0.272        20,0,12
2019-12-30 21:00:00+00:00  0.288        21,0,12
2019-12-30 22:00:00+00:00  0.272        22,0,12
2019-12-30 23:00:00+00:00  0.496        23,0,12

现在我想要时间戳_类型列来作为索引。由于一年中通常有四个（或五个）日期点具有相同的（小时、工作日、月份）属性，因此我不需要四次具有相同的索引。相反，我想将这四个或五个值放在一个列表中，该列表将成为该数据框中单元格的值。

所以目标是得到看起来像这样的东西：

                  values 
timestamp_type
0,1,1             [somevalue, somevalue, somevalue, somevalue]       
1,1,1             [somevalue, somevalue, somevalue, somevalue, somevalue]         
2,1,1             [somevalue, somevalue, somevalue, somevalue, somevalue]          
  ...

我希望我能很好地解释这个问题..我已经浏览了 pandas 文档，但找不到任何相关内容。任何意见都将不胜感激！

原文

This is a bit weird to describe, basically I have this initial dataframe:

test_df
Out[149]: 
                           value
timestamp                       
2019-01-01 00:00:00+00:00  0.640
2019-01-01 01:00:00+00:00  0.224
2019-01-01 02:00:00+00:00  0.320
2019-01-01 03:00:00+00:00  0.304
2019-01-01 04:00:00+00:00  0.736
                         ...
2019-12-30 19:00:00+00:00  0.704
2019-12-30 20:00:00+00:00  0.272
2019-12-30 21:00:00+00:00  0.288
2019-12-30 22:00:00+00:00  0.272
2019-12-30 23:00:00+00:00  0.496

[8736 rows x 1 columns]

Then, based on the timestamp index I create a new column (timestamp_type), which has this atributes (hour,daytype,month):

                           value timestamp_type
timestamp                                      
2019-01-01 00:00:00+00:00  0.640          0,1,1
2019-01-01 01:00:00+00:00  0.224          1,1,1
2019-01-01 02:00:00+00:00  0.320          2,1,1
2019-01-01 03:00:00+00:00  0.304          3,1,1
2019-01-01 04:00:00+00:00  0.736          4,1,1
                         ...            ...
2019-12-30 19:00:00+00:00  0.704        19,0,12
2019-12-30 20:00:00+00:00  0.272        20,0,12
2019-12-30 21:00:00+00:00  0.288        21,0,12
2019-12-30 22:00:00+00:00  0.272        22,0,12
2019-12-30 23:00:00+00:00  0.496        23,0,12

Now I would like the timestamp_type column to be the index. As there are usually four (or five) datepoints in a year that have the same (hour,weekday,month) attribute, I will not be needing to have the same index four times. Instead I want to put these four or five values in a list that will be the value of the cell in that dataframe.

So the goal is to get something that looks like this:

                  values 
timestamp_type
0,1,1             [somevalue, somevalue, somevalue, somevalue]       
1,1,1             [somevalue, somevalue, somevalue, somevalue, somevalue]         
2,1,1             [somevalue, somevalue, somevalue, somevalue, somevalue]          
  ...

I hope I could explain the issue well enough.. I have gone through the pandas docs but couldn't find anything on that. Any input is greatly appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柳絮泡泡 2025-01-20 13:44:27

想通了：

df2 = test_df.groupby('timestamp_type')['value'].apply(list)

df2
Out[33]: 
timestamp_type
0,0,1              [0.784, 0.8, 0.352, 0.784]
0,0,10           [0.336, 0.608, 0.624, 0.336]
0,0,11            [0.752, 0.32, 0.736, 0.512]
0,0,12     [0.72, 0.768, 0.752, 0.624, 0.608]
0,0,2              [0.368, 0.352, 0.8, 0.352]
                
9,6,5            [2.432, 0.272, 2.528, 2.432]
9,6,6      [2.432, 0.224, 2.256, 2.256, 2.64]
9,6,7              [2.336, 0.24, 0.144, 0.56]
9,6,8            [0.784, 0.688, 0.736, 0.704]
9,6,9     [0.576, 2.784, 0.672, 0.576, 2.992]
Name: value, Length: 2016, dtype: object

figured it out:

df2 = test_df.groupby('timestamp_type')['value'].apply(list)

df2
Out[33]: 
timestamp_type
0,0,1              [0.784, 0.8, 0.352, 0.784]
0,0,10           [0.336, 0.608, 0.624, 0.336]
0,0,11            [0.752, 0.32, 0.736, 0.512]
0,0,12     [0.72, 0.768, 0.752, 0.624, 0.608]
0,0,2              [0.368, 0.352, 0.8, 0.352]
                
9,6,5            [2.432, 0.272, 2.528, 2.432]
9,6,6      [2.432, 0.224, 2.256, 2.256, 2.64]
9,6,7              [2.336, 0.24, 0.144, 0.56]
9,6,8            [0.784, 0.688, 0.736, 0.704]
9,6,9     [0.576, 2.784, 0.672, 0.576, 2.992]
Name: value, Length: 2016, dtype: object

回复收藏 0 原文

~没有更多了~