concat 索引并创建列表作为单元格的值,其中的值受到 (python pandas) 中 concat 的影响

发布于 2025-01-13 13:44:27 字数 1742 浏览 0 评论 0原文

描述起来有点奇怪,基本上我有这个初始数据帧:

test_df
Out[149]: 
                           value
timestamp                       
2019-01-01 00:00:00+00:00  0.640
2019-01-01 01:00:00+00:00  0.224
2019-01-01 02:00:00+00:00  0.320
2019-01-01 03:00:00+00:00  0.304
2019-01-01 04:00:00+00:00  0.736
                         ...
2019-12-30 19:00:00+00:00  0.704
2019-12-30 20:00:00+00:00  0.272
2019-12-30 21:00:00+00:00  0.288
2019-12-30 22:00:00+00:00  0.272
2019-12-30 23:00:00+00:00  0.496

[8736 rows x 1 columns]

然后,根据时间戳索引我创建一个新列(timestamp_type),它具有以下属性(小时,日类型,月):

                           value timestamp_type
timestamp                                      
2019-01-01 00:00:00+00:00  0.640          0,1,1
2019-01-01 01:00:00+00:00  0.224          1,1,1
2019-01-01 02:00:00+00:00  0.320          2,1,1
2019-01-01 03:00:00+00:00  0.304          3,1,1
2019-01-01 04:00:00+00:00  0.736          4,1,1
                         ...            ...
2019-12-30 19:00:00+00:00  0.704        19,0,12
2019-12-30 20:00:00+00:00  0.272        20,0,12
2019-12-30 21:00:00+00:00  0.288        21,0,12
2019-12-30 22:00:00+00:00  0.272        22,0,12
2019-12-30 23:00:00+00:00  0.496        23,0,12

现在我想要时间戳_类型列来作为索引。由于一年中通常有四个(或五个)日期点具有相同的(小时、工作日、月份)属性,因此我不需要四次具有相同的索引。相反,我想将这四个或五个值放在一个列表中,该列表将成为该数据框中单元格的值。

所以目标是得到看起来像这样的东西:

                  values 
timestamp_type
0,1,1             [somevalue, somevalue, somevalue, somevalue]       
1,1,1             [somevalue, somevalue, somevalue, somevalue, somevalue]         
2,1,1             [somevalue, somevalue, somevalue, somevalue, somevalue]          
  ...

我希望我能很好地解释这个问题..我已经浏览了 pandas 文档,但找不到任何相关内容。任何意见都将不胜感激!

This is a bit weird to describe, basically I have this initial dataframe:

test_df
Out[149]: 
                           value
timestamp                       
2019-01-01 00:00:00+00:00  0.640
2019-01-01 01:00:00+00:00  0.224
2019-01-01 02:00:00+00:00  0.320
2019-01-01 03:00:00+00:00  0.304
2019-01-01 04:00:00+00:00  0.736
                         ...
2019-12-30 19:00:00+00:00  0.704
2019-12-30 20:00:00+00:00  0.272
2019-12-30 21:00:00+00:00  0.288
2019-12-30 22:00:00+00:00  0.272
2019-12-30 23:00:00+00:00  0.496

[8736 rows x 1 columns]

Then, based on the timestamp index I create a new column (timestamp_type), which has this atributes (hour,daytype,month):

                           value timestamp_type
timestamp                                      
2019-01-01 00:00:00+00:00  0.640          0,1,1
2019-01-01 01:00:00+00:00  0.224          1,1,1
2019-01-01 02:00:00+00:00  0.320          2,1,1
2019-01-01 03:00:00+00:00  0.304          3,1,1
2019-01-01 04:00:00+00:00  0.736          4,1,1
                         ...            ...
2019-12-30 19:00:00+00:00  0.704        19,0,12
2019-12-30 20:00:00+00:00  0.272        20,0,12
2019-12-30 21:00:00+00:00  0.288        21,0,12
2019-12-30 22:00:00+00:00  0.272        22,0,12
2019-12-30 23:00:00+00:00  0.496        23,0,12

Now I would like the timestamp_type column to be the index. As there are usually four (or five) datepoints in a year that have the same (hour,weekday,month) attribute, I will not be needing to have the same index four times. Instead I want to put these four or five values in a list that will be the value of the cell in that dataframe.

So the goal is to get something that looks like this:

                  values 
timestamp_type
0,1,1             [somevalue, somevalue, somevalue, somevalue]       
1,1,1             [somevalue, somevalue, somevalue, somevalue, somevalue]         
2,1,1             [somevalue, somevalue, somevalue, somevalue, somevalue]          
  ...

I hope I could explain the issue well enough.. I have gone through the pandas docs but couldn't find anything on that. Any input is greatly appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

柳絮泡泡 2025-01-20 13:44:27

想通了:

df2 = test_df.groupby('timestamp_type')['value'].apply(list)

df2
Out[33]: 
timestamp_type
0,0,1              [0.784, 0.8, 0.352, 0.784]
0,0,10           [0.336, 0.608, 0.624, 0.336]
0,0,11            [0.752, 0.32, 0.736, 0.512]
0,0,12     [0.72, 0.768, 0.752, 0.624, 0.608]
0,0,2              [0.368, 0.352, 0.8, 0.352]
                
9,6,5            [2.432, 0.272, 2.528, 2.432]
9,6,6      [2.432, 0.224, 2.256, 2.256, 2.64]
9,6,7              [2.336, 0.24, 0.144, 0.56]
9,6,8            [0.784, 0.688, 0.736, 0.704]
9,6,9     [0.576, 2.784, 0.672, 0.576, 2.992]
Name: value, Length: 2016, dtype: object

figured it out:

df2 = test_df.groupby('timestamp_type')['value'].apply(list)

df2
Out[33]: 
timestamp_type
0,0,1              [0.784, 0.8, 0.352, 0.784]
0,0,10           [0.336, 0.608, 0.624, 0.336]
0,0,11            [0.752, 0.32, 0.736, 0.512]
0,0,12     [0.72, 0.768, 0.752, 0.624, 0.608]
0,0,2              [0.368, 0.352, 0.8, 0.352]
                
9,6,5            [2.432, 0.272, 2.528, 2.432]
9,6,6      [2.432, 0.224, 2.256, 2.256, 2.64]
9,6,7              [2.336, 0.24, 0.144, 0.56]
9,6,8            [0.784, 0.688, 0.736, 0.704]
9,6,9     [0.576, 2.784, 0.672, 0.576, 2.992]
Name: value, Length: 2016, dtype: object
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文