Python Pandas:仅基于其中一列重新采样

发布于 2025-01-20 06:45:48 字数 4157 浏览 1 评论 0原文

我有以下数据,我正在重新采样我的数据,以找出每15分钟到达每一个车站的自行车。 However, my code is aggregating my stations too, and I only want to aggregate the variable "dtm_end_trip"

Sample data:

id_tripdtm_start_tripdtm_end_tripstart_stationend_station
12018-10-01 10:15:002018-10-01 10:17:00AB
210:17:0010:18:00BA
...2018-10-01
2018-10-01​002022-01-01 00:22:00CA
100000023:59:002022-01-01-01 00:29:00AD

试用代码:

df2 =  df(['end_station', 'dtm_end_trip']).size().to_frame(name = 'count').reset_index()
df2 = df2.sort_values(by='count', ascending=False)

df2= df2.set_index('dtm_end_trip')

df2 = df2.resample('15T').count()

输出I GET:

DTM_END_END_TRIP_TRIPEND_STATION_STATION_STATION_STATION_STATION_STATIONINC 2018-10-10-10-10-10-10-10-10-10-10-10-10-10-10-10-10-10--10-
2021-12-31 01 00:15:0022
2018-10-01 00:30:000 00
2018-10-01-01 00:45:0011
2018-10-01 01:00201
2018-10-10-01 01 01 01 :15:0011

所需的输出:

DTM_END_TRIPEND_STATIONCOUNT
2018-10-01 00:15:00A2
2018-10-01 00:15:00B0
2018-10-10-01-01 00:15:00C1
2018--- 10-01 00:15:00D2
2018-10-01 00:30:00A3
2018-10-01 00:30:00B2

上表的计数列在这种情况下是用随机数构建,其唯一目的是示例所需输出的体系结构。

I have the following data and I'm resampling my data to find out how many bikes arrive at each of the stations every 15 minutes. However, my code is aggregating my stations too, and I only want to aggregate the variable "dtm_end_trip"

Sample data:

id_tripdtm_start_tripdtm_end_tripstart_stationend_station
12018-10-01 10:15:002018-10-01 10:17:00AB
22018-10-01 10:17:002018-10-01 10:18:00BA
...............
9999992021-12-31 23:58:002022-01-01 00:22:00CA
10000002021-12-31 23:59:002022-01-01 00:29:00AD

Trial code:

df2 =  df(['end_station', 'dtm_end_trip']).size().to_frame(name = 'count').reset_index()
df2 = df2.sort_values(by='count', ascending=False)

df2= df2.set_index('dtm_end_trip')

df2 = df2.resample('15T').count()

Output I get:

dtm_end_tripend_stationcount
2018-10-01 00:15:0022
2018-10-01 00:30:0000
2018-10-01 00:45:0011
2018-10-01 01:00:0022
2018-10-01 01:15:0011

Desired output:

dtm_end_tripend_stationcount
2018-10-01 00:15:00A2
2018-10-01 00:15:00B0
2018-10-01 00:15:00C1
2018-10-01 00:15:00D2
2018-10-01 00:30:00A3
2018-10-01 00:30:00B2

The count column of the table above was, in this case, constructed with random numbers with the sole purpose of exemplifying the architecture of the desired output.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

自控 2025-01-27 06:45:48

您可以使用 >这样:

out = df.groupby([
    pd.Grouper(freq='15min', key='dtm_end_trip'),
    'end_station',
]).size()

>>> out
dtm_end_trip         end_station
2018-10-01 10:15:00  A              1
                     B              1
2022-01-01 00:15:00  A              1
                     D              1
dtype: int64

结果是系列,但是您可以根据所需的输出轻松地将其转换为dataFrame,其标题:

>>> out.to_frame('count').reset_index()
         dtm_end_trip end_station  count
0 2018-10-01 10:15:00           A      1
1 2018-10-01 10:15:00           B      1
2 2022-01-01 00:15:00           A      1
3 2022-01-01 00:15:00           D      1

注意:这是从样本输入数据中的四行。

You can use pd.Grouper like this:

out = df.groupby([
    pd.Grouper(freq='15min', key='dtm_end_trip'),
    'end_station',
]).size()

>>> out
dtm_end_trip         end_station
2018-10-01 10:15:00  A              1
                     B              1
2022-01-01 00:15:00  A              1
                     D              1
dtype: int64

The result is a Series, but you can easily convert it to a DataFrame with the same headings as per your desired output:

>>> out.to_frame('count').reset_index()
         dtm_end_trip end_station  count
0 2018-10-01 10:15:00           A      1
1 2018-10-01 10:15:00           B      1
2 2022-01-01 00:15:00           A      1
3 2022-01-01 00:15:00           D      1

Note: this is the result from the four rows in your sample input data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文