Python Pandas:仅基于其中一列重新采样
我有以下数据,我正在重新采样我的数据,以找出每15分钟到达每一个车站的自行车。 However, my code is aggregating my stations too, and I only want to aggregate the variable "dtm_end_trip"
Sample data:
id_trip | dtm_start_trip | dtm_end_trip | start_station | end_station |
---|---|---|---|---|
1 | 2018-10-01 10:15:00 | 2018-10-01 10:17:00 | A | B |
2 | 10:17:00 | 10:18:00 | B | A |
... | | 2018-10-01 | | |
2018-10-01 | 00 | 2022-01-01 00:22:00 | C | A |
1000000 | 23:59:00 | 2022-01-01-01 00:29:00 | A | D |
试用代码:
df2 = df(['end_station', 'dtm_end_trip']).size().to_frame(name = 'count').reset_index()
df2 = df2.sort_values(by='count', ascending=False)
df2= df2.set_index('dtm_end_trip')
df2 = df2.resample('15T').count()
输出I GET:
DTM_END_END_TRIP_TRIP | END_STATION_STATION_STATION_STATION_STATION_STATION | INC 2018-10-10-10-10-10-10-10-10-10-10-10-10-10-10-10-10-10--10- |
---|---|---|
2021-12-31 01 00:15:00 | 2 | 2 |
2018-10-01 00:30:00 | 0 0 | 0 |
2018-10-01-01 00:45:00 | 1 | 1 |
2018-10-01 01:00 | 2 | 01 |
2018-10-10-01 01 01 01 :15:00 | 1 | 1 |
所需的输出:
DTM_END_TRIP | END_STATION | COUNT |
---|---|---|
2018-10-01 00:15:00 | A | 2 |
2018-10-01 00:15:00 | B | 0 |
2018-10-10-01-01 00:15:00 | C | 1 |
2018--- 10-01 00:15:00 | D | 2 |
2018-10-01 00:30:00 | A | 3 |
2018-10-01 00:30:00 | B | 2 |
上表的计数列在这种情况下是用随机数构建,其唯一目的是示例所需输出的体系结构。
I have the following data and I'm resampling my data to find out how many bikes arrive at each of the stations every 15 minutes. However, my code is aggregating my stations too, and I only want to aggregate the variable "dtm_end_trip"
Sample data:
id_trip | dtm_start_trip | dtm_end_trip | start_station | end_station |
---|---|---|---|---|
1 | 2018-10-01 10:15:00 | 2018-10-01 10:17:00 | A | B |
2 | 2018-10-01 10:17:00 | 2018-10-01 10:18:00 | B | A |
... | ... | ... | ... | ... |
999999 | 2021-12-31 23:58:00 | 2022-01-01 00:22:00 | C | A |
1000000 | 2021-12-31 23:59:00 | 2022-01-01 00:29:00 | A | D |
Trial code:
df2 = df(['end_station', 'dtm_end_trip']).size().to_frame(name = 'count').reset_index()
df2 = df2.sort_values(by='count', ascending=False)
df2= df2.set_index('dtm_end_trip')
df2 = df2.resample('15T').count()
Output I get:
dtm_end_trip | end_station | count |
---|---|---|
2018-10-01 00:15:00 | 2 | 2 |
2018-10-01 00:30:00 | 0 | 0 |
2018-10-01 00:45:00 | 1 | 1 |
2018-10-01 01:00:00 | 2 | 2 |
2018-10-01 01:15:00 | 1 | 1 |
Desired output:
dtm_end_trip | end_station | count |
---|---|---|
2018-10-01 00:15:00 | A | 2 |
2018-10-01 00:15:00 | B | 0 |
2018-10-01 00:15:00 | C | 1 |
2018-10-01 00:15:00 | D | 2 |
2018-10-01 00:30:00 | A | 3 |
2018-10-01 00:30:00 | B | 2 |
The count column of the table above was, in this case, constructed with random numbers with the sole purpose of exemplifying the architecture of the desired output.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用 >这样:
结果是
系列
,但是您可以根据所需的输出轻松地将其转换为dataFrame
,其标题:注意:这是从样本输入数据中的四行。
You can use
pd.Grouper
like this:The result is a
Series
, but you can easily convert it to aDataFrame
with the same headings as per your desired output:Note: this is the result from the four rows in your sample input data.