如何对 Pandas 中的时间范围进行分类?
在我的项目中,我试图创建一个新列来按小时范围对记录进行分类,让我解释一下,我在数据框中有一列名为“TowedTime”的时间序列数据,我希望另一列按整小时(不含分钟)进行分类,例如,如果“TowedTime”列中的值是 09:32:10,我希望将其分类为上午 9 点,如果说 12:45:10,则应分类为中午 12 点,依此类推与所有其他值。我已经阅读了有关 .cut 和 bins 函数的信息,但我无法得到我想要的结果。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
df = pd.read_excel("Baltimore Towing Division.xlsx",sheet_name="TowingData")
df['Month'] = pd.DatetimeIndex(df['TowedDate']).strftime("%b")
df['Week day'] = pd.DatetimeIndex(df['TowedDate']).strftime("%a")
monthOrder = ['Jan', 'Feb', 'Mar', 'Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
dayOrder = ['Mon','Tue','Wed','Thu','Fri','Sat','Sun']
pivotHours = pd.pivot_table(df, values='TowedDate',index='TowedTime',
columns='Week day',
fill_value=0,
aggfunc= 'count',
margins = False, margins_name='Total').reindex(dayOrder,axis=1)
print(pivotHours)
In my project I am trying to create a new column to categorize records by range of hours, let me explain, I have a column in the dataframe called 'TowedTime' with time series data, I want another column to categorize by full hour without minutes, for example if the value in the 'TowedTime' column is 09:32:10 I want it to be categorized as 9 AM, if says 12:45:10 it should be categorized as 12 PM and so on with all the other values. I've read about the .cut and bins function but I can't get the result I want.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
df = pd.read_excel("Baltimore Towing Division.xlsx",sheet_name="TowingData")
df['Month'] = pd.DatetimeIndex(df['TowedDate']).strftime("%b")
df['Week day'] = pd.DatetimeIndex(df['TowedDate']).strftime("%a")
monthOrder = ['Jan', 'Feb', 'Mar', 'Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
dayOrder = ['Mon','Tue','Wed','Thu','Fri','Sat','Sun']
pivotHours = pd.pivot_table(df, values='TowedDate',index='TowedTime',
columns='Week day',
fill_value=0,
aggfunc= 'count',
margins = False, margins_name='Total').reindex(dayOrder,axis=1)
print(pivotHours)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,确保“TowedTime”列的类型为日期时间。其次,您可以轻松地从此数据类型中提取小时。
希望它能回答你的问题
First, make sure the type of the column 'TowedTime' is datetime. Second, you can easily extract the hour from this data type.
hope it answers your question
在@Fabien CI 的帮助下解决了这个问题。
首先,我必须使用 dtypes 函数检查“TowedTime”列中值的数据类型。我发现那是一个对象。
我继续尝试将 'TowedTime' 转换为日期时间:
然后在 df 中创建一个新列,仅显示小时数:
结果是这样的:
您可以在图像中注意到,“TowedTime”列仍保留为对象,但新的“Hour”列正确返回小时值。
最初,数据集已经将日期和时间分成不同的列,我认为他们使用了某种方法在Excel中分离日期和时间,这将时间(“TowedTime”)创建为一个对象,我无法转换它,或者至少 dtypes 函数向我展示了这一点。
我尝试了所有 Pandas 方法将对象转换为 Datetime :
With the help of @Fabien C I was able to solve the problem.
First, I had to check the data type of values in the 'TowedTime' column with dtypes function. I found that were a Object.
I proceed to try convert 'TowedTime' to datetime:
Then to create a new column in the df, for only the hours:
And the result was this:
You can notice in the image that 'TowedTime' column remains as an object, but the new 'Hour' column correctly returns the hour value.
Originally, the dataset already had the date and time separated into different columns, I think they used some method to separate date and time in excel and this created the time ('TowedTime') to be an object, I could not convert it, Or at least that's what the dtypes function shows me.
I tried all this Pandas methods for converting the Object to Datetime :