Pandas 数据框 to_datetime() 给出错误

发布于 2025-01-13 12:13:51 字数 1885 浏览 0 评论 0原文

目标: 我从 .csv 读取测量数据并将其转换为数据框。然后,我将文件名中的日期信息添加到数据框中已有的时间字符串中。最后一步是将带有日期和时间信息的字符串转换为日期时间对象。

有效的第一个步骤:

import pandas as pd
filename = '2022_02_14_data_0.csv
path = 'C:/Users/ma1075116/switchdrive/100_Schaltag/100_Digitales_Model/Messungen/'
measData = pd.read_csv(path+filename, sep = '\t', header = [0,1], encoding = 'ISO-8859-1')
# add the date to the timestamp string
measData['Timestamp'] = filename[:11]+measData['Timestamp']

数据帧 measData['Timestamp'] 中的对象现在精确地具有以下模式的字符串:

'2022_02_14_00:00:06'

现在我想将此字符串转换为日期时间:

measData['Timestamp'] = pd.to_datetime(measData['Timestamp'], format= '%Y_%m_%d_%H:%M:%S')

这会引发错误:

ValueError:组装映射至少需要指定[年、月、日]:缺少[日、月、年]

为什么会出现此错误以及如何避免它?我非常确定格式是正确的。

编辑: 我编写了一个示例代码,它应该执行完全相同的操作,并且它有效:

filename = '2022_02_14_data_0.csv'
timestamps = {'Timestamp': ['00:00:00', '00:00:01', '00:00:04']}
testFrame = pd.DataFrame(timestamps)
testFrame['Timestamp'] = testFrame['Timestamp']#
testFrame['Timestamp'] = filename[:11]+testFrame['Timestamp']
testFrame['Timestamp'] = pd.to_datetime(testFrame['Timestamp'], format= '%Y_%m_%d_%X') 

我的下一步是检查数据帧中的所有时间戳条目是否具有相同的格式。

解决方案: 我不明白这个错误,但我找到了一个可行的解决方案。现在,我解析 read_csv 函数中已有的时间,并添加文件名中的日期信息。这有效,measData(timeStamp) 现在的数据类型为 datetime64。

filename = '2022_02_14_data_0.csv'
path = 'C:/Users/ma1075116/switchdrive/100_Schaltag/100_Digitales_Model/Messungen/'
measData = pd.read_csv(path+filename, sep = '\t', header = [0,1], 
                       parse_dates=[0], # parse for the time in the first column
                       date_parser = lambda col: pd.to_datetime(filename[:11]+col, format= '%Y_%m_%d_%X'),
                       encoding = 'ISO-8859-1')

Goal:
I read measurement data from a .csv and convert them to a dataframe. Then I add the date information from the filename to the time string which is already in the dataframe. And the last step is to convert this string with date and time informatin into a datetime object.

First steps that worked:

import pandas as pd
filename = '2022_02_14_data_0.csv
path = 'C:/Users/ma1075116/switchdrive/100_Schaltag/100_Digitales_Model/Messungen/'
measData = pd.read_csv(path+filename, sep = '\t', header = [0,1], encoding = 'ISO-8859-1')
# add the date to the timestamp string
measData['Timestamp'] = filename[:11]+measData['Timestamp']

An object in the Dataframe measData['Timestamp'] has now exacty a string with the following pattern:

'2022_02_14_00:00:06'

Now I want to convert this string to datetime:

measData['Timestamp'] = pd.to_datetime(measData['Timestamp'], format= '%Y_%m_%d_%H:%M:%S')

This raises the error:

ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing

Why do I get this error and how can I avoid it? I am pretty shure that the format is correct.

Edit:
I wrote a sample code which should do exactly the same, and it works:

filename = '2022_02_14_data_0.csv'
timestamps = {'Timestamp': ['00:00:00', '00:00:01', '00:00:04']}
testFrame = pd.DataFrame(timestamps)
testFrame['Timestamp'] = testFrame['Timestamp']#
testFrame['Timestamp'] = filename[:11]+testFrame['Timestamp']
testFrame['Timestamp'] = pd.to_datetime(testFrame['Timestamp'], format= '%Y_%m_%d_%X') 

My next step is now to check if all timestamp entries in the dataframe have the same format.

Solution:
I do not understand the error but I found a working solution. Now I parse for the time already in the read_csv function and add the date information from the filename there. This works, measData(timeStamp) has now the datatype datetime64.

filename = '2022_02_14_data_0.csv'
path = 'C:/Users/ma1075116/switchdrive/100_Schaltag/100_Digitales_Model/Messungen/'
measData = pd.read_csv(path+filename, sep = '\t', header = [0,1], 
                       parse_dates=[0], # parse for the time in the first column
                       date_parser = lambda col: pd.to_datetime(filename[:11]+col, format= '%Y_%m_%d_%X'),
                       encoding = 'ISO-8859-1')

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

可爱咩 2025-01-20 12:13:51

您的格式似乎在一天后缺少下划线。

这对我有用:

import pandas as pd
date_str = '2022_02_14_00:00:06'
pd.to_datetime(date_str, format= '%Y_%m_%d_%H:%M:%S')

编辑:

这对我来说效果很好(measData [“Timestamp”]是一个pd.Series):

import pandas as pd

measData = pd.DataFrame({"Timestamp": ['2022_02_14_00:00:06', '2022_02_14_13:55:06', '2022_02_14_12:00:06']})
pd.to_datetime(measData["Timestamp"], format= '%Y_%m_%d_%H:%M:%S')

我发现重现错误的唯一方法是这个(measData是一个pd.DataFrame):

import pandas as pd

measData = pd.DataFrame({"Timestamp": ['2022_02_14_00:00:06', '2022_02_14_13:55:06', '2022_02_14_12:00:06']})
pd.to_datetime(measData, format= '%Y_%m_%d_%H:%M:%S')

所以请确保您放入 to_datetime 的内容是 pd.Series。如果这没有帮助,请提供您的一小部分数据样本。

Your format seems to be missing an underscore after day.

This works for me:

import pandas as pd
date_str = '2022_02_14_00:00:06'
pd.to_datetime(date_str, format= '%Y_%m_%d_%H:%M:%S')

EDIT:

This works fine for me (measData["Timestamp"] is a pd.Series):

import pandas as pd

measData = pd.DataFrame({"Timestamp": ['2022_02_14_00:00:06', '2022_02_14_13:55:06', '2022_02_14_12:00:06']})
pd.to_datetime(measData["Timestamp"], format= '%Y_%m_%d_%H:%M:%S')

The only way I found to reproduce your error is this (measData is a pd.DataFrame):

import pandas as pd

measData = pd.DataFrame({"Timestamp": ['2022_02_14_00:00:06', '2022_02_14_13:55:06', '2022_02_14_12:00:06']})
pd.to_datetime(measData, format= '%Y_%m_%d_%H:%M:%S')

So make sure that what you are putting into to_datetime is a pd.Series. If this does not help, please provide a small sample of your data.

寂寞花火° 2025-01-20 12:13:51

您可以在列中使用 datetime.datetime.strptimeapply 来执行此操作。

重新创建数据集:

import datetime
import pandas as pd

data = {'2016_03_29_08:15:27', '2017_03_29_08:18:27', 
        '2018_06_30_08:15:27', '2019_07_29_08:15:27'}
columns = {'time'}

df = pd.DataFrame(data=data, columns=columns)

应用所需的转换:

df['time'] = df.apply(lambda row : datetime.datetime.strptime(row['time'], 
                      '%Y_%m_%d_%H:%M:%S'), axis=1)

You can do it like this using datetime.datetime.strptime and apply in the column.

Recreating your dataset:

import datetime
import pandas as pd

data = {'2016_03_29_08:15:27', '2017_03_29_08:18:27', 
        '2018_06_30_08:15:27', '2019_07_29_08:15:27'}
columns = {'time'}

df = pd.DataFrame(data=data, columns=columns)

Applying the desired transformation:

df['time'] = df.apply(lambda row : datetime.datetime.strptime(row['time'], 
                      '%Y_%m_%d_%H:%M:%S'), axis=1)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文