Pandas.dataframe.read_excel 读取excel 日期格式

发布于 2022-09-11 17:53:30 字数 321 浏览 18 评论 0

使用Pandas分析一个excel的数据，里面有excel日期字段date，存储的值如2018/12/31这样的格式，使用pd.read_excel读取所有excel数据，由于表中的值都是人为录入，为避免潜在的读取错误，特意设置了read_excel的coverters={'date':str}，，但是最终 dataframe 输出的日期却是：“2018-12-31 00:00:00”这样的值，并没有原样存储excel中的值，请问有什么方法使得日期以原样的文本存储吗？

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

葬﹪忆之殇 2022-09-18 17:53:30

老铁, 可以使用datetime库

import datetime

def time_convert(time_str):
    time_obj = datetime.datetime.strptime(str(time_str), '%Y-%m-%d %H:%M:%S')
    time_converted = time_obj.strftime('%Y/%m/%d')
    return time_converted 
    
dataset['日期'] = dataset['日期'].apply( time_convert )

于2019.2.24 12:27更改

简单的看了下官方的接口文档:
有三处可以一试:

一.dtype参数

dtype : Type name or dict of column -> type, default None
Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
Use `object` to preserve data as stored in Excel and not interpret dtype.
If converters are specified, they will be applied INSTEAD of dtype conversion.
dtype：类型名或(列 - >类型)的字典，默认为无
        数据或列的数据类型例如 {'a'：np.float64，'b'：np.int32}
        使用object来保存存储在Excel中的数据，而不是解释dtype
        如果指定了转换器，则它们将应用于dtype转换的INSTEAD。

而converters参数:

converters : dict, default None
  Dict of functions for converting values in certain columns. Keys can
  either be integers or column labels, values are functions that take one
  input argument, the Excel cell content, and return the transformed
  content.
converter：类型dict，默认无
需要一个用于转换某些列中的值的函数的字典。
键可以是整数或列标签，值是带有一个输入参数的函数，Excel单元格内容，并返回转换后的内容。

所以:
你可以在read_excel()中设置:dtype={'data':str}, 而不是参数convert
尝试过了, 无解。。。。。。

看了后面的文档, 好像是解释器, 默认会使用`dateutil.parser.parser`来解析你的时间, 并返回`pd.Timestamp`对象, 默认都是`xxxx-xx-xx xx:xx:xx`, 你可以重写`pd.Timestamp`的`str()` 和`repr()`函数

二.parse_dates 参数

parse_dates : bool, list-like, or dict, default False

The behavior is as follows:

* bool. If True -> try parsing the index.
* list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
* list of lists. e.g.  If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
* dict, e.g. {{'foo' : [1, 3]}} -> parse columns 1, 3 as date and call result 'foo'

 If a column or index contains an unparseable date, the entire column or
 index will be returned unaltered as an object data type. For non-standard
 datetime parsing, use `pd.to_datetime` after `pd.read_csv`

就说下最后一段吧:

如果列或索引包含不可解析的日期，则整个列或索引将作为对象数据类型以不变的方式返回。对于非标准的日期时间解析，在pd.read_csv()之后使用pd.to_datetime()

看完最后这个和时间有关参数, 我好像通了......

三.date_parser参数

date_parser : function, optional
Function to use for converting a sequence of string columns to an array of datetime instances.
The default uses dateutil.parser.parser to do the conversion.
Pandas will try to call date_parser in three different ways, advancing to the next if an exception occurs:
 1) Pass one or more arrays (as defined by `parse_dates`) as arguments; 
 2) concatenate (row-wise) the string values from the columns defined by `parse_dates` into a single array and pass that; 
 3) call `date_parser` once for each row using one or more strings (corresponding to the columns defined by `parse_dates`) as arguments.

翻译:

date_parser : function, optional
该参数是需要一个用于将字符串列序列转换为日期时间实例数组的函数
默认使用dateutil.parser.parser来进行转换。
Pandas将尝试以三种不同的方式调用date_parser，如果发生异常则前进到下一个：
1）传递一个或多个数组（由parse_dates定义）作为参数;
2）将来自parse_dates定义的列的字符串值连接（逐行）到一个数组中并传递;
3）使用一个或多个字符串（对应于parse_dates定义的列）作为参数，为每一行调用date_parser一次。

也就是说, 除非你制定一个对时间字符串不做改变的函数, 否则他一定会解析你的时间字符串转换成可以分析的格式(也就是你看到的那样“2018-12-31 00:00:00”),