用XLSXWriter导出的Excel中的数据时间格式识别

发布于 2025-01-26 19:54:35 字数 1629 浏览 4 评论 0原文

我没有找到解决方案：

从数据框架中，我生成了Excel，有些列需要以HH：MM：SS的格式（无限制为24H，例如，值为'28：39：13' ，

所有内容都可以。

## Example code
import pandas as pd
import numpy as np
from xlsxwriter.utility import xl_rowcol_to_cell
example_list = ["12:35:25", "", "", "", "27:36:11"]
example_df = pd.DataFrame(example_list)
writer = pd.ExcelWriter('enhanced.xlsx', engine='xlsxwriter')
example_df.to_excel(writer, index=False, header=False, sheet_name='report')
workbook = writer.book
worksheet = writer.sheets['report']
total_hformat = workbook.add_format({'num_format': '[h]:mm:ss', 'bold': False})
worksheet.set_column('A:C', 12, total_hformat)
writer.save()
writer.close()

当

生成Excel时 a href =“ https://i.sstatic.net/gcz3b.png” rel =“ nofollow noreferrer”> a1+a2+a3+a3+a4+a4+a5+a6返回>

sum（a1：a6）返回错误的结果

应用“应用”文本到列“：

text_to_to_columns

text_to_to_columns_2

将“文本应用于列”：

一切正常

为什么会发生这种情况？我如何从XLSXWriter求解它，并避免导出Excel后转换所有列的需要？

谢谢。

编辑：我解决了！ Excel具有自己的数字时间格式，该格式由整数（天） +分数（一天中的百分比）组成。示例1,25 = 30小时。然后，我们只需要以秒为单位（UNIX格式）的时间，然后将时间分配在一天的秒数（60*60*24）之间。这就是数据框中应有的值。用XLSXWriter导出时，只需像上面的示例中应用格式'[h]：mm：ss'。

原文

I didn't find a solution for this:

From a dataframe I generate an excel and some columns need to be in format hh:mm:ss (with no limit to 24h, for example a value can be '28:39:13'.

When generating the excel everything looks okay. But when operating with the values of the cells isn't working properly until I apply the "text to columns" option.

## Example code
import pandas as pd
import numpy as np
from xlsxwriter.utility import xl_rowcol_to_cell
example_list = ["12:35:25", "", "", "", "27:36:11"]
example_df = pd.DataFrame(example_list)
writer = pd.ExcelWriter('enhanced.xlsx', engine='xlsxwriter')
example_df.to_excel(writer, index=False, header=False, sheet_name='report')
workbook = writer.book
worksheet = writer.sheets['report']
total_hformat = workbook.add_format({'num_format': '[h]:mm:ss', 'bold': False})
worksheet.set_column('A:C', 12, total_hformat)
writer.save()
writer.close()

Before applying "text to columns":

A1+A2+A3+A4+A5+A6 returns correct result.

sum(A1:A6) returns wrong result

Applying "text to columns":

text_to_columns

text_to_columns_2

After applying "text to columns":

Everything is working fine

Why does this happen? How can I solve it from xlsxwriter and avoid the need of converting all the columns after exporting an excel??

Thanks.

EDIT: I SOLVED IT!!
Excel has own numeric time format which is composed of a integer (days) + fraction (percentage time of the day). Example 1,25 = 30 hours.
Then we just need the time in seconds (unix format) then divide the time between the amount of seconds of a day (60*60*24). That is the value which sould be in the dataframe. When exporting with xlsxwriter, just apply the format '[h]:mm:ss' like i shown in the example above.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不回头走下去 2025-02-02 19:54:35

我解决了它：Excel具有自己的数字时间格式，该格式由整数（天） +分数（一天中的百分比）组成。示例1,25 = 30小时。然后，我们只需要以秒为单位（UNIX格式）的时间，然后将时间分配在一天的秒数（60*60*24）之间。这就是数据框中应有的值。用XLSXWriter导出时，只需像上面的示例中应用格式'[h]：mm：ss'。

回复收藏 0 原文

差↓一点笑了 2025-02-02 19:54:35

解决此问题的最佳方法是将时间字符串转换为dateTime对象，然后将'[h]：mm：ss'设置为 pandas.excelwriter 。但是，我认为不可能使用 datetime.strptime（）。

在这种情况下，最容易将时间字符串转换为 norofollow noreferrer“> excel serial serial date 。类似的内容：

import pandas as pd
import re

example_list = ['12:35:25', '', '', '', '27:36:11']

# Convert the time string to Excel serial dates.
for i, date in enumerate(example_list):
    m = re.match('^(\d+):(\d+):([\d\.]+)
 输出：
   
除了for（）循环，代码的其余部分与您的相同。我允许十进制数秒，这可能不是必需的。
, date)

    if m:
        seconds = (int(m.group(1)) * 60 * 60 +
                   int(m.group(2)) * 60 +
                   float(m.group(3)))
        serial_date = seconds / (24 * 60 * 60)
        example_list[i] = serial_date

example_df = pd.DataFrame(example_list)
writer = pd.ExcelWriter('enhanced.xlsx', engine='xlsxwriter')

example_df.to_excel(writer, index=False, header=False, sheet_name='report')

workbook = writer.book
worksheet = writer.sheets['report']
total_hformat = workbook.add_format({'num_format': '[h]:mm:ss', 'bold': False})
worksheet.set_column('A:C', 12, total_hformat)

writer.close()

输出：

除了for（）循环，代码的其余部分与您的相同。我允许十进制数秒，这可能不是必需的。

The best way to solve this would be to convert the time strings to datetime objects and then set '[h]:mm:ss' as the datetime_format in pandas.ExcelWriter. However, I don't think it is possible to parse 24h+ times with datetime.strptime().

In which case it is probably easiest to convert the time string to an Excel serial date. Something like this:

import pandas as pd
import re

example_list = ['12:35:25', '', '', '', '27:36:11']

# Convert the time string to Excel serial dates.
for i, date in enumerate(example_list):
    m = re.match('^(\d+):(\d+):([\d\.]+)
Output:

Apart from the for() loop the rest of the code is the same as yours. I've allowed for decimal seconds which may not be necessary.
, date)

    if m:
        seconds = (int(m.group(1)) * 60 * 60 +
                   int(m.group(2)) * 60 +
                   float(m.group(3)))
        serial_date = seconds / (24 * 60 * 60)
        example_list[i] = serial_date

example_df = pd.DataFrame(example_list)
writer = pd.ExcelWriter('enhanced.xlsx', engine='xlsxwriter')

example_df.to_excel(writer, index=False, header=False, sheet_name='report')

workbook = writer.book
worksheet = writer.sheets['report']
total_hformat = workbook.add_format({'num_format': '[h]:mm:ss', 'bold': False})
worksheet.set_column('A:C', 12, total_hformat)

writer.close()