pyExcelerator 在读取某些文件时出现问题
我在读取某些 xls 文件时使用 pyExcelerator 时遇到问题。
我写了一些 python 脚本,使用这个库来解析 XLS 文件并用信息填充数据库。
这些脚本解析的文件的模板可能会有所不同,我有时会重新配置脚本来处理它们。使用我遇到问题的模板之一:pyExcelerator 只是引发了一个异常:
Traceback (most recent call last):
File "/home/* * */parsexls.py",
line 64, in handle_label
parser.parse()
File "/home/* * */parsers.py", line 335, in parse
self.contents = pyExcelerator.parse_xls(self.file_record.file,
self.encoding)
File "/usr/local/lib/python2.6/dist-packages/pyExcelerator/ImportXLS.py",
line 327, in parse_xls
ole_streams = CompoundDoc.Reader(filename).STREAMS
File "/usr/local/lib/python2.6/dist-packages/pyExcelerator/CompoundDoc.py",
line 67, in __init__
self.__build_short_sectors_data()
File "/usr/local/lib/python2.6/dist-packages/pyExcelerator/CompoundDoc.py",
line 256, in __build_short_sectors_data
dentry_start_sid, stream_size) = self.dir_entry_list[0]
IndexError: list index out of range
一些问题 XLS 文件包含空工作表,删除这些工作表会有所帮助,但许多文件即使没有空工作表也无法处理。这些文件中没有什么特别的,它们不包含公式或图片 - 只是字符串、数字和日期。
正如我所看到的, pyExcelerator 已被其作者放弃:(
非常感谢任何有关解决此问题的建议。
I've got a problem using pyExcelerator when reading some xls-files.
There're some python scripts i wrote, that use this library to parse XLS-files and populate database with info.
The templates for the files these scripts parse may vary and i sometimes reconfigure the script to handle them. With the one of the templates i ran into problem: pyExcelerator just raises an exception:
Traceback (most recent call last):
File "/home/* * */parsexls.py",
line 64, in handle_label
parser.parse()
File "/home/* * */parsers.py", line 335, in parse
self.contents = pyExcelerator.parse_xls(self.file_record.file,
self.encoding)
File "/usr/local/lib/python2.6/dist-packages/pyExcelerator/ImportXLS.py",
line 327, in parse_xls
ole_streams = CompoundDoc.Reader(filename).STREAMS
File "/usr/local/lib/python2.6/dist-packages/pyExcelerator/CompoundDoc.py",
line 67, in __init__
self.__build_short_sectors_data()
File "/usr/local/lib/python2.6/dist-packages/pyExcelerator/CompoundDoc.py",
line 256, in __build_short_sectors_data
dentry_start_sid, stream_size) = self.dir_entry_list[0]
IndexError: list index out of range
Some of the problem XLS-files contained empty sheets and removing of these sheets helped, but many of the files can't be handled even without empty sheets. There's nothing extraordinary in these files and they contain no formulas or pictures - just strings, numbers and dates.
As i can see, the pyExcelerator is abandoned by it's author :(
Any suggestions on fixing this issue are much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我是xlrd的作者。它读取XLS 文件,并且不是任何东西的分支。我维护一个名为 xlwt 的包,它writes XLS 文件,并且是 pyExcelerator 的一个分支。 pyExcelerator 中的 parse_xls 功能已被弃用,甚至已从 xlwt 中删除。请改用 xlrd。
鉴于您复制的回溯,该文件看起来可能已损坏。它所做的事情早在解析工作表数据之前就发生了。什么软件生成这些文件?您可以使用 Excel 或 OpenOffice.org 的 Calc 或 Gnumeric 打开它们吗? xlrd 可能会给您更有意义的错误消息。您可能想向我发送 (insert_punctuation('sjmachin', 'lexicon', 'net')) 失败文件的副本;请包括一些带空纸的和一些不带空纸的。顺便问一下,你用什么来删除空表?处理带有空工作表的文件时,从 pyExcelerator 收到什么错误消息?
I'm the author of xlrd. It reads XLS files and is not a fork of anything. I maintain a package called xlwt which writes XLS files and is a fork of pyExcelerator. The parse_xls functionality in pyExcelerator was deprecated to the point of removal from xlwt. Use xlrd instead.
Given the traceback that you reproduced, it looks like the file may be corrupted. What it is doing there happens well before the sheet data is parsed. What software produces these files? Can you open them with Excel or OpenOffice.org's Calc or Gnumeric? xlrd may give you a more meaningful error message. You may like to send me (insert_punctuation('sjmachin', 'lexicon', 'net')) copies of your failing file(s); please include some with and some without empty sheets. By the way, what are you using to remove empty sheets? What error message do you get from pyExcelerator when processing files with empty sheets?
您可能希望尝试一下 xlrd...它(我相信)是作为 pyExcelerator 的一个分支开始的,因此合并需要很少的代码更改,但它得到了积极的维护:
http://pypi.python.org/pypi/xlrd
项目网站
一般信息、文档中的发行说明和历史记录
You might wish to give xlrd a try... it started (I believe) as a fork of pyExcelerator, so incorporating requires few code changes, but it is actively maintained:
http://pypi.python.org/pypi/xlrd
Project website
General info, release notes and history from the documentation