我的 Perl 脚本如何确定 Excel 文件是 XLS 还是 XLSX 格式?
我有一个 Perl 脚本,可以从 Excel (xls
) 二进制文件中读取数据。但向我们发送这些文件的客户端有时已经开始向我们发送 XLSX
格式的文件。我已经更新了脚本以便也能够阅读这些内容。但是,客户端有时喜欢使用 .xls
扩展名来命名 XLSX
文件,这目前使我的脚本感到困惑,因为它使用文件名来确定哪种文件类型这是。
XLSX
文件是包含 XML 内容的 zip 文件。有没有一种简单的方法可以让我的脚本查看文件并判断它是否是 zip 文件?如果是这样,我可以让我的脚本使用它,而不仅仅是文件名。
I have a Perl script that reads data from an Excel (xls
) binary file. But the client that sends us these files has started sending us XLSX
format files at times. I've updated the script to be able to read those as well. However, the client sometimes likes to name the XLSX
files with an .xls
extension, which currently confuses the heck outta my script since it uses the file name to determine which file type it is.
An XLSX
file is a zip file that contains XML stuff. Is there a simple way for my script to look at the file and tell whether it's a zip file or not? If so, I can make my script go by that instead of just the file name.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
是的,可以通过检查幻数来实现。
Perl 中有很多模块用于检查文件中的幻数 。
使用 File::LibMagic 的示例:
另一个使用 文件::类型:
Yes, it is possible by checking magic number.
There are quite a few modules in Perl for checking magic number in a file.
An example using File::LibMagic:
Another example, using File::Type:
.xlsx 文件的前 2 个字节为“PK”,因此只需打开并检查前 2 个字符即可。
.xlsx files have the first 2 bytes as 'PK', so a simple open and examination of the first 2 characters will do.
编辑:Archive::Zip 更好
Edit: Archive::Zip is a better
使用
File::Type
:我刚刚使用
.xlsx
文件对其进行了测试,mime_type()
返回了application/zip
。同样,对于.xls
文件,mime_type()
是application/octet-stream
。Use
File::Type
:I just tested it with a
.xlsx
file, and themime_type()
returnedapplication/zip
. Similarly, for a.xls
file themime_type()
isapplication/octet-stream
.您可以通过检查文件的第一个字节中的 Excel 标题来检测 xls 文件。
可以从此处获取有效的旧版 Excel 标题列表(除非您知道其 Excel 的确切版本,否则请检查所有适用的可能性):
http://toorcon.techpathways.com/uploads/headersig.txt
Zip 标头的描述如下:http://en.wikipedia.org/wiki/ZIP_(file_format)#File_headers
但我不确定 .xlsx 文件是否具有相同的标头。
File::Type 的逻辑似乎是“PK\003\004”作为决定 zip 文件的文件头...但我不确定该逻辑是否适用于 .xlsx,没有要测试的文件。
You can detect the xls file by checking the first bytes of the file for Excel headers.
A list of valid older Excel headers can be gotten from here (unless you know exact version of their Excel, check for all applicable possibilities):
http://toorcon.techpathways.com/uploads/headersig.txt
Zip headers are described here: http://en.wikipedia.org/wiki/ZIP_(file_format)#File_headers
but i'm not sure if .xlsx files have the same headers.
File::Type's logic seems to be "PK\003\004" as the file header to decide on zip files... but I'm not certain if that logic would work as far as .xlsx, not having a file to test.
因此,
与
application/zip
进行比较可能可以实现检测 zip 的技巧。当然,您需要安装file
,这在 UNIX 系统上很常见。恐怕我无法提供 Perl 的例子,因为所有 Perl 的知识都从我的记忆中消失了,而且我手头没有例子。Hence, probably comparing
with
application/zip
would do the trick of detecting zips. Of course, you need to havefile
installed which is quite usual on UNIX systems. I'm afraid I cannot provide Perl example since all knowledge of Perl evaporated from my memory, and I have no examples at hand.我不能谈论 Perl,但是通过我使用的框架 .Net,有许多可用的库可以操作您可以使用的 zip 文件。
我见过人们使用的另一件事是 WinZip 的命令行版本。当文件解压缩时,它给出的返回值为 0;当出现错误时,返回值为非零。
这可能不是最好的方法,但它是一个开始。
I can't say about Perl, but with the framework I use, .Net, there are a number of libraries available that will manipulate zip files you could use.
Another thing that I've seen people use is the command-line version of WinZip. It give a return-value that is 0 when a file is unzipped and non-zero when there is an error.
This may not be the best way to do this, but it's a start.