我的 Perl 脚本如何确定 Excel 文件是 XLS 还是 XLSX 格式?

发布于 2024-09-29 05:02:46 字数 334 浏览 9 评论 0原文

我有一个 Perl 脚本,可以从 Excel (xls) 二进制文件中读取数据。但向我们发送这些文件的客户端有时已经开始向我们发送 XLSX 格式的文件。我已经更新了脚本以便也能够阅读这些内容。但是,客户端有时喜欢使用 .xls 扩展名来命名 XLSX 文件,这目前使我的脚本感到困惑,因为它使用文件名来确定哪种文件类型这是。

XLSX 文件是包含 XML 内容的 zip 文件。有没有一种简单的方法可以让我的脚本查看文件并判断它是否是 zip 文件?如果是这样,我可以让我的脚本使用它,而不仅仅是文件名。

I have a Perl script that reads data from an Excel (xls) binary file. But the client that sends us these files has started sending us XLSX format files at times. I've updated the script to be able to read those as well. However, the client sometimes likes to name the XLSX files with an .xls extension, which currently confuses the heck outta my script since it uses the file name to determine which file type it is.

An XLSX file is a zip file that contains XML stuff. Is there a simple way for my script to look at the file and tell whether it's a zip file or not? If so, I can make my script go by that instead of just the file name.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

菊凝晚露 2024-10-06 05:02:46

是的,可以通过检查幻数来实现。

Perl 中有很多模块用于检查文件中的幻数

使用 File::LibMagic 的示例:

use strict;
use warnings;

use File::LibMagic;

my $lm = File::LibMagic->new();

if ( $lm->checktype_filename($filename) eq 'application/zip; charset=binary' ) {
    # XLSX format
}
elsif ( $lm->checktype_filename($filename) eq 'application/vnd.ms-office; charset=binary' ) {
    # XLS format
}

另一个使用 文件::类型

use strict;
use warnings;

use File::Type;

my $ft = File::Type->new();

if ( $ft->mime_type($file) eq 'application/zip' ) {
    # XLSX format
}
else {
    # probably XLS format
}

Yes, it is possible by checking magic number.

There are quite a few modules in Perl for checking magic number in a file.

An example using File::LibMagic:

use strict;
use warnings;

use File::LibMagic;

my $lm = File::LibMagic->new();

if ( $lm->checktype_filename($filename) eq 'application/zip; charset=binary' ) {
    # XLSX format
}
elsif ( $lm->checktype_filename($filename) eq 'application/vnd.ms-office; charset=binary' ) {
    # XLS format
}

Another example, using File::Type:

use strict;
use warnings;

use File::Type;

my $ft = File::Type->new();

if ( $ft->mime_type($file) eq 'application/zip' ) {
    # XLSX format
}
else {
    # probably XLS format
}
枯寂 2024-10-06 05:02:46

.xlsx 文件的前 2 个字节为“PK”,因此只需打开并检查前 2 个字符即可。

.xlsx files have the first 2 bytes as 'PK', so a simple open and examination of the first 2 characters will do.

放肆 2024-10-06 05:02:46

编辑:Archive::Zip 更好

solution
 # Read a Zip file
   my $somezip = Archive::Zip->new();
   unless ( $somezip->read( 'someZip.zip' ) == AZ_OK ) {
       die 'read error';
   }

Edit: Archive::Zip is a better

solution
 # Read a Zip file
   my $somezip = Archive::Zip->new();
   unless ( $somezip->read( 'someZip.zip' ) == AZ_OK ) {
       die 'read error';
   }
我不吻晚风 2024-10-06 05:02:46

使用 File::Type

my $file = "foo.zip";
my $filetype = File::Type->new( );

if( $filetype->mime_type( $file ) eq 'application/zip' ) {
  # File is a zip archive.
  ...
}

我刚刚使用 .xlsx 文件对其进行了测试,mime_type() 返回了 application/zip。同样,对于 .xls 文件,mime_type()application/octet-stream

Use File::Type:

my $file = "foo.zip";
my $filetype = File::Type->new( );

if( $filetype->mime_type( $file ) eq 'application/zip' ) {
  # File is a zip archive.
  ...
}

I just tested it with a .xlsx file, and the mime_type() returned application/zip. Similarly, for a .xls file the mime_type() is application/octet-stream.

哭泣的笑容 2024-10-06 05:02:46

您可以通过检查文件的第一个字节中的 Excel 标题来检测 xls 文件。

可以从此处获取有效的旧版 Excel 标题列表(除非您知道其 Excel 的确切版本,否则请检查所有适用的可能性):

http://toorcon.techpathways.com/uploads/headersig.txt


Zip 标头的描述如下:http://en.wikipedia.org/wiki/ZIP_(file_format)#File_headers
但我不确定 .xlsx 文件是否具有相同的标头。

File::Type 的逻辑似乎是“PK\003\004”作为决定 zip 文件的文件头...但我不确定该逻辑是否适用于 .xlsx,没有要测试的文件。

You can detect the xls file by checking the first bytes of the file for Excel headers.

A list of valid older Excel headers can be gotten from here (unless you know exact version of their Excel, check for all applicable possibilities):

http://toorcon.techpathways.com/uploads/headersig.txt


Zip headers are described here: http://en.wikipedia.org/wiki/ZIP_(file_format)#File_headers
but i'm not sure if .xlsx files have the same headers.

File::Type's logic seems to be "PK\003\004" as the file header to decide on zip files... but I'm not certain if that logic would work as far as .xlsx, not having a file to test.

芯好空 2024-10-06 05:02:46
The-Evil-MacBook:~ ivucica$ file --mime-type --brief file.zip 
application/zip

因此,

`file --mime-type --brief $filename`

application/zip 进行比较可能可以实现检测 zip 的技巧。当然,您需要安装 file,这在 UNIX 系统上很常见。恐怕我无法提供 Perl 的例子,因为所有 Perl 的知识都从我的记忆中消失了,而且我手头没有例子。

The-Evil-MacBook:~ ivucica$ file --mime-type --brief file.zip 
application/zip

Hence, probably comparing

`file --mime-type --brief $filename`

with application/zipwould do the trick of detecting zips. Of course, you need to have file installed which is quite usual on UNIX systems. I'm afraid I cannot provide Perl example since all knowledge of Perl evaporated from my memory, and I have no examples at hand.

沙沙粒小 2024-10-06 05:02:46

我不能谈论 Perl,但是通过我使用的框架 .Net,有许多可用的库可以操作您可以使用的 zip 文件。

我见过人们使用的另一件事是 WinZip 的命令行版本。当文件解压缩时,它给出的返回值为 0;当出现错误时,返回值为非零。

这可能不是最好的方法,但它是一个开始。

I can't say about Perl, but with the framework I use, .Net, there are a number of libraries available that will manipulate zip files you could use.

Another thing that I've seen people use is the command-line version of WinZip. It give a return-value that is 0 when a file is unzipped and non-zero when there is an error.

This may not be the best way to do this, but it's a start.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文