当前位置：文江博客话题详情

如何使用 Java 读取 Winzip 自解压 (exe) zip 文件？

发布于 2024-12-12 08:13:19 字数 59 浏览 0 评论 0原文

是否有现有的方法或者我需要在将数据传递到 ZipInputStream 之前手动解析并跳过 exe 块？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

自找没趣 2024-12-19 08:13:20

TrueZip 在这种情况下效果最好。（至少在我的情况下）

自解压 zip 的格式如下：code1 header1 file1（而普通 zip 的格式为 header1 file1）...代码说明如何解压 zip

尽管 Truezip 解压实用程序抱怨额外的字节并引发异常

这是代码

 private void Extract(String src, String dst, String incPath) {
    TFile srcFile = new TFile(src, incPath);
    TFile dstFile = new TFile(dst);
    try {
        TFile.cp_rp(srcFile, dstFile, TArchiveDetector.NULL);
        } 
    catch (IOException e) {
       //Handle Exception
        }
}

您可以像 Extract(new String("C:\2006Production.exe"), new String("c:\") , ""); 一样调用此方法；

文件被解压到c盘...您可以对您的文件执行您自己的操作。我希望这有帮助。

谢谢。

TrueZip works best in this case. (Atleast in my case)

The self extracting zip is of the following format code1 header1 file1 (while a normal zip is of the format header1 file1)...The code tells on how to extract the zip

Though the Truezip extracting utility complains about the extra bytes and throws an exception

Here is the code

 private void Extract(String src, String dst, String incPath) {
    TFile srcFile = new TFile(src, incPath);
    TFile dstFile = new TFile(dst);
    try {
        TFile.cp_rp(srcFile, dstFile, TArchiveDetector.NULL);
        } 
    catch (IOException e) {
       //Handle Exception
        }
}

You can call this method like Extract(new String("C:\2006Production.exe"), new String("c:\") , "");

The file is extracted in the c drive...you can perform your own operation on your file. I hope this helps.

Thanks.

回复收藏 0 原文

星 2024-12-19 08:13:19

查看EXE 文件格式和ZIP 文件格式并测试各种选项，看来最简单的解决方案是忽略任何前导码第一个 zip 本地文件头。

Zip 文件布局

Zip本地文件头

我编写了一个输入流过滤器来绕过前导码，它工作得很好：

ZipInputStream zis = new ZipInputStream(
    new WinZipInputStream(
    new FileInputStream("test.exe")));
while ((ze = zis.getNextEntry()) != null) {
    . . .
    zis.closeEntry();
}
zis.close();

WinZipInputStream.java

import java.io.FilterInputStream;
import java.io.InputStream;
import java.io.IOException;

public class WinZipInputStream extends FilterInputStream {
    public static final byte[] ZIP_LOCAL = { 0x50, 0x4b, 0x03, 0x04 };
    protected int ip;
    protected int op;

    public WinZipInputStream(InputStream is) {
        super(is);
    }

    public int read() throws IOException {
        while(ip < ZIP_LOCAL.length) {
            int c = super.read();
            if (c == ZIP_LOCAL[ip]) {
                ip++;
            }
            else ip = 0;
        }

        if (op < ZIP_LOCAL.length)
            return ZIP_LOCAL[op++];
        else
            return super.read();
    }

    public int read(byte[] b, int off, int len) throws IOException {
        if (op == ZIP_LOCAL.length) return super.read(b, off, len);
        int l = 0;
        while (l < Math.min(len, ZIP_LOCAL.length)) {
            b[l++] = (byte)read();
        }
        return l;
    }
}

After reviewing the EXE file format and the ZIP file format and testing various options it appears the easiest solution is to just ignore any preamble up to the first zip local file header.

Zip file layout

Zip local file header

I wrote an input stream filter to bypass the preamble and it works perfectly:

ZipInputStream zis = new ZipInputStream(
    new WinZipInputStream(
    new FileInputStream("test.exe")));
while ((ze = zis.getNextEntry()) != null) {
    . . .
    zis.closeEntry();
}
zis.close();

WinZipInputStream.java

import java.io.FilterInputStream;
import java.io.InputStream;
import java.io.IOException;

public class WinZipInputStream extends FilterInputStream {
    public static final byte[] ZIP_LOCAL = { 0x50, 0x4b, 0x03, 0x04 };
    protected int ip;
    protected int op;

    public WinZipInputStream(InputStream is) {
        super(is);
    }

    public int read() throws IOException {
        while(ip < ZIP_LOCAL.length) {
            int c = super.read();
            if (c == ZIP_LOCAL[ip]) {
                ip++;
            }
            else ip = 0;
        }

        if (op < ZIP_LOCAL.length)
            return ZIP_LOCAL[op++];
        else
            return super.read();
    }

    public int read(byte[] b, int off, int len) throws IOException {
        if (op == ZIP_LOCAL.length) return super.read(b, off, len);
        int l = 0;
        while (l < Math.min(len, ZIP_LOCAL.length)) {
            b[l++] = (byte)read();
        }
        return l;
    }
}

回复收藏 0 原文

梦一生花开无言 2024-12-19 08:13:19

ZIP 文件的优点在于它们的顺序结构：每个条目都是一组独立的字节，最后是一个中央目录索引，列出了所有条目及其在文件中的偏移量。

糟糕的是，java.util.zip.* 类忽略该索引，只是开始读入文件并期望第一个条目是本地文件头 块，但自解压 ZIP 存档的情况并非如此（这些存档以 EXE 部分开头）。

几年前，我编写了一个自定义 ZIP 解析器来提取单个 ZIP 条目（LFH + 数据），该解析器依赖 CDI 来查找这些条目在文件中的位置。我刚刚检查过，它实际上可以毫不费力地列出自解压 ZIP 存档的条目，并为您提供偏移量 - 因此您可以：

~~使用该代码查找 EXE 部分之后的第一个 LFH ，并将该偏移量之后的所有内容复制到另一个 File 中，然后将该新的 File 提供给 java.util.zip.ZipFile< /代码>~~：
编辑：仅仅跳过 EXE 部分似乎不起作用，ZipFile 仍然无法读取它，我的本机 ZIP 程序抱怨新的 ZIP 文件是损坏并且我跳过的确切字节数被指定为“丢失”（因此它实际上读取了 CDI）。我想一些标头需要重写，因此下面给出的第二种方法看起来更有希望 - 或者
使用该代码进行完整的 ZIP 提取（它类似于 java.util.zip）；这将需要一些额外的管道，因为代码最初并不打算用作替换 ZIP 库，而是具有非常具体的用例（通过 HTTP 进行 ZIP 文件的差异更新）

代码托管在 SourceForge (项目页面，网站）并根据 Apache License 2.0 获得许可，因此商业用途很好 - AFAIK 有一个商业游戏使用它作为其游戏资产的更新程序。

从 ZIP 文件获取偏移量的有趣部分位于 Indexer.parseZipFile 返回一个 LinkedHashMap （因此第一个映射条目在文件中具有最低的偏移量）。下面是我用来列出自解压 ZIP 存档条目的代码（使用 WinZIP SE 创建器和 Ubuntu 上的 Wine 从 acra 发布文件）：

public static void main(String[] args) throws Exception {
    File archive = new File("/home/phil/downloads", "acra-4.2.3.exe");
    Map<Resource, Long> resources = parseZipFile(archive);
    for (Entry<Resource, Long> resource : resources.entrySet()) {
        System.out.println(resource.getKey() + ": " + resource.getValue());
    }
}

除了包含所有标头的 Indexer 类和 zip 包之外，您可能可以删除大部分代码解析类。

The nice thing about ZIP files is their sequential structure: Every entry is a independent bunch of bytes, and at the end is a Central Directory Index that lists all entries and their offsets in the file.

The bad thing is, the java.util.zip.* classes ignore that index and just start reading into the file and expect the first entry to be a Local File Header block, which isn't the case for self-extracting ZIP archives (these start with the EXE part).

Some years ago, I wrote a custom ZIP parser to extract individual ZIP entries (LFH + data) that relied on the CDI to find where these entries where in the file. I just checked and it can actually list the entries of a self-extracing ZIP archive without further ado and give you the offsets -- so you could either:

~~use that code to find the first LFH after the EXE part, and copy everything after that offset to a different File, then feed that new File to java.util.zip.ZipFile~~:
Edit: Just skipping the EXE part doesn't seem to work, ZipFile still won't read it and my native ZIP program complains that the new ZIP file is damaged and exactly the number of bytes I skipped are given as "missing" (so it actually reads the CDI). I guess some headers would need to be rewritten, so the second approach given below looks more promising -- or
use that code for the full ZIP extraction (it's similar to java.util.zip); this would require some additional plumbing because the code originally wasn't intended as replacement ZIP library but had a very specific use case (differential updating of ZIP files over HTTP)

The code is hosted at SourceForge (project page, website) and licensed under Apache License 2.0, so commercial use is fine -- AFAIK there's a commercial game using it as updater for their game assets.

The interesting parts to get the offsets from a ZIP file are in Indexer.parseZipFile which returns a LinkedHashMap<Resource, Long> (so the first map entry has the lowest offset in the file). Here's the code I used to list the entries of a self-extracting ZIP archive (created with the WinZIP SE creator with Wine on Ubuntu from an acra release file):

public static void main(String[] args) throws Exception {
    File archive = new File("/home/phil/downloads", "acra-4.2.3.exe");
    Map<Resource, Long> resources = parseZipFile(archive);
    for (Entry<Resource, Long> resource : resources.entrySet()) {
        System.out.println(resource.getKey() + ": " + resource.getValue());
    }
}

You can probably rip out most of the code except for the Indexer class and zip package that contains all the header parsing classes.

回复收藏 0 原文

无语# 2024-12-19 08:13:19

某些自解压 ZIP 文件中存在虚假的本地文件头标记。我认为最好向后扫描文件以查找中央目录结尾记录。 EOCD记录包含中央目录的偏移量，CD包含第一个本地文件头的偏移量。如果您从本地文件头的第一个字节开始读取，ZipInputStream工作正常。

显然下面的代码不是最快的解决方案。如果要处理大文件，您应该实现某种缓冲或使用内存映射文件。

import org.apache.commons.io.EndianUtils;
...

public class ZipHandler {
    private static final byte[] EOCD_MARKER = { 0x06, 0x05, 0x4b, 0x50 };

    public InputStream openExecutableZipFile(Path zipFilePath) throws IOException {
        try (RandomAccessFile raf = new RandomAccessFile(zipFilePath.toFile(), "r")) {
            long position = raf.length() - 1;
            int markerIndex = 0;
            byte[] buffer = new byte[4];
            while (position > EOCD_MARKER.length) {
                raf.seek(position);
                raf.read(buffer, 0 ,1);
                if (buffer[0] == EOCD_MARKER[markerIndex]) {
                    markerIndex++;
                } else {
                    markerIndex = 0;
                }
                if (markerIndex == EOCD_MARKER.length) {
                    raf.skipBytes(15);
                    raf.read(buffer, 0, 4);
                    int centralDirectoryOffset = EndianUtils.readSwappedInteger(buffer, 0);
                    raf.seek(centralDirectoryOffset);
                    raf.skipBytes(42);
                    raf.read(buffer, 0, 4);
                    int localFileHeaderOffset = EndianUtils.readSwappedInteger(buffer, 0);
                    return new SkippingInputStream(Files.newInputStream(zipFilePath), localFileHeaderOffset);
                }
                position--;
            }
            throw new IOException("No EOCD marker found");
        }
    }
}

public class SkippingInputStream extends FilterInputStream {
    private int bytesToSkip;
    private int bytesAlreadySkipped;

    public SkippingInputStream(InputStream inputStream, int bytesToSkip) {
        super(inputStream);
        this.bytesToSkip = bytesToSkip;
        this.bytesAlreadySkipped = 0;
    }

    @Override
    public int read() throws IOException {
        while (bytesAlreadySkipped < bytesToSkip) {
            int c = super.read();
            if (c == -1) {
                return -1;
            }
            bytesAlreadySkipped++;
        }
        return super.read();
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        if (bytesAlreadySkipped == bytesToSkip) {
            return super.read(b, off, len);
        }
        int count = 0;
        while (count < len) {
            int c = read();
            if (c == -1) {
                break;
            }
            b[count++] = (byte) c;
        }
        return count;
    }
}

There are fake Local File Header markers in some self-extracting ZIP files. I think it's best to scan a file backwards to find End Of Central Directory record. EOCD record contains offset of a Central Directory, and CD contains offset of the first Local File Header. If you start reading from the first byte of a Local File Header ZipInputStream works fine.

Obviously the code below is not the fastest solution. If you are going to process large files you should implement some kind of buffering or use memory mapped files.

import org.apache.commons.io.EndianUtils;
...

public class ZipHandler {
    private static final byte[] EOCD_MARKER = { 0x06, 0x05, 0x4b, 0x50 };

    public InputStream openExecutableZipFile(Path zipFilePath) throws IOException {
        try (RandomAccessFile raf = new RandomAccessFile(zipFilePath.toFile(), "r")) {
            long position = raf.length() - 1;
            int markerIndex = 0;
            byte[] buffer = new byte[4];
            while (position > EOCD_MARKER.length) {
                raf.seek(position);
                raf.read(buffer, 0 ,1);
                if (buffer[0] == EOCD_MARKER[markerIndex]) {
                    markerIndex++;
                } else {
                    markerIndex = 0;
                }
                if (markerIndex == EOCD_MARKER.length) {
                    raf.skipBytes(15);
                    raf.read(buffer, 0, 4);
                    int centralDirectoryOffset = EndianUtils.readSwappedInteger(buffer, 0);
                    raf.seek(centralDirectoryOffset);
                    raf.skipBytes(42);
                    raf.read(buffer, 0, 4);
                    int localFileHeaderOffset = EndianUtils.readSwappedInteger(buffer, 0);
                    return new SkippingInputStream(Files.newInputStream(zipFilePath), localFileHeaderOffset);
                }
                position--;
            }
            throw new IOException("No EOCD marker found");
        }
    }
}

public class SkippingInputStream extends FilterInputStream {
    private int bytesToSkip;
    private int bytesAlreadySkipped;

    public SkippingInputStream(InputStream inputStream, int bytesToSkip) {
        super(inputStream);
        this.bytesToSkip = bytesToSkip;
        this.bytesAlreadySkipped = 0;
    }

    @Override
    public int read() throws IOException {
        while (bytesAlreadySkipped < bytesToSkip) {
            int c = super.read();
            if (c == -1) {
                return -1;
            }
            bytesAlreadySkipped++;
        }
        return super.read();
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        if (bytesAlreadySkipped == bytesToSkip) {
            return super.read(b, off, len);
        }
        int count = 0;
        while (count < len) {
            int c = read();
            if (c == -1) {
                break;
            }
            b[count++] = (byte) c;
        }
        return count;
    }
}

回复收藏 0 原文

~没有更多了~