在 Java 中向 zip 添加非 ASCII 文件名

发布于 2024-07-06 10:53:28 字数 1835 浏览 9 评论 0原文

使用Java将非ASCII文件名添加到zip文件的最佳方法是什么，这样文件就可以正确地在Windows和Linux中阅读？

这是一种尝试，改编自https://truezip.dev.java.net/tutorial-6.html#Example，它在 Windows Vista 中工作，但在 Ubuntu Hardy 中失败。在 Hardy 中，文件名在 file-roller 中显示为 abc-ЖДФ.txt。

import java.io.IOException;
import java.io.PrintStream;

import de.schlichtherle.io.File;
import de.schlichtherle.io.FileOutputStream;

public class Main {

    public static void main(final String[] args) throws IOException {

        try {
            PrintStream ps = new PrintStream(new FileOutputStream(
                    "outer.zip/abc-åäö.txt"));
            try {
                ps.println("The characters åäö works here though.");
            } finally {
                ps.close();
            }
        } finally {
            File.umount();
        }
    }
}

与 java.util.zip 不同，truezip 允许指定 zip 文件编码。这是另一个示例，这次明确指定了编码。 IBM437、UTF-8 和 ISO-8859-1 在 Linux 中均不起作用。 IBM437 在 Windows 下运行。

import java.io.IOException;

import de.schlichtherle.io.FileOutputStream;
import de.schlichtherle.util.zip.ZipEntry;
import de.schlichtherle.util.zip.ZipOutputStream;

public class Main {

    public static void main(final String[] args) throws IOException {

        for (String encoding : new String[] { "IBM437", "UTF-8", "ISO-8859-1" }) {
            ZipOutputStream zipOutput = new ZipOutputStream(
                    new FileOutputStream(encoding + "-example.zip"), encoding);
            ZipEntry entry = new ZipEntry("abc-åäö.txt");
            zipOutput.putNextEntry(entry);
            zipOutput.closeEntry();
            zipOutput.close();
        }
    }
}

原文

What is the best way to add non-ASCII file names to a zip file using Java, in such a way that the files can be properly read in both Windows and Linux?

Here is one attempt, adapted from https://truezip.dev.java.net/tutorial-6.html#Example, which works in Windows Vista but fails in Ubuntu Hardy. In Hardy the file name is shown as abc-ЖДФ.txt in file-roller.

import java.io.IOException;
import java.io.PrintStream;

import de.schlichtherle.io.File;
import de.schlichtherle.io.FileOutputStream;

public class Main {

    public static void main(final String[] args) throws IOException {

        try {
            PrintStream ps = new PrintStream(new FileOutputStream(
                    "outer.zip/abc-åäö.txt"));
            try {
                ps.println("The characters åäö works here though.");
            } finally {
                ps.close();
            }
        } finally {
            File.umount();
        }
    }
}

Unlike java.util.zip, truezip allows specifying zip file encoding. Here's another sample, this time explicitly specifiying the encoding. Neither IBM437, UTF-8 nor ISO-8859-1 works in Linux. IBM437 works in Windows.

import java.io.IOException;

import de.schlichtherle.io.FileOutputStream;
import de.schlichtherle.util.zip.ZipEntry;
import de.schlichtherle.util.zip.ZipOutputStream;

public class Main {

    public static void main(final String[] args) throws IOException {

        for (String encoding : new String[] { "IBM437", "UTF-8", "ISO-8859-1" }) {
            ZipOutputStream zipOutput = new ZipOutputStream(
                    new FileOutputStream(encoding + "-example.zip"), encoding);
            ZipEntry entry = new ZipEntry("abc-åäö.txt");
            zipOutput.putNextEntry(entry);
            zipOutput.closeEntry();
            zipOutput.close();
        }
    }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

灰色世界里的红玫瑰 2024-07-13 10:53:29

它实际上失败了还是只是字体问题？（例如，对于这些字符码，字体具有不同的字形）我在 Windows 中看到了类似的问题，其中渲染“中断”，因为字体不支持字符集，但数据实际上是完整且正确的。

回复收藏 0 原文

书间行客 2024-07-13 10:53:29

非 ASCII 文件名在 ZIP 实现中并不可靠，最好避免使用。没有在 ZIP 文件中存储字符集设置的规定；客户倾向于猜测“当前系统代码页”，这不太可能是您想要的。客户端和代码页的许多组合可能会导致文件无法访问。

对不起！

回复收藏 0 原文

奈何桥上唱咆哮 2024-07-13 10:53:28

ZIP 中文件条目的编码最初指定为 IBM Code Page 437。其他语言中使用的许多字符无法以这种方式使用。

PKWARE 规范提到了该问题并添加了一些内容。但这是后来添加的（从 2007 年开始，感谢 Cheeso 澄清了这一点，请参阅评论）。如果设置了该位，则文件名条目必须以 UTF-8 进行编码。此扩展在链接文档末尾的“附录 D - 语言编码 (EFS)”中进行了描述。

对于 Java 来说，这是一个已知的错误，会遇到非 ASCII 字符的麻烦。请参阅 bug #4244499 以及大量相关错误。

我的同事在将文件名存储到 ZIP 之前使用 URL 编码作为解决方法，并在读取文件后进行解码。如果您同时控制存储和读取，这可能是一个解决方法。

编辑：在该错误中，有人建议使用 Apache Ant 的 ZipOutputStream 作为解决方法。该实现允许指定编码。

回复收藏 0 原文

听你说爱我 2024-07-13 10:53:28

在Zip文件中，根据PKWare拥有的规范，文件名和文件注释的编码是IBM437。 2007 年，PKWare 扩展了规范，也允许使用 UTF-8。这没有说明 zip 中包含的文件的编码。仅文件名的编码。

我认为所有工具和库（Java 和非 Java）都支持 IBM437（它是 ASCII 的超集），支持 UTF-8 的工具和库较少。某些工具和库支持其他代码页。例如，如果您在上海运行的计算机上使用 WinRar 压缩某些内容，您将获得 Big5 代码页。邮政编码规范不“允许”这种情况，但无论如何它都会发生。

.NET 的 DotNetZip 库支持 Unicode，但是如果您使用 Java，这当然对您没有帮助！

使用 Java 对 ZIP 的内置支持，您将始终获得 IBM437。如果您想要 IBM437 以外的存档，请使用第三方库，或创建一个 JAR。

回复收藏 0 原文