如何使用UTF-8编码打开java程序生成的zip文件
我们的产品有导出功能,使用ZipOutputStream
来压缩目录;但是,当您尝试压缩包含带有中文或日文字符的文件名的目录时,导出无法正常工作。由于某种原因,压缩文件中的新文件的名称不同。以下是我们的压缩代码的示例:
ZipOutputStream out = new ZipOutputStream(new FileOutputStream(zipFileName));
out.setEncoding("UTF-8");
//program to add directory to zip
//program add/create file to zip
out.close();
我的导入算法也是用 Java 构建的,可以正确导入压缩文件,即使文件/目录名称中包含中文/日文字符。
Zipfile zipfile = new ZipFile(zipPath, "UTF-8");
Enumeration e = zipFile.getEntries();
while (e.hasMoreElements()) {
entry = (ZipEntry) e.nextElement();
String name = entry.getName();
....
zip 软件的程序是否在解压缩 UTF-8 编码文件时遇到问题,或者是否需要一些特殊的东西来创建可以由使用 utf-8 编码的现有软件轻松使用的 zip 文件?
我写了一个示例程序:
package ZipFile;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.tools.zip.ZipEntry;
import org.apache.tools.zip.ZipOutputStream;
public class ZipFolder{
public static void main(String[] a) throws Exception
{
String srcFolder = "D:/9.4_work/openscript_repo/中文124.All/中文";
String destZipFile = "D:/Eclipse_Projects/OpenScriptDebuggingProject/src/ZipFile/demo.zip";
zipFolder(srcFolder, destZipFile);
}
static public void zipFolder(String srcFolder, String destZipFile) throws Exception
{
ZipOutputStream zip = null;
FileOutputStream fileWriter = null;
fileWriter = new FileOutputStream(destZipFile);
zip = new ZipOutputStream(fileWriter);
zip.setEncoding("UTF-8");
// using GBK encoding, the chinese name can be correctly displayed when unzip
// zip.setEncoding("GBK");
addFolderToZip("", srcFolder, zip);
zip.flush();
zip.close();
}
static private void addFileToZip(String path, String srcFile, ZipOutputStream zip) throws Exception
{
File folder = new File(srcFile);
if (folder.isDirectory()) {
addFolderToZip(path, srcFile, zip);
}
else {
byte[] buf = new byte[1024];
int len;
FileInputStream in = new FileInputStream(srcFile);
zip.putNextEntry(new ZipEntry(path + "/" + folder.getName()));
while ((len = in.read(buf)) > 0) {
zip.write(buf, 0, len);
}
}
}
static private void addFolderToZip(String path, String srcFolder, ZipOutputStream zip) throws Exception
{
File folder = new File(srcFolder);
for (String fileName : folder.list()) {
if (path.equals("")) {
addFileToZip(folder.getName(), srcFolder + "/" + fileName, zip);
}
else {
addFileToZip(path + "/" + folder.getName(), srcFolder + "/" + fileName, zip);
}
}
}
}
Our product has an export function, which uses ZipOutputStream
to zip a directory; however, when you try to zip a directory that contains file names with Chinese or Japanese character the export doesn't work properly. For some reason the new files in the zipped file are named differently. Here is an example of our zipping code:
ZipOutputStream out = new ZipOutputStream(new FileOutputStream(zipFileName));
out.setEncoding("UTF-8");
//program to add directory to zip
//program add/create file to zip
out.close();
My import algorithm, also built in Java, can import the zipped file correctly, even if it contains Chinese/Japanese characters in file/directory names.
Zipfile zipfile = new ZipFile(zipPath, "UTF-8");
Enumeration e = zipFile.getEntries();
while (e.hasMoreElements()) {
entry = (ZipEntry) e.nextElement();
String name = entry.getName();
....
Is the zip software's program having trouble unzipping the UTF-8 encoded files, or is there something special needed to create a zip file that can be easily used by existing software using utf-8 encoding??
I have written an example program:
package ZipFile;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.tools.zip.ZipEntry;
import org.apache.tools.zip.ZipOutputStream;
public class ZipFolder{
public static void main(String[] a) throws Exception
{
String srcFolder = "D:/9.4_work/openscript_repo/中文124.All/中文";
String destZipFile = "D:/Eclipse_Projects/OpenScriptDebuggingProject/src/ZipFile/demo.zip";
zipFolder(srcFolder, destZipFile);
}
static public void zipFolder(String srcFolder, String destZipFile) throws Exception
{
ZipOutputStream zip = null;
FileOutputStream fileWriter = null;
fileWriter = new FileOutputStream(destZipFile);
zip = new ZipOutputStream(fileWriter);
zip.setEncoding("UTF-8");
// using GBK encoding, the chinese name can be correctly displayed when unzip
// zip.setEncoding("GBK");
addFolderToZip("", srcFolder, zip);
zip.flush();
zip.close();
}
static private void addFileToZip(String path, String srcFile, ZipOutputStream zip) throws Exception
{
File folder = new File(srcFile);
if (folder.isDirectory()) {
addFolderToZip(path, srcFile, zip);
}
else {
byte[] buf = new byte[1024];
int len;
FileInputStream in = new FileInputStream(srcFile);
zip.putNextEntry(new ZipEntry(path + "/" + folder.getName()));
while ((len = in.read(buf)) > 0) {
zip.write(buf, 0, len);
}
}
}
static private void addFolderToZip(String path, String srcFolder, ZipOutputStream zip) throws Exception
{
File folder = new File(srcFolder);
for (String fileName : folder.list()) {
if (path.equals("")) {
addFileToZip(folder.getName(), srcFolder + "/" + fileName, zip);
}
else {
addFileToZip(path + "/" + folder.getName(), srcFolder + "/" + fileName, zip);
}
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这里的最佳答案可能会回答您的问题;不幸的是,它似乎表明 Zip 格式实际上不允许创建在任何计算机上正确显示文件名的 Zip 文件:
https://superuser.com/questions/60379/linux-zip-tgz-filenames-encoding-problem
我希望当您将编码设置为GBK,因为这是系统的默认编码,因此 7zip 对其打开的所有 zip 文件都使用该编码。
这表明
rar
和7z
格式有更好的支持。我发现了一篇专门介绍 UTF-8 in zips with Java 的博客文章。这表明存在更新版本的 ZIP 规范,当前版本的 Java 可能不会创建该规范,但 Java 7 将会创建。我不知道 Apache 类是否也使用这个。
http://blogs.oracle.com/xuemingshen/entry/non_utf_8_encoding_in
The top answer here may answer your question; unfortunately it seems to suggest that the Zip format doesn't really allow for creating a Zip file that will display filenames properly on any computer:
https://superuser.com/questions/60379/linux-zip-tgz-filenames-encoding-problem
I expect it works when you set encoding to GBK, because that is your system's default encoding and so 7zip is using that for all zip files it opens.
It suggests that
rar
and7z
formats have better support.I found a blog entry specifically about UTF-8 in zips with Java. It suggests there's a newer version of the ZIP specification which the current versions of Java may not be creating, but Java 7 will do. I don't know if the Apache classes use this too.
http://blogs.oracle.com/xuemingshen/entry/non_utf_8_encoding_in
以下实用程序类允许您使用 GZIP 压缩算法来压缩和解压缩字符串。例如,如果您想在数据库中保存长字符串,这会很有用。
这是一个测试用例,提供了上述类的使用示例:
The following utility class allows you to compress and decompress strings using the GZIP compression algorithm. This can be useful if you want to save long strings in a database for example.
Here is a TestCase which provides example use of the class above: