ZIP 传统上使用 IBM437 编码对文件名进行编码。然而,据我所知,许多工具(错误地)倾向于使用系统上的默认编码,这可能会在这种情况下导致问题,因为两端可能使用不同的编码。
理论上ZIP现在也支持UTF-8,这应该可以解决这些问题,但工具支持又将是问题。例如,据我所知,Windows 资源管理器的 ZIP 支持将无法处理 UTF-8 编码的文件名。
所以我们最终会得到这样的结果:两端必须就文件名使用的编码达成一致,并且您将需要一种支持所有字符的编码(任何 Unicode 编码都可以,但我不确定 IBM437 是否适用)。 ZIP 已经走过了漫长的道路,因此有许多工具在编码方面往往存在分歧。如果可能,请显式指定要使用的编码,并且首选 Unicode。就与任意工具的兼容性而言,使用专为 Unicode 设计的较新格式可能会更好。
根据更改日志,7-Zip 从 4.58 beta 版开始支持它,但只有当本地代码页不支持所需的字符时才会使用它。使用 -mcu 命令行开关将使用 UTF-8 表示除 ASCII 之外的任何内容。本地编码通常仅在非 ASCII 字符范围上有所不同,因此这很可能会解决问题。也就是说,如果用于解包的工具也支持 UTF-8(7-ZIP 比 ZIP 更有可能,因为它不像 ZIP 那么古老,而且解包工具也较少)。
WinRAR 也可能值得一试。
ZIP traditionally encodes filenames using IBM437 encoding. However to my knowledge, many tools (incorrectly) tend to use the default encoding on the system, which will likely cause problems in such a situation, because both ends might use different encodings.
In theory ZIP also supports UTF-8 by now, which should resolve these problems, but again tool-support will be the problem. For example as far as I know the ZIP support of Windows Explorer won't be able to handle UTF-8 encoded filenames.
So we end up with this: both ends have to agree about the encoding used for filenames and you will need an encoding that supports all the characters you have (any Unicode encoding will be fine, I'm not sure about IBM437 though). ZIP came a long way and thus there are many tools which tend to disagree about encoding. If possible, explicitly specify the encoding to use and prefer Unicode. In terms of compatibility with arbitrary tools you might be better off, using a newer format that is designed with Unicode in mind.
7-Zip supports it since 4.58 beta, according to the change log, but will only use it, when the local code page doesn't support the required characters. Using the -mcu command line switch will use UTF-8 for anything but ASCII. The local encodings usually differ only on the non-ASCII character range, so this will most likely do the trick. That is, if the tool used for unpacking also supports UTF-8 (which is more likely for 7-ZIP than for ZIP, because it isn't as old as ZIP and there are fewer unpacking tools).
Try using an archive program that allows you to specify the character encoding (say, UTF-8), or figuring out how to do it with the one you have. This forum thread might help you, because it's similar to what you're asking, albeit in reverse and for German rather than French: http://sourceforge.net/projects/sevenzip/forums/forum/45797/topic/3710172
Tar 有一个--转换选项。如果您创建一个 sed 模式将每个 iso-latin-1 重音 aeiou 和 c 字符转换为非重音版本,那么您可能会没事。
Alternatively... You could nuke the accented characters. If francophones are on the receiving end of the file transfer, they may or may not be sympathetic (ask your users!).
French doesn't have that all many accents to worry about, really. You have [ae]-grave, e-aigue, [aeiou]-circumflex and c-cedilla to worry about, capital and lower (though that's more likely for the grave and aigue ones, unless someone hit the capslock key)
Tar has a --transform option. If you create a sed pattern to turn every iso-latin-1 accented aeiou and c character to the unaccented versions, you'll probably be okay.
I think you should go with compression in 7z format. Under Linux it can be done using PeaZip, or by installing p7zip and using it through an UI like Ark or Filereoller depending on your desktop (I prefer PeaZip because it can be used on any desktop). 7z format was designed ground up with UTF8 in mind (the author is Russian), and in my exeperience it never failed.
发布评论
评论(4)
ZIP 传统上使用 IBM437 编码对文件名进行编码。然而,据我所知,许多工具(错误地)倾向于使用系统上的默认编码,这可能会在这种情况下导致问题,因为两端可能使用不同的编码。
理论上ZIP现在也支持UTF-8,这应该可以解决这些问题,但工具支持又将是问题。例如,据我所知,Windows 资源管理器的 ZIP 支持将无法处理 UTF-8 编码的文件名。
所以我们最终会得到这样的结果:两端必须就文件名使用的编码达成一致,并且您将需要一种支持所有字符的编码(任何 Unicode 编码都可以,但我不确定 IBM437 是否适用)。 ZIP 已经走过了漫长的道路,因此有许多工具在编码方面往往存在分歧。如果可能,请显式指定要使用的编码,并且首选 Unicode。就与任意工具的兼容性而言,使用专为 Unicode 设计的较新格式可能会更好。
根据更改日志,7-Zip 从 4.58 beta 版开始支持它,但只有当本地代码页不支持所需的字符时才会使用它。使用 -mcu 命令行开关将使用 UTF-8 表示除 ASCII 之外的任何内容。本地编码通常仅在非 ASCII 字符范围上有所不同,因此这很可能会解决问题。也就是说,如果用于解包的工具也支持 UTF-8(7-ZIP 比 ZIP 更有可能,因为它不像 ZIP 那么古老,而且解包工具也较少)。
WinRAR 也可能值得一试。
ZIP traditionally encodes filenames using IBM437 encoding. However to my knowledge, many tools (incorrectly) tend to use the default encoding on the system, which will likely cause problems in such a situation, because both ends might use different encodings.
In theory ZIP also supports UTF-8 by now, which should resolve these problems, but again tool-support will be the problem. For example as far as I know the ZIP support of Windows Explorer won't be able to handle UTF-8 encoded filenames.
So we end up with this: both ends have to agree about the encoding used for filenames and you will need an encoding that supports all the characters you have (any Unicode encoding will be fine, I'm not sure about IBM437 though). ZIP came a long way and thus there are many tools which tend to disagree about encoding. If possible, explicitly specify the encoding to use and prefer Unicode. In terms of compatibility with arbitrary tools you might be better off, using a newer format that is designed with Unicode in mind.
7-Zip supports it since 4.58 beta, according to the change log, but will only use it, when the local code page doesn't support the required characters. Using the -mcu command line switch will use UTF-8 for anything but ASCII. The local encodings usually differ only on the non-ASCII character range, so this will most likely do the trick. That is, if the tool used for unpacking also supports UTF-8 (which is more likely for 7-ZIP than for ZIP, because it isn't as old as ZIP and there are fewer unpacking tools).
WinRAR might also be worth a try.
尝试使用允许您指定字符编码(例如 UTF-8)的存档程序,或者弄清楚如何使用您拥有的编码来执行此操作。这个论坛帖子可能对您有帮助,因为它与您所要求的类似,尽管是相反的,并且是德语而不是法语: http://sourceforge.net/projects/sevenzip/forums/forum/45797/topic/3710172
Try using an archive program that allows you to specify the character encoding (say, UTF-8), or figuring out how to do it with the one you have. This forum thread might help you, because it's similar to what you're asking, albeit in reverse and for German rather than French: http://sourceforge.net/projects/sevenzip/forums/forum/45797/topic/3710172
或者...您可以删除带重音的字符。如果法语国家是文件传输的接收端,他们可能会也可能不会同情(问问你的用户!)。
法语确实没有那么多口音需要担心。你需要担心 [ae]-grave、e-aigue、[aeiou]-抑扬符和 c-变音符、大写和小写(虽然这对于坟墓和 aigue 来说更有可能,除非有人按下大写键)
Tar 有一个--转换选项。如果您创建一个 sed 模式将每个 iso-latin-1 重音 aeiou 和 c 字符转换为非重音版本,那么您可能会没事。
Alternatively... You could nuke the accented characters. If francophones are on the receiving end of the file transfer, they may or may not be sympathetic (ask your users!).
French doesn't have that all many accents to worry about, really. You have [ae]-grave, e-aigue, [aeiou]-circumflex and c-cedilla to worry about, capital and lower (though that's more likely for the grave and aigue ones, unless someone hit the capslock key)
Tar has a --transform option. If you create a sed pattern to turn every iso-latin-1 accented aeiou and c character to the unaccented versions, you'll probably be okay.
我认为你应该选择 7z 格式的压缩。
在 Linux 下,可以使用 PeaZip 来完成,或者安装 p7zip 并通过 Ark 或 Filereoller 等 UI 使用它,具体取决于您的桌面(我更喜欢 PeaZip,因为它可以在任何桌面上使用)。
7z 格式是根据 UTF8 设计的(作者是俄罗斯人),根据我的经验,它从未失败过。
I think you should go with compression in 7z format.
Under Linux it can be done using PeaZip, or by installing p7zip and using it through an UI like Ark or Filereoller depending on your desktop (I prefer PeaZip because it can be used on any desktop).
7z format was designed ground up with UTF8 in mind (the author is Russian), and in my exeperience it never failed.