使用 iconv 将 latin-1 文件批量转换为 utf-8
我在我的 OSX 上有一个 PHP 项目,它采用 latin1 编码。现在我需要将文件转换为UTF8。我不是一个 shell 编码员,我尝试了从互联网上找到的一些东西:
mkdir new
for a in `ls -R *`; do iconv -f iso-8859-1 -t utf-8 <"$a" >new/"$a" ; done
但这不会创建目录结构,并且在运行时会给我带来大量错误。任何人都可以想出一个巧妙的解决方案吗?
I'm having this one PHP project on my OSX which is in latin1 -encoding. Now I need to convert files to UTF8. I'm not much a shell coder and I tried something I found from internet:
mkdir new
for a in `ls -R *`; do iconv -f iso-8859-1 -t utf-8 <"$a" >new/"$a" ; done
But that does not create the directory structure and it gives me heck load of errors when run. Can anyone come up with neat solution?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
上面的答案一切都很好,但如果这是一个“混合”项目,即已经有UTF8文件,那么我们可能会遇到麻烦,因此这是我的解决方案,我首先检查文件编码。
Everything's fine with the above answers, but if this is a "mixed" project, i.e. there are already UTF8 files, then we may get into trouble, therefore here's my solution, I'm checking file encoding first.
在 Windows Git Bash 上,我在使用几个建议的解决方案时遇到了这些错误:
find: 只有一个 {} 实例受 -exec ... +
find: In '-exec 的支持。 .. {} +' '{}' 必须单独出现,但您指定了 'source={};
...'但这(其他建议的解决方案的混合)有效:
On Windows Git Bash, I got these errors with several of the proposed solutions:
find: Only one instance of {} is supported with -exec ... +
find: In ‘-exec ... {} +’ the ‘{}’ must appear by itself, but you specified ‘source={};
...’But that (a mix of other proposed solutions) worked:
在 iconv 之前使用
mkdir -p "${a%/*}";
。请注意,当文件名中有空格时,您正在使用潜在危险的
for
结构,请参阅 http://porkmail.org/era/unix/award.html。Use
mkdir -p "${a%/*}";
before iconv.Note that you are using a potentially dangerous
for
construct when there are spaces in filenames, see http://porkmail.org/era/unix/award.html.使用 Dennis Williamson 和 Alberto Zaccagni 的答案,我想出了以下脚本,用于转换所有子目录中指定文件类型的所有文件。然后将输出收集到由
/path/to/destination
指定的一个文件夹中。 函数 basename 返回不带文件路径的文件名。
替代方案(用户交互):
现在,我还创建了一个用户交互式脚本,让您决定是要覆盖旧文件还是只是重命名它们。另外感谢 tbsalling 祝
你玩得开心,我将不胜感激任何改进它的评论,谢谢!
Using the answers of Dennis Williamson and Alberto Zaccagni, I came up with the following script that converts all files of the specified file type from all subdirectories. The output is then collected in one folder that is given by
/path/to/destination
The function basename returns the filename without the path of the file.
Alternative (user interactive):
Now I also created a user interactive script that lets you decide whether you want to overwrite the old files or just rename them. Additional thanks go to tbsalling
Have fun with this and I would be grateful for any comments to improve it, thanks!
您不应该像这样使用
ls
并且for
循环也不合适。此外,目标目录应该位于源目录之外。不需要循环。
-type f
选项包含文件并排除目录。编辑:
OS X 版本的
iconv
没有-o
选项。试试这个:You shouldn't use
ls
like that and afor
loop is not appropriate either. Also, the destination directory should be outside the source directory.No need for a loop. The
-type f
option includes files and excludes directories.Edit:
The OS X version of
iconv
doesn't have the-o
option. Try this:这将转换当前目录及其子目录中带有
.php
文件扩展名的所有文件 - 保留目录结构:注意:
要预先获取目标文件的列表,只需运行不带任何内容的命令即可
-exec
标志(如下所示:find . -name "*.php"
)。进行备份是个好主意。像这样使用
sh
允许使用 -exec 进行管道和重定向,这是必要的,因为并非所有版本的 iconv 都支持-o
标志。将
.utf8
添加到输出的文件名中,然后将其删除可能看起来很奇怪,但这是必要的。对输出和输入文件使用相同的名称可能会导致以下问题:对于大文件(根据我的经验,大约 30 KB),它会导致核心转储(或
由信号 7 终止
)某些版本的 iconv 似乎在读取输入文件之前创建输出文件,这意味着如果输入和输出文件具有相同的名称,则在读取输入文件之前会用空文件覆盖输入文件。
This converts all files with the
.php
filename extension - in the current directory and its subdirectories - preserving the directory structure:Notes:
To get a list of files that will be targeted beforehand, just run the command without the
-exec
flags (like this:find . -name "*.php"
). Making a backup is a good idea.Using
sh
like this allows piping and redirecting with -exec, which is necessary because not all versions of iconv support the-o
flag.Adding
.utf8
to the filename of the output and then removing it might seem strange but it is necessary. Using the same name for output and input files can cause the following problems:For large files (around 30 KB in my experience) it causes core dump (or
termination by signal 7
)Some versions of iconv seem to create the output-file before they read the input file, which means that if the input and output files have the same name, the input file is overwritten with an empty file before it is read.
一些很好的答案,但我发现在我的情况下,使用包含数百个文件的嵌套目录进行转换要容易得多:
警告:这会将文件写入到位,因此请进行备份
Some good answers, but I found this a lot easier in my case with a nested directory of hundreds of files to convert:
WARNING: This will write the files in place, so make a backup
要将完整的目录树从 iso-8859-1 递归转换为 utf-8(包括创建子目录),上面的简短解决方案都不适合我,因为目录结构不是在目标中创建的。根据 Dennis Williamsons 的回答,我提出了以下解决方案:
它将在
/tmp/dest
中创建当前目录子树的克隆(根据您的需要进行调整),包括所有子目录和所有iso-8859-1
文件转换为utf-8
。在 macOS 上测试。顺便说一句:检查您的文件编码:
以获取编码信息。
希望这有帮助。
To convert a complete directory tree recursively from iso-8859-1 to utf-8 including the creation of subdirectories none of the short solutions above worked for me because the directory structure was not created in the target. Based on Dennis Williamsons answer I came up with the following solution:
It will create a clone of the current directory subtree in
/tmp/dest
(adjust to your needs) including all subdirectories and with alliso-8859-1
files converted toutf-8
. Tested on macosx.Btw: Check your file encodings with:
to get the encoding information.
Hope this helps.
我创建以下脚本,(i)备份目录“converted”中的所有 tex 文件,(ii)检查每个 tex 文件的编码,以及(iii)仅将 ISO-8859-1 中的 tex 文件转换为 UTF-8编码。
I create the following script that (i) backups all tex files in directory "converted", (ii) checks the encoding of every tex file, and (iii) converts to UTF-8 only the tex files in the ISO-8859-1 encoding.
如果您必须转换的所有文件都是 .php,您可以使用以下命令,默认情况下是递归的:
我相信您的错误是由于
ls -R
也会产生可能不会的输出被 iconv 识别为有效的文件名,例如./my/dir/struct:
If all the files you have to convert are .php you could use the following, which is recursive by default:
I believe your errors were due to the fact that
ls -R
also produces an output that might not be recognized by iconv as a valid filename, something like./my/dir/structure:
在 unix.stackexchange.com 上,有人提出了类似的问题,用户 manatwork 建议重新编码,这非常有效。
我一直在用它来将 ucs-2 转换为 utf-8
On unix.stackexchange.com a similar question was asked, and user manatwork suggested recode which does the trick very nicely.
I've been using it to convert ucs-2 to utf-8 in place