在脚本中通过 vim 将文件编码更改为 utf-8
在我们的服务器从 Debian 4 更新到 5 后,我感到很沮丧。 我们切换到 UTF-8 环境,现在在浏览器上正确打印文本时遇到问题,因为所有文件都是非 utf8 编码,如 iso-8859-1、ascii 等。
我尝试了许多不同的脚本。
我尝试的第一个是“iconv”。那个不起作用,它改变了内容,但文件的编码仍然是非utf8。
enca、encamv、convmv 和我通过 apt-get 安装的其他一些工具也有同样的问题。
然后我找到了一个python代码,它使用chardet Universal Detector模块来检测文件的编码(工作正常),但是使用unicode类或编解码器类将其保存为utf-8不起作用,没有任何错误。
我发现将文件及其内容转换为 UTF-8 的唯一方法是 vi。
这些是我对一个文件执行的步骤:
vi filename.php
:set bomb
:set fileencoding=utf-8
:wq
就是这样。那一个工作完美。但我怎样才能通过脚本让它运行呢? 我想编写一个脚本(Linux shell),它遍历一个目录并获取所有 php 文件,然后使用 vi 和上面的命令转换它们。 由于我需要启动 vi 应用程序,我不知道如何执行以下操作:
"vi --run-command=':setomb, :set fileencoding=utf-8' filename.php"
希望有人能帮助我。
I just got knocked down after our server has been updated from Debian 4 to 5.
We switched to UTF-8 environment and now we have problems getting the text printed correctly on the browser, because all files are in non-utf8 encodings like iso-8859-1, ascii, etc.
I tried many different scripts.
The first one I tried is "iconv". That one doesn't work, it changes the content, but the file's encoding is still non-utf8.
Same problem with enca, encamv, convmv and some other tools I installed via apt-get.
Then I found a python code, which uses chardet Universal Detector module, to detect encoding of a file (which works fine), but using the unicode class or the codec class to save it as utf-8 doesn't work, without any errors.
The only way I found to get the file and its content converted to UTF-8, is vi.
These are the steps I do for one file:
vi filename.php
:set bomb
:set fileencoding=utf-8
:wq
That's it. That one works perfect. But how can I get this running via a script?
I would like to write a script (Linux shell) which traverses a directory taking all php files, then converting them using vi with the commands above.
As I need to start the vi app, I do not know how to do something like this:
"vi --run-command=':set bomb, :set fileencoding=utf-8' filename.php"
Hope someone can help me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这是我所知道的从命令行轻松执行此操作的最简单方法:
或者如果预计文件数量相当大,那就更好了:
This is the simplest way I know of to do this easily from the command line:
Or better yet if the number of files is expected to be pretty large:
您可以将命令放入一个文件中,我们将其命名为
script.vim
:然后您使用
-S
(源)选项调用 Vim,以在您要执行的文件上执行脚本。希望修复。要对一堆文件执行此操作,您还可以使用
+
选项将 Vim 命令放在命令行上,但我认为这样可能更具可读性。注意:我没有测试过这个。
You could put your commands in a file, let's call it
script.vim
:Then you invoke Vim with the
-S
(source) option to execute the script on the file you wish to fix. To do this on a bunch of files you could doYou could also put the Vim commands on the command line using the
+
option, but I think it may be more readable like this.Note: I have not tested this.
您可能实际上想要set nobomb(BOM = 字节顺序标记),尤其是在[非Windows]世界中。
例如,我有一个脚本无法工作,因为开头有一个字节顺序标记。它通常不会显示在编辑器中(即使在 vi 中使用设置列表)或控制台上,因此很难发现。
该文件看起来像这样,
但尝试运行它时,我得到
未显示,但在文件的开头,是 3 字节 BOM。所以,就linux而言,该文件不是以#!
解决方案是
删除文件开头的 BOM,使其正确的 utf8。
注意 Windows 使用 BOM 将文本文件标识为 utf8,而不是 ANSI。 Linux(和官方规范)没有。
You may actually want set nobomb (BOM = byte order mark), especially in the [not windows] world.
e.g., I had a script that didn't work as there was a byte order mark at the start. It isn't usually displayed in editors (even with set list in vi), or on the console, so its difficult to spot.
The file looked like this
But trying to run it, I get
Not displayed, but at the start of the file, is the 3 byte BOM. So, as far as linux is concerned, the file doesn't start with #!
The solution is
This removes the BOM at the start of the file, making it correct utf8.
NB Windows uses the BOM to identify a text file as being utf8, rather than ANSI. Linux (and the official spec) doesn't.
接受的答案将保持 Vim 中打开的最后一个文件。使用 Vim 的
-c
选项可以轻松解决这个问题,如果你只需要处理一个文件,下面的方法也可以,
The accepted answer will keep the last file open in Vim. This problem can be easily resolved using the
-c
option of Vim,If you need only process one file, the following will also work,
这两个命令都不适合我。
vim +"argdo 设置炸弹 | set fileencoding=utf-8 | w" -c ":q" file1.txt file2.txt
vim -c ':setomb' -c ':set fileencoding=utf-8' -c ':wq' file1.txt
他们可以将脚本文件转换为utf-8。但是,“没有这样的文件或目录”错误仍然存在。将“set炸弹”更改为“setnobomb”会停止转换。
Neither of those two commands worked for me.
vim +"argdo set bomb | set fileencoding=utf-8 | w" -c ":q" file1.txt file2.txt
vim -c ':set bomb' -c ':set fileencoding=utf-8' -c ':wq' file1.txt
They can convert the script file to utf-8 alright. However, that 'No such file or directory' error persists. Changing 'set bomb' to 'set nobomb' stops converting.