有没有办法使用 JavaScript 检查文件编码?
这是我的情况:我正在处理一个包含大量文件的非常大的项目。其中一些文件以 UTF-8 编码,其他以 ANSI 编码。我们需要将所有文件转换为 UTF-8,因为我们决定这将是我们下一个项目的默认设置。 这是一个大问题,因为我们是巴西人,我们有使用 á、ç、ê、ü 等字符的常用单词。因此,多个文件采用多个字符集编码会产生严重的问题。
不管怎样,我已经找到了这个 JS 文件,它将 ANSI 文件转换为 UTF-8,将它们复制到另一个文件夹并保留原始文件:
var indir = "in";
var outdir = "out";
function ansiToUtf8(fin, fout) {
var ansi = WScript.CreateObject("ADODB.Stream");
ansi.Open();
ansi.Charset = "x-ansi";
ansi.LoadFromFile(fin);
var utf8 = WScript.CreateObject("ADODB.Stream");
utf8.Open();
utf8.Charset = "UTF-8";
utf8.WriteText(ansi.ReadText());
utf8.SaveToFile(fout, 2 /*adSaveCreateOverWrite*/);
ansi.Close();
utf8.Close();
}
var fso = WScript.CreateObject("Scripting.FileSystemObject");
var folder = fso.GetFolder(indir);
var fc = new Enumerator(folder.files);
for (; !fc.atEnd(); fc.moveNext()) {
var file = fc.item();
ansiToUtf8(indir+"\\"+file.name, outdir+"\\"+file.name);
}
我在命令行中使用它运行
cscript /Nologo ansi2utf8.js
问题是该脚本运行所有文件,甚至是已经采用 UTF-8 格式的文件,这会导致破坏我的特殊字符。所以我需要检查文件编码是否已经是 UTF-8,并且仅在它是 ANSI 时才运行我的代码。 我怎样才能做到这一点?
另外,我的脚本仅通过“in”文件夹运行。我仍在考虑以一种简单的方式使其进入此文件夹中的文件夹并在那里运行。
Here's my case: I'm working with a very big project that contains lots of files. Some of these files are encoded in UTF-8, other in ANSI. We need to convert all the files to UTF-8, because we decided this will be the default in our next projects.
This is a big concern because we're Brazilian and we have common words using characters like á, ç, ê, ü, etc. So having multiple files in multiple charset-encodes generated a serious issue.
Anyway, I've come to this JS file that converts ANSI files to UTF-8, copying them to another folder and preserving the originals:
var indir = "in";
var outdir = "out";
function ansiToUtf8(fin, fout) {
var ansi = WScript.CreateObject("ADODB.Stream");
ansi.Open();
ansi.Charset = "x-ansi";
ansi.LoadFromFile(fin);
var utf8 = WScript.CreateObject("ADODB.Stream");
utf8.Open();
utf8.Charset = "UTF-8";
utf8.WriteText(ansi.ReadText());
utf8.SaveToFile(fout, 2 /*adSaveCreateOverWrite*/);
ansi.Close();
utf8.Close();
}
var fso = WScript.CreateObject("Scripting.FileSystemObject");
var folder = fso.GetFolder(indir);
var fc = new Enumerator(folder.files);
for (; !fc.atEnd(); fc.moveNext()) {
var file = fc.item();
ansiToUtf8(indir+"\\"+file.name, outdir+"\\"+file.name);
}
which I run using this in command line
cscript /Nologo ansi2utf8.js
The problem is that this script runs through all the files, even the ones that are already in UTF-8, and this results in breaking my special characters. So I need to check if the file encoding is already UTF-8, and run my code only if it is ANSI.
How can I do that?
Also, my script is running only through the 'in' folder. I'm still thinking in a easy way to make it go inside folders that are in this folder and run there too.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的 UTF-8 文件有字节顺序标记吗?在这种情况下,您可以简单地检查前 3 个字节的值来确定文件是否为 UTF-8。否则,标准方法是检查文件是否始终是合法的 UTF-8,如果是,则很可能会被读取为 UTF-8。
Does your UTF-8 files have a byte order mark? In that case you could simply check the value of the first 3 bytes to determine if the files are UTF-8 or not. Otherwise the standard method is to check if the file is legal UTF-8 all the way through, if so it is most likely supposed to be read as UTF-8.