检查所有文件是否编码为 UTF-8
有谁知道有一个 Windows 应用程序可以扫描目录并检查哪些脚本编码为/未编码为指定的字符集(在本例中为 UTF-8)?我可以手动完成,但这可能需要一段时间,而且很容易出错!
Does anyone know of a Windows app that can scan through a directory and check which scripts are/aren't encoded as a specified charset (UTF-8 in this case)? I could do it manually, but that could take a while and is quite error prone!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
UTF-8 不是字符集,而是 Unicode 字符的编码。而且,由于这与编程无关,因此我将其移交给超级用户。
如果您确实想要编写一个程序来检测这些序列,那非常简单:
然后,假设第一个八位字节是合法的,只需记住形成代码点的八位字节的数量可以通过计算第一个
0
位之前的1
位的数量。例如,
11110xxx
是 4 个八位字节序列的开始,因此一旦确定其合法性,您就应该向前跳过 4 个八位字节。另一件要做的事情是确保所有连续八位字节都以
10
开头。UTF-8 isn't a character set, it's an encoding for Unicode characters. And, since this is not programming related, I'm nudging it over to superuser.
If you do want to write a program for detecting those sequences, it's pretty easy:
Then, provided the first octet is legal, just remember that the number of octets forming a code point can be obtained by counting the number of
1
bits before the first0
bit.For example,
11110xxx
is the start of a 4-octet sequence so you should skip ahead 4 octets once you've established its legality.The other thing to do is ensure that all continuation octets start with
10
.不确定这是否是您正在寻找的内容,但我使用命令 shell for 循环并使用我的
hdump
实用程序转储每个文件的前几个字节,该实用程序显示文件的字节十六进制形式。然后,我在每个文件的开头查找前导 3 字节 UTF-8 签名(字节顺序标记)。我的
hdump
实用程序位于:http://david.tribble.com/程序.htmlNot sure if this is what you're looking for, but I use a command shell for-loop and dump the first few bytes of each file using my
hdump
utility, which displays the bytes of the file in hexadecimal form. I then look for the leading 3-byte UTF-8 signature (Byte Order Mark) at the start of each file.My
hdump
utility is available at: http://david.tribble.com/programs.html