获取 Windows 中文件的编码
这实际上不是一个编程问题,是否有命令行或 Windows 工具(Windows 7)来获取文本文件的当前编码?当然,我可以编写一个小 C# 应用程序,但我想知道是否已经内置了一些东西?
This isn't really a programming question, is there a command line or Windows tool (Windows 7) to get the current encoding of a text file? Sure I can write a little C# app but I wanted to know if there is something already built in?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
以下是我如何通过 BOM 检测 Unicode 系列文本编码的方法。此方法的准确性较低,因为此方法仅适用于文本文件(特别是 Unicode 文件),并且当不存在 BOM 时默认为
ascii
(与大多数文本编辑器一样,默认值为ascii
) >UTF8(如果您想匹配 HTTP/web 生态系统)。2018 年更新:我不再推荐此方法。我建议使用来自 GIT 的 file.exe 或 @Sybren 推荐的 *nix 工具,以及 我将在稍后的答案中展示如何通过 PowerShell 执行此操作。
建议:如果
dir
、ls
或Get-ChildItem
仅检查已知文本文件,并且当您仅从已知的工具列表中寻找“错误的编码”。 (即 SQL Management Studio 默认为 UTF16,这打破了 Windows 的 GIT auto-cr-lf,这是多年来的默认设置。)Here's my take how to detect the Unicode family of text encodings via BOM. The accuracy of this method is low, as this method only works on text files (specifically Unicode files), and defaults to
ascii
when no BOM is present (like most text editors, the default would beUTF8
if you want to match the HTTP/web ecosystem).Update 2018: I no longer recommend this method. I recommend using file.exe from GIT or *nix tools as recommended by @Sybren, and I show how to do that via PowerShell in a later answer.
Recommendation: This can work reasonably well if the
dir
,ls
, orGet-ChildItem
only checks known text files, and when you're only looking for "bad encodings" from a known list of tools. (i.e. SQL Management Studio defaults to UTF16, which broke GIT auto-cr-lf for Windows, which was the default for many years.)一个简单解决方案可能是在 Firefox 中打开该文件。
,文本编码将出现在“页面信息”窗口中。
注意:如果文件不是txt格式,只需将其重命名为txt,然后重试即可。
PS 有关详细信息,请参阅此 文章。
A simple solution might be opening the file in Firefox.
and the text encoding will appear on the "Page Info" window.
Note: If the file is not in txt format, just rename it to txt and try again.
P.S. For more info see this article.
我写了#4 答案(在撰写本文时)。但最近我所有的电脑上都安装了 git,所以现在我使用@Sybren 的解决方案。这是一个新的答案,可以从 powershell 中方便地使用该解决方案(无需将所有 git/usr/bin 放入 PATH 中,这对我来说太混乱了)。
将其添加到您的
profile.ps1
中:并使用如下方式:
file.exe --mime-encoding *
。您必须在命令中包含 .exe 才能使 PS 别名发挥作用。但是,如果您不自定义 PowerShell profile.ps1,我建议您从我的开始: https:// gist.github.com/yzorg/8215221/8e38fd722a3dfc526bbe4668d1f3b08eb7c08be0
并将其保存到
~\Documents\WindowsPowerShell
。在没有git的计算机上使用是安全的,但是当找不到git时会写警告。命令中的.exe也是我从powershell使用
C:\WINDOWS\system32\where.exe
的方式;以及许多其他被 powershell“默认隐藏”的操作系统 CLI 命令,*shrug*。I wrote the #4 answer (at time of writing). But lately I have git installed on all my computers, so now I use @Sybren's solution. Here is a new answer that makes that solution handy from powershell (without putting all of git/usr/bin in the PATH, which is too much clutter for me).
Add this to your
profile.ps1
:And used like:
file.exe --mime-encoding *
. You must include .exe in the command for PS alias to work.But if you don't customize your PowerShell profile.ps1 I suggest you start with mine: https://gist.github.com/yzorg/8215221/8e38fd722a3dfc526bbe4668d1f3b08eb7c08be0
and save it to
~\Documents\WindowsPowerShell
. It's safe to use on a computer without git, but will write warnings when git is not found.The .exe in the command is also how I use
C:\WINDOWS\system32\where.exe
from powershell; and many other OS CLI commands that are "hidden by default" by powershell, *shrug*.这里有一些 C 代码用于可靠的 ascii、bom 和 utf8 检测: https://unicodebook.readthedocs.io/ guess_encoding.html
编辑:
C# 答案的 powershell 版本来自:查找任何文件编码的有效方法。仅适用于签名 (boms)。
Some C code here for reliable ascii, bom's, and utf8 detection: https://unicodebook.readthedocs.io/guess_encoding.html
EDIT:
A powershell version of a C# answer from: Effective way to find any file's Encoding. Only works with signatures (boms).
您只需在文件位置打开 git bash 然后运行命令
file -i file_name
example即可进行检查
you can simply check that by opening your git bash on the file location then running the command
file -i file_name
example
正在寻找 Node.js/npm 解决方案?尝试 encoding-checker:
用法
示例
获取当前目录中所有文件的编码:
返回编码当前目录中的所有
md
文件:获取当前目录及其子文件夹中所有文件的编码(对于大型文件夹将花费相当长的时间;看似没有响应):
有关更多示例,请参阅 npm docu 或官方
Looking for a Node.js/npm solution? Try encoding-checker:
Usage
Examples
Get encoding of all files in current directory:
Return encoding of all
md
files in current directory:Get encoding of all files in current directory and its subfolders (will take quite some time for huge folders; seemingly unresponsive):
For more examples refer to the npm docu or the official repository.
EncodingChecker
文件编码检查器是一种 GUI 工具,可让您验证一个或多个文本编码文件。该工具可以显示所有选定文件的编码,或仅显示没有指定编码的文件。
文件编码检查器需要 .NET 4 或更高版本才能运行。
EncodingChecker
File Encoding Checker is a GUI tool that allows you to validate the text encoding of one or more files. The tool can display the encoding for all selected files, or only the files that do not have the encodings you specify.
File Encoding Checker requires .NET 4 or above to run.
与上面使用记事本列出的解决方案类似,您也可以在 Visual Studio 中打开该文件(如果您使用的是 Visual Studio)。在 Visual Studio 中,您可以选择“文件 > 高级保存选项...”,
“编码:”组合框将具体告诉您文件当前使用的编码。它比记事本列出了更多的文本编码,因此在处理来自世界各地的各种文件和其他文件时非常有用。
就像记事本一样,您也可以从选项列表中更改编码,然后单击“确定”后保存文件。您还可以通过“另存为”对话框中的“使用编码保存...”选项选择所需的编码(通过单击“保存”按钮旁边的箭头)。
Similar to the solution listed above with Notepad, you can also open the file in Visual Studio, if you're using that. In Visual Studio, you can select "File > Advanced Save Options..."
The "Encoding:" combo box will tell you specifically which encoding is currently being used for the file. It has a lot more text encodings listed in there than Notepad does, so it's useful when dealing with various files from around the world and whatever else.
Just like Notepad, you can also change the encoding from the list of options there, and then saving the file after hitting "OK". You can also select the encoding you want through the "Save with Encoding..." option in the Save As dialog (by clicking the arrow next to the Save button).
我发现做到这一点的唯一方法是 VIM 或 Notepad++。
The only way that I have found to do this is VIM or Notepad++.
使用 Powershell
经过多年尝试从本机 CMD/Powershell 方法获取文件编码,并且总是不得不求助于使用(和安装)第 3 方软件,例如
Cygwin
、git-bash 和其他外部二进制文件,终于有了一个本地方法。
之前,人们一直抱怨这个工具可能会失败,请理解这个工具主要用于识别文本、日志、CSV 和 TAB 类型的文件。不是二进制文件。此外,文件编码主要是一个猜测游戏,因此提供的脚本正在进行一些基本的猜测,这在大文件上肯定会失败。请随意测试并在要点上提供改进的反馈。
为了测试这一点,我将一堆奇怪的垃圾文本转储到一个字符串中,然后使用可用的 Windows 编码将其导出。
ASCII, BigEndianUnicode, BigEndianUTF32, OEM, Unicode, UTF7, UTF8, UTF8BOM, UTF8NoBOM, UTF32
这是代码,也可以在 gist URL 中找到。
Using Powershell
After many years of trying to get file encoding from native CMD/Powershell methods, and always having to resort to using (and installing) 3rd party software like
Cygwin
,git-bash
and other external binaries, there is finally a native method.Before, people go on complaining about all the ways this can fail, please understand that this tool is primarily to be used for identifying Text,Log, CSV and TAB type of files. Not binary files. In addition, the file encoding is mostly a guessing game, so the provided script is making some rudimentary guesses, that will certainly fail on large files. Feel free to test and give improved feedback in the gist.
To test this, I was dumping a bunch of weird garbage text into a string, and then exporting it using the available Windows Encodings.
ASCII, BigEndianUnicode, BigEndianUTF32, OEM, Unicode, UTF7, UTF8, UTF8BOM, UTF8NoBOM, UTF32
Here's is the code and it can also be found in the gist URL.
使用 Windows 7 附带的常规旧版 记事本 打开文件。
当您单击“另存为...”时,它将显示文件的编码。
它看起来像这样:
无论默认选择的编码是什么,这就是文件当前的编码。
如果是 UTF-8,您可以将其更改为 ANSI,然后单击“保存”以更改编码(反之亦然)。
有许多不同类型的编码,但这就是我们导出文件时所需要的全部 。 UTF-8 和第 3 方需要 ANSI。这是一次性导出,因此记事本适合我。
仅供参考:根据我的理解,我认为“Unicode”(如记事本中列出的)是 UTF-16 的用词不当。
有关记事本的“Unicode”选项的更多信息:Windows 7 - UTF -8 和 Unicode
更新 (06/14/2023):
使用较新的 Notepad 和 Notepad++
Notepad (Windows 10 和 11) 的屏幕截图进行更新:
右下角:
“另存为...”对话框:
Notepad++:
右下角:
“编码”菜单项:
NotePad++ 中提供了更多编码选项;
其他 (Mac/Linux/Win) 选项:
我听说 Windows 11 改进了 100+MB 大型文件的性能,打开速度更快。
我在网上读到,Notepad++ 仍然是大文件编辑器领域的冠军。
但是,(对于那些使用 Mac 或 Linux 的用户)我发现了一些其他竞争者:
1)。 Sublime Text
2)。 Visual Studio 代码
Open up your file using regular old vanilla Notepad that comes with Windows 7.
It will show you the encoding of the file when you click "Save As...".
It'll look like this:
Whatever the default-selected encoding is, that is what your current encoding is for the file.
If it is UTF-8, you can change it to ANSI and click save to change the encoding (or visa-versa).
There are many different types of encodings, but this was all I needed when our export files were in UTF-8 and the 3rd party required ANSI. It was a onetime export, so Notepad fit the bill for me.
FYI: From my understanding I think "Unicode" (as listed in Notepad) is a misnomer for UTF-16.
More here on Notepad's "Unicode" option: Windows 7 - UTF-8 and Unicode
Update (06/14/2023):
Updated with screenshots of the newer Notepad and Notepad++
Notepad (Windows 10 & 11):
Bottom-Right Corner:
"Save As..." Dialog Box:
Notepad++:
Bottom-Right Corner:
"Encoding" Menu Item:
Far more Encoding options are available in NotePad++; should you need them.
Other (Mac/Linux/Win) Options:
I hear Windows 11 improved the performance of large 100+MB files to open much faster.
On the web I've read that Notepad++ is still the all around large-file editor champion.
However, (for those on Mac or Linux) here are some other contenders I found:
1). Sublime Text
2). Visual Studio Code
如果您的 Windows 计算机上有“git”或“Cygwin”,请转到文件所在的文件夹并执行命令:
这将为您提供该文件夹中所有文件的编码详细信息。
If you have "git" or "Cygwin" on your Windows Machine, then go to the folder where your file is present and execute the command:
This will give you the encoding details of all the files in that folder.
(Linux) 命令行工具“文件”可通过 GnuWin32 在 Windows 上使用:
http://gnuwin32 .sourceforge.net/packages/file.htm
如果您安装了 git,它位于 C:\Program Files\git\usr\bin 中。
例子:
The (Linux) command-line tool 'file' is available on Windows via GnuWin32:
http://gnuwin32.sourceforge.net/packages/file.htm
If you have git installed, it's located in C:\Program Files\git\usr\bin.
Example:
安装 git (在 Windows 上你必须使用 git bash 控制台)。类型:
对于当前目录中的所有文件,或
对于所有子目录中的文件
Install git ( on Windows you have to use git bash console). Type:
for all files in the current directory , or
for the files in all subdirectories
我发现另一个有用的工具:[https://codeplexarchive.org/project/EncodingCheckerEXE] 可以在这里找到
Another tool that I found useful: [https://codeplexarchive.org/project/EncodingCheckerEXE] can be found here