当前位置：文江博客话题详情

获取 Windows 中文件的编码

发布于 2024-09-19 03:37:43 字数 99 浏览 5 评论 0原文

这实际上不是一个编程问题，是否有命令行或 Windows 工具（Windows 7）来获取文本文件的当前编码？当然，我可以编写一个小 C# 应用程序，但我想知道是否已经内置了一些东西？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

貪欢 2024-09-26 03:37:44

以下是我如何通过 BOM 检测 Unicode 系列文本编码的方法。此方法的准确性较低，因为此方法仅适用于文本文件（特别是 Unicode 文件），并且当不存在 BOM 时默认为 ascii（与大多数文本编辑器一样，默认值为 ascii） >UTF8（如果您想匹配 HTTP/web 生态系统）。

2018 年更新：我不再推荐此方法。我建议使用来自 GIT 的 file.exe 或 @Sybren 推荐的 *nix 工具，以及我将在稍后的答案中展示如何通过 PowerShell 执行此操作。

# from https://gist.github.com/zommarin/1480974
function Get-FileEncoding($Path) {
    $bytes = [byte[]](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)

    if(!$bytes) { return 'utf8' }

    switch -regex ('{0:x2}{1:x2}{2:x2}{3:x2}' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3]) {
        '^efbbbf'   { return 'utf8' }
        '^2b2f76'   { return 'utf7' }
        '^fffe'     { return 'unicode' }
        '^feff'     { return 'bigendianunicode' }
        '^0000feff' { return 'utf32' }
        default     { return 'ascii' }
    }
}

dir ~\Documents\WindowsPowershell -File | 
    select Name,@{Name='Encoding';Expression={Get-FileEncoding $_.FullName}} | 
    ft -AutoSize

建议：如果 dir、ls 或 Get-ChildItem 仅检查已知文本文件，并且当您仅从已知的工具列表中寻找“错误的编码”。（即 SQL Management Studio 默认为 UTF16，这打破了 Windows 的 GIT auto-cr-lf，这是多年来的默认设置。）

Here's my take how to detect the Unicode family of text encodings via BOM. The accuracy of this method is low, as this method only works on text files (specifically Unicode files), and defaults to ascii when no BOM is present (like most text editors, the default would be UTF8 if you want to match the HTTP/web ecosystem).

Update 2018: I no longer recommend this method. I recommend using file.exe from GIT or *nix tools as recommended by @Sybren, and I show how to do that via PowerShell in a later answer.

# from https://gist.github.com/zommarin/1480974
function Get-FileEncoding($Path) {
    $bytes = [byte[]](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)

    if(!$bytes) { return 'utf8' }

    switch -regex ('{0:x2}{1:x2}{2:x2}{3:x2}' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3]) {
        '^efbbbf'   { return 'utf8' }
        '^2b2f76'   { return 'utf7' }
        '^fffe'     { return 'unicode' }
        '^feff'     { return 'bigendianunicode' }
        '^0000feff' { return 'utf32' }
        default     { return 'ascii' }
    }
}

dir ~\Documents\WindowsPowershell -File | 
    select Name,@{Name='Encoding';Expression={Get-FileEncoding $_.FullName}} | 
    ft -AutoSize

Recommendation: This can work reasonably well if the dir, ls, or Get-ChildItem only checks known text files, and when you're only looking for "bad encodings" from a known list of tools. (i.e. SQL Management Studio defaults to UTF16, which broke GIT auto-cr-lf for Windows, which was the default for many years.)

获取 Windows 中文件的编码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（15）

用法

示例

Usage

Examples

使用 Powershell

Using Powershell

更新 (06/14/2023)：

其他 (Mac/Linux/Win) 选项：

Update (06/14/2023):

Other (Mac/Linux/Win) Options:

关于作者

相关话题

热门标签

推荐作者

淡笑忘祈一世凡恋

我们的影子

素年丶

南笙

18215568913

qq_xk7Ean

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。