在命令提示 / Windows PowerShell(Windows 10)中使用UTF-8编码(CHCP 65001)

发布于 2025-01-25 14:45:42 字数 266 浏览 2 评论 0 原文

我一直在命令提示和Windows PowerShell中强制使用 CHCP 65001 ,但根据Q& so和其他几个社区的帖子来判断

我个人我一直在使用 CHCP 949 用于韩国角色支持,但是Backslash \ 的怪异显示以及在多种应用程序(例如Neovim)中的不正确/难以理解的显示器(例如Neovim)由于不是通过 949 不支持韩语的字符似乎最近成为一个问题。

I've been forcing the usage of chcp 65001 in Command Prompt and Windows Powershell for some time now, but judging by Q&A posts on SO and several other communities it seems like a dangerous and inefficient solution. Does Microsoft provide an improved / complete alternative to chcp 65001 that can be saved permanently without manual alteration of the Registry? And if there isn't, is there a publicly announced timeline or agenda to support UTF-8 in the Windows CLI in the future?

Personally I've been using chcp 949 for Korean Character Support, but the weird display of the backslash \ and incorrect/incomprehensible displays in several applications (like Neovim), as well as characters that aren't Korean not being supported via 949 seems to become more of a problem lately.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

猫卆 2025-02-01 14:45:42

注意:

  • 此答案显示了如何切换字符编码 Windows consoles(端子)(终端)到(BOM-less) UTF-8 系统范围 (代码页 65001 ),因此 shells ,例如 cmd.exe 代码>和powerShell与 exterme(Console)程序完整的Unicode支持以及在编码和解码字符(文本) > cmd.exe 也用于文件I/O。 [1]

  • ,相反,您担心的是 unicode targe 渲染的限制的单独方面 在控制台窗口中,请参见


Microsoft是否提供了CHCP 65001的改进 /完整替代方案,可以永久保存而无需手动更改注册表?< / p>

截至(至少) Windows 10 ,版本1903,您可以选择设置 system locale for non-nicode程序语言)to utf-8 ,但功能仍然是beta ,从本文开始

激活它:

  • 运行 intl.cpl (在控制面板中打开区域设置)
  • 遵循下面屏幕截图中的说明。

“区域>管理“>

  • sets 系统的活动OEM ANSI代码页到 65001 ,UTF-8代码页,因此(a)使所有未来的 console windows 使用 oem 代码页,默认为utf-8 (好像 CHCP 65001 已在 cmd.exe 窗口中执行),并且(b)也使遗产non-unicode gui subsystem使用 ansi 代码页的应用程序,使用UTF-8。

    • 警告

      • 如果您使用的 Windows PowerShell ,这也将使 get-content set-content < /strong>以及Windows PowerShell默认值的其他上下文,因此系统的Active ANSI代码页面,特别是读取源代码来自Bom-less Files 默认为UTF-8 (powershell core (v6+)总是这样做的)。这意味着,在没有编码参数的情况下,将被ANSI编码的bom-bom-bom-bom-never文件(历史上是常见的)将被误读,并且使用 set-content创建的文件将是UTF-8而不是ANSI编码。

        • 类似地,遗产(非unicode)非console应用程序 s将误解ANSI-INSI编码的文件

      • 选择一个TT(truetype)字体,但即使它们通常仅支持所有字符的 subset ,因此您可能必须使用特定字体进行实验查看您是否关心的所有字符表示表示 - 请参阅此答案有关详细信息,该详细信息还讨论了替代游戏机(终端)应用程序拥有更好的Unicode渲染支持。

      • AS eryksun 将仅限于仅ASCII输入,并且在尝试在(7位)ASCII范围之外输出字符 时会产生不正确的输出。 (在过时的Windows 7及以下,程序甚至可能 crash )。
        如果运行的遗留控制台应用程序对您很重要,请参见评论中的Eryksun的建议。




  • 然而,对于 Windows PowerShell 不够

    • 您还必须另外设置 $ outputEncoding to utf-8 的偏好变量: $ outputEncoding = [System.Text.utf8Encoding] :: new(new) ) [2] ;将该命令添加到您的 $ profile (仅当前用户)或 $ profile.alluserscurrentost (所有用户)文件。
    • 最简单。

    • 幸运的是,这在PowerShell core 中不再需要,该 在内部始终默认为Bom-bom-bom-utf-8。

如果将系统语言环境设置为UTF-8是不是环境中的选项,请使用 startup commands

注意:上面提到的警告Re legacy Console应用程序在此处同样适用。如果运行旧版控制台应用程序对您很重要,请在评论中查看Eryksun的建议。

  • 对于powershell (两个版本),将以下行添加到您的 $ profile (仅当前用户)或 $ profile.alluserscurrentost (所有用户)文件,相当于 chcp 65001 ,并补充了设置首选项变量 $ outputEncoding ,以指示PowerShell通过UTF-8中的管道将数据发送到外部程序:

    • 请注意,运行 CHCP 65001 来自 incem powershell会话是 not 效率不知道以 chcp 进行的以后更改;此外,如前所述, Windows PowerShell 需要 $ outputEncoding 要设置 - 请参见答案有关详细信息。


$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
  • 例如,这是一种快速及时的方法,将此行添加到 $ profile 以编程方式:
'$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding' + [Environment]::Newline + (Get-Content -Raw $PROFILE -ErrorAction SilentlyContinue) | Set-Content -Encoding utf8 $PROFILE
  • cmd.exe ,定义通过注册表中的自动运行命令,在值 autorun hkey_current_user \ software \ software \ microsoft \ microsoft \ command processor (仅当前用户)或 hkey_local_machine \ Microsoft \ microsoft \ microsoft \ Microsoft \命令处理器(所有用户):

    • 例如,您可以使用PowerShell为您创建此值:

# Auto-execute `chcp 65001` whenever the current user opens a `cmd.exe` console
# window (including when running a batch file):
Set-ItemProperty 'HKCU:\Software\Microsoft\Command Processor' AutoRun 'chcp 65001 >NUL'

可选读数:为什么使用Windows PowerShell ise 不明智的

虽然ISE确实比控制台具有更好的unicode 渲染支持支持糟糕的选择:

  • 首先, ISE是过时的 :它不支持 powershell(core)7+ ,所有未来的开发都会去的地方,并且它不是跨平台powershell Editions, Visual Studio Code ,它已经用PowerShell默认为powershell core 可以配置为Windows PowerShell。


  • ISE通常是开发脚本的环境,而不是在生产中运行 它们将在 console /in Windows terminal 中运行;值得注意的是,关于运行代码,ISE的行为不是不是与常规控制台/Windows终端的行为相同::



    • 对运行外部程序的支持不佳 ,不仅是由于缺乏支持 Interactive (请参阅下一点),而且还有关:

      • 字符编码

        • ISE错误地假设外部程序默认使用 ansi 代码页面,而实际上是 oem 代码页。例如,默认情况下,此简单命令试图简单地传递从 cmd.exe 通过,故障的字符串(请参见下文):
          CMD /CECHOHü|写出


        • $ outputEncoding 偏好变量默认为utf-8而不是传统OEM代码页(如常规控制台)和不合适的 prepent a utf-8 bom to(第一个)字符串被管道到外部程序 - 参见此答案。。 /p>



      • 作为powershell错误的stderr输出不适当渲染:请参阅这个答案

    • ISE dot-sources script-file起诉,而不是在儿童范围 中运行它们(后者是发生的情况在常规控制台窗口 / Windows终端中);也就是说,在ISE 重复中,在中运行的范围非常相同。这可能会导致微妙的错误,其中以前运行的定义会影响后续错误。


    • ISE预紧额额外的.NET组件未在常规的PowerShell控制台Windows/wt(Windows terminal)选项卡中预加载,尤其是 system.windows.windows.forms 。因此,在没有显式汇编加载命令( add-Type -AssemblyName )的情况下,在ISE中运行良好的脚本可能会在常规控制台/wt中打破。 >)。


    • as eryksun 指出, ISE不支持运行 Interactive /em>外部控制台程序,即需要用户输入的程序:


      • 问题是它隐藏了控制台,并将过程输出(但不是输入)重定向到管道。当文件是管道时,大多数控制台应用程序切换到完整缓冲。同样,交互式应用程序需要从隐藏的控制台窗口中读取STDIN的读数。 (可以通过 showwindow 毫不掩饰,但是一个单独的输入窗口很笨拙。)

      • 如果您愿意以这种限制进行互动,将活动代码页面切换到 65001 (UTF-8)进行与外部程序进行正确的通信需要尴尬的解决方法:< /p>


        • 您必须首先通过运行内置控制台的任何外部程序来强制创建隐藏的控制台窗口游戏机窗口短暂闪光。

        • 唯一然后您可以设置 [console] :: outputEncoding (和 $ outputEncoding ),如图所示以上(如果尚未创建隐藏的控制台,您将获得 handle是无效的错误)。




[1]在PowerShell中,如果您从不调用外部程序,则不必担心系统语言环境(活动代码页):PowerShell-native命令和.NET呼叫始终通过UTF进行通信-16字符串(本机.NET字符串)和文件I/O应用独立于系统语言环境的默认编码。同样,由于Windows API函数的 Unicode 版本用于从控制台打印和读取,因此非ASCII字符总是正确打印(在控制台的渲染限制内)。


相比之下,在 cmd.exe 中,系统语言环境对文件I/O(带有&lt; and and &gt; 重定向,但尤其包括包括要为批处理文件源代码进行编码的编码),而不仅仅是与内存中的外部程序进行通信(例如,读取程序以/f loop读取程序输出时)。

[2]在powershell v4-中,其中静态 :: new()方法不可用,请使用 $ outputEncoding =(new-object system.text.utf8encododing).psobject。 baseObject 。参见 github问题#5763 为什么为什么 .psobignt.baseobject.baseobject.baseobject 需要部分。

Note:

  • This answer shows how to switch the character encoding in Windows consoles (terminals) to (BOM-less) UTF-8 system-wide (code page 65001), so that shells such as cmd.exe and PowerShell properly encode and decode characters (text) when communicating with external (console) programs with full Unicode support, and in cmd.exe also for file I/O.[1]

  • If, by contrast, your concern is about the separate aspect of the limitations of Unicode character rendering in console windows, see the middle and bottom sections of this answer, where alternative console (terminal) applications are discussed too.


Does Microsoft provide an improved / complete alternative to chcp 65001 that can be saved permanently without manual alteration of the Registry?

As of (at least) Windows 10, version 1903, you have the option to set the system locale (language for non-Unicode programs) to UTF-8, but the feature is still in beta as of this writing and fundamentally has far-reaching consequences.

To activate it:

  • Run intl.cpl (which opens the regional settings in Control Panel)
  • Follow the instructions in the screen shot below.

Control Panel > Region > Administrative

  • This sets both the system's active OEM and the ANSI code page to 65001, the UTF-8 code page, which therefore (a) makes all future console windows, which use the OEM code page, default to UTF-8 (as if chcp 65001 had been executed in a cmd.exe window) and (b) also makes legacy, non-Unicode GUI-subsystem applications, which use the ANSI code page, use UTF-8.

    • Caveats:

      • If you're using Windows PowerShell, this will also make Get-Content and Set-Content and other contexts where Windows PowerShell default so the system's active ANSI code page, notably reading source code from BOM-less files, default to UTF-8 (which PowerShell Core (v6+) always does). This means that, in the absence of an -Encoding argument, BOM-less files that are ANSI-encoded (which is historically common) will then be misread, and files created with Set-Content will be UTF-8 rather than ANSI-encoded.

        • Similarly, legacy (non-Unicode) non-console applications will then misinterpret ANSI-encoded files.
      • Pick a TT (TrueType) font, but even they usually support only a subset of all characters, so you may have to experiment with specific fonts to see if all characters you care about are represented - see this answer for details, which also discusses alternative console (terminal) applications that have better Unicode rendering support.

      • As eryksun points out, legacy console applications that do not "speak" UTF-8 will be limited to ASCII-only input and will produce incorrect output when trying to output characters outside the (7-bit) ASCII range. (In the obsolescent Windows 7 and below, programs may even crash).
        If running legacy console applications is important to you, see eryksun's recommendations in the comments.

  • However, for Windows PowerShell, that is not enough:

    • You must additionally set the $OutputEncoding preference variable to UTF-8 as well: $OutputEncoding = [System.Text.UTF8Encoding]::new()[2]; it's simplest to add that command to your $PROFILE (current user only) or $PROFILE.AllUsersCurrentHost (all users) file.
    • Fortunately, this is no longer necessary in PowerShell Core, which internally consistently defaults to BOM-less UTF-8.

If setting the system locale to UTF-8 is not an option in your environment, use startup commands instead:

Note: The caveat re legacy console applications mentioned above equally applies here. If running legacy console applications is important to you, see eryksun's recommendations in the comments.

  • For PowerShell (both editions), add the following line to your $PROFILE (current user only) or $PROFILE.AllUsersCurrentHost (all users) file, which is the equivalent of chcp 65001, supplemented with setting preference variable $OutputEncoding to instruct PowerShell to send data to external programs via the pipeline in UTF-8:

    • Note that running chcp 65001 from inside a PowerShell session is not effective, because .NET caches the console's output encoding on startup and is unaware of later changes made with chcp; additionally, as stated, Windows PowerShell requires $OutputEncoding to be set - see this answer for details.
$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding
  • For example, here's a quick-and-dirty approach to add this line to $PROFILE programmatically:
'$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding' + [Environment]::Newline + (Get-Content -Raw $PROFILE -ErrorAction SilentlyContinue) | Set-Content -Encoding utf8 $PROFILE
  • For cmd.exe, define an auto-run command via the registry, in value AutoRun of key HKEY_CURRENT_USER\Software\Microsoft\Command Processor (current user only) or HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor (all users):

    • For instance, you can use PowerShell to create this value for you:
# Auto-execute `chcp 65001` whenever the current user opens a `cmd.exe` console
# window (including when running a batch file):
Set-ItemProperty 'HKCU:\Software\Microsoft\Command Processor' AutoRun 'chcp 65001 >NUL'

Optional reading: Why using the Windows PowerShell ISE is ill-advised in general:

While the ISE does have better Unicode rendering support than the console, it is generally a poor choice:

  • First and foremost, the ISE is obsolescent: it doesn't support PowerShell (Core) 7+, where all future development will go, and it isn't cross-platform, unlike the new premier IDE for both PowerShell editions, Visual Studio Code, which already speaks UTF-8 by default for PowerShell Core and can be configured to do so for Windows PowerShell.

  • The ISE is generally an environment for developing scripts, not for running them in production (if you're writing scripts (also) for others, you should assume that they'll be run in the console / in Windows Terminal); notably, with respect to running code, the ISE's behavior is not the same as that of a regular console / Windows Terminal:

    • Poor support for running external programs, not only due to lack of supporting interactive ones (see next point), but also with respect to:

      • Character encoding:

        • The ISE mistakenly assumes that external programs use the ANSI code page by default, when in reality it is the OEM code page. E.g., by default this simple command, which tries to simply pass a string echoed from cmd.exe through, malfunctions (see below for a fix):
          cmd /c echo hü | Write-Output

        • The $OutputEncoding preference variable defaults to UTF-8 instead of to the legacy OEM code page (as in regular consoles) and inappropriately prepends a UTF-8 BOM to the (first) string piped to an external program - see this answer.

      • Inappropriate rendering of stderr output as PowerShell errors: see this answer.

    • The ISE dot-sources script-file invocations instead of running them in a child scope (the latter is what happens in a regular console window / in Windows Terminal); that is, in the ISE repeated invocations run in the very same scope. This can lead to subtle bugs, where definitions left behind by a previous run can affect subsequent ones.

    • The ISE preloads additional .NET assemblies that aren't preloaded in regular PowerShell console windows / WT (Windows Terminal ) tabs, notably System.Windows.Forms. Therefore, a script that runs fine in the ISE may break in regular console windows / WT in the absence of an explicit assembly-loading command (Add-Type -AssemblyName or using assembly).

    • As eryksun points out, the ISE doesn't support running interactive external console programs, namely those that require user input:

      • The problem is that it hides the console and redirects the process output (but not input) to a pipe. Most console applications switch to full buffering when a file is a pipe. Also, interactive applications require reading from stdin, which isn't possible from a hidden console window. (It can be unhidden via ShowWindow, but a separate window for input is clunky.)

      • If you're willing to live with this limitation re interactivity, switching the active code page to 65001 (UTF-8) for proper communication with external programs requires an awkward workaround:

        • You must first force creation of the hidden console window by running any external program from the built-in console, e.g., chcp - you'll see a console window flash briefly.

        • Only then can you set [Console]::OutputEncoding (and $OutputEncoding) to UTF-8, as shown above (if the hidden console hasn't been created yet, you'll get a handle is invalid error).


[1] In PowerShell, if you never call external programs, you needn't worry about the system locale (active code pages): PowerShell-native commands and .NET calls always communicate via UTF-16 strings (native .NET strings) and on file I/O apply default encodings that are independent of the system locale. Similarly, because the Unicode versions of the Windows API functions are used to print to and read from the console, non-ASCII characters always print correctly (within the rendering limitations of the console).
In cmd.exe, by contrast, the system locale matters for file I/O (with < and > redirections, but notably including what encoding to assume for batch-file source code), not just for communicating with external programs in-memory (such as when reading program output in a for /f loop).

[2] In PowerShell v4-, where the static ::new() method isn't available, use $OutputEncoding = (New-Object System.Text.UTF8Encoding).psobject.BaseObject. See GitHub issue #5763 for why the .psobject.BaseObject part is needed.

一个人的旅程 2025-02-01 14:45:42

您可以将命令 CHCP 65001 放在PowerShell配置文件中,当您打开PowerShell时,它将自动运行它。但是,这对cmd.exe不会有任何作用。

Microsoft目前正在研究一个将提供全部Unicode支持的改进终端。它是开放源,如果您使用的是Windows 10版本1903或更高版本,则可以已经下载

另外,您可以使用第三方终端模拟器,例如 terminus

You can put the command chcp 65001 in your Powershell Profile, which will run it automatically when you open Powershell. However, this won't do anything for cmd.exe.

Microsoft is currently working on an improved terminal that will have full Unicode support. It is open source, and if you're using Windows 10 Version 1903 or later, you can already download a preview version.

Alternatively, you can use a third-party terminal emulator such as Terminus.

絕版丫頭 2025-02-01 14:45:42

可以使用编辑注册表。这是在 cmd /?< /code>中记录的正确方法:

如果在命令行上未指定 /d,则当cmd.exe启动时,
它寻找以下reg_sz/reg_expand_sz注册表变量,
如果存在或两个都存在,则首先执行。

  HKEY_LOCAL_MACHINE \ SOFTWORT \ MICROSOFT \命令处理器\ Autorun

    和/或

hkey_current_user \ software \ microsoft \命令处理器\ autorun
 

现在是2023年,好消息。使用Windows终端,不需要编辑注册表或创建其他批处理文件。在Windows终端,Go 设置&gt;配置文件并找到命令提示System32 \ cmd.exe /k“ CHCP 65001” < /code>。很简单。

*添加:如果您使用powerShell代替命令提示符,将其添加到命令行设置: -noexit -command“ CHCP 65001” 。在我的情况下,默认值为:

%SystemRoot%\ System32 \ WindowsPowersHell \ V1.0 \ PowerShell.exe

然后更改为:

%SystemRoot%\ System32 \ WindowsPowersHell \ v1.0 \ powerShell。 exe -noexit -command“ CHCP 65001”

Typing some commands (chcp or whatever) whenever starting Command Prompt is can be done with editing registry. It's the right way as it's documented in CMD /?:

If /D was NOT specified on the command line, then when CMD.EXE starts,
it looks for the following REG_SZ/REG_EXPAND_SZ registry variables,
and if either or both are present, they are executed first.

HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor\AutoRun

    and/or

HKEY_CURRENT_USER\Software\Microsoft\Command Processor\AutoRun

Now it's 2023 and good news. With Windows Terminal, editing registry or creating an additional batch file is not needed. In Windows Terminal, go Settings > Profiles and locate Command Prompt and then change the Command line from %SystemRoot%\System32\cmd.exe (default) to %SystemRoot%\System32\cmd.exe /K "chcp 65001". It's simple.

*Added: If you use PowerShell instead of Command Prompt, add this to Command Line setting: -NoExit -Command "chcp 65001". In my case default value was:

%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe

Then changed to:

%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe -NoExit -Command "chcp 65001"

暮凉 2025-02-01 14:45:42

Powershell iSe表现出韩国的表现非常好。这是在UTF8中编码的示例文本文件,它可以使用:

PS C:\Users\js> cat .\korean.txt

The Korean language (South Korean: 한국어/韓國語 Hangugeo; North 
Korean: 조선말/朝鮮말 Chosŏnmal) is an East Asian language
spoken by about 77 million people.[3]

由于ISE随附Windows 10的每个版本,所以我认为它并不过时。我不同意谁删除了我的原始答案。

ISE有一些局限性,但是可以使用外部命令来完成一些脚本:

echo 'list volume' | diskpart # as admin
cmd /c echo hi

编辑:

如果您有Windows 10 1903,则可以从Microsoft Store下载Windows terminal https://devblogs.microsoft.com/commandline/introducing-windows-terminal/ ,而韩国文本将在其中使用。 PowerShell 5将需要文本格式为带有BOM或UTF16的UTF8。

Edit2:

对于粘贴字符和输出,似乎理想是Windows terminal + PowerShell 7或Vscode + PowerShell 7。

EDIT3:

即使在Edit2情况下,也无法粘贴某些Unicode字符,例如 (U+21c6)或 unicode空间。 OSX中只有PS7可以起作用。

The Powershell ISE displays Korean perfectly fine. Here's a sample text file encoded in utf8 that would work:

PS C:\Users\js> cat .\korean.txt

The Korean language (South Korean: 한국어/韓國語 Hangugeo; North 
Korean: 조선말/朝鮮말 Chosŏnmal) is an East Asian language
spoken by about 77 million people.[3]

Since the ISE comes with every version of Windows 10, I do not consider it obsolete. I disagree with whoever deleted my original answer.

The ISE has some limitations, but some scripting can be done with external commands:

echo 'list volume' | diskpart # as admin
cmd /c echo hi

EDIT:

If you have Windows 10 1903, you can download Windows Terminal from the Microsoft Store https://devblogs.microsoft.com/commandline/introducing-windows-terminal/, and Korean text would work in there. Powershell 5 would need the text format to be UTF8 with bom or UTF16.

EDIT2:

It seems like the ideals are windows terminal + powershell 7 or vscode + powershell 7, for both pasting characters and output.

EDIT3:

Even in the EDIT2 situations, some unicode characters cannot be pasted, like (U+21C6), or unicode spaces. Only PS7 in Osx would work.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文