我一直在命令提示和Windows PowerShell中强制使用 CHCP 65001
,但根据Q& so和其他几个社区的帖子来判断
我个人我一直在使用 CHCP 949
用于韩国角色支持,但是Backslash \ 的怪异显示以及在多种应用程序(例如Neovim)中的不正确/难以理解的显示器(例如Neovim)由于不是通过 949
不支持韩语的字符似乎最近成为一个问题。
I've been forcing the usage of chcp 65001
in Command Prompt and Windows Powershell for some time now, but judging by Q&A posts on SO and several other communities it seems like a dangerous and inefficient solution. Does Microsoft provide an improved / complete alternative to chcp 65001
that can be saved permanently without manual alteration of the Registry? And if there isn't, is there a publicly announced timeline or agenda to support UTF-8 in the Windows CLI in the future?
Personally I've been using chcp 949
for Korean Character Support, but the weird display of the backslash \ and incorrect/incomprehensible displays in several applications (like Neovim), as well as characters that aren't Korean not being supported via 949
seems to become more of a problem lately.
发布评论
评论(4)
注意:
此答案显示了如何切换字符编码 Windows consoles(端子)(终端)到(BOM-less) UTF-8 系统范围 (代码页
65001
),因此 shells ,例如cmd.exe 代码>和powerShell与 exterme(Console)程序与完整的Unicode支持以及在
也用于文件I/O。 [1]编码和解码字符(文本) > cmd.exe
,相反,您担心的是 unicode targe 渲染的限制的单独方面 在控制台窗口中,请参见
截至(至少) Windows 10 ,版本1903,您可以选择设置 system locale ( for non-nicode程序语言)to utf-8 ,但功能仍然是beta ,从本文开始和和。 。
激活它:
intl.cpl
(在控制面板中打开区域设置)此 sets 系统的活动OEM 和 ANSI代码页到
65001
,UTF-8代码页,因此(a)使所有未来的 console windows 使用 oem 代码页,默认为utf-8 (好像CHCP 65001
已在cmd.exe
窗口中执行),并且(b)也使遗产non-unicode gui subsystem使用 ansi 代码页的应用程序,使用UTF-8。警告:
如果您使用的 Windows PowerShell ,这也将使
get-content
和set-content
< /strong>以及Windows PowerShell默认值的其他上下文,因此系统的Active ANSI代码页面,特别是读取源代码来自Bom-less Files ,默认为UTF-8 (powershell core (v6+)总是这样做的)。这意味着,在没有编码
参数的情况下,将被ANSI编码的bom-bom-bom-bom-never文件(历史上是常见的)将被误读,并且使用set-content创建的文件
将是UTF-8而不是ANSI编码。选择一个TT(truetype)字体,但即使它们通常仅支持所有字符的 subset ,因此您可能必须使用特定字体进行实验查看您是否关心的所有字符表示表示 - 请参阅此答案有关详细信息,该详细信息还讨论了替代游戏机(终端)应用程序拥有更好的Unicode渲染支持。
AS eryksun 将仅限于仅ASCII输入,并且在尝试在(7位)ASCII范围之外输出字符 时会产生不正确的输出。 (在过时的Windows 7及以下,程序甚至可能 crash )。
如果运行的遗留控制台应用程序对您很重要,请参见评论中的Eryksun的建议。
然而,对于 Windows PowerShell , 不够:
:
$ outputEncoding
to utf-8 的偏好变量:$ outputEncoding = [System.Text.utf8Encoding] :: new(new) )
[2] ;将该命令添加到您的$ profile
(仅当前用户)或$ profile.alluserscurrentost
(所有用户)文件。最简单。
如果将系统语言环境设置为UTF-8是不是环境中的选项,请使用 startup commands :
注意:上面提到的警告Re legacy Console应用程序在此处同样适用。如果运行旧版控制台应用程序对您很重要,请在评论中查看Eryksun的建议。
对于powershell (两个版本),将以下行添加到您的
$ profile
(仅当前用户)或$ profile.alluserscurrentost
(所有用户)文件,相当于chcp 65001
,并补充了设置首选项变量$ outputEncoding
,以指示PowerShell通过UTF-8中的管道将数据发送到外部程序:CHCP 65001
来自 incem powershell会话是 not 效率不知道以chcp
进行的以后更改;此外,如前所述, Windows PowerShell 需要$ outputEncoding
要设置 - 请参见答案有关详细信息。$ profile
以编程方式:cmd.exe
,定义通过注册表中的自动运行命令,在值autorun
hkey_current_user \ software \ software \ microsoft \ microsoft \ command processor
(仅当前用户)或hkey_local_machine \ Microsoft \ microsoft \ microsoft \ Microsoft \命令处理器
(所有用户):可选读数:为什么使用Windows PowerShell ise 是 不明智的:
虽然ISE确实比控制台具有更好的unicode 渲染支持支持糟糕的选择:
首先, ISE是过时的 :它不支持 powershell(core)7+ ,所有未来的开发都会去的地方,并且它不是跨平台powershell Editions, Visual Studio Code ,它已经用PowerShell默认为powershell core 可以配置为Windows PowerShell。
ISE通常是开发脚本的环境,而不是在生产中运行 它们将在 console /in Windows terminal 中运行;值得注意的是,关于运行代码,ISE的行为不是不是与常规控制台/Windows终端的行为相同::
对运行外部程序的支持不佳 ,不仅是由于缺乏支持 Interactive (请参阅下一点),而且还有关:
字符编码:
ISE错误地假设外部程序默认使用 ansi 代码页面,而实际上是 oem 代码页。例如,默认情况下,此简单命令试图简单地传递从
cmd.exe
通过,故障的字符串(请参见下文):CMD /CECHOHü|写出
$ outputEncoding
偏好变量默认为utf-8而不是传统OEM代码页(如常规控制台)和不合适的 prepent a utf-8 bom to(第一个)字符串被管道到外部程序 - 参见此答案。。 /p>作为powershell错误的stderr输出不适当渲染:请参阅这个答案。
ISE dot-sources script-file起诉,而不是在儿童范围 中运行它们(后者是发生的情况在常规控制台窗口 / Windows终端中);也就是说,在ISE 重复中,在中运行的范围非常相同。这可能会导致微妙的错误,其中以前运行的定义会影响后续错误。
ISE预紧额额外的.NET组件未在常规的PowerShell控制台Windows/wt(Windows terminal)选项卡中预加载,尤其是
system.windows.windows.forms
。因此,在没有显式汇编加载命令(add-Type -AssemblyName
或)的情况下,在ISE中运行良好的脚本可能会在常规控制台/wt中打破。 >)。
as eryksun 指出, ISE不支持运行 Interactive /em>外部控制台程序,即需要用户输入的程序:
如果您愿意以这种限制进行互动,将活动代码页面切换到
65001
(UTF-8)进行与外部程序进行正确的通信需要尴尬的解决方法:< /p>您必须首先通过运行内置控制台的任何外部程序来强制创建隐藏的控制台窗口游戏机窗口短暂闪光。
唯一然后您可以设置
[console] :: outputEncoding
(和$ outputEncoding
),如图所示以上(如果尚未创建隐藏的控制台,您将获得handle是无效的错误
)。[1]在PowerShell中,如果您从不调用外部程序,则不必担心系统语言环境(活动代码页):PowerShell-native命令和.NET呼叫始终通过UTF进行通信-16字符串(本机.NET字符串)和文件I/O应用独立于系统语言环境的默认编码。同样,由于Windows API函数的 Unicode 版本用于从控制台打印和读取,因此非ASCII字符总是正确打印(在控制台的渲染限制内)。
。
相比之下,在
cmd.exe
中,系统语言环境对文件I/O(带有&lt;
and and&gt;
重定向,但尤其包括包括要为批处理文件源代码进行编码的编码),而不仅仅是与内存中的外部程序进行通信(例如,读取程序以/f loop读取程序输出时)。[2]在powershell v4-中,其中静态
:: new()
方法不可用,请使用$ outputEncoding =(new-object system.text.utf8encododing).psobject。 baseObject
。参见 github问题#5763 为什么为什么.psobignt.baseobject.baseobject.baseobject
需要部分。Note:
This answer shows how to switch the character encoding in Windows consoles (terminals) to (BOM-less) UTF-8 system-wide (code page
65001
), so that shells such ascmd.exe
and PowerShell properly encode and decode characters (text) when communicating with external (console) programs with full Unicode support, and incmd.exe
also for file I/O.[1]If, by contrast, your concern is about the separate aspect of the limitations of Unicode character rendering in console windows, see the middle and bottom sections of this answer, where alternative console (terminal) applications are discussed too.
As of (at least) Windows 10, version 1903, you have the option to set the system locale (language for non-Unicode programs) to UTF-8, but the feature is still in beta as of this writing and fundamentally has far-reaching consequences.
To activate it:
intl.cpl
(which opens the regional settings in Control Panel)This sets both the system's active OEM and the ANSI code page to
65001
, the UTF-8 code page, which therefore (a) makes all future console windows, which use the OEM code page, default to UTF-8 (as ifchcp 65001
had been executed in acmd.exe
window) and (b) also makes legacy, non-Unicode GUI-subsystem applications, which use the ANSI code page, use UTF-8.Caveats:
If you're using Windows PowerShell, this will also make
Get-Content
andSet-Content
and other contexts where Windows PowerShell default so the system's active ANSI code page, notably reading source code from BOM-less files, default to UTF-8 (which PowerShell Core (v6+) always does). This means that, in the absence of an-Encoding
argument, BOM-less files that are ANSI-encoded (which is historically common) will then be misread, and files created withSet-Content
will be UTF-8 rather than ANSI-encoded.Pick a TT (TrueType) font, but even they usually support only a subset of all characters, so you may have to experiment with specific fonts to see if all characters you care about are represented - see this answer for details, which also discusses alternative console (terminal) applications that have better Unicode rendering support.
As eryksun points out, legacy console applications that do not "speak" UTF-8 will be limited to ASCII-only input and will produce incorrect output when trying to output characters outside the (7-bit) ASCII range. (In the obsolescent Windows 7 and below, programs may even crash).
If running legacy console applications is important to you, see eryksun's recommendations in the comments.
However, for Windows PowerShell, that is not enough:
$OutputEncoding
preference variable to UTF-8 as well:$OutputEncoding = [System.Text.UTF8Encoding]::new()
[2]; it's simplest to add that command to your$PROFILE
(current user only) or$PROFILE.AllUsersCurrentHost
(all users) file.If setting the system locale to UTF-8 is not an option in your environment, use startup commands instead:
Note: The caveat re legacy console applications mentioned above equally applies here. If running legacy console applications is important to you, see eryksun's recommendations in the comments.
For PowerShell (both editions), add the following line to your
$PROFILE
(current user only) or$PROFILE.AllUsersCurrentHost
(all users) file, which is the equivalent ofchcp 65001
, supplemented with setting preference variable$OutputEncoding
to instruct PowerShell to send data to external programs via the pipeline in UTF-8:chcp 65001
from inside a PowerShell session is not effective, because .NET caches the console's output encoding on startup and is unaware of later changes made withchcp
; additionally, as stated, Windows PowerShell requires$OutputEncoding
to be set - see this answer for details.$PROFILE
programmatically:For
cmd.exe
, define an auto-run command via the registry, in valueAutoRun
of keyHKEY_CURRENT_USER\Software\Microsoft\Command Processor
(current user only) orHKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor
(all users):Optional reading: Why using the Windows PowerShell ISE is ill-advised in general:
While the ISE does have better Unicode rendering support than the console, it is generally a poor choice:
First and foremost, the ISE is obsolescent: it doesn't support PowerShell (Core) 7+, where all future development will go, and it isn't cross-platform, unlike the new premier IDE for both PowerShell editions, Visual Studio Code, which already speaks UTF-8 by default for PowerShell Core and can be configured to do so for Windows PowerShell.
The ISE is generally an environment for developing scripts, not for running them in production (if you're writing scripts (also) for others, you should assume that they'll be run in the console / in Windows Terminal); notably, with respect to running code, the ISE's behavior is not the same as that of a regular console / Windows Terminal:
Poor support for running external programs, not only due to lack of supporting interactive ones (see next point), but also with respect to:
Character encoding:
The ISE mistakenly assumes that external programs use the ANSI code page by default, when in reality it is the OEM code page. E.g., by default this simple command, which tries to simply pass a string echoed from
cmd.exe
through, malfunctions (see below for a fix):cmd /c echo hü | Write-Output
The
$OutputEncoding
preference variable defaults to UTF-8 instead of to the legacy OEM code page (as in regular consoles) and inappropriately prepends a UTF-8 BOM to the (first) string piped to an external program - see this answer.Inappropriate rendering of stderr output as PowerShell errors: see this answer.
The ISE dot-sources script-file invocations instead of running them in a child scope (the latter is what happens in a regular console window / in Windows Terminal); that is, in the ISE repeated invocations run in the very same scope. This can lead to subtle bugs, where definitions left behind by a previous run can affect subsequent ones.
The ISE preloads additional .NET assemblies that aren't preloaded in regular PowerShell console windows / WT (Windows Terminal ) tabs, notably
System.Windows.Forms
. Therefore, a script that runs fine in the ISE may break in regular console windows / WT in the absence of an explicit assembly-loading command (Add-Type -AssemblyName
orusing assembly
).As eryksun points out, the ISE doesn't support running interactive external console programs, namely those that require user input:
If you're willing to live with this limitation re interactivity, switching the active code page to
65001
(UTF-8) for proper communication with external programs requires an awkward workaround:You must first force creation of the hidden console window by running any external program from the built-in console, e.g.,
chcp
- you'll see a console window flash briefly.Only then can you set
[Console]::OutputEncoding
(and$OutputEncoding
) to UTF-8, as shown above (if the hidden console hasn't been created yet, you'll get ahandle is invalid error
).[1] In PowerShell, if you never call external programs, you needn't worry about the system locale (active code pages): PowerShell-native commands and .NET calls always communicate via UTF-16 strings (native .NET strings) and on file I/O apply default encodings that are independent of the system locale. Similarly, because the Unicode versions of the Windows API functions are used to print to and read from the console, non-ASCII characters always print correctly (within the rendering limitations of the console).
In
cmd.exe
, by contrast, the system locale matters for file I/O (with<
and>
redirections, but notably including what encoding to assume for batch-file source code), not just for communicating with external programs in-memory (such as when reading program output in afor /f
loop).[2] In PowerShell v4-, where the static
::new()
method isn't available, use$OutputEncoding = (New-Object System.Text.UTF8Encoding).psobject.BaseObject
. See GitHub issue #5763 for why the.psobject.BaseObject
part is needed.您可以将命令
CHCP 65001
放在PowerShell配置文件中,当您打开PowerShell时,它将自动运行它。但是,这对cmd.exe不会有任何作用。Microsoft目前正在研究一个将提供全部Unicode支持的改进终端。它是开放源,如果您使用的是Windows 10版本1903或更高版本,则可以已经下载
另外,您可以使用第三方终端模拟器,例如 terminus 。
You can put the command
chcp 65001
in your Powershell Profile, which will run it automatically when you open Powershell. However, this won't do anything for cmd.exe.Microsoft is currently working on an improved terminal that will have full Unicode support. It is open source, and if you're using Windows 10 Version 1903 or later, you can already download a preview version.
Alternatively, you can use a third-party terminal emulator such as Terminus.
可以使用编辑注册表。这是在
cmd /?< /code>中记录的正确方法:
现在是2023年,好消息。使用Windows终端,不需要编辑注册表或创建其他批处理文件。在Windows终端,Go 设置&gt;配置文件并找到命令提示System32 \ cmd.exe /k“ CHCP 65001” < /code>。很简单。
*添加:如果您使用powerShell代替命令提示符,将其添加到命令行设置:
-noexit -command“ CHCP 65001”
。在我的情况下,默认值为:%SystemRoot%\ System32 \ WindowsPowersHell \ V1.0 \ PowerShell.exe
然后更改为:
%SystemRoot%\ System32 \ WindowsPowersHell \ v1.0 \ powerShell。 exe -noexit -command“ CHCP 65001”
Typing some commands (
chcp
or whatever) whenever starting Command Prompt is can be done with editing registry. It's the right way as it's documented inCMD /?
:Now it's 2023 and good news. With Windows Terminal, editing registry or creating an additional batch file is not needed. In Windows Terminal, go Settings > Profiles and locate Command Prompt and then change the Command line from
%SystemRoot%\System32\cmd.exe
(default) to%SystemRoot%\System32\cmd.exe /K "chcp 65001"
. It's simple.*Added: If you use PowerShell instead of Command Prompt, add this to Command Line setting:
-NoExit -Command "chcp 65001"
. In my case default value was:%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe
Then changed to:
%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe -NoExit -Command "chcp 65001"
Powershell iSe表现出韩国的表现非常好。这是在UTF8中编码的示例文本文件,它可以使用:
由于ISE随附Windows 10的每个版本,所以我认为它并不过时。我不同意谁删除了我的原始答案。
ISE有一些局限性,但是可以使用外部命令来完成一些脚本:
编辑:
如果您有Windows 10 1903,则可以从Microsoft Store下载Windows terminal https://devblogs.microsoft.com/commandline/introducing-windows-terminal/ ,而韩国文本将在其中使用。 PowerShell 5将需要文本格式为带有BOM或UTF16的UTF8。
Edit2:
对于粘贴字符和输出,似乎理想是Windows terminal + PowerShell 7或Vscode + PowerShell 7。
EDIT3:
即使在Edit2情况下,也无法粘贴某些Unicode字符,例如
(U+21c6)或 unicode空间。 OSX中只有PS7可以起作用。
The Powershell ISE displays Korean perfectly fine. Here's a sample text file encoded in utf8 that would work:
Since the ISE comes with every version of Windows 10, I do not consider it obsolete. I disagree with whoever deleted my original answer.
The ISE has some limitations, but some scripting can be done with external commands:
EDIT:
If you have Windows 10 1903, you can download Windows Terminal from the Microsoft Store https://devblogs.microsoft.com/commandline/introducing-windows-terminal/, and Korean text would work in there. Powershell 5 would need the text format to be UTF8 with bom or UTF16.
EDIT2:
It seems like the ideals are windows terminal + powershell 7 or vscode + powershell 7, for both pasting characters and output.
EDIT3:
Even in the EDIT2 situations, some unicode characters cannot be pasted, like
⇆
(U+21C6), or unicode spaces. Only PS7 in Osx would work.