在 Win7 中,Unicode/UTF-8 文本文件:Windows 控制台上出现乱码(尝试显示希伯来语)
我有一个宽字符文件(带有希伯来语文本),在记事本中看起来很好(保存在“UTF-8编码”中),在Notepad++中读取很好,当我复制并粘贴到MS Word中时,它看起来也很好。但是,当我打开“DOS 框”(Windows 控制台)并输入:“键入 file.txt”时,它会打印出乱码。
是的,我已经在 Windows 控制台上完成了有关 Unicode 的所有建议:我打开了控制台使用“cmd /u”,我将字体更改为 Lucida,并输入:“chcp 65001”。
在运行 Windows 7 的 PC 和另一台运行 Windows XP SP3 的 PC 上,该问题是相同的。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
字体
Courier New
支持希伯来语,可以添加到命令提示符中。默认字体有consolas、lucida、raster,都不支持希伯来语。因此,将 Courier New 添加到命令提示符中。这是一个注册表黑客来做到这一点
http://www.howtogeek.com/howto/windows-vista/stupid-geek-tricks-enable-more-fonts-for-the-windows-command-prompt/
http://www.techrepublic.com/blog/windows-and-office/quick-tip-add-fonts-to-the-command-prompt/
这是一个很好的例子如何安装字体,但我应该删除很多这样的条目,因为它们中的大多数没有添加到 cmd 中,因为 cmd 不支持它们。
Lucida 和 Consolas 是默认值。
栅格是默认值,此处未列出,可能是因为它是 TTF
我尝试添加的所有这些中,仅添加了 3 个(cmd 支持)
Courier New、DejaVu Sans Mono、Droid Sans Mono
DejaVu Sans Mono 和 Droid Sans Mono 可下载,受 cmd 支持,可能有一些良好的 unicode 支持/字符,但不包括希伯来语
我有
常见的希伯来语字体是 Miriam 和 David,但它们无法添加到命令提示符中。
作为记录,Babelmap 可以列出系统上支持希伯来语的所有字体,例如在 babelmap 中 - 单击“字体..字体覆盖范围”,然后输入 05D0(即 aleph)。我认为所有这些字体都存在于默认的 Windows 7 安装中,
但命令提示符中不支持大多数或全部带有希伯来语的字体,除了 Courier New。事实上,命令提示符中不支持大多数字体句号,甚至“times new roman”也不支持(因为“times new roman”不是等宽/固定宽度,这是它的众多标准之一支持,其他标准似乎更加模糊)。
因此,现在您可以添加并选择 Courier New 以在命令提示符中使用。
因此,如果所选字体支持的话,您可以将 unicode 字符粘贴到 cmd 上。
要复制/粘贴,请单击 charmap 中的复制按钮
现在它位于剪贴板中
要将其粘贴到命令提示符中,在 win7 中,粘贴到命令提示符不是 ctrl-v。您右键单击并选择粘贴。 (或者如果处于快速编辑模式,则只需右键单击)
这是主要内容。
另外
在 Windows 中,人们通常可能会使用记事本和字符映射表。但是人们应该意识到它们的一些限制。
当您选择的字体支持时,字符映射表显示前 65536 个 unicode 字符,并且字符映射表显示 UTF-16 代码。没关系,您仍然可以从字符映射表粘贴到 cmd.exe 窗口中,但您应该知道在 cmd.exe 中运行的命令和管道不支持 utf-16。因此,您可以使用字符映射表,查找一个字符,例如 aleph 05d0,但值得在 http://www.fileformat.info/info/unicode/char/05d0/index.htm 看到 utf-16 代码是 05d0, utf-8 代码是 d790。 xxd 命令和 file 命令对于查看文件的真实内容和确定文件的类型非常有用。
记事本在处理 unicode 或 unicode 字符集中 UTF16 代码 > 的任何字符时有点限制。 FF。并且 cmd 在某些命令(例如“type”)以及管道和重定向方面受到一些限制。
如果使用 cmd.exe,您确实需要管道才能工作,因为管道很重要。
管道仅限于可以由 CHCP 命令指定的编码。
(请注意,如果 CHCP 告诉您处于特定的代码页,例如 850,它会告诉您输入编码。如果您运行命令 chcp 850,它将更改输入和输出编码。通常它们是相同的。当但如果您使用其他程序来更改 cmd 的编码,例如 c# 编译器有一个可以更改它的开关,那么最好使用 chcp 更改它,这样您就知道两种编码都已设置。 )。
有 CHCP 1200 (UTF-16LE) 和 1201(UTF-16BE) ,但是都不支持,如果你尝试它会说 invalid codepage (在 win7 中测试)。 CHCP 不支持 UTF-16(它不支持 UTF16LE 或 UTF16BE)。有 CHCP 65001(即无 BOM 的 UTF-8)。还有 CHCP 862(我提到过的 MSDOS 时代的老式希伯来语编码方式)
type 命令支持 UTF16LE,记事本也支持 UTF16LE(记事本称之为 Unicode,是 UTF-16 LE),但是管道和重定向不支持不支持那个。 type 命令还支持 CHCP 指定/支持的任何代码页。所以类型支持862或65001。
所以你可以使用记事本将其保存为UTF8(带有BOM),然后摆弄以删除BOM。 (这有点过分了)..或者你可以使用记事本,将其另存为 Unicode UTF 16LE..但是这样你就无法起诉管道..(这很糟糕)..最简单的方法是使用像 notepad2 或这样的文本编辑器notepad++,支持UTF8无BOM。
或者,如果从 cmd 执行所有操作,您可以使用 862 或 65001。尽管许多文本编辑器可能无法很好地支持 862。所以您可能更喜欢 65001。
如果您想在记事本中写入任何文件,并且它的字符大于 UTF16 中的字符被称为 \uFF,并且您想在该文件上运行 cmd.exe 中的命令,那么如果不考虑某些命令(例如 type 命令),将会出现问题是靠什么支撑的。
记事本支持带 BOM 的 UTF-16BE、UTF-16LE 和 UTF-8。那不好。无需摆弄 xxd 和 sed 或其他命令来删除 BOM。如果您有任何包含所谓 unicode 字符(即常规 ascii 范围之外的字符)的文件。一个字符> UTF-16的\uFF,字符映射表显示为> \uFF,然后使用 Notepad2 或 notepad++
类型支持 UTF16LE 以及 CHCP 设置的任何代码页,例如 65001 或 862。
管道和重定向遵循 CHCP 设置的任何内容。
代码页 862 已过时,因此代码页 65001 是一个不错的选择。
xxd 和 file 对于查看文件的编码方式很有用,如果您遇到问题,这会很有帮助。但并非绝对必要。
因此,如果您想编写一个在 CMD 中使用的文件,并且它具有一些 unicode 字符,而有些命令(例如 xxd 和 sed)可用于删除 BOM,以及其他命令可以执行此操作。在文本编辑器中制作此类文件的最简单方法是使用支持 UTF8 without BOM 的文本编辑器,例如 notepad2 或 notepad++。
如上所述,显示希伯来语可能是首先要做的最重要的事情。接下来的事情是能够将文件保存在文本编辑器中,您可以使用“类型”等命令来显示该文件。
如果您想从命令提示符进行复制,如果不在快速编辑模式下,请右键单击然后选择标记,然后选择它然后按 Enter 键。并右键单击并选择粘贴。
另外一点是,
显然 chcp 65001 中存在错误,某些批处理文件无法运行,也许某些 C 程序也无法运行。 如何在 Windows 命令行中使用 unicode 字符? 我已经甚至在 cmd 位于代码页 65001 时看到 c Sharp 编译器崩溃(尽管有人可能会责怪 c Sharp 编译器,但也可能会责怪 65001)为什么 csc.exe 崩溃当我上次将输出编码保留为 UTF8 时?
注意 - 此答案的早期版本有一些命令行示例,但它们不必要地复杂。我可能会在某个时候添加一些命令来演示我所描述的内容,但这相当简单。
The Font
Courier New
supports hebrew and can be added to the command prompt. The default fonts are consolas, lucida, raster, none of them support hebrew. So add Courier New to the command prompt.It's a registry hack to do that
http://www.howtogeek.com/howto/windows-vista/stupid-geek-tricks-enable-more-fonts-for-the-windows-command-prompt/
http://www.techrepublic.com/blog/windows-and-office/quick-tip-add-fonts-to-the-command-prompt/
This is a good example of how to install fonts, but I should remove a lot of these entries, because most of them didn't get added to cmd because cmd didn't support them.
Lucida and Consolas are defaults.
Raster is a default not listed here maybe 'cos it's a TTF
Of all these I tried to add, only 3 added(are supported by cmd)
Courier New, DejaVu Sans Mono, Droid Sans Mono
DejaVu Sans Mono and Droid Sans Mono are downloadable, supported by cmd, might have some good unicode support/characters, but don't include Hebrew
I have
Common hebrew fonts are Miriam and David, but they can't be added to the command prompt.
For the record, Babelmap can list all fonts on your system that support hebrew e.g. in babelmap- click fonts..font coverage, then enter 05D0(that's aleph). I think all these fonts exist on a default windows 7 installation
But most or all of those fonts with hebrew aren't supported in the command prompt, except Courier New. In fact most fonts full stop aren't supported in the command prompt, not even "times new roman"(because "times new roman" is not mono-spaced / fixed width, and that's one of a number of criteria for it to be supported, other criteria seem to be more obscure).
So now you can have Courier New added and selected for use in the command prompt.
And so you can paste unicode characters onto cmd provided the selected font supports it.
To copy/paste, click the Copy button in charmap
Now it's in the clipboard
To paste it into the command prompt, in win7 paste into command prompt isn't ctrl-v. You right click and choose paste. (or if in quickedit mode then just rightclick)
That's the main thing.
Additionally
Often in windows one might use notepad and character map.. but one should be aware of some limitations with them.
Character map shows the first 65536 unicode characters when the font you selected supports it, and character map shows you the UTF-16 code. That's ok, you can still paste from character map into a cmd.exe window, but you should know that commands run in cmd.exe and pipes don't support utf-16. So you can use character map, find a character e.g. aleph 05d0, but it's worth looking up the character on http://www.fileformat.info/info/unicode/char/05d0/index.htm and seeing that while the utf-16 code is 05d0, the utf-8 code is d790. The xxd command and file command is useful for seeing the real contents of a file and determining the file's type.
Notepad is a bit limited when it comes to unicode or any character in the unicode character set whose UTF16 code is > FF. And cmd is a bit limited in regard to some commands like 'type', and in regard to pipes and redirection.
If using cmd.exe you really need pipes to work 'cos pipes are important..
Pipes are limited to the encodings that can be specified by the CHCP Command.
(Note that if CHCP tells you you are on a particular codepage, e.g. 850, it's telling you the input encoding. If you run the command chcp 850 it will change both the input and output encodings. Usually they are the same. It's simpler when they are the same. But if you used some other program to change the encoding of cmd eg the c# compiler has a switch that changes it, then it's best to change it with chcp so you know both encodings are set ).
There is a CHCP 1200 (UTF-16LE) and 1201(UTF-16BE) , but neither are supported, if you try it it will say invalid codepage (tested in win7). CHCP doesn't support UTF-16(it doesn't support UTF16LE or UTF16BE). There is CHCP 65001 (That's UTF-8 without BOM). And there is CHCP 862 (the old fashioned way as in MSDOS days way, of encoding Hebrew, that I mentioned)
The type command supports UTF16LE as does notepad(What notepad calls Unicode, is UTF-16 LE), But pipes and redirection don't support that. The type command also supports any codepage specified/supported by CHCP. So type supports 862 or 65001.
So you could use notepad save it as UTF8 (which is with BOM), then fiddle around to remove the BOM. (That's a bit overkill).. Or you could use notepad, save it as Unicode UTF 16LE.. But then you can't sue pipes.. (that's bad).. Easiest thing to do is use a text editor like notepad2 or notepad++, that supports UTF8 without BOM.
Or if doing everything from cmd you could use 862 or 65001. Though many text editors might not give good support of 862. So you might prefer 65001.
If you want to write any file in notepad and it has a character greater than what in UTF16 is referred to as \uFF, and you want to run commands in cmd.exe on that file, then some commands (e.g. the type command), will have problems if you don't take into account what is supported by what.
Notepad supports UTF-16BE, UTF-16LE and UTF-8 with BOM. That's not good. And no need to fiddle around with xxd and sed or other commands to remove the BOM. If you have any file with a so-called unicode character, a character outside of the regular ascii range. A character > UTF-16's \uFF, as shown by character map as being > \uFF, then use Notepad2 or notepad++
Type supports UTF16LE, and any codepage set by CHCP e.g. 65001 or 862.
Pipes and redirection go by whatever is set by CHCP.
Codepage 862 is old so Codepage 65001 is a good way to go.
xxd and file are useful for seeing how a file is encoded which can be helpful if you have issues. But not absolutely necessary.
So if you want to write a file for use in CMD, and it has some unicode characters, while thee are some commands like xxd and sed that could be used to remove a BOM, and other commands to do so. The easiest way to make such a file in a text editor is to use a text editor like notepad2 or notepad++ which supports UTF8 without BOM.
Getting hebrew displaying might be the most important thing to do first, as described above. And the next thing is being able to save files in a text editor that you can display with e.g. 'type'.
And if you ever want to copy from the command prompt, if not in quickedit mode, then right click then choose mark then select it then hit ENTER. And to paste right click and choose paste.
An further additional point is
Apparently there are bugs in chcp 65001 where some batch files won't run and maybe some C programs won't work either. How to use unicode characters in Windows command line? And i've even seen the c sharp compiler crash when cmd is in codepage 65001 (though one may blame the c sharp compiler, one could also blame 65001) Why is csc.exe crashing when I last left the output encoding as UTF8?
Note- an earlier revision of this answer had some command line examples but they were unnecessarily complex. I might at some point add some commands that demonstrate what I have been describing but it's fairly trivial.
/u
适用于 UTF-16LE,而不是 UTF-8。这就是为什么将文件保存为 UTF-16LE(Windows/记事本误导性地称为“Unicode”)并使用/u
运行的原因。UTF-8应该可以通过
chcp 65001
实现,但是此代码页的 Microsoft C 运行时中存在一些令人讨厌的低级错误,这使得某些应用程序不可靠,而某些根本不跑。所以,是的,我很抱歉,UTF-8 在 Windows 下是二等公民。任何使用“ANSI”IO 接口的东西,包括任何使用 C 标准 IO 库的东西,包括命令提示符,都将无法正确处理它。
在命令提示符中获取 Unicode 输出的唯一可靠方法是使用 Windows 特定的
WriteConsoleW
接口直接推送 Unicode 字符串。不幸的是,由于这不能跨平台使用,因此许多工具不会使用它。无论如何,即使您的编码正确,您仍然必须在命令提示符中拥有包含所需字符的字体。我相信这就是为什么您仍然没有在
/u
+UTF-16LE 路线中获得希伯来语的原因。摘要:命令提示符 + 非 ASCII == 几乎肯定会失败。放弃并寻找其他可以更好地支持 Unicode 的接口。
/u
is for UTF-16LE, not UTF-8. This is why saving the file as UTF-16LE (what Windows/Notepad misleadingly calls "Unicode") and running with/u
works, in as much as it does.UTF-8 should be achievable with
chcp 65001
, but there are some nasty low-level bugs in the Microsoft C Runtime for this code page, which makes some apps unreliable and some not run at all.So yeah, I'm sorry, but UTF-8 is a second-class citizen under Windows. Anything that uses the 'ANSI' interfaces for IO, including anything that uses the C standard IO library, including the Command Prompt, won't be able to cope with it properly.
The only reliable way to get Unicode output in Command Prompt is to use the Windows-specific
WriteConsoleW
interface to push Unicode strings directly. Unfortunately as this is not available cross-platform, many tools won't use it.In any case, even when you've got the encoding right, you still have to have a font in the Command Prompt that contains the characters you want. I believe this is why you still aren't getting Hebrew in the
/u
+UTF-16LE route.Summary: Command Prompt + non-ASCII == almost certain fail. Give up and find some other interface you can use that supports Unicode better.
在
输入 file.txt
之前,您应该将file.txt
转换为 UTF-16(Little Endian)参考:cmd.exe 使用什么编码/代码页?
You should convert
file.txt
to UTF-16(Little Endian) beforetype file.txt
Reference: What encoding/code page is cmd.exe using?
当您说“Lucida”时,我认为您指的是“Lucida Console”。
使用
charmap
应用程序,我在字体中找不到任何希伯来字符。我不知道该字体在早期版本的 Windows 中是否功能更强大,但在 Windows 7 中,除了欧洲字符之外似乎没有任何内容。我的系统还有 Lucida Sans Typewriter,其中包含希伯来语字符。不幸的是,Cmd 窗口没有将其显示为选项。您需要编辑注册表以打开更多选择,如 SuperUser 上的这个问题所示: https://superuser.com/questions/5035/how-to-change-the-windows-console-font
PS 我无法验证这个解决方案,因为 Windows 很难。请参阅 https ://superuser.com/questions/390933/how-to-add-a-font-to-the-cmd-window-choices-in-windows-7-64-bit
I presume you mean "Lucida Console" when you say "Lucida".
Using the
charmap
application I couldn't find any Hebrew characters in the font. I don't know if the font was more capable in earlier versions of Windows, but in Windows 7 there appears to be nothing outside of the European characters.My system also has Lucida Sans Typewriter which does include the Hebrew characters. Unfortunately the Cmd window doesn't show it as a choice. You need to edit the registry to open up more choices, as shown in this question on SuperUser: https://superuser.com/questions/5035/how-to-change-the-windows-console-font
P.S. I have been unable to verify this solution because Windows is being difficult. See https://superuser.com/questions/390933/how-to-add-a-font-to-the-cmd-window-choices-in-windows-7-64-bit
如何获得支持希伯来语的 XP 安装?
首先,这是关于 XP home SP3,启用希伯来语。我的意思是,它是标准的 XP US 安装,至少我相信是这样,还增加了键盘和显示器的希伯来语功能。我相信每张XP光盘都可以安装这样的系统。特别是,我相信以下是这样一个系统所需要的全部:
1) 单击详细信息并添加希伯来语键盘。
2) 用 V 标记复杂脚本和从右到左语言(包括泰语)选项的安装文件。
接受,并标有 V、10004(MAC - 阿拉伯语)和 10005(Mac - 希伯来语)。不确定阿拉伯语是否是这里的必备品。
现在到 cmd 控制台,
必须将 Courier New 字体显式添加到控制台字体注册表,如前所述。否则,将不会显示显式希伯来字体。
现在,当打开 cmd 控制台时,要输入希伯来语字符,只需启用 Courier New 字体,并将键盘更改为希伯来语模式即可。让 Windows 滚动键盘的语言非常简单。重复按下左 Alt 键和左 Shift 键,或者使用鼠标。
顺便说一句,dir 命令将显示包含希伯来字符的文件名。 不能只发出 a
但是,如果文件以希伯来字母开头,则 并查看通常的输出。我假设星号字符一定会
添加 BOM unicode 字符。
还可以打开记事本,输入希伯来语字符,将文件另存为 UTF8,然后在控制台命令中运行以下命令:
将文件另存为 UTF8 在记事本保存屏幕上完成。
How to get an Hebrew enabled XP installation?
First of all, this is about an XP home SP3, Hebrew enabled. By that I mean it is a standard XP US installation, or so I believe, with the addition of Hebrew capabilities for keyboard and display. I believe every XP CD can install such a system. In particular, I believe the following is all that is needed for such a system:
1) Click Details and add an Hebrew keyboard.
2) mark with a V the Install files for complex script and right-to-left languages (including Thai) option.
Accept, mark with a V, 10004 (MAC - Arabic) and 10005 (Mac - Hebrew). Not sure if Arabic is a must have here.
Now to the cmd console
One has to explicitly add Courier New fonts to the console fonts registry, as described earlier. Otherwise, explicit Hebrew fonts will not be displayed.
Now when cmd console is opened, all there is to do in order to input Hebrew characters is to enable the Courier New fonts, and change the keyboard to an Hebrew mode. Having Windows scroll the languages it has for the keyboard is easy. Either repetitive pressing of left Alt combined with left shift keys, or with the mouse.
As an aside, a dir command will show file names that have Hebrew characters. However, one can't just issue a
and see the usual output if the file begins with a Hebrew letter. It must be
I assume the asterisk character adds the BOM unicode character.
One can also open Notepad, input Hebrew characters, save the file as UTF8, and run the following in the console commands:
Saving the file as UTF8 is done on Notepad save screen.