文本应用程序和终端之间的字母的过程是什么?
基本问题是标题所描述的。在这里,我将为特定的场景提供必要的信息以及一些猜测。
键盘输入和文本输出如何工作?这个答案已经讨论了这个主题。但是在这里,我主要想深入研究其中的一部分。
更具体的问题是:
在GUI中,是python解释器在终端中以交互式模式打开的python解释器,如果我在键盘中按“ a”,则在屏幕上显示“ a”字符。应用程序和终端如何通信?
ubuntu 18.04
Python 3.6
中的默认编码locale
is:
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=
terminal的
的编码为utf8
。
( Note :对于终端,我可以更改首选项中的编码方式
- > default
- > 兼容>兼容性
- > ,通过更改lc_ctype
更改编码方式。 用于终端的应用程序,设置
用于终端
本身
$ LC_CTYPE=zh_CN.GB2312 echo 你好
你好
$ LC_CTYPE=C echo 你好
你好
。
$ LC_CTYPE=zh_CN.GB2312 python
>>> print('你好')
File "<stdin>", line 0
^
SyntaxError: 'gb2312' codec can't decode byte 0xa0 in position 9: illegal multibyte sequence
>>> s = u'\u4f60\u597d' # '\u4f60\u597d' is code point of '你好'
>>> print(s)
����
> locale 是中文的编码。)
, gb2312 是我应该提到的一件事:上面的示例在更改终端设置的编码为gb2312
之后,也可以正常工作)
,当我更改语言环境编码时,当我键入某些> code>时,出现了一些奇怪的现象。非accii
字符,即使我只是输入命令行,它也会提高一些乱码的字符。
编码一直是头痛,因此在其中可能会有更多奇怪的现象,我可以一一描述它,但是我试图在简短和完整性之间取得平衡。我很高兴知道内部机制,我认为这可能会清除我的一些疑问。
我推测以下是一些事情:
用于输入:
encoding it decode it
using utf8 (by ternimal) using gb2312 (by Python)
+--------+ +-----+ +----------+
--------->|terminal| ------------> | pty |-------------->|Python app|
+--------+ +-----+ +----------+
utf8-encoded
bytes sequence
输出:
convert unicode to decode it
bytes using gb2312 (by Python) using utf8 (by terminal)
+----------+ +-----+ +--------+
|Python app| ------------> | pty |-------------->|terminal|-------->
+----------+ +-----+ +--------+
gb2312-encoded certain kind of
bytes sequence bytes sequence
我对吗?
而且这个问题并不具体提到Python,我认为以Python为例会使事情变得更加清晰。
由于我不清楚知识,所以也许它仍然没有完全描述事物。希望你能理解。
The essential question is as the title describes. And here I will provide specific scene with necessary info, and some my speculations.
How do keyboard input and text output work? This answer have talked about the topic. But here I mainly want to dive deeply into one part of it.
The more specific question is:
In GUI, an Python interpreter opened in interactive mode in terminal, If I press 'A' in keyboard, then an 'a' character is displayed in screen. How exactly does app and terminal communicate ?
Ubuntu 18.04
Python 3.6
The default encoding in
locale
is:
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=
- And here encoding of
setting
of terminal isutf8
.
(NOTE: For terminal, I can change the encoding way in Preferences
-> default
-> compatibility
-> Encoding
, change encoding way too by changing LC_CTYPE
. So, there are two ways to change the encoding. Through various tries, I have reason to think that locale
is for app in terminal, and setting
for terminal per se.)
In terminal:
$ LC_CTYPE=zh_CN.GB2312 echo 你好
你好
$ LC_CTYPE=C echo 你好
你好
but in Python interactive mode:
$ LC_CTYPE=zh_CN.GB2312 python
>>> print('你好')
File "<stdin>", line 0
^
SyntaxError: 'gb2312' codec can't decode byte 0xa0 in position 9: illegal multibyte sequence
>>> s = u'\u4f60\u597d' # '\u4f60\u597d' is code point of '你好'
>>> print(s)
����
(note1: 你好
is Chinese, GB2312 is an encoding for Chinese.)
(note2: There is one thing I should mention: the above example works fine after I change encoding of setting of terminal to GB2312
too)
and when I change the locale encoding, some odd phenomenons appear as I type some none-ascii
characters, where it raises some garbled characters even though I just type in command line.
Encoding is a headache all the time, so there could be more odd phenomenons in this, I can 't describe it one by one, but I try to strike a balance between brevity and completeness in example. And I glad to know the internal mechanism, I think this may clear out some of my doubts.
Here is some things I speculate:
for input:
encoding it decode it
using utf8 (by ternimal) using gb2312 (by Python)
+--------+ +-----+ +----------+
--------->|terminal| ------------> | pty |-------------->|Python app|
+--------+ +-----+ +----------+
utf8-encoded
bytes sequence
for output:
convert unicode to decode it
bytes using gb2312 (by Python) using utf8 (by terminal)
+----------+ +-----+ +--------+
|Python app| ------------> | pty |-------------->|terminal|-------->
+----------+ +-----+ +--------+
gb2312-encoded certain kind of
bytes sequence bytes sequence
Am I right?
And this question doesn't specifically refer to Python, merely I think using Python as example will make things clearer.
Due to I'm not clearly knowing the knowledge, so maybe it does still not completely describe things. Hope you can understand it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论