文本应用程序和终端之间的字母的过程是什么？

发布于 2025-01-24 14:33:56 字数 3117 浏览 2 评论 0原文

基本问题是标题所描述的。在这里，我将为特定的场景提供必要的信息以及一些猜测。

键盘输入和文本输出如何工作？这个答案已经讨论了这个主题。但是在这里，我主要想深入研究其中的一部分。

更具体的问题是：

在GUI中，是python解释器在终端中以交互式模式打开的python解释器，如果我在键盘中按“ a”，则在屏幕上显示“ a”字符。应用程序和终端如何通信？

ubuntu 18.04
Python 3.6
locale is：
中的默认编码

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

terminal的的编码为utf8。

（ Note ：对于终端，我可以更改首选项中的编码方式 - ＆gt; default - ＆gt; 兼容>兼容性 - ＆gt; ，通过更改lc_ctype更改编码方式。用于终端的应用程序，设置用于终端

本身

$ LC_CTYPE=zh_CN.GB2312 echo 你好
你好

$ LC_CTYPE=C echo 你好
你好

。

$ LC_CTYPE=zh_CN.GB2312 python  
>>> print('你好')
  File "<stdin>", line 0
    
    ^
SyntaxError: 'gb2312' codec can't decode byte 0xa0 in position 9: illegal multibyte sequence

>>> s = u'\u4f60\u597d'  # '\u4f60\u597d' is code point of '你好'
>>> print(s)
����

> locale 是中文的编码。）

， gb2312 是我应该提到的一件事：上面的示例在更改终端设置的编码为gb2312之后，也可以正常工作）

，当我更改语言环境编码时，当我键入某些> code>时，出现了一些奇怪的现象。非accii字符，即使我只是输入命令行，它也会提高一些乱码的字符。

编码一直是头痛，因此在其中可能会有更多奇怪的现象，我可以一一描述它，但是我试图在简短和完整性之间取得平衡。我很高兴知道内部机制，我认为这可能会清除我的一些疑问。

我推测以下是一些事情：

用于输入：

           encoding it                      decode it                           
           using utf8 (by ternimal)         using gb2312 (by Python)
          +--------+               +-----+               +----------+
--------->|terminal| ------------> | pty |-------------->|Python app|
          +--------+               +-----+               +----------+
                      utf8-encoded
                     bytes sequence

输出：

    convert unicode to                      decode it  
    bytes using gb2312 (by Python)          using utf8 (by terminal)
       +----------+               +-----+               +--------+
       |Python app| ------------> | pty |-------------->|terminal|-------->
       +----------+               +-----+               +--------+
                    gb2312-encoded                              certain kind of
                     bytes sequence                               bytes sequence

我对吗？

而且这个问题并不具体提到Python，我认为以Python为例会使事情变得更加清晰。

由于我不清楚知识，所以也许它仍然没有完全描述事物。希望你能理解。

原文

The essential question is as the title describes. And here I will provide specific scene with necessary info, and some my speculations.

How do keyboard input and text output work? This answer have talked about the topic. But here I mainly want to dive deeply into one part of it.

The more specific question is:

In GUI, an Python interpreter opened in interactive mode in terminal, If I press 'A' in keyboard, then an 'a' character is displayed in screen. How exactly does app and terminal communicate ?

Ubuntu 18.04
Python 3.6
The default encoding in locale is:

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

And here encoding of setting of terminal is utf8.

(NOTE: For terminal, I can change the encoding way in Preferences -> default -> compatibility -> Encoding, change encoding way too by changing LC_CTYPE. So, there are two ways to change the encoding. Through various tries, I have reason to think that locale is for app in terminal, and setting for terminal per se.)

In terminal:

$ LC_CTYPE=zh_CN.GB2312 echo 你好
你好

$ LC_CTYPE=C echo 你好
你好

but in Python interactive mode:

$ LC_CTYPE=zh_CN.GB2312 python  
>>> print('你好')
  File "<stdin>", line 0
    
    ^
SyntaxError: 'gb2312' codec can't decode byte 0xa0 in position 9: illegal multibyte sequence

>>> s = u'\u4f60\u597d'  # '\u4f60\u597d' is code point of '你好'
>>> print(s)
����

(note1: 你好 is Chinese, GB2312 is an encoding for Chinese.)

(note2: There is one thing I should mention: the above example works fine after I change encoding of setting of terminal to GB2312 too)

and when I change the locale encoding, some odd phenomenons appear as I type some none-ascii characters, where it raises some garbled characters even though I just type in command line.

Encoding is a headache all the time, so there could be more odd phenomenons in this, I can 't describe it one by one, but I try to strike a balance between brevity and completeness in example. And I glad to know the internal mechanism, I think this may clear out some of my doubts.

Here is some things I speculate:

for input:

           encoding it                      decode it                           
           using utf8 (by ternimal)         using gb2312 (by Python)
          +--------+               +-----+               +----------+
--------->|terminal| ------------> | pty |-------------->|Python app|
          +--------+               +-----+               +----------+
                      utf8-encoded
                     bytes sequence

for output:

    convert unicode to                      decode it  
    bytes using gb2312 (by Python)          using utf8 (by terminal)
       +----------+               +-----+               +--------+
       |Python app| ------------> | pty |-------------->|terminal|-------->
       +----------+               +-----+               +--------+
                    gb2312-encoded                              certain kind of
                     bytes sequence                               bytes sequence

Am I right?

And this question doesn't specifically refer to Python, merely I think using Python as example will make things clearer.

Due to I'm not clearly knowing the knowledge, so maybe it does still not completely describe things. Hope you can understand it.

分享到QQ

分享到微博