根据 ISO/IEC 6429:1992,什么是控制功能的 7 位/8 位环境?
我正在学习 ECMA-48,我看到很多关于控制功能的 7 位和 8 位环境的注释。例如:
注意 LS0 仅用于 8 位环境;在 7 位环境中 改为使用 SHIFT-IN (SI)。
据我所知,今天所有环境都是 8 位。如果我错了,谁能给出使用 7 位环境的真实例子。
I am learning ECMA-48 and I see a lot of notes about 7 bit and 8 bit environments for control functions. For example:
NOTE LS0 is used in 8-bit environments only; in 7-bit environments
SHIFT-IN (SI) is used instead.
As I understand today all environments are 8 bits. If I am wrong could anyone give real examples where 7 bit environments are used.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
例如字符编码。
标准使用值 0x00 到 0x1F 和 0x80 到 0x9F 作为 C0 和 C1 控制代码。并使用从 ESC (0x1B) 或 CSI (0x9B) 开始的控制函数、控制序列等。
在8位环境中必须定义某种编码,它指定哪个字符由哪些值表示。前 128 个值将根据 ASCII(或其他一些兼容的标准(不使用 0x00 到 0x1F 作为可打印字符,而是将它们保留用于 C0 控制代码)),但是接下来的 128 个值呢?
这里我们进入了代码页的世界,它定义了上面的 128 个值。一些现有的代码页(如 ISO8859-2)为 C1 控制代码保留值 0x80 - 0x9F,但其他一些代码页(如 CP1250)则没有,并将它们用于可打印字符。
当使用这种编码时,不可能同时将值 0x80 - 0x9F 用于两个目的(可打印字符和控制代码)。因此,即使有 8 位,它们也无法用于标准定义的用途。
因此,从该标准的角度来看,我们将其视为 7 位环境,因此例如 CSI (0x9B) 变为 0x1B 0x5B 的序列。
“好吧,忘记代码页,我们现在生活在未来。unicode 规则”。
好的,对于 utf-8(unicode 的 8 位编码),情况是一样的。
值 0x80 - 0xBF(包括 0x80 - 0x9F)在 utf-8 中被视为由多个字节编码的字符的最后一个字节(实际上是一个代码点,但这无关紧要)。再次发生冲突。
因此,如果标准中的控制功能必须与 utf-8 共存,则出于该标准的目的,必须再次假定 7 位环境。
(实际上,unicode(还有 utf-8)确实允许将 C1 控制代码编码为有效的 unicode 代码点,但只有在由识别 unicode 的程序解释时它们才会起作用。假设 7 位消除了该要求
)使用 LS0、SHIFT-IN (SI)
这些是 ECMA-35 (ISO 2022) 标准中定义的东西,是一种可以将更多字符编码到 7 或 8 个可用位中的形式。
除非您确实想支持这些类型的字符编码,否则您可能不必处理这部分。
For example character encodings.
Standard uses values 0x00 to 0x1F, and 0x80 to 0x9F as C0, and C1 control codes. And uses control functions, control sequences, etc. which start from either ESC (0x1B) or CSI (0x9B).
In the 8 bit environment there must be some kind of encoding defined, which specifies which character is represented by which values. The first 128 values will be according to ASCII (or some other standard which is compatible (doesn't use 0x00 to 0x1F as printable characters but reserves them for C0 control codes)) but what about the next 128 values?
Here we enter the world of code pages, which define the upper 128 values. Some existing code pages (like ISO8859-2) reserve the values 0x80 - 0x9F for C1 control codes but some other ones (like CP1250) do not, and use them for printable characters.
When such an encoding is used it is not possible to use the values 0x80 - 0x9F simultaneously for both purposes (printable characters and control codes). So even though there are 8 bits, they are not available for the purposes defined by the standard.
So from the point of view of this standard we treat this as a 7 bit environment and so for example CSI (0x9B) becomes a sequence of 0x1B 0x5B.
"Ok, forget the code pages, we live in the future now. unicode rules".
Ok, with utf-8, the 8 bit encoding for unicode, the story is the same.
Values 0x80 - 0xBF (which includes 0x80 - 0x9F) are in utf-8 treated as the last byte of a character (actually, a code point, but that's irrelevant) encoded by multiple bytes. Again, a conflict.
So if the control functions from the standard have to coexist with utf-8, again 7 bit environment has to be assumed for the purposes of this standard.
(Actually, unicode (so also utf-8) does allow to encode the C1 control codes as valid unicode code points but then they will only work if interpreted by a program which is aware of unicode. Assuming 7 bits removes that requirement)
Your quote uses LS0, SHIFT-IN (SI)
these are thigs defined in the ECMA-35 (ISO 2022) standard are a form of making it possible to encode more characters into the 7 or 8 available bits.
You probably don't have to deal with this part unless you actually want to support these kind of character encodings.