我能否确定终端是否解释 C1 控制代码?

发布于 2024-09-24 11:33:38 字数 542 浏览 7 评论 0原文

ISO/IEC 2022 定义了C0 和 C1 控制代码。 C0 集是 ASCII、ISO-8859-1 和 UTF-8 中 0x000x1f 之间熟悉的代码(例如 ESC、< kbd>CR,LF)。

一些 VT100 终端仿真器(例如 screen(1)、PuTTY)也支持 C1 集。这些是 0x800x9f 之间的值(例如,0x84 将光标向下移动一行)。

我正在显示用户提供的输入。我不希望用户输入能够改变终端状态(例如移动光标)。我目前正在过滤掉C0集中的字符编码;但是,如果终端将它们解释为控制代码,我也想有条件地过滤掉 C1 集。

有没有办法从像 termcap 这样的数据库获取这些信息?

ISO/IEC 2022 defines the C0 and C1 control codes. The C0 set are the familiar codes between 0x00 and 0x1f in ASCII, ISO-8859-1 and UTF-8 (eg. ESC, CR, LF).

Some VT100 terminal emulators (eg. screen(1), PuTTY) support the C1 set, too. These are the values between 0x80 and 0x9f (so, for example, 0x84 moves the cursor down a line).

I am displaying user-supplied input. I do not wish the user input to be able to alter the terminal state (eg. move the cursor). I am currently filtering out the character codes in the C0 set; however I would like to conditionally filter out the C1 set too, if terminal will interpret them as control codes.

Is there a way of getting this information from a database like termcap?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

你好,陌生人 2024-10-01 11:33:38

我能想到的唯一方法是使用 C1 请求并测试返回值:

$ echo `echo -en "\x9bc"`
^[[?1;2c
$ echo `echo -e "\x9b5n"`
^[[0n
$ echo `echo -e "\x9b6n"`
^[[39;1R
$ echo `echo -e "\x9b0x" `
^[[2;1;1;112;112;1;0x

以上是:

CSI c      Primary DA; request Device Attributes
CSI 5 n    DSR; Device Status Report
CSI 6 n    CPR; Cursor Position Report
CSI 0 x    DECREQTPARM; Request Terminal Parameters

ESR 维护的 terminfo/termcap (link) 在用户字符串 7 和 9(user7/u7、user9/u9)中有几个这样的请求:

# 用户能力解读
#
# System V Release 4 和 XPG4 terminfo 格式定义了十个字符串
# 应用程序使用的功能,...在此文件中,我们使用
# 其中某些功能用于描述未涵盖的功能
# 由 terminfo 提供。映射如下:
#
# u9 终端查询字符串(相当于 ANSI/ECMA-48 DA)
# u8终端应答描述
# u7 光标位置请求(相当于 VT100/ANSI/ECMA-48 DSR 6)
# u6 光标位置报告(相当于 ANSI/ECMA-48 CPR)
#
# 终端查询字符串应该引发应答响应
# 从终端。的常见值为 ^E(在较旧的 ASCII
# 终端)或 \E[c(在较新的 VT100/ANSI/ECMA-48 兼容终端上)。
#
# 光标位置 request() 字符串应该引出光标位置
# 报告。典型值(对于 VT100 端子)为 \E[6n。
#
# 终端应答描述 (u8) 必须包含预期的
# 应答字符串。该字符串可能包含以下类似 scanf(3) 的内容
# 转义:
#
# %c 接受任意字符
# %[...] 接受给定集合中任意数量的字符
#
# 光标位置report()字符串必须包含两个scanf(3)-style
# %d 个格式元素。其中第一个必须对应于 Y 坐标
# 第二个到%d。如果字符串包含序列%i,则为
# 作为读取每个值后递减每个值的指令(这是
# 与 cup 字符串相反的意义)。典型的 CPR 值是
# \E[%i%d;%dR(在 VT100/ANSI/ECMA-48 兼容终端上)。
#
# 这些功能由 terminfo 操作检查器 tac(1m) 使用
#(随 ncurses 5.0 一起分发)。

示例:

$ echo `tput u7`
^[[39;1R
$ echo `tput u9`
^[[?1;2c

当然,如果您只想防止显示损坏,可以使用 less 方法,让用户在显示/不显示控制字符之间切换(中的 -r 和 -R 选项)少)。另外,如果您知道输出字符集,则 ISO-8859 字符集具有为控制代码保留的 C1 范围(因此它们在该范围内没有可打印字符)。

The only way to do it that I can think of is using C1 requests and testing the return value:

$ echo `echo -en "\x9bc"`
^[[?1;2c
$ echo `echo -e "\x9b5n"`
^[[0n
$ echo `echo -e "\x9b6n"`
^[[39;1R
$ echo `echo -e "\x9b0x" `
^[[2;1;1;112;112;1;0x

The above ones are:

CSI c      Primary DA; request Device Attributes
CSI 5 n    DSR; Device Status Report
CSI 6 n    CPR; Cursor Position Report
CSI 0 x    DECREQTPARM; Request Terminal Parameters

The terminfo/termcap that ESR maintains (link) has a couple of these requests in user strings 7 and 9 (user7/u7, user9/u9):

# INTERPRETATION OF USER CAPABILITIES
#
# The System V Release 4 and XPG4 terminfo format defines ten string
# capabilities for use by applications, ....   In this file, we use
# certain of these capabilities to describe functions which are not covered
# by terminfo.  The mapping is as follows:
#
#       u9      terminal enquire string (equiv. to ANSI/ECMA-48 DA)
#       u8      terminal answerback description
#       u7      cursor position request (equiv. to VT100/ANSI/ECMA-48 DSR 6)
#       u6      cursor position report (equiv. to ANSI/ECMA-48 CPR)
#
# The terminal enquire string  should elicit an answerback response
# from the terminal.  Common values for  will be ^E (on older ASCII
# terminals) or \E[c (on newer VT100/ANSI/ECMA-48-compatible terminals).
#
# The cursor position request () string should elicit a cursor position
# report.  A typical value (for VT100 terminals) is \E[6n.
#
# The terminal answerback description (u8) must consist of an expected
# answerback string.  The string may contain the following scanf(3)-like
# escapes:
#
#       %c      Accept any character
#       %[...]  Accept any number of characters in the given set
#
# The cursor position report () string must contain two scanf(3)-style
# %d format elements.  The first of these must correspond to the Y coordinate
# and the second to the %d.  If the string contains the sequence %i, it is
# taken as an instruction to decrement each value after reading it (this is
# the inverse sense from the cup string).  The typical CPR value is
# \E[%i%d;%dR (on VT100/ANSI/ECMA-48-compatible terminals).
#
# These capabilities are used by tac(1m), the terminfo action checker
# (distributed with ncurses 5.0).

Example:

$ echo `tput u7`
^[[39;1R
$ echo `tput u9`
^[[?1;2c

Of course, if you only want to prevent display corruption, you can use less approach, and let the user switch between displaying/not displaying control characters (-r and -R options in less). Also, if you know your output charset, ISO-8859 charsets have the C1 range reserved for control codes (so they have no printable chars in that range).

风吹雨成花 2024-10-01 11:33:38

实际上,PuTTY 似乎不支持 C1 控件。

测试此功能的常用方法是使用 vttest,它提供了用于更改输入的菜单条目- 和输出- 分别使用8 位控制。 PuTTY 无法对每个菜单项进行健全性检查,如果禁用该检查,结果将确认 PuTTY 不遵守这些控件。

Actually, PuTTY does not appear to support C1 controls.

The usual way of testing this feature is with vttest, which provides menu entries for changing the input- and output- separately to use 8-bit controls. PuTTY fails the sanity-check for each of those menu entries, and if the check is disabled, the result confirms that PuTTY does not honor those controls.

江南月 2024-10-01 11:33:38

我认为没有一种直接的方法来查询终端是否支持它们。您可以尝试令人讨厌的黑客解决方法(例如打印它们然后查询光标位置),但我真的不建议这样做。

我认为你可以无条件地过滤掉这些C1代码。无论如何,Unicode 将 U+0080.. U+009F 范围声明为控制字符,我认为您不应该将它们用于任何不同的用途。

(注意:您使用了 0x84 示例来表示光标向下。实际上,U+0084 以终端使用的任何编码进行编码,例如 0xC2 0x84对于 UTF-8。)

I don't think there's a straightforward way to query whether the terminal supports them. You can try nasty hacky workarounds (like print them and then query the cursor position) but I really don't recommend anything along these lines.

I think you could just filter out these C1 codes unconditionally. Unicode declares the U+0080.. U+009F range as control characters anyway, I don't think you should ever use them for anything different.

(Note: you used the example 0x84 for cursor down. It's in fact U+0084 encoded in whichever encoding the terminal uses, e.g. 0xC2 0x84 for UTF-8.)

吖咩 2024-10-01 11:33:38

100% 自动完成这件事充其量也具有挑战性。许多(如果不是大多数)Unix 接口都是智能的(xterms 等),但您实际上不知道是否连接到 ASR33 或运行 MSDOS 的 PC。

在此处输入图像描述

您可以尝试一些终端询问转义序列和超时(如果没有回复)。但随后您可能不得不回退并询问用户他们正在使用哪种终端。

Doing it 100% automatically is challenging at best. Many, if not most, Unix interfaces are smart (xterms and whatnot), but you don't actually know if connected to an ASR33 or a PC running MSDOS.

enter image description here

You could try some of the terminal interrogation escape sequences and timeout if there is no reply. But then you might have to fall back and maybe ask the user what kind of terminal they are using.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文