用于匹配 VT100 转义序列的 Python 正则表达式

发布于 2024-12-11 10:42:51 字数 297 浏览 0 评论 0原文

我正在编写一个记录终端交互的 Python 程序(类似于 script 程序),并且我想在写入磁盘之前过滤掉 VT100 转义序列。我想使用这样的函数:

def strip_escapes(buf):
    escape_regex = re.compile(???) # <--- this is what I'm looking for
    return escape_regex.sub('', buf)

escape_regex 中应该包含什么?

I'm writing a Python program that logs terminal interaction (similar to the script program), and I'd like to filter out the VT100 escape sequences before writing to disk. I'd like to use a function like this:

def strip_escapes(buf):
    escape_regex = re.compile(???) # <--- this is what I'm looking for
    return escape_regex.sub('', buf)

What should go in escape_regex?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

冰魂雪魄 2024-12-18 10:42:51

转义序列的组合表达式可以是这样的通用表达式:

(\x1b\[|\x9b)[^@-_]*[@-_]|\x1b[@-_]

应该与 re.I 一起使用

这包含:

  1. 两字节序列,即 \x1b 后跟 @ 范围内的字符,直到 _
  2. 一字节 CSI,即 \x9b,而不是 \x1b + "["

但是,这对于定义键映射或以其他方式包含在引号中的字符串的序列不起作用。

The combined expression for escape sequences can be something generic like this:

(\x1b\[|\x9b)[^@-_]*[@-_]|\x1b[@-_]

Should be used with re.I

This incorporates:

  1. Two-byte sequences, i.e. \x1b followed by a character in the range of @ until _.
  2. One-byte CSI, i.e. \x9b as opposed to \x1b + "[".

However, this will not work for sequences that define key mappings or otherwise included strings wrapped in quotes.

凝望流年 2024-12-18 10:42:51

VT100 代码(大部分)已根据类似模式进行分组:

http:// /ascii-table.com/ansi-escape-sequences-vt-100.php

我认为最简单的方法是使用像 regexbuddy 这样的工具为每个定义一个正则表达式VT100 代码组。

VT100 codes are already grouped(mostly) according to similar patterns here:

http://ascii-table.com/ansi-escape-sequences-vt-100.php

I think the simplest approach would be to use some tool like regexbuddy to define a regex for each VT100 codes group.

七婞 2024-12-18 10:42:51

我找到了以下解决方案来成功解析 vt100 颜色代码并删除不可打印的转义序列。找到的代码片段 此处在使用 telnetlib 运行 telnet 会话时成功删除了我的所有代码:

    def __processReadLine(self, line_p):
    '''
    remove non-printable characters from line <line_p>
    return a printable string.
    '''

    line, i, imax = '', 0, len(line_p)
    while i < imax:
        ac = ord(line_p[i])
        if (32<=ac<127) or ac in (9,10): # printable, \t, \n
            line += line_p[i]
        elif ac == 27:                   # remove coded sequences
            i += 1
            while i<imax and line_p[i].lower() not in 'abcdhsujkm':
                i += 1
        elif ac == 8 or (ac==13 and line and line[-1] == ' '): # backspace or EOL spacing
            if line:
                line = line[:-1]
        i += 1

    return line

I found the following solution to successfully parse vt100 color codes and remove the non-printable escape sequences. The code snippet found here successfully removed all codes for me when running a telnet session using telnetlib:

    def __processReadLine(self, line_p):
    '''
    remove non-printable characters from line <line_p>
    return a printable string.
    '''

    line, i, imax = '', 0, len(line_p)
    while i < imax:
        ac = ord(line_p[i])
        if (32<=ac<127) or ac in (9,10): # printable, \t, \n
            line += line_p[i]
        elif ac == 27:                   # remove coded sequences
            i += 1
            while i<imax and line_p[i].lower() not in 'abcdhsujkm':
                i += 1
        elif ac == 8 or (ac==13 and line and line[-1] == ' '): # backspace or EOL spacing
            if line:
                line = line[:-1]
        i += 1

    return line
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文