python 正则表达式转义字符

发布于 2024-08-12 18:08:43 字数 945 浏览 6 评论 0原文

我们有：

>>> str
'exit\r\ndrwxr-xr-x    2 root     root            0 Jan  1  2000 
\x1b[1;34mbin\x1b[0m\r\ndrwxr-xr-x    3 root     root           
0 Jan  1  2000 \x1b[1;34mlib\x1b[0m\r\ndrwxr-xr-x   10 root     
root            0 Jan  1  1970 \x1b[1;34mlocal\x1b[0m\r\ndrwxr-xr-x    
2 root     root            0 Jan  1  2000 \x1b[1;34msbin\x1b[0m\r\ndrwxr-xr-x    
5 root     root            0 Jan  1  2000 \x1b[1;34mshare\x1b[0m\r\n# exit\r\n'

>>> print str
exit
drwxr-xr-x    2 root     root            0 Jan  1  2000 bin
drwxr-xr-x    3 root     root            0 Jan  1  2000 lib
drwxr-xr-x   10 root     root            0 Jan  1  1970 local
drwxr-xr-x    2 root     root            0 Jan  1  2000 sbin
drwxr-xr-x    5 root     root            0 Jan  1  2000 share
# exit

我想使用正则表达式摆脱所有'\xblah[0m'废话。我已经尝试过

re.sub(str, r'(\x.*m)', '')

但这并没有成功。有什么想法吗？

原文

We have:

>>> str
'exit\r\ndrwxr-xr-x    2 root     root            0 Jan  1  2000 
\x1b[1;34mbin\x1b[0m\r\ndrwxr-xr-x    3 root     root           
0 Jan  1  2000 \x1b[1;34mlib\x1b[0m\r\ndrwxr-xr-x   10 root     
root            0 Jan  1  1970 \x1b[1;34mlocal\x1b[0m\r\ndrwxr-xr-x    
2 root     root            0 Jan  1  2000 \x1b[1;34msbin\x1b[0m\r\ndrwxr-xr-x    
5 root     root            0 Jan  1  2000 \x1b[1;34mshare\x1b[0m\r\n# exit\r\n'

>>> print str
exit
drwxr-xr-x    2 root     root            0 Jan  1  2000 bin
drwxr-xr-x    3 root     root            0 Jan  1  2000 lib
drwxr-xr-x   10 root     root            0 Jan  1  1970 local
drwxr-xr-x    2 root     root            0 Jan  1  2000 sbin
drwxr-xr-x    5 root     root            0 Jan  1  2000 share
# exit

I want to get rid of all the '\xblah[0m' nonsense using regexp. I've tried

re.sub(str, r'(\x.*m)', '')

But that hasn't done the trick. Any ideas?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝咒 2024-08-19 18:08:43

您遇到了一些问题：

您以错误的顺序将参数传递给 re.sub。应该是：
re.sub(regexp_pattern, replacement, source_string)
字符串不包含“\x”。 “\x1b”是转义字符，它是单个字符。
正如 interjay 指出的，你想要“.*？”而不是“.*”，因为否则它将匹配从第一个转义到最后一个“m”的所有内容。

对 re.sub 的正确调用是：

print re.sub('\x1b.*?m', '', s)

或者，您可以使用：

print re.sub('\x1b[^m]*m', '', s)

You have a few issues:

You're passing arguments to re.sub in the wrong order wrong. It should be:
re.sub(regexp_pattern, replacement, source_string)
The string doesn't contain "\x". That "\x1b" is the escape character, and it's a single character.
As interjay pointed out, you want ".*?" rather than ".*", because otherwise it will match everything from the first escape through the last "m".

The correct call to re.sub is:

print re.sub('\x1b.*?m', '', s)

Alternatively, you could use:

print re.sub('\x1b[^m]*m', '', s)

回复收藏 0 原文

痴意少年 2024-08-19 18:08:43

您需要进行以下更改：

转义反斜杠
切换到非贪婪匹配。否则，第一个 \x 和最后一个 m 之间的所有内容都将被删除，当出现多次时，这将是一个问题。
参数顺序不正确

结果：

re.sub(r'(\\x.*?m)', '', str)

You need the following changes:

Escape the backslash
Switch to non-greedy matching. Otherwise, everything between the first \x and the last m will be removed, which will be a problem when there is more than one occurrence.
The order of arguments is incorrect

Result:

re.sub(r'(\\x.*?m)', '', str)

回复收藏 0 原文

和我恋爱吧 2024-08-19 18:08:43

这些是 ANSI 终端代码。它们由 ESC（字节 27，在 Python 中视为 \x1B）表示，后跟 [，然后是一些 ; 分隔的参数和最后是一个字母来指定它是哪个命令。（m 是颜色变化。）

参数通常是数字，因此对于这个简单的情况，您可以使用以下方法摆脱它们：

ansisequence= re.compile(r'\x1B\[[^A-Za-z]*[A-Za-z]')
ansisequence.sub('', string)

从技术上讲，对于某些（与颜色无关的）控制代码，它们可以是通用字符串，这使得解析很烦人。你很少会遇到这些，但如果你遇到了，我想你必须使用一些复杂的东西，比如：

\x1B\[((\d+|"[^"]*")(;(\d+|"[^"]*"))*)?[A-Za-z]

最好是说服任何生成字符串的人你不是 ANSI 终端，所以它不应该在其输出中包含颜色代码。

These are ANSI terminal codes. They're signalled by an ESC (byte 27, seen in Python as \x1B) followed by [, then some ;-separated parameters and finally a letter to specify which command it is. (m is a colour change.)

The parameters are usually numbers so for this simple case you could get rid of them with:

ansisequence= re.compile(r'\x1B\[[^A-Za-z]*[A-Za-z]')
ansisequence.sub('', string)

Technically for some (non-colour-related) control codes they could be general strings, which makes the parsing annoying. It's rare you'd meet these, but if you did I guess you'd have to use something complicated like:

\x1B\[((\d+|"[^"]*")(;(\d+|"[^"]*"))*)?[A-Za-z]

Best would be to persuade whatever's generating the string that you're not an ANSI terminal so it shouldnt include colour codes in its output.

回复收藏 0 原文

迷你仙 2024-08-19 18:08:43

尝试运行 ls --color=never -l ，您一开始就不会获得 ANSI 转义码。

回复收藏 0 原文

橘亓 2024-08-19 18:08:43

这是针对您的问题的 pyparsing 解决方案，其中包含针对那些讨厌的转义序列的通用解析表达式。通过使用抑制表达式转换初始字符串，这将返回一个删除了表达式的所有匹配项的字符串。

s = \
'exit\r\ndrwxr-xr-x    2 root     root            0 Jan  1  2000 ' \
'\x1b[1;34mbin\x1b[0m\r\ndrwxr-xr-x    3 root     root           ' \
'0 Jan  1  2000 \x1b[1;34mlib\x1b[0m\r\ndrwxr-xr-x   10 root     ' \
'root            0 Jan  1  1970 \x1b[1;34mlocal\x1b[0m\r\ndrwxr-xr-x    ' \
'2 root     root            0 Jan  1  2000 \x1b[1;34msbin\x1b[0m\r\ndrwxr-xr-x    ' \
'5 root     root            0 Jan  1  2000 \x1b[1;34mshare\x1b[0m\r\n# exit\r\n' \

from pyparsing import (Literal, Word, nums, Combine, 
    delimitedList, oneOf, alphas, Suppress)

ESC = Literal('\x1b')
integer = Word(nums)
escapeSeq = Combine(ESC + '[' + delimitedList(integer,';') + oneOf(list(alphas)))

s_prime = Suppress(escapeSeq).transformString(s)

print s_prime

这将打印您想要的输出，如存储在 s_prime 中。

Here is a pyparsing solution to your problem, with a general parsing expression for those pesky escape sequences. By transforming the initial string with a suppressed expression, this returns a string stripped of all matches of the expression.

s = \
'exit\r\ndrwxr-xr-x    2 root     root            0 Jan  1  2000 ' \
'\x1b[1;34mbin\x1b[0m\r\ndrwxr-xr-x    3 root     root           ' \
'0 Jan  1  2000 \x1b[1;34mlib\x1b[0m\r\ndrwxr-xr-x   10 root     ' \
'root            0 Jan  1  1970 \x1b[1;34mlocal\x1b[0m\r\ndrwxr-xr-x    ' \
'2 root     root            0 Jan  1  2000 \x1b[1;34msbin\x1b[0m\r\ndrwxr-xr-x    ' \
'5 root     root            0 Jan  1  2000 \x1b[1;34mshare\x1b[0m\r\n# exit\r\n' \

from pyparsing import (Literal, Word, nums, Combine, 
    delimitedList, oneOf, alphas, Suppress)

ESC = Literal('\x1b')
integer = Word(nums)
escapeSeq = Combine(ESC + '[' + delimitedList(integer,';') + oneOf(list(alphas)))

s_prime = Suppress(escapeSeq).transformString(s)

print s_prime

This prints your desired output, as stored in s_prime.

回复收藏 0 原文

~没有更多了~