python 正则表达式转义字符
我们有:
>>> str
'exit\r\ndrwxr-xr-x 2 root root 0 Jan 1 2000
\x1b[1;34mbin\x1b[0m\r\ndrwxr-xr-x 3 root root
0 Jan 1 2000 \x1b[1;34mlib\x1b[0m\r\ndrwxr-xr-x 10 root
root 0 Jan 1 1970 \x1b[1;34mlocal\x1b[0m\r\ndrwxr-xr-x
2 root root 0 Jan 1 2000 \x1b[1;34msbin\x1b[0m\r\ndrwxr-xr-x
5 root root 0 Jan 1 2000 \x1b[1;34mshare\x1b[0m\r\n# exit\r\n'
>>> print str
exit
drwxr-xr-x 2 root root 0 Jan 1 2000 bin
drwxr-xr-x 3 root root 0 Jan 1 2000 lib
drwxr-xr-x 10 root root 0 Jan 1 1970 local
drwxr-xr-x 2 root root 0 Jan 1 2000 sbin
drwxr-xr-x 5 root root 0 Jan 1 2000 share
# exit
我想使用正则表达式摆脱所有'\xblah[0m'废话。我已经尝试过
re.sub(str, r'(\x.*m)', '')
但这并没有成功。有什么想法吗?
We have:
>>> str
'exit\r\ndrwxr-xr-x 2 root root 0 Jan 1 2000
\x1b[1;34mbin\x1b[0m\r\ndrwxr-xr-x 3 root root
0 Jan 1 2000 \x1b[1;34mlib\x1b[0m\r\ndrwxr-xr-x 10 root
root 0 Jan 1 1970 \x1b[1;34mlocal\x1b[0m\r\ndrwxr-xr-x
2 root root 0 Jan 1 2000 \x1b[1;34msbin\x1b[0m\r\ndrwxr-xr-x
5 root root 0 Jan 1 2000 \x1b[1;34mshare\x1b[0m\r\n# exit\r\n'
>>> print str
exit
drwxr-xr-x 2 root root 0 Jan 1 2000 bin
drwxr-xr-x 3 root root 0 Jan 1 2000 lib
drwxr-xr-x 10 root root 0 Jan 1 1970 local
drwxr-xr-x 2 root root 0 Jan 1 2000 sbin
drwxr-xr-x 5 root root 0 Jan 1 2000 share
# exit
I want to get rid of all the '\xblah[0m' nonsense using regexp. I've tried
re.sub(str, r'(\x.*m)', '')
But that hasn't done the trick. Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您遇到了一些问题:
您以错误的顺序将参数传递给 re.sub。应该是:
re.sub(regexp_pattern, replacement, source_string)
字符串不包含“\x”。 “\x1b”是转义字符,它是单个字符。
正如 interjay 指出的,你想要“.*?”而不是“.*”,因为否则它将匹配从第一个转义到最后一个“m”的所有内容。
对 re.sub 的正确调用是:
或者,您可以使用:
You have a few issues:
You're passing arguments to re.sub in the wrong order wrong. It should be:
re.sub(regexp_pattern, replacement, source_string)
The string doesn't contain "\x". That "\x1b" is the escape character, and it's a single character.
As interjay pointed out, you want ".*?" rather than ".*", because otherwise it will match everything from the first escape through the last "m".
The correct call to re.sub is:
Alternatively, you could use:
您需要进行以下更改:
\x
和最后一个m
之间的所有内容都将被删除,当出现多次时,这将是一个问题。结果:
You need the following changes:
\x
and the lastm
will be removed, which will be a problem when there is more than one occurrence.Result:
这些是 ANSI 终端代码。它们由 ESC(字节 27,在 Python 中视为
\x1B
)表示,后跟[
,然后是一些;
分隔的参数和最后是一个字母来指定它是哪个命令。 (m
是颜色变化。)参数通常是数字,因此对于这个简单的情况,您可以使用以下方法摆脱它们:
从技术上讲,对于某些(与颜色无关的)控制代码,它们可以是通用字符串,这使得解析很烦人。你很少会遇到这些,但如果你遇到了,我想你必须使用一些复杂的东西,比如:
最好是说服任何生成字符串的人你不是 ANSI 终端,所以它不应该在其输出中包含颜色代码。
These are ANSI terminal codes. They're signalled by an ESC (byte 27, seen in Python as
\x1B
) followed by[
, then some;
-separated parameters and finally a letter to specify which command it is. (m
is a colour change.)The parameters are usually numbers so for this simple case you could get rid of them with:
Technically for some (non-colour-related) control codes they could be general strings, which makes the parsing annoying. It's rare you'd meet these, but if you did I guess you'd have to use something complicated like:
Best would be to persuade whatever's generating the string that you're not an ANSI terminal so it shouldnt include colour codes in its output.
尝试运行
ls --color=never -l
,您一开始就不会获得 ANSI 转义码。Try running
ls --color=never -l
instead, and you won't get the ANSI escape codes in the first place.这是针对您的问题的 pyparsing 解决方案,其中包含针对那些讨厌的转义序列的通用解析表达式。通过使用抑制表达式转换初始字符串,这将返回一个删除了表达式的所有匹配项的字符串。
这将打印您想要的输出,如存储在
s_prime
中。Here is a pyparsing solution to your problem, with a general parsing expression for those pesky escape sequences. By transforming the initial string with a suppressed expression, this returns a string stripped of all matches of the expression.
This prints your desired output, as stored in
s_prime
.