Python正则表达式实现字符串转义
我正在尝试使用 Python 正则表达式和反向引用来实现字符串转义,但它似乎并不想很好地工作。 我确信这是我做错的事情,但我不知道是什么......
>>> import re
>>> mystring = r"This is \n a test \r"
>>> p = re.compile( "\\\\(\\S)" )
>>> p.sub( "\\1", mystring )
'This is n a test r'
>>> p.sub( "\\\\\\1", mystring )
'This is \\n a test \\r'
>>> p.sub( "\\\\1", mystring )
'This is \\1 a test \\1'
我想用 \[char] 替换 \\[char],但 Python 中的反向引用似乎不遵循他们在我曾经使用过的所有其他实现中执行的规则相同。 有人可以透露一些信息吗?
I am trying to implement string unescaping with Python regex and backreferences, and it doesn't seem to want to work very well. I'm sure it's something I'm doing wrong but I can't figure out what...
>>> import re
>>> mystring = r"This is \n a test \r"
>>> p = re.compile( "\\\\(\\S)" )
>>> p.sub( "\\1", mystring )
'This is n a test r'
>>> p.sub( "\\\\\\1", mystring )
'This is \\n a test \\r'
>>> p.sub( "\\\\1", mystring )
'This is \\1 a test \\1'
I'd like to replace \\[char] with \[char], but backreferences in Python don't appear to follow the same rules they do in every other implementation I've ever used. Could someone shed some light?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我的想法是,我将读入转义字符串,然后取消转义(Python 明显缺乏此功能,您首先不需要诉诸正则表达式)。 不幸的是,我没有被反斜杠欺骗......
另一个说明性的例子:
我希望它打印的是
The idea is that I'll read in an escaped string, and unescape it (a feature notably lacking from Python, which you shouldn't need to resort to regular expressions for in the first place). Unfortunately I'm not being tricked by the backslashes...
Another illustrative example:
What I'd like it to print is
标记; 他的第二个示例要求首先将每个转义字符放入数组中,如果转义序列碰巧不在数组中,则会生成 KeyError。 除了提供的三个字符之外,它都会死掉(尝试使用 \va ),并且每次想要对字符串进行转义(或保留全局数组)时枚举每个可能的转义序列是一个非常糟糕的解决方案。 与 PHP 类似,使用带有 lambda 的
preg_replace_callback()
而不是preg_replace()
,在这种情况下完全没有必要。如果我在这件事上表现得像个混蛋,我很抱歉,我只是对 Python 感到非常沮丧。 我曾经使用过的所有其他正则表达式引擎都支持这一点,但我不明白为什么这不起作用。
感谢您的回复;
string.decode('string-escape')
函数正是我最初想要的。 如果有人对正则表达式反向引用问题有通用的解决方案,请随时发布,我也会接受它作为答案。Mark; his second example requires every escaped character thrown into an array initially, which generates a KeyError if the escape sequence happens not to be in the array. It will die on anything but the three characters provided (give \v a try), and enumerating every possible escape sequence every time you want to unescape a string (or keeping a global array) is a really bad solution. Analogous to PHP, that's using
preg_replace_callback()
with a lambda instead ofpreg_replace()
, which is utterly unnecessary in this situation.I'm sorry if I'm coming off as a dick about it, I'm just utterly frustrated with Python. This is supported by every other regular expression engine I've ever used, and I can't understand why this wouldn't work.
Thank you for responding; the
string.decode('string-escape')
function is precisely what i was looking for initially. If someone has a general solution to the regex backreference problem, feel free to post it and I'll accept that as an answer as well.好吧,我想你可能错过了 r 或算错了反斜杠......
如果我理解的话,这就是所要求的。
我怀疑更常见的请求是这样的:
感兴趣的学生还应该阅读 Ken Thompson 的 Reflections关于信任信任”,其中我们的英雄使用了一个类似的例子来解释信任编译器的危险,而这些编译器不是您自己从机器代码引导的。
Well, I think you might have missed the r or miscounted the backslashes...
Which, if I understood is what was requested.
I suspect the more common request is this:
The interested student should also read Ken Thompson's Reflections on Trusting Trust", wherein our hero uses a similar example to explain the perils of trusting compilers you haven't bootstrapped from machine code yourself.
这不是安德斯的第二个例子所做的吗?
在 2.5 中,您还可以应用
string-escape
编码:Isn't that what Anders' second example does?
In 2.5 there's also a
string-escape
encoding you can apply:你被 Python 的结果字符串表示形式欺骗了。 Python表达式:
字符串
代表我认为你想要的 。 尝试在每个 p.sub() 调用前面添加“print”以打印返回的实际字符串,而不是字符串的 Python 表示形式。
You are being tricked by Python's representation of the result string. The Python expression:
represents the string
which is I think what you wanted. Try adding 'print' in front of each of your p.sub() calls to print the actual string returned instead of a Python representation of the string.