Python正则表达式实现字符串转义

发布于 2024-07-04 06:04:52 字数 507 浏览 9 评论 0原文

我正在尝试使用 Python 正则表达式和反向引用来实现字符串转义,但它似乎并不想很好地工作。 我确信这是我做错的事情,但我不知道是什么......

>>> import re
>>> mystring = r"This is \n a test \r"
>>> p = re.compile( "\\\\(\\S)" )
>>> p.sub( "\\1", mystring )
'This is n a test r'
>>> p.sub( "\\\\\\1", mystring )
'This is \\n a test \\r'
>>> p.sub( "\\\\1", mystring )
'This is \\1 a test \\1'

我想用 \[char] 替换 \\[char],但 Python 中的反向引用似乎不遵循他们在我曾经使用过的所有其他实现中执行的规则相同。 有人可以透露一些信息吗?

I am trying to implement string unescaping with Python regex and backreferences, and it doesn't seem to want to work very well. I'm sure it's something I'm doing wrong but I can't figure out what...

>>> import re
>>> mystring = r"This is \n a test \r"
>>> p = re.compile( "\\\\(\\S)" )
>>> p.sub( "\\1", mystring )
'This is n a test r'
>>> p.sub( "\\\\\\1", mystring )
'This is \\n a test \\r'
>>> p.sub( "\\\\1", mystring )
'This is \\1 a test \\1'

I'd like to replace \\[char] with \[char], but backreferences in Python don't appear to follow the same rules they do in every other implementation I've ever used. Could someone shed some light?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

带上头具痛哭 2024-07-11 06:04:53

我的想法是,我将读入转义字符串,然后取消转义(Python 明显缺乏此功能,您首先不需要诉诸正则表达式)。 不幸的是,我没有被反斜杠欺骗......

另一个说明性的例子:

>>> mystring = r"This is \n ridiculous"
>>> print mystring
This is \n ridiculous
>>> p = re.compile( r"\\(\S)" )
>>> print p.sub( 'bloody', mystring )
This is bloody ridiculous
>>> print p.sub( r'\1', mystring )
This is n ridiculous
>>> print p.sub( r'\\1', mystring )
This is \1 ridiculous
>>> print p.sub( r'\\\1', mystring )
This is \n ridiculous

我希望它打印的是

This is 
ridiculous

The idea is that I'll read in an escaped string, and unescape it (a feature notably lacking from Python, which you shouldn't need to resort to regular expressions for in the first place). Unfortunately I'm not being tricked by the backslashes...

Another illustrative example:

>>> mystring = r"This is \n ridiculous"
>>> print mystring
This is \n ridiculous
>>> p = re.compile( r"\\(\S)" )
>>> print p.sub( 'bloody', mystring )
This is bloody ridiculous
>>> print p.sub( r'\1', mystring )
This is n ridiculous
>>> print p.sub( r'\\1', mystring )
This is \1 ridiculous
>>> print p.sub( r'\\\1', mystring )
This is \n ridiculous

What I'd like it to print is

This is 
ridiculous
奢欲 2024-07-11 06:04:53

标记; 他的第二个示例要求首先将每个转义字符放入数组中,如果转义序列碰巧不在数组中,则会生成 KeyError。 除了提供的三个字符之外,它都会死掉(尝试使用 \va ),并且每次想要对字符串进行转义(或保留全局数组)时枚举每个可能的转义序列是一个非常糟糕的解决方案。 与 PHP 类似,使用带有 lambda 的 preg_replace_callback() 而不是 preg_replace(),在这种情况下完全没有必要。

如果我在这件事上表现得像个混蛋,我很抱歉,我只是对 Python 感到非常沮丧。 我曾经使用过的所有其他正则表达式引擎都支持这一点,但我不明白为什么这不起作用。

感谢您的回复; string.decode('string-escape') 函数正是我最初想要的。 如果有人对正则表达式反向引用问题有通用的解决方案,请随时发布,我也会接受它作为答案。

Mark; his second example requires every escaped character thrown into an array initially, which generates a KeyError if the escape sequence happens not to be in the array. It will die on anything but the three characters provided (give \v a try), and enumerating every possible escape sequence every time you want to unescape a string (or keeping a global array) is a really bad solution. Analogous to PHP, that's using preg_replace_callback() with a lambda instead of preg_replace(), which is utterly unnecessary in this situation.

I'm sorry if I'm coming off as a dick about it, I'm just utterly frustrated with Python. This is supported by every other regular expression engine I've ever used, and I can't understand why this wouldn't work.

Thank you for responding; the string.decode('string-escape') function is precisely what i was looking for initially. If someone has a general solution to the regex backreference problem, feel free to post it and I'll accept that as an answer as well.

任谁 2024-07-11 06:04:53

好吧,我想你可能错过了 r 或算错了反斜杠......

"\\n" == r"\n"

>>> import re
>>> mystring = r"This is \\n a test \\r"
>>> p = re.compile( r"[\\][\\](.)" )
>>> print p.sub( r"\\\1", mystring )
This is \n a test \r
>>>

如果我理解的话,这就是所要求的。

我怀疑更常见的请求是这样的:

>>> d = {'n':'\n', 'r':'\r', 'f':'\f'}
>>> p = re.compile(r"[\\]([nrfv])")
>>> print p.sub(lambda mo: d[mo.group(1)], mystring)
This is \
 a test \
>>>

感兴趣的学生还应该阅读 Ken Thompson 的 Reflections关于信任信任”,其中我们的英雄使用了一个类似的例子来解释信任编译器的危险,而这些编译器不是您自己从机器代码引导的。

Well, I think you might have missed the r or miscounted the backslashes...

"\\n" == r"\n"

>>> import re
>>> mystring = r"This is \\n a test \\r"
>>> p = re.compile( r"[\\][\\](.)" )
>>> print p.sub( r"\\\1", mystring )
This is \n a test \r
>>>

Which, if I understood is what was requested.

I suspect the more common request is this:

>>> d = {'n':'\n', 'r':'\r', 'f':'\f'}
>>> p = re.compile(r"[\\]([nrfv])")
>>> print p.sub(lambda mo: d[mo.group(1)], mystring)
This is \
 a test \
>>>

The interested student should also read Ken Thompson's Reflections on Trusting Trust", wherein our hero uses a similar example to explain the perils of trusting compilers you haven't bootstrapped from machine code yourself.

ゃ人海孤独症 2024-07-11 06:04:53

这不是安德斯的第二个例子所做的吗?

在 2.5 中,您还可以应用 string-escape 编码:

>>> mystring = r"This is \n a test \r"
>>> mystring.decode('string-escape')
'This is \n a test \r'
>>> print mystring.decode('string-escape')
This is 
 a test 
>>> 

Isn't that what Anders' second example does?

In 2.5 there's also a string-escape encoding you can apply:

>>> mystring = r"This is \n a test \r"
>>> mystring.decode('string-escape')
'This is \n a test \r'
>>> print mystring.decode('string-escape')
This is 
 a test 
>>> 
薄情伤 2024-07-11 06:04:53

你被 Python 的结果字符串表示形式欺骗了。 Python表达式:

'This is \\n a test \\r'

字符串

This is \n a test \r

代表我认为你想要的 。 尝试在每个 p.sub() 调用前面添加“print”以打印返回的实际字符串,而不是字符串的 Python 表示形式。

>>> mystring = r"This is \n a test \r"
>>> mystring
'This is \\n a test \\r'
>>> print mystring
This is \n a test \r

You are being tricked by Python's representation of the result string. The Python expression:

'This is \\n a test \\r'

represents the string

This is \n a test \r

which is I think what you wanted. Try adding 'print' in front of each of your p.sub() calls to print the actual string returned instead of a Python representation of the string.

>>> mystring = r"This is \n a test \r"
>>> mystring
'This is \\n a test \\r'
>>> print mystring
This is \n a test \r
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文