在python字符串中解码特定的逃脱字符

发布于 2025-01-17 13:21:17 字数 851 浏览 2 评论 0原文

我有一个 Python 变量(名为 var),其中包含一个具有以下文字数据的字符串:

day\r\n\\night

以十六进制表示,它是:

64  61  79  5C  72  5C  6E  5C  5C  6E  69  67  68  74  07
d   a   y   \   r   \   n   \   \   n   i   g   h   t   BEL

我需要解码 \\, \r\n

所需的输出(十六进制):

64  61  79  0D  0A  5C  6E  69  67  68  74  07
d   a   y   CR  LF  \   n   i   g   h   t   BEL

使用 decode 不起作用:

>>> print(var.decode('ascii'))
AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?

使用正则表达式查找并替换 \\\r\n 及其转义值是不成功的,因为 \night 中的 \n 被视为 0x0A。

是否可以指定我想要解码哪些字符,或者是否有更合适的模块?我正在使用Python 3.10.2。

I have a Python variable (named var) containing a string with the following literal data:

day\r\n\\night

in hex, it is:

64  61  79  5C  72  5C  6E  5C  5C  6E  69  67  68  74  07
d   a   y   \   r   \   n   \   \   n   i   g   h   t   BEL

I need to decode \\, \r and \n only.

The desired output (in hex):

64  61  79  0D  0A  5C  6E  69  67  68  74  07
d   a   y   CR  LF  \   n   i   g   h   t   BEL

Using decode doesn't work:

>>> print(var.decode('ascii'))
AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?

Using regex to find and replace \\, \r and \n with their escaped values is unsuccessful, as the \n in \night is treated as a 0x0A.

Is it possible to specify which characters I want to decode, or is there a more appropriate module? I'm using Python 3.10.2.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

三生殊途 2025-01-24 13:21:17

假设 var 是这样的字符串:

64617905C725C6E5C5C6E69676877407 (不含空格),

您应该尝试:

i = 0
escaped = {'72': '0D', '6E': '0A', '5C': '5C'}
while i < len(var):
   if var[i:i+2] == '5C':                # checks if the caracter is a '\'
      i += 2                             # if yes, goes to next character hex code in var
      var[i-2:i+2] = escaped[var[i:i+2]] # replaces the '5Cxx' by its escaped value
   i += 2

它将用以下内容替换 \r \n \\对应的字符(CR LF \)。

稍后我将在 day\r\l\\night64617905C725C6E5C5C6E696768774 之间添加转换器。

编辑:转换器就在这里!
每次转换后的字符串都是r
它处理 input() 的结果,但对于硬编码字符串,您必须输入:
var = 'day\\r\\l\\\\night'
这样代码就会将其理解为“day”,然后是“\”,然后是“r”,然后是“\”,然后是“n”,然后是“\”,然后是“\”,然后是“night”< /strong> 而不是'day',然后 CR,然后 LF,然后 '\',然后 'night';因此
打印(var)
将会打印
白天\r\n\\夜晚
而不是

day
\night

# convert string to hex
r = ''
for c in var:
   t = hex(ord(c))[2:]
   if ord(c) < 16: t = '0' + t
   r += t
# convert hex to string
r = ''
c = 0
while c < len(var):
   # transforms each hex code point into a decimal number
   # I kind of cheat using `eval`. But don't worry. Doesn't matter.
   # anyway, it then adds the corresponding character to `r`.
   r += eval('chr(0x' + var[c:c+2] + ')') # does like, `r += chr(0x5C)` for example.
   c += 2

Assuming var is a string like this:

64617905C725C6E5C5C6E69676877407 (without spaces)

you should try:

i = 0
escaped = {'72': '0D', '6E': '0A', '5C': '5C'}
while i < len(var):
   if var[i:i+2] == '5C':                # checks if the caracter is a '\'
      i += 2                             # if yes, goes to next character hex code in var
      var[i-2:i+2] = escaped[var[i:i+2]] # replaces the '5Cxx' by its escaped value
   i += 2

It will replace the \r \n \\ by the characters corresponding (CR LF \).

I'll later add converters between day\r\l\\night and 64617905C725C6E5C5C6E696768774.

EDIT: Converters are here!
The converted string is r each time.
It handles the results of input() but for hard-coded strings you'll have to enter:
var = 'day\\r\\l\\\\night'
so that the code will understand it as 'day', then '\', then 'r', then '\', then 'n', then '\', then '\', then 'night' and not 'day', then CR, then LF, then '\', then 'night'; so that upon
print(var)
there will be printed
day\r\n\\night
and not

day
\night

# convert string to hex
r = ''
for c in var:
   t = hex(ord(c))[2:]
   if ord(c) < 16: t = '0' + t
   r += t
# convert hex to string
r = ''
c = 0
while c < len(var):
   # transforms each hex code point into a decimal number
   # I kind of cheat using `eval`. But don't worry. Doesn't matter.
   # anyway, it then adds the corresponding character to `r`.
   r += eval('chr(0x' + var[c:c+2] + ')') # does like, `r += chr(0x5C)` for example.
   c += 2
鸠魁 2025-01-24 13:21:17

此处查找类似问题。根据这个你可以做以下

var = r"day\r\n\\night"

# This is what you got previously
var.encode('ascii').hex()
# '64 61 79 5c 72 5c 6e 5c 5c 6e 69 67 68 74'

# To get required output do this
bytes(var, encoding='ascii').decode('unicode-escape').encode('ascii').hex()
# '64 61 79 0d 0a 5c 6e 69 67 68 74'

Find similar question here. According to this you can do following

var = r"day\r\n\\night"

# This is what you got previously
var.encode('ascii').hex()
# '64 61 79 5c 72 5c 6e 5c 5c 6e 69 67 68 74'

# To get required output do this
bytes(var, encoding='ascii').decode('unicode-escape').encode('ascii').hex()
# '64 61 79 0d 0a 5c 6e 69 67 68 74'
戴着白色围巾的女孩 2025-01-24 13:21:17

非常感谢每个提供答案的人,但他们似乎都没有完全解决我的问题。经过长时间的研究,我发现这个sahil Kothiya 的解决方案 (mirror) -- 我修改了它来解决我的具体问题:

import re, codecs

ESCAPE_SEQUENCE_RE = re.compile(r'''
    ( \\[\\nr]  # Single-character escapes
    )''', re.UNICODE | re.VERBOSE)

def decode_escapes(s):
    def decode_match(match):
        return codecs.decode(match.group(0), 'unicode-escape')
return ESCAPE_SEQUENCE_RE.sub(decode_match, s)

IDLE 中的演示:

IDLE 演示

Notepad++ 中显示的特殊字符:

NP++ demo

输出字符串的十六进制转储:

hexdump


它甚至可以使用 Unicode 字符(这是我的 脚本)。

IDLE 中的演示:

IDLE demo-2

Notepad++ 中显示的特殊字符:

NP++ demo-2

输出字符串的十六进制转储:

hexdump-2

Many thanks to everyone that contributed their answers, but none of them seemed to solve my issue completely. After long time of research I found this solution from sahil Kothiya (mirror) -- I modified it to resolve my specific issue:

import re, codecs

ESCAPE_SEQUENCE_RE = re.compile(r'''
    ( \\[\\nr]  # Single-character escapes
    )''', re.UNICODE | re.VERBOSE)

def decode_escapes(s):
    def decode_match(match):
        return codecs.decode(match.group(0), 'unicode-escape')
return ESCAPE_SEQUENCE_RE.sub(decode_match, s)

Demonstration in IDLE:

IDLE demo

Special characters shown in Notepad++:

NP++ demo

Hex dump of output string:

hexdump


It even works with Unicode characters (an important component to my script).

Demonstration in IDLE:

IDLE demo-2

Special characters shown in Notepad++:

NP++ demo-2

Hex dump of output string:

hexdump-2

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文