在python字符串中解码特定的逃脱字符

发布于 2025-01-17 13:21:17 字数 851 浏览 2 评论 0原文

我有一个 Python 变量（名为 var），其中包含一个具有以下文字数据的字符串：

day\r\n\\night

以十六进制表示，它是：

64  61  79  5C  72  5C  6E  5C  5C  6E  69  67  68  74  07
d   a   y   \   r   \   n   \   \   n   i   g   h   t   BEL

我需要解码 \\, \r 和 \n 仅。

所需的输出（十六进制）：

64  61  79  0D  0A  5C  6E  69  67  68  74  07
d   a   y   CR  LF  \   n   i   g   h   t   BEL

使用 decode 不起作用：

>>> print(var.decode('ascii'))
AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?

使用正则表达式查找并替换 \\、\r 和 \n 及其转义值是不成功的，因为 \night 中的 \n 被视为 0x0A。

是否可以指定我想要解码哪些字符，或者是否有更合适的模块？我正在使用Python 3.10.2。

原文

I have a Python variable (named var) containing a string with the following literal data:

day\r\n\\night

in hex, it is:

64  61  79  5C  72  5C  6E  5C  5C  6E  69  67  68  74  07
d   a   y   \   r   \   n   \   \   n   i   g   h   t   BEL

I need to decode \\, \r and \n only.

The desired output (in hex):

64  61  79  0D  0A  5C  6E  69  67  68  74  07
d   a   y   CR  LF  \   n   i   g   h   t   BEL

Using decode doesn't work:

>>> print(var.decode('ascii'))
AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?

Using regex to find and replace \\, \r and \n with their escaped values is unsuccessful, as the \n in \night is treated as a 0x0A.

Is it possible to specify which characters I want to decode, or is there a more appropriate module? I'm using Python 3.10.2.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

三生殊途 2025-01-24 13:21:17

假设 var 是这样的字符串：

64617905C725C6E5C5C6E69676877407 （不含空格），

您应该尝试：

i = 0
escaped = {'72': '0D', '6E': '0A', '5C': '5C'}
while i < len(var):
   if var[i:i+2] == '5C':                # checks if the caracter is a '\'
      i += 2                             # if yes, goes to next character hex code in var
      var[i-2:i+2] = escaped[var[i:i+2]] # replaces the '5Cxx' by its escaped value
   i += 2

它将用以下内容替换 \r \n \\对应的字符（CR LF \）。

稍后我将在 day\r\l\\night 和 64617905C725C6E5C5C6E696768774 之间添加转换器。

编辑：转换器就在这里！
每次转换后的字符串都是r。
它处理 input() 的结果，但对于硬编码字符串，您必须输入：
var = 'day\\r\\l\\\\night'
这样代码就会将其理解为“day”，然后是“\”，然后是“r”，然后是“\”，然后是“n”，然后是“\”，然后是“\”，然后是“night”< /strong> 而不是'day'，然后 CR，然后 LF，然后 '\'，然后 'night'；因此
打印(var)
将会打印
白天\r\n\\夜晚
而不是

day
\night

# convert string to hex
r = ''
for c in var:
   t = hex(ord(c))[2:]
   if ord(c) < 16: t = '0' + t
   r += t

# convert hex to string
r = ''
c = 0
while c < len(var):
   # transforms each hex code point into a decimal number
   # I kind of cheat using `eval`. But don't worry. Doesn't matter.
   # anyway, it then adds the corresponding character to `r`.
   r += eval('chr(0x' + var[c:c+2] + ')') # does like, `r += chr(0x5C)` for example.
   c += 2

Assuming var is a string like this:

64617905C725C6E5C5C6E69676877407 (without spaces)

you should try:

i = 0
escaped = {'72': '0D', '6E': '0A', '5C': '5C'}
while i < len(var):
   if var[i:i+2] == '5C':                # checks if the caracter is a '\'
      i += 2                             # if yes, goes to next character hex code in var
      var[i-2:i+2] = escaped[var[i:i+2]] # replaces the '5Cxx' by its escaped value
   i += 2

It will replace the \r \n \\ by the characters corresponding (CR LF \).

I'll later add converters between day\r\l\\night and 64617905C725C6E5C5C6E696768774.

EDIT: Converters are here!
The converted string is r each time.
It handles the results of input() but for hard-coded strings you'll have to enter:
var = 'day\\r\\l\\\\night'
so that the code will understand it as 'day', then '\', then 'r', then '\', then 'n', then '\', then '\', then 'night' and not 'day', then CR, then LF, then '\', then 'night'; so that upon
print(var)
there will be printed
day\r\n\\night
and not

day
\night

# convert string to hex
r = ''
for c in var:
   t = hex(ord(c))[2:]
   if ord(c) < 16: t = '0' + t
   r += t

# convert hex to string
r = ''
c = 0
while c < len(var):
   # transforms each hex code point into a decimal number
   # I kind of cheat using `eval`. But don't worry. Doesn't matter.
   # anyway, it then adds the corresponding character to `r`.
   r += eval('chr(0x' + var[c:c+2] + ')') # does like, `r += chr(0x5C)` for example.
   c += 2

回复收藏 0 原文

鸠魁 2025-01-24 13:21:17

此处查找类似问题。根据这个你可以做以下

var = r"day\r\n\\night"

# This is what you got previously
var.encode('ascii').hex()
# '64 61 79 5c 72 5c 6e 5c 5c 6e 69 67 68 74'

# To get required output do this
bytes(var, encoding='ascii').decode('unicode-escape').encode('ascii').hex()
# '64 61 79 0d 0a 5c 6e 69 67 68 74'

Find similar question here. According to this you can do following

var = r"day\r\n\\night"

# This is what you got previously
var.encode('ascii').hex()
# '64 61 79 5c 72 5c 6e 5c 5c 6e 69 67 68 74'

# To get required output do this
bytes(var, encoding='ascii').decode('unicode-escape').encode('ascii').hex()
# '64 61 79 0d 0a 5c 6e 69 67 68 74'

回复收藏 0 原文

戴着白色围巾的女孩 2025-01-24 13:21:17

非常感谢每个提供答案的人，但他们似乎都没有完全解决我的问题。经过长时间的研究，我发现这个sahil Kothiya 的解决方案 (mirror) -- 我修改了它来解决我的具体问题：

import re, codecs

ESCAPE_SEQUENCE_RE = re.compile(r'''
    ( \\[\\nr]  # Single-character escapes
    )''', re.UNICODE | re.VERBOSE)

def decode_escapes(s):
    def decode_match(match):
        return codecs.decode(match.group(0), 'unicode-escape')
return ESCAPE_SEQUENCE_RE.sub(decode_match, s)

IDLE 中的演示：

Notepad++ 中显示的特殊字符：

输出字符串的十六进制转储：

它甚至可以使用 Unicode 字符（这是我的脚本）。

IDLE 中的演示：

Notepad++ 中显示的特殊字符：

输出字符串的十六进制转储：

Many thanks to everyone that contributed their answers, but none of them seemed to solve my issue completely. After long time of research I found this solution from sahil Kothiya (mirror) -- I modified it to resolve my specific issue:

import re, codecs

ESCAPE_SEQUENCE_RE = re.compile(r'''
    ( \\[\\nr]  # Single-character escapes
    )''', re.UNICODE | re.VERBOSE)

def decode_escapes(s):
    def decode_match(match):
        return codecs.decode(match.group(0), 'unicode-escape')
return ESCAPE_SEQUENCE_RE.sub(decode_match, s)

Demonstration in IDLE:

Special characters shown in Notepad++:

Hex dump of output string:

It even works with Unicode characters (an important component to my script).

Demonstration in IDLE:

Special characters shown in Notepad++:

Hex dump of output string:

回复收藏 0 原文

~没有更多了~