如何删除在unicodedata.normarize()中使用的字符串的后斜切逃脱?

发布于 2025-02-01 08:41:09 字数 997 浏览 2 评论 0原文

问题公式/示例

考虑拉丁字符á,可以表示为

  • \ xe1在hex

  • 在16位hex

  • \ u000000e1 32位hex

中,我将拉丁-1个字符分解为同等字符,并删除了口音(ie from á to <代码> a ):(

import unicodedata

decomposed = unicodedata.normalize('NFD', '\xe1') 
encoded = decomposed.encode("utf-8")
letter = chr(list(encoded)[0]) 

print(letter)

unicodedata.normalize()的第二个参数中,可以使用三种弹头格式中的任何一个。)

我的问题

是试图推广这是normanize()的第二个参数是一个分配的变量。

我很难做到这一点,而没有明确将字符串进入公式,因为逃脱了后斜击。

示例尝试

latin = "á"
a = ascii(latin) # print(a) gives '\xe1'
decomposed = unicodedata.normalize('NFD', a)  
encoded = decomposed.encode("utf-8")
letter = chr(list(encoded)[0]) 

这是不起作用的,因为参数a被解释为'\\ xe1'而不是'\ xe1'

出于相同的原因,其他尝试通过将\ x串联\ x来构建字符串的其他尝试也无法正常工作。

Problem formulation/example

Consider the latin character á, which can be represented as

  • \xe1 in hex

  • \u00e1 in 16-bit hex

  • \U000000e1 in 32-bit hex

In the following code block, I'm decomposing the latin-1 character into an equivalent character with the accent removed (i.e. from á to a):

import unicodedata

decomposed = unicodedata.normalize('NFD', '\xe1') 
encoded = decomposed.encode("utf-8")
letter = chr(list(encoded)[0]) 

print(letter)

(Any of the three bullet-pointed formats could have been used in the second argument of unicodedata.normalize().)

My issue

My issue is in trying to generalise this, whereby the second argument to normalize() is to be an assigned variable.

I'm struggling to do this without explicitly entering the string into the formula because of the escaped backslash.

Example attempt

latin = "á"
a = ascii(latin) # print(a) gives '\xe1'
decomposed = unicodedata.normalize('NFD', a)  
encoded = decomposed.encode("utf-8")
letter = chr(list(encoded)[0]) 

This won't work because the argument a is interpreted as '\\xe1' instead of '\xe1'.

Other attempts to get the hex representation and construct a string by concatenating \x to it won't work either, for the same reason.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文