如何在Python中编码/解码转义序列字符
如何将 python 中的转义序列字符 '\x13' 编码/解码为 RSS 或 XML 中有效的字符。
用例是,我从任意来源获取数据并为该数据制作 RSS 提要。数据源有时具有转义序列字符,这会破坏我的 RSS 提要。
那么如何使用转义序列字符清理输入数据。
how to encode/decode escape sequence character '\x13' in python into a character that is valid in a RSS or XML.
use case is, I am getting data from arbitrary sources and making a RSS feed for that data. The data source sometimes have escape sequence character which is breaking my RSS feed.
So how can I sanitize the input data with escape sequence character.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
\x13
(ASCII 19, 'DC3') 无法转义;它在 XML 1.0 中是无效的。您可以包含一个,在 XML 1.1 中编码为或
,但随后您必须包含
声明,许多工具不喜欢它。
我不知道为什么该字符会包含在您的数据中,但前进的方向可能是完全删除控制代码。例如:
对于某些类型的转义序列(例如 ANSI 颜色代码),您可能会在其中得到杂散(非控制)字符,在这种情况下,您可能需要针对该特定格式的自定义解析器。
\x13
(ASCII 19, ‘DC3’) can't be escaped; it is invalid in XML 1.0, period. You can include one, encoded asor
in XML 1.1, but then you have to include the
<?xml version="1.1"?>
declaration and many tools won't like it.I've no idea why that character would be included in your data, but the way forward is probably to completely remove control codes. For example:
For some kinds of escape sequence (eg. ANSI colour codes) you might get stray (non-control) characters still in there, in which case you'd probably want a custom parser for that particular format.