让python默认用字符串替换不可编码的字符
我想让 python 忽略它无法编码的字符,只需将它们替换为字符串 "
即可。
例如,假设默认编码是 ascii,该命令
'%s is the word'%'ébác'
将产生
'<could not encode>b<could not encode>c is the word'
是否有任何方法可以使其成为我所有项目的默认行为?
I want to make python ignore chars it can't encode, by simply replacing them with the string "<could not encode>"
.
E.g, assuming the default encoding is ascii, the command
'%s is the word'%'ébác'
would yield
'<could not encode>b<could not encode>c is the word'
Is there any way to make this the default behavior, across all my project?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
str.encode
函数采用可选的定义错误处理的参数:来自文档:
在您的情况下,
codecs.register_error
函数可能会让人感兴趣。[关于坏字符的注意事项]
顺便说一下,请注意,在使用
register_error
时,您可能会发现自己不仅用字符串替换了单个坏字符,还替换了一组连续的坏字符,除非你注意。每次运行坏字符,而不是每个字符,您都会调用一次错误处理程序。The
str.encode
function takes an optional argument defining the error handling:From the docs:
In your case, the
codecs.register_error
function might be of interest.[Note about bad chars]
By the way, note when using
register_error
that you'll likely find yourself replacing not just individual bad characters but groups of consecutive bad characters with your string, unless you pay attention. You get one call to the error handler per run of bad chars, not per char.因此,例如:
将您自己的回调添加到 codecs.register_error 以替换为您选择的字符串。
So, for instance:
Add your own callback to codecs.register_error to replace with the string of your choice.
codecs.register_error
的最小示例
基于这个答案 (这更详细)
minimal example for codecs.register_error
based on this answer (which is more verbose)