从Python中的字符串中去除不可打印的字符

发布于 2024-07-05 17:53:40 字数 324 浏览 9 评论 0原文

我曾经在 Perl 上运行

$s =~ s/[^[:print:]]//g;

来摆脱不可打印的字符。

在 Python 中，没有 POSIX 正则表达式类，而且我无法编写 [:print:] 让它表达我想要的意思。我知道 Python 中没有办法检测字符是否可打印。

你会怎么办？

编辑：它也必须支持 Unicode 字符。 string.printable 方式很乐意将它们从输出中删除。对于任何 unicode 字符，curses.ascii.isprint 都会返回 false。

原文

I use to run

$s =~ s/[^[:print:]]//g;

on Perl to get rid of non printable characters.

In Python there's no POSIX regex classes, and I can't write [:print:] having it mean what I want. I know of no way in Python to detect if a character is printable or not.

What would you do?

EDIT: It has to support Unicode characters as well. The string.printable way will happily strip them out of the output.
curses.ascii.isprint will return false for any unicode character.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

十二 2024-07-12 17:53:40

在 Python 3 中，

def filter_nonprintable(text):
    import itertools
    # Use characters of control category
    nonprintable = itertools.chain(range(0x00,0x20),range(0x7f,0xa0))
    # Use translate to remove all non-printable characters
    return text.translate({character:None for character in nonprintable})

请参阅这篇有关删除标点符号的 StackOverflow 帖子< /a> 了解 .translate() 与正则表达式 & 的比较 .replace()

如果 unicodedata.category(c)=='Cc' 则可以通过 nonprintable = (ord(c) for c in (chr(i) for i in range(sys.maxunicode)) 生成范围) 使用 Unicode 字符数据库类别，如 @Ants Aasma 所示。

In Python 3,

def filter_nonprintable(text):
    import itertools
    # Use characters of control category
    nonprintable = itertools.chain(range(0x00,0x20),range(0x7f,0xa0))
    # Use translate to remove all non-printable characters
    return text.translate({character:None for character in nonprintable})

See this StackOverflow post on removing punctuation for how .translate() compares to regex & .replace()

The ranges can be generated via nonprintable = (ord(c) for c in (chr(i) for i in range(sys.maxunicode)) if unicodedata.category(c)=='Cc') using the Unicode character database categories as shown by @Ants Aasma.

从Python中的字符串中去除不可打印的字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（16）

关于作者

相关话题

热门标签

推荐作者

不再见

真是无聊啊

樱娆

浅语花开

烛光

绻影浮沉

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。