使用 string.translate() 将不可打印的字符修改为点
所以我以前就做过这样的事情,对于这样一个看似简单的任务来说,这是一段令人惊讶的丑陋的代码。
目标是将任何不可打印的字符转换为 .(点)。出于我的目的,“printable”确实排除了 string.printable 中的最后几个字符(换行符、制表符等)。这是为了打印旧的 MS-DOS 调试“十六进制转储”格式之类的内容......或类似的任何内容(其中额外的空格将破坏预期的转储布局)。
我知道我可以使用 string.translate() ,并且要使用它,我需要一个翻译表。所以我使用 string.maketrans() 来实现这一点。这是我能想到的最好的:
filter = string.maketrans(
string.translate(string.maketrans('',''),
string.maketrans('',''),string.printable[:-5]),
'.'*len(string.translate(string.maketrans('',''),
string.maketrans('',''),string.printable[:-5])))
...这是一个难以阅读的混乱(尽管它确实有效)。
从那里你可以调用类似的东西:
for each_line in sometext:
print string.translate(each_line, filter)
... 并感到高兴。 (只要你不看引擎盖下面)。
现在,如果我将这个可怕的表达式分解为单独的语句,它会更具可读性:
ascii = string.maketrans('','') # The whole ASCII character set
nonprintable = string.translate(ascii, ascii, string.printable[:-5]) # Optional delchars argument
filter = string.maketrans(nonprintable, '.' * len(nonprintable))
并且为了易读性而这样做很诱人。
然而,我一直认为必须有一种更优雅的方式来表达这一点!
So I've done this before and it's a surprising ugly bit of code for such a seemingly simple task.
The goal is to translate any non-printable character into a . (dot). For my purposes "printable" does exclude the last few characters from string.printable
(new-lines, tabs, and so on). This is for printing things like the old MS-DOS debug "hex dump" format ... or anything similar to that (where additional whitespace will mangle the intended dump layout).
I know I can use string.translate()
and, to use that, I need a translation table. So I use string.maketrans()
for that. Here's the best I could come up with:
filter = string.maketrans(
string.translate(string.maketrans('',''),
string.maketrans('',''),string.printable[:-5]),
'.'*len(string.translate(string.maketrans('',''),
string.maketrans('',''),string.printable[:-5])))
... which is an unreadable mess (though it does work).
From there you can call use something like:
for each_line in sometext:
print string.translate(each_line, filter)
... and be happy. (So long as you don't look under the hood).
Now it is more readable if I break that horrid expression into separate statements:
ascii = string.maketrans('','') # The whole ASCII character set
nonprintable = string.translate(ascii, ascii, string.printable[:-5]) # Optional delchars argument
filter = string.maketrans(nonprintable, '.' * len(nonprintable))
And it's tempting to do that just for legibility.
However, I keep thinking there has to be a more elegant way to express this!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是使用列表理解的另一种方法:
Here's another approach using a list comprehension:
这里最广泛使用的是“ascii”,但是你明白了
如果我在打高尔夫球,可能会使用这样的东西:
Broadest use of "ascii" here, but you get the idea
If I were golfing, probably use something like this:
对于实际的代码高尔夫,我想你会完全避免 string.maketrans
或
for actual code-golf, I imagine you'd avoid string.maketrans entirely
or
我不觉得这个解决方案丑陋。它肯定比任何基于正则表达式的解决方案更有效。这是一个稍微短一点的解决方案。但仅适用于python2.6:
I don't find this solution ugly. It is certainly more efficient than any regex based solution. Here is a tiny bit shorter solution. But only works in python2.6: