从命令行将文本转换为 7 位 ASCII
我使用的是 OS X 10.5.5(尽管我猜这并不重要)
我有一组带有奇特字符的文本文件,例如双反引号、一个字符中的省略号(“...”)等。
我需要转换这些文件转换为良好的旧式普通 7 位 ASCII,最好不要丢失字符含义(即,将这些省略号转换为三个句点,将反引号转换为通常的“s 等)。
请建议一些智能命令行(bash)工具/脚本去做。
I'm on OS X 10.5.5 (though it does not matter much I guess)
I have a set of text files with fancy characters like double backquotes, ellipsises ("...") in one character etc.
I need to convert these files to good old plain 7-bit ASCII, preferably without losing character meaning (that is, convert those ellipses to three periods, backquotes to usual "s etc.).
Please advise some smart command-line (bash) tool/script to do that.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
Elinks 网络浏览器会将 Unicode 实体转换为其 ASCII 等效项,将“--”表示为“—” ”和“...”代表“…”等。有一个 python 模块 python- elinks 使用相同的转换表,将其转换为 shell 过滤器是很简单的,如下所示:
The Elinks web browser will convert Unicode entities to their ASCII equivalents, giving things like "--" for "—" and "..." for "…", etc. There is a python module python-elinks which uses the same conversion table, and it would be trivial to turn it into a shell filter, like this:
据我所知, iconv 应该这样做。 不能 100% 确定它如何处理一个输入字符应该/可能成为多个输出字符的转换,例如省略号示例......值得尝试的东西!
更新:我确实尝试过,但似乎不起作用。 它失败了,可能是因为它不知道如何以“较小”的编码表达省略号(我使用的测试字符)。 从 UTF-8 转换为 UTF-16 进展顺利。 :/ 不过,iconv 可能值得进一步研究。
iconv should do it, as far as I know. Not 100% certain about how it handles conversions where one input character should/could become several output characters, such as with the ellipsis example ... Something to try!
Update: I did try it, and it seems it doesn't work. It fails, possibly since it doesn't know how to express ellipsis (the test character I used) in a "smaller" encoding. Converting from UTF-8 to UTF-16 went fine. :/ Still, iconv might be worth investigating further.
看看音译工具; 我喜欢 Unidecode (Perl 语言) ,并且移植到其他语言并不难。
Have a look at transliteration tools; I like Unidecode (in Perl), and it's not too hard to port to other languages.
我已经使用 iconv 将由 Windows 中的 TextPad 创建的 UTF-16LE 文件(我通过反复试验发现的小端序)转换为 OSX 上的 ASCII,如下所示:
您也可以通过 hexdump 进行管道查看字符并确保您获得正确的输出,终端知道如何解释 UTF-16 并正确显示它,这样您就无法仅在文件上执行“cat”:
这显示了带有十六进制字符代码的布局和右侧的 ASCII 字符,您可以在 -f“from”参数中尝试不同的编码来弄清楚您正在处理的内容。
使用“iconv -l”列出 iconv 可在您的系统上使用的字符集。
I have used iconv to convert a file from UTF-16LE (little-endian as I found out by trial and error) that was created by TextPad in Windows into ASCII on OSX like this:
You can pipe through hexdump as well to view the characters and make sure you're getting the right output, the terminal knows how to interpret UTF-16 and displays it properly so you can't tell just but doing 'cat' on the file:
This shows the layout with the hex char codes and the ASCII characters to the right-hand side, and you can try different encodings in the -f "from" parameter to figure out what you're dealing with.
Use 'iconv -l' to list the character sets iconv can use on your system.
昨天或前天有一个关于文件重命名的问题,我展示了一个可用于该任务的 Perl 脚本
rename.pl
。 问题在于了解奇数字符是如何编码的,并设计正确的音译序列。 我可能会通过对该脚本的改编来完成它,该脚本按顺序完成所有映射。 一次只处理一个角色会显得过于繁琐。问题是:如何使用前缀/后缀重命名
There was a question yesterday or the day before about file renaming, and I showed a Perl script
rename.pl
that would be usable for the task. The problem area is knowing how the odd characters are encoded, and devising the correct sequence of transliterations. I'd probably do it with an adaptation of that script that did all the mappings sequentially. Doing it one character at a time would be unduly fiddly.Question was: How to rename with prefix/suffix