python收集ascii和utf-8的东西

发布于 2024-12-14 14:49:37 字数 396 浏览 3 评论 0原文

我有一个包含英语单词的文本文件“words.txt”。假设它只包含三个单词：“一”、“二”和“三”。我还有三个文件：one.dat、two.dat 和 Three.dat。这些文件中的每一个都包含表示相应单词的转录的二进制数据。格式为UTF-8。我想要什么：我想将“words.txt”和所有这些 .dats 合并到我可以打印的单个文档中。所以我需要这样的东西（让我们将其命名为“final.dat”）：

一个[wan] 两个[你：] 三个 [?ri:]

但使用正确的“th”符号而不是“?” :)

最重要的是我必须能够将“final.dat”加载到 MSWord 或 Writer 中并将其打印出来。

我将通过 python 来完成它，但我真的被所有这些“编解码器”、“编码”、“解码”等等所困扰......

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你的呼吸 2024-12-21 14:49:37

来完成读取 UTF-8 文件

open('one.dat').read().decode('utf-8')

在 Python 2.x 中，可以使用或

codecs.open('one.dat', encoding='utf-8').read()

两者都返回 Python unicode 对象。如果要将 str（ASCII/二进制字符串）s 转换为 unicode，请使用 s.decode('utf- 8')。

在 Python 3.x 中，只执行

open('one.dat').read()

open('one.dat', encoding='utf-8').read()

这个想法是 str (Py2.x) 或 bytes (Py3.x) 对象仅包含字符串的二进制表示形式在 some 编码中，而不指定是哪种编码； decode 方法将其转换为正确的 Unicode 字符串（2.x 中为 unicode，3.x 中为 str）。

（顺便说一句，UTF-8 不是“二进制数据”，它只是非 ASCII 编码的文本。）

In Python 2.x, reading a UTF-8 file can be accomplished using

open('one.dat').read().decode('utf-8')

codecs.open('one.dat', encoding='utf-8').read()

both of which return a Python unicode object. If you want to turn a str (ASCII/binary string) s into a unicode, use s.decode('utf-8').

In Python 3.x, do just

open('one.dat').read()

open('one.dat', encoding='utf-8').read()

The idea is that a str (Py2.x) or bytes (Py3.x) object contains just the binary representation of a string in some encoding without specifying which encoding that is; the decode method turns this into a proper Unicode string (unicode in 2.x, str in 3.x).

(Btw., UTF-8 is not "binary data", it's just text in a non-ASCII encoding.)

回复收藏 0 原文

~没有更多了~