python中的二进制文件IO,从哪里开始?
作为一名自学成才的 Python 爱好者,我将如何学习使用标准格式导入和导出二进制文件?
我想实现一个脚本,它采用 ePub 电子书(zip 中的 XHTML + CSS)并将其转换为 mobipocket (Palmdoc) 格式,以便允许 Amazon Kindle 阅读它(作为我的一个更大项目的一部分)我正在努力)。
已经有一个很棒的开源项目用于管理电子书库:Calibre。 我想尝试自己实现这一点作为学习/自学练习。 我开始查看他们的 python 源代码代码并意识到我不知道发生了什么。 当然,自学任何事情的最大危险是不知道自己不知道什么。
在这种情况下,我知道我对这些二进制文件以及如何在 python 代码中使用它们了解不多(结构?)。 但我认为我可能缺少很多关于二进制文件的一般知识,我需要一些帮助来理解如何使用它们。 这里是 mobi/palmdoc 标头的详细概述。 谢谢!
编辑:没问题,好点! 您对如何获得使用二进制文件的基本知识有什么建议吗? Python 特定的方法会很有帮助,但其他方法也可能有用。
TOM:编辑为问题,添加了介绍/更好的标题
As a self-taught python hobbyist, how would I go about learning to import and export binary files using standard formats?
I'd like to implement a script that takes ePub ebooks (XHTML + CSS in a zip) and converts it to a mobipocket (Palmdoc) format in order to allow the Amazon Kindle to read it (as part of a larger project that I'm working on).
There is already an awesome open-source project for managing ebook libraries : Calibre. I wanted to try implementing this on my own as a learning/self-teaching exercise. I started looking at their python source code and realized that I have no idea what is going on. Of course, the big danger in being self-taught at anything is not knowing what you don't know.
In this case, I know that I don't know much about these binary files and how to work with them in python code (struct?). But I think I'm probably missing a lot of knowledge about binary files in general and I'd like some help understanding how to work with them. Here is a detailed overview of the mobi/palmdoc headers. Thanks!
Edit: No question, good point! Do you have any tips on how to gain a basic knowledge of working with binary files? Python-specific would be helpful but other approaches could also be useful.
TOM:Edited as question, added intro / better title
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
正如您在问题中指出的那样,您可能应该从 struct 模块开始,并且当然,以二进制形式打开文件。
基本上,您只需从文件的开头开始,然后将其一块一块地拆开。 这是一个麻烦,但不是一个大问题。 如果文件被压缩或加密,事情就会变得更加困难。 如果您从一个您知道其内容的文件开始,这样您就不会一直猜测,这会很有帮助。
尝试一下,也许你会提出更具体的问题。
You should probably start with the struct module, as you pointed to in your question, and of course, open the file as a binary.
Basically you just start at the beginning of the file and pick it apart piece by piece. It's a hassle, but not a huge problem. If the files are compressed or encrypted, things can get more difficult. It's helpful if you start with a file that you know the contents of so you're not guessing all the time.
Try it a bit, and maybe you'll evolve more specific questions.
如果您想构建和分析二进制文件,struct 模块将为您提供基本工具,但它不是很友好,特别是如果您想查看不是整数字节的内容。
有一些模块可以提供帮助,例如 BitVector、bitarray 和 位串。 (我喜欢 bitstring,但我写了它,所以可能有偏见)。
对于解析二进制格式, hachoir 模块非常好,但我怀疑它对于您当前的需求来说太高级了。
If you want to construct and analyse binary files the struct module will give you the basic tools, but it isn't very friendly, especially if you want to look at things that aren't a whole number of bytes.
There are a few modules that can help, such as BitVector, bitarray and bitstring. (I favour bitstring, but I wrote it and so may be biased).
For parsing binary formats the hachoir module is very good, but I suspect it's too high-level for your current needs.
为了自学使用二进制文件的 python 工具,
这会让你继续前进。 也很有趣。 使用二进制文件、zip、图像等进行练习...等等。
For teaching yourself python tools that work with binary files,
this will get you going. Fun too. Exercises with binaries, zips, images... lots more.