当前位置：文江博客话题详情

Python EPUB ibooks

用于提取“epub”的 Python 库信息

发布于 2024-09-07 03:22:28 字数 1539 浏览 9 评论 0 原文

Closed. This question is seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. It does not meet Stack Overflow guidelines. It is not currently accepting answers.

我们不允许提出寻求软件库、教程、工具、书籍或其他场外资源推荐的问题。您可以编辑问题，以便用事实和引文来回答。

9 年前已关闭。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

故事↓在人 2024-09-14 03:22:28

.epub 文件是一个包含 META-INF 目录的 zip 编码文件，该目录包含一个名为 container.xml 的文件，该文件指向另一个通常名为 Content.opf 的文件，该文件对构成电子书的所有其他文件进行索引（摘要基于 http://www.jedisaber.com/eBooks/tutorial.asp ; 完整规范位于 http://www.idpf.org/2007/ opf/opf2.0/download/ ）

以下 Python 代码将从 .epub 文件中提取基本元信息并将其作为字典返回。

import zipfile
from lxml import etree

def epub_info(fname):
    def xpath(element, path):
        return element.xpath(
            path,
            namespaces={
                "n": "urn:oasis:names:tc:opendocument:xmlns:container",
                "pkg": "http://www.idpf.org/2007/opf",
                "dc": "http://purl.org/dc/elements/1.1/",
            },
        )[0]

    # prepare to read from the .epub file
    zip_content = zipfile.ZipFile(fname)
      
    # find the contents metafile
    cfname = xpath(
        etree.fromstring(zip_content.read("META-INF/container.xml")),
        "n:rootfiles/n:rootfile/@full-path",
    ) 
    
    # grab the metadata block from the contents metafile
    metadata = xpath(
        etree.fromstring(zip_content.read(cfname)), "/pkg:package/pkg:metadata"
    )
    
    # repackage the data
    return {
        s: xpath(metadata, f"dc:{s}/text()")
        for s in ("title", "language", "creator", "date", "identifier")
    }

示例输出：

{
    'date': '2009-12-26T17:03:31',
    'identifier': '25f96ff0-7004-4bb0-b1f2-d511ca4b2756',
    'creator': 'John Grisham',
    'language': 'UND',
    'title': 'Ford County'
}

An .epub file is a zip-encoded file containing a META-INF directory, which contains a file named container.xml, which points to another file usually named Content.opf, which indexes all the other files which make up the e-book (summary based on http://www.jedisaber.com/eBooks/tutorial.asp ; full spec at http://www.idpf.org/2007/opf/opf2.0/download/ )

The following Python code will extract the basic meta-information from an .epub file and return it as a dict.

import zipfile
from lxml import etree

def epub_info(fname):
    def xpath(element, path):
        return element.xpath(
            path,
            namespaces={
                "n": "urn:oasis:names:tc:opendocument:xmlns:container",
                "pkg": "http://www.idpf.org/2007/opf",
                "dc": "http://purl.org/dc/elements/1.1/",
            },
        )[0]

    # prepare to read from the .epub file
    zip_content = zipfile.ZipFile(fname)
      
    # find the contents metafile
    cfname = xpath(
        etree.fromstring(zip_content.read("META-INF/container.xml")),
        "n:rootfiles/n:rootfile/@full-path",
    ) 
    
    # grab the metadata block from the contents metafile
    metadata = xpath(
        etree.fromstring(zip_content.read(cfname)), "/pkg:package/pkg:metadata"
    )
    
    # repackage the data
    return {
        s: xpath(metadata, f"dc:{s}/text()")
        for s in ("title", "language", "creator", "date", "identifier")
    }

Sample output:

{
    'date': '2009-12-26T17:03:31',
    'identifier': '25f96ff0-7004-4bb0-b1f2-d511ca4b2756',
    'creator': 'John Grisham',
    'language': 'UND',
    'title': 'Ford County'
}

回复收藏 0 原文

梦初启 2024-09-14 03:22:28

例如，类似于 epub-tools 的东西？但这主要是关于编写 epub 格式（来自各种可能的来源），epubtools （类似拼写，不同项目）。为了阅读它，我会尝试配套项目 Threepress ，一个用于在浏览器上显示 epub 书籍的 Django 应用程序 - 还没有看过该代码，但我想为了显示这本书，它必须首先能够阅读它;-)。

回复收藏 0 原文

南汐寒笙箫 2024-09-14 03:22:28

查看 epub 模块。这看起来是一个简单的选择。

回复收藏 0 原文

攀登最高峰 2024-09-14 03:22:28

在寻找类似的东西后，我来到这里，并受到 Bothwell 先生的代码片段的启发，开始了我自己的项目。如果有人感兴趣... http://epubzilla.odeegan.com/

回复收藏 0 原文

~没有更多了~

关于作者

拧巴小姐

暂无简介

0 文章

0 评论

25 人气

关注发私信

友情链接

文江博客

用于提取“epub”的 Python 库信息

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

用于提取“epub”的 Python 库信息

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。