将喜欢PDF的epub保留
我目前正在研究一个项目,该项目是使用Python将PDF转换为EPUB。在将PDF转换为ePub的样式时,字体尺寸在EPUB中必须与PDF完全相同。有没有办法使用Python实现这一目标?而且我不需要任何外部软件就可以做到这一点。我使用了aspose。
#code我将
导入aspose.words用作aw
doc = aw.document(“ input.pdf”) doc.save(“ output.epub”)
,它是一个简单的文本pdf。
I'm currently working on a project which is to convert pdf to epub using python. While converting the pdf to epub the styling like font family, font size need to be exactly same in epub as that of pdf. Is there a way to achieve this using python? And i don't need any external softwares to do it. I used aspose.
#code i used
import aspose.words as aw
doc = aw.Document("Input.pdf")
doc.save("Output.epub")
and it is a simple text pdf.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您将获得各种答案/评论,这些答案/评论会要求您显示您尝试过的内容并发布示例文档等。
让我为您节省麻烦。您的问题似乎很简单,因为想要将PDF转换为EPUB并保留样式信息。
祝你好运。
这一切都取决于您的PDF文件。它是否具有嵌入式字体或依赖系统字体?复杂的布局?标题和页脚?那图像呢? Dingbats字符?如果PDF中没有文本,而只是文本字符的后记绘图怎么办?如果PDF仅由PDF容器中的多次页面组成,该怎么办?一切都用英语吗?是否有Unicode字符?您是否想在页面级别获得样式?段落?句子?单词?还是角色级别?
基本上,这是一个棘手的问题。 PDF被设计为最终使用格式而不是可互换格式。大多数事情都会转换为PDF,因为有人想控制最终产品的外观。您可以查看PDF的文本提取工具,但是使用OpenSOURCE或商业工具没有简单的解决方案。
You are going to get a variety of answers/comments that will ask you to show code as to what you tried and post sample documents etc.
Let me save you the trouble. Your question seems straightforward in that want to convert a pdf to epub and retain the style information.
Good luck.
It will all depend on your PDF file. Does it have embedded fonts or does it rely on system fonts? Complicated layout? Headers and footers? What about images? Dingbats characters? What if there is no text in the pdf, but just postscript drawing of text characters? What if the PDF just consists of multiple scans of pages in a pdf container? Is everything in English? Any Unicode characters? Are you looking to get the styles right at the page level? Paragraph? Sentence? Word? or Character Level?
Basically this is a hard problem. PDF was designed as an end use format not an interchangeable format. Most things get converted to PDF because someone wanted to control how the final product looked. You can look at text extraction tools for PDF, but there is not an easy solution with opensource or commercial tools.
您可以使用aspose.words for Python轻松将PDF转换为epub。代码很简单:
但是,将PDF加载到Aspose.Words文档对象模型后,它将从固定页面布局转换为流文档。当将文档保存到epub时,将其保存为流文档。恐怕,这可能会导致布局和格式化转换时的格式化。
You can easily convert PDF to EPUB using Aspose.Words for Python. The code is pretty simple:
However, upon loading PDF into Aspose.Words Document Object Model it is converted from fixed page layout to flow document. And when document is saved to EPUB it is saved as flow document. I am afraid, this might lead into layout and formatting loses upon conversion.