将 pdf、doc、ppt 转换为 html5

发布于 2024-09-08 10:16:03 字数 1536 浏览 10 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

亢潮 2024-09-15 10:16:03

您不太可能找到一个产品可以完成所有这些工作,尤其是在开源世界中。您最终更有可能依赖于各种混杂的东西,甚至可能需要链接一些转换器才能获取 HTML。 (例如PDF -> ps -> HTML)

OpenOffice支持转换为HTML,并且可以从命令行调用。

http://pdftohtml.sourceforge.net/ 看起来相当擅长将 pdf 转换为 html。

对于 Word ML 或 OpenXML 格式的 Doc,可以想象您可以使用 XSLT 转换,因为输入和输出格式都是 XML。我见过网上流传的一些样式表可以做到这一点,但是YMMV。

顺便问一下,为什么对开源有特定的要求呢?例如,MS Powerpoint 已经支持另存为 HTML。

You're unlikely to find a single offering that does all this, especially in the open source world. It's more likely that you'll end up relying on a mishmash of things, and may even need to chain some converters in order to get to HTML. (Eg PDF -> ps -> HTML)

OpenOffice supports conversion to HTML, and can be called from the command line.

http://pdftohtml.sourceforge.net/ looks reasonably good at converting pdf to html.

For Doc that is Word ML or OpenXML format it's conceivable that you could use XSLT transforms since both input and output formats are XML. I've seen some stylesheets floating around the net that do this, but YMMV.

Incidentally, why is there a specific requirement for open source? MS Powerpoint already supports save-as-HTML for example.

戏剧牡丹亭 2024-09-15 10:16:03

Open Office 会将 pdf 转换为 html,但设计质量会受到影响。

我建议: Crocodoc 作为付费服务(它为不同平台提供不同的风格,例如Python、Ruby、 Java、PHP 开发人员可以使用他们的 API。)或等待官方 Adob​​e 工具(正在开发中)。

Open Office will convert pdf to html but you'll take a hit to design quality.

I suggest either: Crocodoc as a paid service (It provides different flavours for different platforms such as Python,Ruby,Java,PHP Developers are allowed to work on their APIs.) or waiting for an official Adobe tool (it's in the works).

拥抱影子 2024-09-15 10:16:03

对于 PDF 到 HTML 的转换,pdf2htmlEX 似乎是一个非常好的工具(查看所有示例/样本):

https:// github.com/coolwanglu/pdf2htmlEX

For PDF to HTML conversion, pdf2htmlEX seems like a pretty good tool (looking at all the examples/samples):

https://github.com/coolwanglu/pdf2htmlEX

胡渣熟男 2024-09-15 10:16:03

对于 pdf,有一个由 mozilla 启动的开源项目,它非常好: https://github.com/ mozilla/pdf.js/

您可以看到一个 hello world 示例: https://github.com/mozilla/pdf.js/tree/master/examples/helloworld

对于其余的文档类型,我认为 LibreOffice 表示计划在 html5 中构建一些东西,但到目前为止还没有什么也没做。

For pdf there is an open source project started by mozilla and it's very good: https://github.com/mozilla/pdf.js/

You can see a hello world example : https://github.com/mozilla/pdf.js/tree/master/examples/helloworld

For the rest of document types I think LibreOffice said that are planning to build something in html5, but so far there isn't anything done.

摇划花蜜的午后 2024-09-15 10:16:03

http://wvware.sourceforge.net/

wvHtml:转换您的Word文档
转换为 HTML4.0。

可能:
http://www.abisource.com/
但在这种情况下,它看起来像“open doc”>手动“导出 html”,也许插件有帮助。不确定,你的意思是:“可以转换的源软件”。

或者这个:
http://www.zope.org/Members/sf/NuxDocument

也是 pdftohtml会给你一个 html 页面输出。但是你必须使用它的图形界面。因为它似乎不是很有交互性。

http://wvware.sourceforge.net/

wvHtml: convert your Word document
into HTML4.0.

Possibly:
http://www.abisource.com/
but in this case it looks like "open doc" > "export html" manually, maybe plugins help. Not sure, what do you mean: "source software that can convert".

Or this:
http://www.zope.org/Members/sf/NuxDocument

Also the pdftohtml will give you an html page output.But you will have to work upon its graphical interface.Since it doesn't seems to be very interactive.

贪恋 2024-09-15 10:16:03

我知道这个问题有点老了,但是我发现了名为 flaxpaper http://flexpaper.devaldi.com/< 的新开源工具/a>

I know the question is bit old however I have found new Open source tool called flaxpaper http://flexpaper.devaldi.com/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文