Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 10 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(6)
您不太可能找到一个产品可以完成所有这些工作,尤其是在开源世界中。您最终更有可能依赖于各种混杂的东西,甚至可能需要链接一些转换器才能获取 HTML。 (例如PDF -> ps -> HTML)
OpenOffice支持转换为HTML,并且可以从命令行调用。
http://pdftohtml.sourceforge.net/ 看起来相当擅长将 pdf 转换为 html。
对于 Word ML 或 OpenXML 格式的 Doc,可以想象您可以使用 XSLT 转换,因为输入和输出格式都是 XML。我见过网上流传的一些样式表可以做到这一点,但是YMMV。
顺便问一下,为什么对开源有特定的要求呢?例如,MS Powerpoint 已经支持另存为 HTML。
You're unlikely to find a single offering that does all this, especially in the open source world. It's more likely that you'll end up relying on a mishmash of things, and may even need to chain some converters in order to get to HTML. (Eg PDF -> ps -> HTML)
OpenOffice supports conversion to HTML, and can be called from the command line.
http://pdftohtml.sourceforge.net/ looks reasonably good at converting pdf to html.
For Doc that is Word ML or OpenXML format it's conceivable that you could use XSLT transforms since both input and output formats are XML. I've seen some stylesheets floating around the net that do this, but YMMV.
Incidentally, why is there a specific requirement for open source? MS Powerpoint already supports save-as-HTML for example.
Open Office 会将 pdf 转换为 html,但设计质量会受到影响。
我建议: Crocodoc 作为付费服务(它为不同平台提供不同的风格,例如Python、Ruby、 Java、PHP 开发人员可以使用他们的 API。)或等待官方 Adobe 工具(正在开发中)。
Open Office will convert pdf to html but you'll take a hit to design quality.
I suggest either: Crocodoc as a paid service (It provides different flavours for different platforms such as Python,Ruby,Java,PHP Developers are allowed to work on their APIs.) or waiting for an official Adobe tool (it's in the works).
对于 PDF 到 HTML 的转换,pdf2htmlEX 似乎是一个非常好的工具(查看所有示例/样本):
https:// github.com/coolwanglu/pdf2htmlEX
For PDF to HTML conversion, pdf2htmlEX seems like a pretty good tool (looking at all the examples/samples):
https://github.com/coolwanglu/pdf2htmlEX
对于 pdf,有一个由 mozilla 启动的开源项目,它非常好: https://github.com/ mozilla/pdf.js/
您可以看到一个 hello world 示例: https://github.com/mozilla/pdf.js/tree/master/examples/helloworld
对于其余的文档类型,我认为 LibreOffice 表示计划在 html5 中构建一些东西,但到目前为止还没有什么也没做。
For pdf there is an open source project started by mozilla and it's very good: https://github.com/mozilla/pdf.js/
You can see a hello world example : https://github.com/mozilla/pdf.js/tree/master/examples/helloworld
For the rest of document types I think LibreOffice said that are planning to build something in html5, but so far there isn't anything done.
http://wvware.sourceforge.net/
可能:
http://www.abisource.com/
但在这种情况下,它看起来像“open doc”>手动“导出 html”,也许插件有帮助。不确定,你的意思是:“可以转换的源软件”。
或者这个:
http://www.zope.org/Members/sf/NuxDocument
也是 pdftohtml会给你一个 html 页面输出。但是你必须使用它的图形界面。因为它似乎不是很有交互性。
http://wvware.sourceforge.net/
Possibly:
http://www.abisource.com/
but in this case it looks like "open doc" > "export html" manually, maybe plugins help. Not sure, what do you mean: "source software that can convert".
Or this:
http://www.zope.org/Members/sf/NuxDocument
Also the pdftohtml will give you an html page output.But you will have to work upon its graphical interface.Since it doesn't seems to be very interactive.
我知道这个问题有点老了,但是我发现了名为 flaxpaper http://flexpaper.devaldi.com/< 的新开源工具/a>
I know the question is bit old however I have found new Open source tool called flaxpaper http://flexpaper.devaldi.com/