将 XML 或 HTML 转换为 Wiki 标记 - 您会选择什么方法?
我需要将 HTML 文档(从 DocBook XML 文档生成)转换为 Wiki 标记语言,特别是 PM Wiki 标记语言。目标是将公司的应用程序操作指南包含在我们新创建的 wiki 中。这意味着我实际上有两个选择:
- 将 HTML(从 DocBook XML 生成)转换为 wiki
- 将 Docbook XML 直接转换为 wiki
由于 HTML 是由 DocBook 到 HTML 转换器生成的,因此在 HTML 文档中定义标签的方式差别不大,只是文件的内容不同。
我正在寻找一种可以自己快速实施的解决方案。我必须执行一次此转换,然后每次创建新版本的应用程序操作指南时。
到目前为止我想到的解决方案:
- 使用基于正则表达式的 Perl 或 PHP 脚本将 HTML 转换为 wiki。
- 将 Docbook XML 直接转换为 wiki。既然是XML,我可以使用Java来解析XML。这里的风险是我不熟悉 DocBooks XML 格式(就像我不熟悉 HTML 一样),因此这需要一些时间来学习。
您会选择什么方法来完成这项工作?
更新:
我刚刚尝试了一个名为 ConvertHTML 的 PMWiki 扩展。它效果不佳,因为它不会转换 HTML 标签(例如,不会像 wiki 中那样进行转换),正如其文档所述:
PmWiki 标记不支持所有 HTML 标记,因此 100% 转换是不可能的。但是,PmWiki 可以在编辑或保存文本时对其进行替换。 ConvertHTML 实现了一套相对全面的规则,用于将 HTML 标签转换为 wiki 标记。
I need to convert HTML documents (generated from DocBook XML documents) to the Wiki mark up language, in particular to the PM Wiki mark up language. The goal is to include the company's application operations guides in our newly created wiki. This means that I actually have two options:
- Convert the HTMLs (generated from DocBook XMLs) to wiki
- Convert the Docbook XMLs directly to wiki
Since the HTMLs are generated by a DocBook to HTML converter, the way the tags are defined within the HTML documents do not vary much, only the contents of the documents.
I am looking for a solution that could be implemented quickly by myself. I will have to do this conversion once and then every time new versions of the application operations guides are created.
Solutions that I've thought of so far:
- Convert HTML to wiki with a Perl or PHP script, based on regular expressions.
- Convert Docbook XMLs directly to wiki. Since it is XML, I could use Java for XML parsing. The risk here is that I am not familiar with the DocBooks XML format (as I am with HTML), so this make take some time to learn.
What approach would you choose for this work?
Update:
I just tried a PMWiki extension called ConvertHTML. It did not work well, because it does not convert HTML tags (e.g. is not converted as is left as in the wiki), as its documentation says:
PmWiki markup does not support all of the HTML markup so a 100% conversion is not possible. However, PmWiki can make replacements to the text as it is being edited or saved. ConvertHTML implements a relatively comprehensive set of rules for converting HTML tags to wiki markup.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
DocBook 至Wiki 可能很有用,尽管它从 DocBook 转换为 MediaWiki,而不是 PM Wiki。
有 Perl 模块可以将 HTML 转换为各种 Wiki 方言: HTML::WikiConverter 。因此,如果您可以将 DocBook 转换为 HTML,那么这也可能有效。
DocBook to Wiki might be useful, though it converts from DocBook to MediaWiki, not PM Wiki.
There are Perl modules which can convert HTML to various Wiki dialects: HTML::WikiConverter. So if you can get your DocBook into HTML, then that might also work.
我使用 Digester 从简单的 XML 文件生成 Java 对象,并通过 Java 修改它以满足我的需要。这是一个非常简单易用的工具。也许你想尝试一下。为我工作..
I used Digester to generate Java Objects out of an simple XML File and modify it for my needs via Java. It is an very simple to use Tool. Maybe you want to give it a try. Worked for me..
尝试 HTML2Mediawiki
(2020 年 3 月 10 日更新链接)
Try HTML2Mediawiki
(Updated link 10Mar2020)