以编程方式使用的良好文档标准是什么?

发布于 2024-09-06 05:50:51 字数 664 浏览 3 评论 0原文

我正在编写一个程序,需要以文档形式输入,它需要替换一些值,插入一个表格,并将其转换为PDF。它是用 Python + Qt (PyQt) 编写的。是否有任何众所周知的文档标准可以轻松地以编程方式使用?它必须是跨平台的,并且最好是开放的。

  1. 我查看了 Microsoft Doc 和 Docx,它们是二进制格式,我无法编辑它们。 Python 有它的绑定,但它们仅在 Windows 上。

  2. Open Office 的 ODT/ODF 压缩在一个 xml 文件中,因此我可以编辑该文件,但没有命令行实用程序或任何以编程方式将文件转换为 PDF 的方法。 Open Office 提供了绑定,但是需要从命令行运行 Open Office、启动服务器等。而且我的客户端可能没有安装 Open Office。

  3. RTF 可以从 Python 读取,但我找不到任何方法/库将 RTF 文档转换为 PDF。

    RTF 可以从 Python 读取

目前,我正在从 Microsoft Word 导出到 HTML,替换值并使用 PyQt 将其转换为 PDF。然而它失去了格式化功能并且看起来很糟糕。我很惊讶没有一个众所周知的库可以让您编辑各种文档格式并将它们转换为其他格式,我是否遗漏了一些东西?

更新:感谢您的建议,我将看看如何使用 Latex。

谢谢, 杰克逊

I'm writing a program that requires input in the form of a document, it needs to replace a few values, insert a table, and convert it to PDF. It's written in Python + Qt (PyQt). Is there any well known document standard which can be easily used programmatically? It must be cross platform, and preferably open.

  1. I have looked into Microsoft Doc and Docx, which are binary formats and I can't edit them. Python has bindings for it, but they're only on Windows.

  2. Open Office's ODT/ODF is zipped in an xml file, so I can edit that one but there's no command line utilities or any way to programmatically convert the file to a PDF. Open Office provides bindings, but you need to run Open Office from the command line, start a server, etc. And my clients may not have Open Office installed.

  3. RTF is readable from Python, but I couldn't find any way/libraries to convert RTF documents to PDF.

At the moment I'm exporting from Microsoft Word to HTML, replacing the values and using PyQt to convert it to a PDF. However it loses formatting features and looks awful. I'm surprised there isn't a well known library which lets you edit a variety of document formats and convert them into other formats, am I missing something?

Update: Thanks for the advice, I'll have a look at using Latex.

Thanks,
Jackson

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

情感失落者 2024-09-13 05:50:51

您研究过使用 LaTeX 文档吗?

它们非常适合以编程方式使用(编译文档?您一定会喜欢它......),并且您有多个可以使用的 Python 框架,例如 plasTeXPyTex

将 LaTeX 文档导出为 PDF 几乎是即时的。

Have you looked into using LaTeX documents?

They are perfect to use programatically (compiling documents? You gotta love that...), and you have several Python frameworks you can use such as plasTeX and PyTex.

Exporting a LaTeX documents to PDF is almost immediate.

故笙诉离歌 2024-09-13 05:50:51

既然你已经在使用 PyQt 了,那么可能值得看看 Qt 的内置 RTF处理模块看起来不错。以下是有关详细内容操作(包括插入表格)的文档。此外,QPrinter 模块的默认打印到文件格式恰好是 PDF。

如果不了解更多关于您的特定需求的信息,很难说这些是否能满足您的需求,但由于您的应用程序已经将 PyQt 作为依赖项,因此在不评估您已经可用的功能的情况下引入更多内容似乎很愚蠢。

然而,Qt 框架的非 GUI 部分常常被忽视。

编辑:包含更多链接。

Since you're already using PyQt anyway, it might be worth looking at Qt's built-in RTF processing module which looks decent. Here's the documentation on detailed content manipulation including inserting tables. Also the QPrinter module's default print-to-file format happens to be PDF.

Without knowing more about your particular needs it's hard to say if these would do what you want, but since your application already has PyQt as a dependency, seems silly to introduce any more without evaluating the functionality you've already got available.

The non-GUI parts of the Qt framework are often overlooked though.

edit: included more links.

陈独秀 2024-09-13 05:50:51

您可能想尝试 ReportLab。开源版本可以编写 PDF,商业版本有很多非常好的抽象,允许从单个输入输出到各种不同的格式。

You might want to try ReportLab. The open source version can write PDFs, and the commercial version has a lot of really nice abstractions to allow output to a variety of different formats from a single input.

似最初 2024-09-13 05:50:51

我不知道你的程序是什么样的,Tex 很好,我会选择它。
另一种可能的选择是 Excel 格式,用 xlrd 解析它。
我已经用过几次了,非常简单。
Excel 文件是一个很好的文件,原因如下:

  1. 众所周知的格式,易于编辑
  2. 您可以准备一个带有约束和表格的预定义模板

I don't know the kind of odience of your program, Tex is good and i would go with it.
Another possible choice is Excel format, parsing it with xlrd.
I've used it a couple of time and it's pretty straightforward.
Excel file is a good for the following reasons:

  1. Well known format easy to edit
  2. You could prepare a predefined template with constrains and table
梦与时光遇 2024-09-13 05:50:51

创建 XML 文档,将其转换为 XSL/fo 并使用 Fop 或 RenderX 进行渲染。如果您使用 docbook 作为主要输入,则可以免费使用工具链将其转换为 PDF、RTF、HTML 等。

它的使用方式相当古怪,并不是我的乐趣所在,但它确实提供了并且可以嵌入到应用程序 AFAICT 中。

创建文档手册非常简单,因为它具有广泛的语义标签、表格支持等,可以提供可以可靠格式化的“有意义的”标记。 XSL 样式表是模块化的,允许自定义或替换部分以生成您自己的外观和感觉。

它非常适合具有大量文本的相对自由流动的文档。

对于填写空白类型的文档,常规报告引擎可能更适合,或者一些直接输出 XSL-fo 的简单 XSL 样式表。

Creating XML documents, transforming them to XSL/fo and rendering with Fop or RenderX. If you use docbook as the primary input, there are toolchains freely available for converting that to PDF, RTF, HTML and so forth.

It is rather quirky to use and not my idea of fun, but is does deliver and can be embedded in an application, AFAICT.

Creating docbook is very straightforward as it has a wide range of semantic tags, table support etc to give a "meaningful" markup which can be reliably formatted. The XSL stylesheets are modular and allow parts to be customized or replaced to generate your own look and feel.

It works well for relatively free flow documents with lots of text.

For filling in the blanks kind of documents, a regular reporting engine may be a better fit, or some straighforward XSL stylesheets spitting out the XSL-fo directly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文