如何将 Word 文档转换为 PDF?

发布于 2024-09-05 05:53:12 字数 1539 浏览 7 评论 0 原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

安穩 2024-09-12 05:53:12

这是一项相当艰巨的任务,如果您想要完美的结果(不使用 Word 是不可能的),那就更难了,因为我相信用纯 Java 为您完成这一切并且开源的 API 数量为零(更新:我错了,见下文)。

您的基本选项如下:

  1. 使用 JNI/C# Web 服务/等脚本 MS Office(100% 完美结果的唯一选项)
  2. 使用可用的 API 脚本 Open Office(90+% 完美)
  3. 使用 Apache POI 和 Open Office(90+% 完美) iText(非常大的工作,永远不会完美)。

更新 - 2016-02-11
这是我关于该主题的博客文章的精简副本,其中概述了支持 Java 中的 Word-to-PDF 的现有产品。

转换 Microsoft Office (使用 Java 将 Word、Excel)文档转换为 PDF

据我所知,三种产品可以呈现 Office 文档:

yeokm1/docs-to-pdf-converter
不定期维护、纯 Java、开源
将多个库结合在一起来执行转换。

xdocreport
积极开发,纯Java,开源
它是 Java API,用于将使用 MS Office (docx) 或 OpenOffice (odt)、LibreOffice (odt) 创建的 XML 文档与 Java 模型合并,以生成报告并在需要时将其转换为其他格式(PDF、XHTML...)。

Snowbound 成像 SDK
闭源、纯 Java
Snowbound 似乎是 100% Java 解决方案,成本超过 2,500 美元。它包含描述如何转换评估下载中的文档的示例。

OpenOffice API
开源,非纯 Java - 需要安装 Open Office
OpenOffice 是一个支持 Java API 的本机 Office 套件。这支持阅读Office文档和编写PDF文档。 SDK 包含文档转换示例 (examples/java/DocumentHandling/DocumentConverter.java)。要编写 PDF,您需要通过“writer_pdf_Export”编写器,而不是“MS Word 97”编写器。
或者您可以使用包装 API JODConverter

JDocToPdf - 截至 2016 年 2 月 11 日已失效
使用 Apache POI 读取 Word 文档,使用 iText 编写 PDF。完全免费,100% Java,但有一些限制

This is quite a hard task, ever harder if you want perfect results (impossible without using Word) as such the number of APIs that just do it all for you in pure Java and are open source is zero I believe (Update: I am wrong, see below).

Your basic options are as follows:

  1. Using JNI/a C# web service/etc script MS Office (only option for 100% perfect results)
  2. Using the available APIs script Open Office (90+% perfect)
  3. Use Apache POI & iText (very large job, will never be perfect).

Update - 2016-02-11
Here is a cut down copy of my blog post on this subject which outlines existing products that support Word-to-PDF in Java.

Converting Microsoft Office (Word, Excel) documents to PDFs in Java

Three products that I know of can render Office documents:

yeokm1/docs-to-pdf-converter
Irregularly maintained, Pure Java, Open Source
Ties together a number of libraries to perform the conversion.

xdocreport
Actively developed, Pure Java, Open Source
It's Java API to merge XML document created with MS Office (docx) or OpenOffice (odt), LibreOffice (odt) with a Java model to generate report and convert it if you need to another format (PDF, XHTML...).

Snowbound Imaging SDK
Closed Source, Pure Java
Snowbound appears to be a 100% Java solution and costs over $2,500. It contains samples describing how to convert documents in the evaluation download.

OpenOffice API
Open Source, Not Pure Java - Requires Open Office installed
OpenOffice is a native Office suite which supports a Java API. This supports reading Office documents and writing PDF documents. The SDK contains an example in document conversion (examples/java/DocumentHandling/DocumentConverter.java). To write PDFs you need to pass the "writer_pdf_Export" writer rather than the "MS Word 97" one.
Or you can use the wrapper API JODConverter.

JDocToPdf - Dead as of 2016-02-11
Uses Apache POI to read the Word document and iText to write the PDF. Completely free, 100% Java but has some limitations.

甜尕妞 2024-09-12 05:53:12

Docx4j 是开源,也是将 Docx 转换为 pdf 的最佳 API,无需任何对齐或字体问题。

Maven 依赖项

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-Internal</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-MOXy</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-export-fo</artifactId>
    <version>8.0.0</version>
</dependency>

代码

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;

import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;

public class DocToPDF {

    public static void main(String[] args) {
        
        try {
            InputStream templateInputStream = new FileInputStream("D:\\\\Workspace\\\\New\\\\Sample.docx");
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);
            MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();

            String outputfilepath = "D:\\\\Workspace\\\\New\\\\Sample.pdf";
            FileOutputStream os = new FileOutputStream(outputfilepath);
            Docx4J.toPDF(wordMLPackage,os);
            os.flush();
            os.close();
        } catch (Throwable e) {

            e.printStackTrace();
        } 
    }

}

Docx4j is open source and the best API for convert Docx to pdf without any alignment or font issue.

Maven Dependencies:

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-Internal</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-MOXy</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-export-fo</artifactId>
    <version>8.0.0</version>
</dependency>

Code:

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;

import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;

public class DocToPDF {

    public static void main(String[] args) {
        
        try {
            InputStream templateInputStream = new FileInputStream("D:\\\\Workspace\\\\New\\\\Sample.docx");
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);
            MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();

            String outputfilepath = "D:\\\\Workspace\\\\New\\\\Sample.pdf";
            FileOutputStream os = new FileOutputStream(outputfilepath);
            Docx4J.toPDF(wordMLPackage,os);
            os.flush();
            os.close();
        } catch (Throwable e) {

            e.printStackTrace();
        } 
    }

}
很酷又爱笑 2024-09-12 05:53:12

您可以使用 JODConverter 来实现此目的。它可用于在不同的办公格式之间转换文档。例如:

  1. Microsoft Office 到 OpenDocument,反之亦然
  2. 任何格式到 PDF
  3. 并支持更多转换
  4. 它还可以将 MS Office 2007 文档转换为 PDF 以及几乎所有格式

更多详细信息可以在此处找到:
http://www.artofsolving.com/opensource/jodconverter

You can use JODConverter for this purpose. It can be used to convert documents between different office formats. such as:

  1. Microsoft Office to OpenDocument, and vice versa
  2. Any format to PDF
  3. And supports many more conversion as well
  4. It can also convert MS office 2007 documents to PDF as well with almost all formats

More details about it can be found here:
http://www.artofsolving.com/opensource/jodconverter

自由如风 2024-09-12 05:53:12

已经是 2019 年了,我不敢相信仍然没有最简单、最方便的方法将 Java 世界中最流行的 Micro$oft Word 文档转换为 Adob​​e PDF 格式。

我几乎尝试了上面答案提到的所有方法,我发现最好的也是唯一能满足我的要求的方法是使用 OpenOffice 或 LibreOffice。其实我不太清楚它们之间的区别,似乎它们都提供了 soffice 命令行。

我的要求是:

  1. 它必须运行在Linux上,更具体地说是CentOS,不能运行在Windows上,因此我们不能在上面安装Microsoft Office;
  2. 它必须支持汉字,所以ISO-8859-1字符编码不是一个选择,它必须支持Unicode。

首先想到的是 doc-to-pdf-converter ,但它缺乏维护,上次更新发生在 4 年前,我不会使用无人维护的解决方案。 Xdocreport 似乎是一个有前途的选择,但它只能转换 docx,但不能转换 doc 二进制文件,这对我来说是强制性的。使用Java调用OpenOffice API看起来不错,但是对于如此简单的需求来说太复杂了。

最后我找到了最好的解决方案:使用OpenOffice命令行来完成工作:

Runtime.getRuntime().exec("soffice --convert-to pdf -outdir . /path/some.doc");

我始终相信最短的代码就是最好的代码(当然应该是可以理解的),就是这样。

It's already 2019, I can't believe still no easiest and conveniencest way to convert the most popular Micro$oft Word document to Adobe PDF format in Java world.

I almost tried every method the above answers mentioned, and I found the best and the only way can satisfy my requirement is by using OpenOffice or LibreOffice. Actually I am not exactly know the difference between them, seems both of them provide soffice command line.

My requirement is:

  1. It must run on Linux, more specifically CentOS, not on Windows, thus we cannot install Microsoft Office on it;
  2. It must support Chinese character, so ISO-8859-1 character encoding is not a choice, it must support Unicode.

First thing came in mind is doc-to-pdf-converter, but it lacks of maintenance, last update happened 4 years ago, I will not use a nobody-maintain-solution. Xdocreport seems a promising choice, but it can only convert docx, but not doc binary file which is mandatory for me. Using Java to call OpenOffice API seems good, but too complicated for such a simple requirement.

Finally I found the best solution: use OpenOffice command line to finish the job:

Runtime.getRuntime().exec("soffice --convert-to pdf -outdir . /path/some.doc");

I always believe the shortest code is the best code (of course it should be understandable), that's it.

孤单情人 2024-09-12 05:53:12

查看 github 上的 docs-to-pdf-converter。它是专为将文档转换为 pdf 而设计的轻量级解决方案。

为什么?

我想要一个可以转换 Microsoft Office 文档的简单程序
转换为 PDF,但没有 LibreOffice 之类的依赖项或昂贵的
专有解决方案。看看代码和库如何转换
每种单独的格式都分散在网络上,我决定
将所有这些解决方案合并到一个程序中。一路走来,我
因为我也遇到了该代码,所以决定添加 ODT 支持。

Check out docs-to-pdf-converter on github. Its a lightweight solution designed specifically for converting documents to pdf.

Why?

I wanted a simple program that can convert Microsoft Office documents
to PDF but without dependencies like LibreOffice or expensive
proprietary solutions. Seeing as how code and libraries to convert
each individual format is scattered around the web, I decided to
combine all those solutions into one single program. Along the way, I
decided to add ODT support as well since I encountered the code too.

窗影残 2024-09-12 05:53:12

您可以使用 Cloudmersive 本机 Java 库。它每月最多可免费进行 50,000 次转换,根据我的经验,它比 iText 或基于 Apache POI 的方法等其他方法的保真度要高得多。这些文档实际上看起来与 Microsoft Word 中的文档相同,这对我来说是关键。顺便说一句,它还可以将 XLSX、PPTX 以及传统的 DOC、XLS 和 PPT 转换为 PDF。

代码如下,首先添加导入:

import com.cloudmersive.client.invoker.ApiClient;
import com.cloudmersive.client.invoker.ApiException;
import com.cloudmersive.client.invoker.Configuration;
import com.cloudmersive.client.invoker.auth.*;
import com.cloudmersive.client.ConvertDocumentApi;

然后转换文件:

ApiClient defaultClient = Configuration.getDefaultApiClient();

// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");

ConvertDocumentApi apiInstance = new ConvertDocumentApi();
File inputFile = new File("/path/to/input.docx"); // File to perform the operation on.
try {
  byte[] result = apiInstance.convertDocumentDocxToPdf(inputFile);
  System.out.println(result);
} catch (ApiException e) {
  System.err.println("Exception when calling ConvertDocumentApi#convertDocumentDocxToPdf");
e.printStackTrace();
}

您可以获得 文档转换API 密钥可从门户免费获取。

You can use Cloudmersive native Java library. It is free for up to 50,000 conversions/month and is much higher fidelity in my experience than other things like iText or Apache POI-based methods. The documents actually look the same as they do in Microsoft Word which for me is the key. Incidentally it can also do XLSX, PPTX, and the legacy DOC, XLS and PPT conversion to PDF.

Here is what the code looks like, first add your imports:

import com.cloudmersive.client.invoker.ApiClient;
import com.cloudmersive.client.invoker.ApiException;
import com.cloudmersive.client.invoker.Configuration;
import com.cloudmersive.client.invoker.auth.*;
import com.cloudmersive.client.ConvertDocumentApi;

Then convert a file:

ApiClient defaultClient = Configuration.getDefaultApiClient();

// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");

ConvertDocumentApi apiInstance = new ConvertDocumentApi();
File inputFile = new File("/path/to/input.docx"); // File to perform the operation on.
try {
  byte[] result = apiInstance.convertDocumentDocxToPdf(inputFile);
  System.out.println(result);
} catch (ApiException e) {
  System.err.println("Exception when calling ConvertDocumentApi#convertDocumentDocxToPdf");
e.printStackTrace();
}

You can get an document conversion API key for free from the portal.

小矜持 2024-09-12 05:53:12

我同意海报将 OpenOffice 列为具有 Java API 的 word/pdf 文档高保真导入/导出工具,并且它也可以跨平台工作。 OpenOffice 导入/导出过滤器非常强大,在转换为包括 PDF 在内的各种格式时可以保留大多数格式。 DocmosisJODReports 增值功能使生活比直接学习 OpenOffice API 更容易,因为 UNO api 的风格和与崩溃相关的错误可能具有挑战性。

I agree with posters listing OpenOffice as a high-fidelity import/export facility of word / pdf docs with a Java API and it also works across platforms. OpenOffice import/export filters are pretty powerful and preserve most formatting during conversion to various formats including PDF. Docmosis and JODReports value-add to make life easier than learning the OpenOffice API directly which can be challenging because of the style of the UNO api and the crash-related bugs.

肩上的翅膀 2024-09-12 05:53:12

使用 JACOB 调用 Office Word是一个100%完美的解决方案。但它仅支持Windows平台,因为需要安装Office Word

  1. 下载JACOB压缩包(最新版本为1.19);

  2. 将 jacob.jar 添加到您的项目类路径中;

  3. 将 jacob-1.19-x32.dll 或 jacob-1.19-x64.dll(取决于您的 jdk 版本)添加到 ...\Java\jdk1.x.x_xxx\jre\bin

  4. 使用JACOB API调用Office Word进行转换doc/docx 到 pdf。

    public void ConvertDocx2pdf(String docxFilePath) {
        文件 docxFile = 新文件(docxFilePath);
        String pdfFile = docxFilePath.substring(0, docxFilePath.lastIndexOf(".docx")) + ".pdf";
    
        如果(docxFile.exists()){
            if (!docxFile.isDirectory()) {
                ActiveXComponent 应用程序 = null;
    
                长开始 = System.currentTimeMillis();
                尝试 {
                    ComThread.InitMTA(true);
                    app = new ActiveXComponent("Word.Application");
                    调度文档 = app.getProperty("Documents").toDispatch();
                    调度文档 = Dispatch.call(documents, "Open", docxFilePath, false, true).toDispatch();
                    文件目标 = 新文件(pdfFile);
                    if (目标.exists()) {
                        目标.删除();
                    }
                    Dispatch.call(文档, "另存为", pdfFile, 17);
                    Dispatch.call(文档, "关闭", false);
                    长结束= System.currentTimeMillis();
                    logger.info("============转换完成:" + (end - start) + "ms");
                } catch (异常 e) {
                    logger.error(e.getLocalizedMessage(), e);
                    throw new RuntimeException("pdf 转换失败。");
                } 最后 {
                    如果(应用程序!= null){
                        app.invoke("退出", new Variant[] {});
                    }
                    ComThread.Release();
                }
            }
        }
    }
    

Using JACOB call Office Word is a 100% perfect solution. But it only supports on Windows platform because need Office Word installed.

  1. Download JACOB archive (the latest version is 1.19);

  2. Add jacob.jar to your project classpath;

  3. Add jacob-1.19-x32.dll or jacob-1.19-x64.dll (depends on your jdk version) to ...\Java\jdk1.x.x_xxx\jre\bin

  4. Using JACOB API call Office Word to convert doc/docx to pdf.

    public void convertDocx2pdf(String docxFilePath) {
        File docxFile = new File(docxFilePath);
        String pdfFile = docxFilePath.substring(0, docxFilePath.lastIndexOf(".docx")) + ".pdf";
    
        if (docxFile.exists()) {
            if (!docxFile.isDirectory()) {
                ActiveXComponent app = null;
    
                long start = System.currentTimeMillis();
                try {
                    ComThread.InitMTA(true);
                    app = new ActiveXComponent("Word.Application");
                    Dispatch documents = app.getProperty("Documents").toDispatch();
                    Dispatch document = Dispatch.call(documents, "Open", docxFilePath, false, true).toDispatch();
                    File target = new File(pdfFile);
                    if (target.exists()) {
                        target.delete();
                    }
                    Dispatch.call(document, "SaveAs", pdfFile, 17);
                    Dispatch.call(document, "Close", false);
                    long end = System.currentTimeMillis();
                    logger.info("============Convert Finished:" + (end - start) + "ms");
                } catch (Exception e) {
                    logger.error(e.getLocalizedMessage(), e);
                    throw new RuntimeException("pdf convert failed.");
                } finally {
                    if (app != null) {
                        app.invoke("Quit", new Variant[] {});
                    }
                    ComThread.Release();
                }
            }
        }
    }
    
鯉魚旗 2024-09-12 05:53:12

unoconv,它是一个在 UNIX 上运行的 python 工具。
当我使用 Java 来调用 UNIX 中的 shell 时,它非常适合我。我的源代码: UnoconvTool.java。据说JODConverter和unoconv都使用open office/libre office。

docx4j/docxreport、POI、PDFBox 都很好,但它们在转换时缺少一些格式。

unoconv, it's a python tool worked in UNIX.
While I use Java to invoke the shell in UNIX, it works perfect for me. My source code : UnoconvTool.java. Both JODConverter and unoconv are said to use open office/libre office.

docx4j/docxreport, POI, PDFBox are good but they are missing some formats in conversion.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文