使用 Java 将 DOC 文件转换为 DOCX

发布于 2024-11-23 21:55:07 字数 181 浏览 1 评论 0原文

我需要在我目前正在开发的Java软件中使用DOCX文件(实际上是其中包含的XML),但是我公司中的一些人仍然使用DOC格式。

您知道是否有一种方法可以使用 Java 将 DOC 文件转换为 DOCX 格式?我知道可以使用 C#,但这不是

我用 google 搜索的选项,但没有出现任何结果...

谢谢

I need to use DOCX files (actually the XML contained in them) in a Java software I'm currently developing, but some people in my company still use the DOC format.

Do you know if there is a way to convert a DOC file to the DOCX format using Java ? I know it's possible using C#, but that's not an option

I googled it, but nothing came up...

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

梦中的蝴蝶 2024-11-30 21:55:07

您可以尝试 Aspose.Words for Java。它允许您 加载DOC文件另存为DOCX 格式。代码非常简单,如下所示:

// Open a document.  
Document doc = new Document("input.doc"); 
// Save document. 
doc.save("output.docx");

请看看这对您的场景是否有帮助。

披露:我在 Aspose 担任开发人员传播者。

You may try Aspose.Words for Java. It allows you to load a DOC file and save it as DOCX format. The code is very simple as shown below:

// Open a document.  
Document doc = new Document("input.doc"); 
// Save document. 
doc.save("output.docx");

Please see if this helps in your scenario.

Disclosure: I work as developer evangelist at Aspose.

诺曦 2024-11-30 21:55:07

查看 JODConverter 看看它是否符合要求。我个人没有使用过。

Check out JODConverter to see if it fits the bill. I haven't personally used it.

离旧人 2024-11-30 21:55:07

使用较新版本的 jar jodconverter-core-4.2.2.jarjodconverter-local-4.2.2.jar

String inputFile = "*.doc";
String outputFile = "*.docx";

LocalOfficeManager localOfficeManager = LocalOfficeManager.builder()
            .install()
            .officeHome(getDefaultOfficeHome()) //your path to openoffice
            .build();

  try {
      localOfficeManager.start();
      final DocumentFormat format
              = DocumentFormat.builder()
                      .from(DefaultDocumentFormatRegistry.DOCX)
                      .build();

      LocalConverter
              .make()
              .convert(new FileInputStream(new File(inputFile)))
              .as(DefaultDocumentFormatRegistry.getFormatByMediaType("application/msword"))
              .to(new File(outputFile))
              .as(format)
              .execute();

  } catch (OfficeException ex) {
      Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
  } catch (FileNotFoundException ex) {
      Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
  } finally {
      OfficeUtils.stopQuietly(localOfficeManager);
  }

Use newer versions of jars jodconverter-core-4.2.2.jar and jodconverter-local-4.2.2.jar

String inputFile = "*.doc";
String outputFile = "*.docx";

LocalOfficeManager localOfficeManager = LocalOfficeManager.builder()
            .install()
            .officeHome(getDefaultOfficeHome()) //your path to openoffice
            .build();

  try {
      localOfficeManager.start();
      final DocumentFormat format
              = DocumentFormat.builder()
                      .from(DefaultDocumentFormatRegistry.DOCX)
                      .build();

      LocalConverter
              .make()
              .convert(new FileInputStream(new File(inputFile)))
              .as(DefaultDocumentFormatRegistry.getFormatByMediaType("application/msword"))
              .to(new File(outputFile))
              .as(format)
              .execute();

  } catch (OfficeException ex) {
      Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
  } catch (FileNotFoundException ex) {
      Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
  } finally {
      OfficeUtils.stopQuietly(localOfficeManager);
  }
嗳卜坏 2024-11-30 21:55:07

JODConvertor 通过网络协议调用 OpenOffice/LibreOffice。因此,它可以“做您在 OpenOffice 中可以做的任何事情”。这包括转换格式。但它的性能与您运行的 OpenOffice 版本相同。我的一份文档中有一些艺术作品,但它并没有像我希望的那样转换它们。

根据 v3 的 google code 网站,不再支持 JODConvertor。

要让 JOD 完成这项工作,您需要执行以下操作

private static void transformBinaryWordDocToDocX(File in, File out)
{
    OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);
    DocumentFormat docx = converter.getFormatRegistry().getFormatByExtension("docx");
    docx.setStoreProperties(DocumentFamily.TEXT,
    Collections.singletonMap("FilterName", "MS Word 2007 XML"));

    converter.convert(in, out, docx);
}


private static void transformBinaryWordDocToW2003Xml(File in, File out)
{
    OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);;
    DocumentFormat w2003xml = new DocumentFormat("Microsoft Word 2003 XML", "xml", "text/xml");
    w2003xml.setInputFamily(DocumentFamily.TEXT);
    w2003xml.setStoreProperties(DocumentFamily.TEXT, Collections.singletonMap("FilterName", "MS Word 2003 XML"));
    converter.convert(in, out, w2003xml);
}



private static OfficeManager officeManager;

@BeforeClass
public static void setupStatic() throws IOException {

          /*officeManager = new DefaultOfficeManagerConfiguration()
      .setOfficeHome("C:/Program Files/LibreOffice 3.6")
      .buildOfficeManager();
      */

    officeManager = new ExternalOfficeManagerConfiguration().setConnectOnStart(true).setPortNumber(8100).buildOfficeManager();


    officeManager.start();
}

@AfterClass
public static void shutdownStatic() throws IOException {

    officeManager.stop();
}

:为此,您需要将 LibreOffice 作为网络服务器运行(我无法让 JODConvertor 的“按需运行”部分在 Windows 下使用 LO 3.6 很好地工作)

JODConvertor calls OpenOffice/LibreOffice via a network protocol. It can therefore 'do anything you can do in OpenOffice'. This includes converting formats. But it only does as good a job as whatever version of OpenOffice you are running. I have some art in one of my docs, and it doesn't convert them as I hoped.

JODConvertor is no longer supported, according to the google code web site for v3.

To get JOD to do the job you need to do something like

private static void transformBinaryWordDocToDocX(File in, File out)
{
    OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);
    DocumentFormat docx = converter.getFormatRegistry().getFormatByExtension("docx");
    docx.setStoreProperties(DocumentFamily.TEXT,
    Collections.singletonMap("FilterName", "MS Word 2007 XML"));

    converter.convert(in, out, docx);
}


private static void transformBinaryWordDocToW2003Xml(File in, File out)
{
    OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);;
    DocumentFormat w2003xml = new DocumentFormat("Microsoft Word 2003 XML", "xml", "text/xml");
    w2003xml.setInputFamily(DocumentFamily.TEXT);
    w2003xml.setStoreProperties(DocumentFamily.TEXT, Collections.singletonMap("FilterName", "MS Word 2003 XML"));
    converter.convert(in, out, w2003xml);
}



private static OfficeManager officeManager;

@BeforeClass
public static void setupStatic() throws IOException {

          /*officeManager = new DefaultOfficeManagerConfiguration()
      .setOfficeHome("C:/Program Files/LibreOffice 3.6")
      .buildOfficeManager();
      */

    officeManager = new ExternalOfficeManagerConfiguration().setConnectOnStart(true).setPortNumber(8100).buildOfficeManager();


    officeManager.start();
}

@AfterClass
public static void shutdownStatic() throws IOException {

    officeManager.stop();
}

For this to work you need to be running LibreOffice as a networked server ( I could not get the 'run on demand' part of JODConvertor to work under windows with LO 3.6 very well )

半透明的墙 2024-11-30 21:55:07

我需要相同的转换,经过大量研究后发现 Jodconvertor 可以在其中有用,您可以从以下位置下载 jar
https://code.google.com/p/jodconverter/downloads/list

将 jodconverter-core-3.0-beta-4-sources.jar 文件添加到项目库中

  //1) Create OfficeManger Object     
OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
                .setOfficeHome(new File("/opt/libreoffice4.4"))
                .buildOfficeManager();
        officeManager.start();
    // 2) Create JODConverter converter   
        OfficeDocumentConverter converter = new OfficeDocumentConverter(
                officeManager);
// 3)Create DocumentFormat for docx
DocumentFormat docx = converter.getFormatRegistry().getFormatByExtension("docx");
        docx.setStoreProperties(DocumentFamily.TEXT,
                Collections.singletonMap("FilterName", "MS Word 2007 XML"));
//4)Call convert funtion in converter object
converter.convert(new File("doc/AdvancedTable.doc"), new File(
                "docx/AdvancedTable.docx"), docx);

I needed the same conversion ,after researching a lot found Jodconvertor can be useful in it , you can download the jar from
https://code.google.com/p/jodconverter/downloads/list

Add jodconverter-core-3.0-beta-4-sources.jar file to your project lib

  //1) Create OfficeManger Object     
OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
                .setOfficeHome(new File("/opt/libreoffice4.4"))
                .buildOfficeManager();
        officeManager.start();
    // 2) Create JODConverter converter   
        OfficeDocumentConverter converter = new OfficeDocumentConverter(
                officeManager);
// 3)Create DocumentFormat for docx
DocumentFormat docx = converter.getFormatRegistry().getFormatByExtension("docx");
        docx.setStoreProperties(DocumentFamily.TEXT,
                Collections.singletonMap("FilterName", "MS Word 2007 XML"));
//4)Call convert funtion in converter object
converter.convert(new File("doc/AdvancedTable.doc"), new File(
                "docx/AdvancedTable.docx"), docx);
治碍 2024-11-30 21:55:07

要将 DOC 文件转换为 HTML,请查看此
在 Java 中以编程方式将 Word 文档转换为 HTML

使用此: http://poi.apache.org/

或者使用这个:

XWPFDocument docx = new XWPFDocument(OPCPackage.openOrCreate(new File("hello.docx")));  
XWPFWordExtractor wx = new XWPFWordExtractor(docx);  
String text = wx.getText();  
System.out.println("text = "+text); 

To convert DOC file to HTML look at this
(Convert Word doc to HTML programmatically in Java)

Use this: http://poi.apache.org/

Or use this :

XWPFDocument docx = new XWPFDocument(OPCPackage.openOrCreate(new File("hello.docx")));  
XWPFWordExtractor wx = new XWPFWordExtractor(docx);  
String text = wx.getText();  
System.out.println("text = "+text); 
南冥有猫 2024-11-30 21:55:07
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;


import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;


public class TestCon {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub

        POIFSFileSystem fs = null;  
        Document document = new Document();

        try {  
            System.out.println("Starting the test");  
            fs = new POIFSFileSystem(new FileInputStream("C:/Users/312845/Desktop/a.doc"));  

            HWPFDocument doc = new HWPFDocument(fs);  
            WordExtractor we = new WordExtractor(doc);  

            OutputStream file = new FileOutputStream(new File("C:/Users/312845/Desktop/test.docx")); 

            System.out.println("Document testing completed");  
        } catch (Exception e) {  
            System.out.println("Exception during test");  
            e.printStackTrace();  
        } finally {  
            // close the document  
            document.close();  
        }  
    }  
}
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;


import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;


public class TestCon {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub

        POIFSFileSystem fs = null;  
        Document document = new Document();

        try {  
            System.out.println("Starting the test");  
            fs = new POIFSFileSystem(new FileInputStream("C:/Users/312845/Desktop/a.doc"));  

            HWPFDocument doc = new HWPFDocument(fs);  
            WordExtractor we = new WordExtractor(doc);  

            OutputStream file = new FileOutputStream(new File("C:/Users/312845/Desktop/test.docx")); 

            System.out.println("Document testing completed");  
        } catch (Exception e) {  
            System.out.println("Exception during test");  
            e.printStackTrace();  
        } finally {  
            // close the document  
            document.close();  
        }  
    }  
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文