用 Java 打开 Microsoft Word

发布于 2024-07-19 14:59:38 字数 3607 浏览 2 评论 0原文

我正在尝试用 java 打开 MS Word 2003 文档,搜索指定的字符串并将其替换为新的字符串。 我使用 APACHE POI 来做到这一点。 我的代码如下所示:

public void searchAndReplace(String inputFilename, String outputFilename,
            HashMap<String, String> replacements) {
    File outputFile = null;
    File inputFile = null;
    FileInputStream fileIStream = null;
    FileOutputStream fileOStream = null;
    BufferedInputStream bufIStream = null;
    BufferedOutputStream bufOStream = null;
    POIFSFileSystem fileSystem = null;
    HWPFDocument document = null;
    Range docRange = null;
    Paragraph paragraph = null;
    CharacterRun charRun = null;
    Set<String> keySet = null;
    Iterator<String> keySetIterator = null;
    int numParagraphs = 0;
    int numCharRuns = 0;
    String text = null;
    String key = null;
    String value = null;
        try {
            // Create an instance of the POIFSFileSystem class and
            // attach it to the Word document using an InputStream.
            inputFile = new File(inputFilename);
            fileIStream = new FileInputStream(inputFile);
            bufIStream = new BufferedInputStream(fileIStream);
            fileSystem = new POIFSFileSystem(bufIStream);
            document = new HWPFDocument(fileSystem);
            docRange = document.getRange();
            numParagraphs = docRange.numParagraphs();
            keySet = replacements.keySet();
            for (int i = 0; i < numParagraphs; i++) {
                paragraph = docRange.getParagraph(i);
                text = paragraph.text();
                numCharRuns = paragraph.numCharacterRuns();
                for (int j = 0; j < numCharRuns; j++) {
                    charRun = paragraph.getCharacterRun(j);
                    text = charRun.text();
                    System.out.println("Character Run text: " + text);
                    keySetIterator = keySet.iterator();
                    while (keySetIterator.hasNext()) {
                        key = keySetIterator.next();
                        if (text.contains(key)) {
                            value = replacements.get(key);
                            charRun.replaceText(key, value);
                            docRange = document.getRange();
                            paragraph = docRange.getParagraph(i);
                            charRun = paragraph.getCharacterRun(j);
                            text = charRun.text();
                        }
                    }
                }
            }
            bufIStream.close();
            bufIStream = null;
            outputFile = new File(outputFilename);
            fileOStream = new FileOutputStream(outputFile);
            bufOStream = new BufferedOutputStream(fileOStream);
            document.write(bufOStream);
        } catch (Exception ex) {
            System.out.println("Caught an: " + ex.getClass().getName());
            System.out.println("Message: " + ex.getMessage());
            System.out.println("Stacktrace follows.............");
            ex.printStackTrace(System.out);
        }
}

我使用以下参数调用此函数:

HashMap<String, String> replacements = new HashMap<String, String>();
replacements.put("AAA", "BBB");
searchAndReplace("C:/Test.doc", "C:/Test1.doc", replacements);

当 Test.doc 文件包含这样的简单行:“AAA EEE”时,它会成功运行,但是当我使用复杂的文件,它将成功读取内容并生成 Test1.doc 文件,但当我尝试打开它时,它会给出以下错误:

Word 无法读取此文档。 它可能已腐败。 尝试以下一项或多项操作: * 打开并修复文件。 * 使用文本恢复转换器打开文件。 (C:\Test1.doc)

请告诉我该怎么做,因为我是 POI 的初学者,我还没有找到好的教程。

I'm trying to open MS Word 2003 document in java, search for a specified String and replace it with a new String. I use APACHE POI to do that. My code is like the following one:

public void searchAndReplace(String inputFilename, String outputFilename,
            HashMap<String, String> replacements) {
    File outputFile = null;
    File inputFile = null;
    FileInputStream fileIStream = null;
    FileOutputStream fileOStream = null;
    BufferedInputStream bufIStream = null;
    BufferedOutputStream bufOStream = null;
    POIFSFileSystem fileSystem = null;
    HWPFDocument document = null;
    Range docRange = null;
    Paragraph paragraph = null;
    CharacterRun charRun = null;
    Set<String> keySet = null;
    Iterator<String> keySetIterator = null;
    int numParagraphs = 0;
    int numCharRuns = 0;
    String text = null;
    String key = null;
    String value = null;
        try {
            // Create an instance of the POIFSFileSystem class and
            // attach it to the Word document using an InputStream.
            inputFile = new File(inputFilename);
            fileIStream = new FileInputStream(inputFile);
            bufIStream = new BufferedInputStream(fileIStream);
            fileSystem = new POIFSFileSystem(bufIStream);
            document = new HWPFDocument(fileSystem);
            docRange = document.getRange();
            numParagraphs = docRange.numParagraphs();
            keySet = replacements.keySet();
            for (int i = 0; i < numParagraphs; i++) {
                paragraph = docRange.getParagraph(i);
                text = paragraph.text();
                numCharRuns = paragraph.numCharacterRuns();
                for (int j = 0; j < numCharRuns; j++) {
                    charRun = paragraph.getCharacterRun(j);
                    text = charRun.text();
                    System.out.println("Character Run text: " + text);
                    keySetIterator = keySet.iterator();
                    while (keySetIterator.hasNext()) {
                        key = keySetIterator.next();
                        if (text.contains(key)) {
                            value = replacements.get(key);
                            charRun.replaceText(key, value);
                            docRange = document.getRange();
                            paragraph = docRange.getParagraph(i);
                            charRun = paragraph.getCharacterRun(j);
                            text = charRun.text();
                        }
                    }
                }
            }
            bufIStream.close();
            bufIStream = null;
            outputFile = new File(outputFilename);
            fileOStream = new FileOutputStream(outputFile);
            bufOStream = new BufferedOutputStream(fileOStream);
            document.write(bufOStream);
        } catch (Exception ex) {
            System.out.println("Caught an: " + ex.getClass().getName());
            System.out.println("Message: " + ex.getMessage());
            System.out.println("Stacktrace follows.............");
            ex.printStackTrace(System.out);
        }
}

I call this function with following arguments:

HashMap<String, String> replacements = new HashMap<String, String>();
replacements.put("AAA", "BBB");
searchAndReplace("C:/Test.doc", "C:/Test1.doc", replacements);

When the Test.doc file contains a simple line like this : "AAA EEE", it works successfully, but when i use a complicated file it will read the content successfully and generate the Test1.doc file but when I try to open it, it will give me the following error:

Word unable to read this document. It may be corrupt.
Try one or more of the following:
* Open and repair the file.
* Open the file with Text Recovery converter.
(C:\Test1.doc)

Please tell me what to do, because I'm a beginner in POI and I have not found a good tutorial for it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

傲世九天 2024-07-26 14:59:38

首先,您应该关闭文档。

除此之外,我建议将原始 Word 文档重新保存为 Word XML 文档,然后手动将扩展名从 .XML 更改为 .doc 。 然后查看您正在使用的实际文档的 XML 并跟踪内容,以确保您没有意外编辑十六进制值(AAA 和 EEE 可能是其他字段中的十六进制值)。

如果没有看到实际的 Word 文档,很难说到底发生了什么。

关于 POI 的文档并不多,尤其是 Word 文档。

First of all you should be closing your document.

Besides that, what I suggest doing is resaving your original Word document as a Word XML document, then changing the extension manually from .XML to .doc . Then look at the XML of the actual document you're working with and trace the content to make sure you're not accidentally editing hexadecimal values (AAA and EEE could be hex values in other fields).

Without seeing the actual Word document it's hard to say what's going on.

There is not much documentation about POI at all, especially for Word document unfortunately.

遇到 2024-07-26 14:59:38

看起来可能是问题所在。

Looks like this could be the issue.

蓝眼睛不忧郁 2024-07-26 14:59:38

你可以尝试OpenOffice API,但是没有很多那里有资源告诉您如何使用它。

You could try OpenOffice API, but there arent many resources out there to tell you how to use it.

假扮的天使 2024-07-26 14:59:38

我不知道:可以自己回答吗,但只是为了分享知识,我会回答自己。

在网上浏览后,我找到的最终解决方案是:
名为 docx4j 的库非常适合处理 MS docx 文件,虽然它的文档到目前为止还不够,而且它的论坛仍处于起步阶段,但总的来说,它帮助我做我需要的事情。

感谢 4 所有帮助我的人。

I don't know : is its OK to answer myself, but Just to share the knowledge, I'll answer myself.

After navigating the web, the final solution i found is :
The Library called docx4j is very good for dealing with MS docx file, although its documentation is not enough till now and its forum is still in a beginning steps, but overall it help me to do what i need..

Thanks 4 all who help me..

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文