当前位置：文江博客话题详情

如何将 PDF 转换为文本文件并保留 PDF 的格式？

发布于 2024-12-24 18:40:54 字数 83 浏览 2 评论 0原文

您好，我想将 PDF 文件转换为文本文件。我正在将 PDF 文件转换为文本文件。但它不保留 PDF 文件中完全相同的文本格式。

请帮我。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

胡渣熟男 2024-12-31 18:40:54

文本文件本身不能包含格式。

您无法在纯文本文件中保留格式，因为它只包含文本。文本文件中可能有 HTML 标记，但我会将其称为 HTML 文件。否则，您应该尝试将其转换为富文本格式 (RTF)、Microsoft Word、OpenOffice 或某些其他文档类型。

回复收藏 0 原文

荒芜了季节 2024-12-31 18:40:54

这可以帮助你。

File f = new File(fileName);
        if (!f.isFile()) {  
            return null;  
        } 


        try {
            parser = new PDFParser(new FileInputStream(f));
        } catch (Exception e) {
            return null;
        }  

        try {
            parser.parse();
            cosDoc = parser.getDocument();  
            pdfStripper = new PDFTextStripper();
           /* pdfStripper.setStartPage(2); 
            pdfStripper.setEndPage(3);*/  
            pdDoc = new PDDocument(cosDoc);
            parsedText = pdfStripper.getText(pdDoc);
        } catch (Exception e) {  
            System.out.println("An exception occured in parsing the PDF Document.");  
            e.printStackTrace();  
            try {  
                   if (cosDoc != null) cosDoc.close();  
                   if (pdDoc != null) pdDoc.close();  
               } catch (Exception e1) {  
               e.printStackTrace();  
            }  
            return null;  
        }

This can help you.

File f = new File(fileName);
        if (!f.isFile()) {  
            return null;  
        } 


        try {
            parser = new PDFParser(new FileInputStream(f));
        } catch (Exception e) {
            return null;
        }  

        try {
            parser.parse();
            cosDoc = parser.getDocument();  
            pdfStripper = new PDFTextStripper();
           /* pdfStripper.setStartPage(2); 
            pdfStripper.setEndPage(3);*/  
            pdDoc = new PDDocument(cosDoc);
            parsedText = pdfStripper.getText(pdDoc);
        } catch (Exception e) {  
            System.out.println("An exception occured in parsing the PDF Document.");  
            e.printStackTrace();  
            try {  
                   if (cosDoc != null) cosDoc.close();  
                   if (pdDoc != null) pdDoc.close();  
               } catch (Exception e1) {  
               e.printStackTrace();  
            }  
            return null;  
        }

回复收藏 0 原文