如何使用java将文本插入到扫描的pdf文档中

发布于 2025-01-04 06:43:06 字数 3281 浏览 0 评论 0原文

我必须将文本添加到有许多扫描的 pdf 文档的 pdf 文档中，以便将插入的文本插入回扫描的图像中，而不是插入到图像上。如何在 pdf 内的扫描图像上添加文本。

package editExistingPDF;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;

import jxl.Cell;
import jxl.Sheet;
import jxl.Workbook;
import jxl.read.biff.BiffException;

import org.apache.commons.io.FilenameUtils;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Font;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;

public class AddPragraphToPdf {



    public static void main(String[] args) throws IOException, DocumentException, BiffException {

        String tan = "no tan";
        File inputWorkbook = new File("lars.xls");
        Workbook w;

            w = Workbook.getWorkbook(inputWorkbook);
            // Get the first sheet
            Sheet sheet = w.getSheet(0);

            Cell[] tnas =sheet.getColumn(0);


        File ArticleFolder = new File("C:\\Documents and Settings\\sathishkumarkk\\My Documents\\article");
        File[] listOfArticles = ArticleFolder.listFiles();

        for (int ArticleInList = 0; ArticleInList < listOfArticles.length; ArticleInList++)  
        { 
            Document document = new Document(PageSize.A4);

      //  System.out.println(listOfArticles[ArticleInList].toString());
        PdfReader pdfArticle = new PdfReader(listOfArticles[ArticleInList].toString());
        if(listOfArticles[ArticleInList].getName().contains(".si."))
        {continue;}
        int noPgs=pdfArticle.getNumberOfPages();
        String ArticleNoWithOutExt = FilenameUtils.removeExtension(listOfArticles[ArticleInList].getName());
        String TanNo=ArticleNoWithOutExt.substring(0,ArticleNoWithOutExt.indexOf('.'));

     // Create output PDF
        PdfWriter writer = PdfWriter.getInstance(document,new FileOutputStream("C:\\Documents and Settings\\sathishkumarkk\\My Documents\\toPrint\\"+ArticleNoWithOutExt+".pdf"));
        document.open();
        PdfContentByte cb = writer.getDirectContent();
        //get tan form excel sheet
        System.out.println(TanNo);
        for(Cell content : tnas){
            if(content.getContents().contains(TanNo)){
                tan=content.getContents();
                System.out.println(tan);
            }else{
                continue;
            }
        }
        // Load existing PDF
        //PdfReader reader = new PdfReader(new FileInputStream("1.pdf"));

          for (int i = 1; i <= noPgs; i++) {
        PdfImportedPage page = writer.getImportedPage(pdfArticle, i); 

        // Copy first page of existing PDF into output PDF
        document.newPage();
        cb.addTemplate(page, 0, 0);
        // Add your TAN here
        Paragraph p= new Paragraph(tan);
        Font font = new Font();
        font.setSize(1.0f);
        p.setLeading(12.0f, 1.0f);
        p.setFont(font);

        document.add(p); 
          }
        document.close();
        }
    }

}

注意：问题是，当创建仅包含文本的 pdf 时，我没有问题，但是当 pdf 充满扫描文档并且当我尝试添加文本时；它会添加到扫描文档的背面。因此，当我打印这些 pdf 时，我不会收到我添加的文本。

原文

I have to add text to pdf documents where there are many scanned pdf documents so the inserted text is inserted back to the scanned image and not over the image. how to add text over the scanned image inside the pdf.

package editExistingPDF;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;

import jxl.Cell;
import jxl.Sheet;
import jxl.Workbook;
import jxl.read.biff.BiffException;

import org.apache.commons.io.FilenameUtils;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Font;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;

public class AddPragraphToPdf {



    public static void main(String[] args) throws IOException, DocumentException, BiffException {

        String tan = "no tan";
        File inputWorkbook = new File("lars.xls");
        Workbook w;

            w = Workbook.getWorkbook(inputWorkbook);
            // Get the first sheet
            Sheet sheet = w.getSheet(0);

            Cell[] tnas =sheet.getColumn(0);


        File ArticleFolder = new File("C:\\Documents and Settings\\sathishkumarkk\\My Documents\\article");
        File[] listOfArticles = ArticleFolder.listFiles();

        for (int ArticleInList = 0; ArticleInList < listOfArticles.length; ArticleInList++)  
        { 
            Document document = new Document(PageSize.A4);

      //  System.out.println(listOfArticles[ArticleInList].toString());
        PdfReader pdfArticle = new PdfReader(listOfArticles[ArticleInList].toString());
        if(listOfArticles[ArticleInList].getName().contains(".si."))
        {continue;}
        int noPgs=pdfArticle.getNumberOfPages();
        String ArticleNoWithOutExt = FilenameUtils.removeExtension(listOfArticles[ArticleInList].getName());
        String TanNo=ArticleNoWithOutExt.substring(0,ArticleNoWithOutExt.indexOf('.'));

     // Create output PDF
        PdfWriter writer = PdfWriter.getInstance(document,new FileOutputStream("C:\\Documents and Settings\\sathishkumarkk\\My Documents\\toPrint\\"+ArticleNoWithOutExt+".pdf"));
        document.open();
        PdfContentByte cb = writer.getDirectContent();
        //get tan form excel sheet
        System.out.println(TanNo);
        for(Cell content : tnas){
            if(content.getContents().contains(TanNo)){
                tan=content.getContents();
                System.out.println(tan);
            }else{
                continue;
            }
        }
        // Load existing PDF
        //PdfReader reader = new PdfReader(new FileInputStream("1.pdf"));

          for (int i = 1; i <= noPgs; i++) {
        PdfImportedPage page = writer.getImportedPage(pdfArticle, i); 

        // Copy first page of existing PDF into output PDF
        document.newPage();
        cb.addTemplate(page, 0, 0);
        // Add your TAN here
        Paragraph p= new Paragraph(tan);
        Font font = new Font();
        font.setSize(1.0f);
        p.setLeading(12.0f, 1.0f);
        p.setFont(font);

        document.add(p); 
          }
        document.close();
        }
    }

}

NOTE: The problem is that when there is a pdf create with only text I have no problem but when a pdf is with full of scanned document and when I try to add text; it gets added to the back of the scanned document. so while I print those pdf I will not get those text I added.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

遥远的她 2025-01-11 06:43:06

来自这个iText示例（这是与您想要的相反，但将 getUnderContent 与 getOverContent 切换，就可以了）：

块引用
每个 PDF 页面都有两个额外的层；一个位于所有文本/图形的顶部，一个位于底部。所有用户添加的内容都介于这两者之间。如果我们进入这个最底层的内容，我们可以在下面写任何我们想要的东西。要进入最底层，我们可以使用 PdfStamper 对象的“getUnderContent”方法。
这在 iText API 参考中进行了记录，如下所示：

public PdfContentByte getUnderContent(int pageNum)
    Gets a PdfContentByte to write under the page of the original document.
    Parameters:
       pageNum - the page number where the extra content is written
    Returns:
         a PdfContentByte to write under the page of the original document

From this iText Example (which is the reverse of what you want, but switch getUnderContent with getOverContent and you'll be fine) :

Blockquote
Each PDF page has two extra layers; one that sits on top of all text / graphics and one that goes to the bottom. All user added content gets in-between these two. If we get into this bottommost content, we can write anything under that we want. To get into this bottommost layer, we can use the " getUnderContent" method of PdfStamper object.
This is documented in iText API Reference as shown below:

public PdfContentByte getUnderContent(int pageNum)
    Gets a PdfContentByte to write under the page of the original document.
    Parameters:
       pageNum - the page number where the extra content is written
    Returns:
         a PdfContentByte to write under the page of the original document

回复收藏 0 原文