当前位置：文江博客话题详情

PDFBox PDFTextStripperByArea 区域坐标

发布于 2024-12-21 19:24:30 字数 355 浏览 3 评论 0原文

中的矩形的尺寸和方向是多少

PDFTextStripperByArea 函数addRegion(StringregionName, Rectangle2D rect)

。换句话说，矩形R从哪里开始以及它有多大（原点值的尺寸，矩形的尺寸）以及它的方向go（图中蓝色箭头的方向），如果将 new Rectangle(10,10,100,100) 作为第二个参数给出？

PdfBox 矩形

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

度的依靠╰つ 2024-12-28 19:24:31

new Rectangle(10,10,100,100)

表示矩形的左上角位于 (10, 10) 位置，即距离 PDF 文档左侧和顶部 10 个单位。这里的“单位”是 1 pt = 1/72 英寸。

第一个 100 表示矩形的宽度，第二个 100 表示矩形的高度。
总结一下，右图是第一张。

我编写了这段代码来提取作为函数参数给出的页面的某些区域：

Rectangle2D region = new Rectangle2D.Double(x, y, width, height);
String regionName = "region";
PDFTextStripperByArea stripper;

stripper = new PDFTextStripperByArea();
stripper.addRegion(regionName, region);
stripper.extractRegions(page);

因此，x 和 y 是矩形左上角的绝对坐标，然后指定其宽度和高度。 page 是作为该函数的参数给出的 PDPage 变量。

new Rectangle(10,10,100,100)

means that the rectangle will have its upper-left corner at position (10, 10), so 10 units far from the left and the top of the PDF document. Here a "unit" is 1 pt = 1/72 inch.

The first 100 represents the width of the rectangle and the second one its height.
To sum up, the right picture is the first one.

I wrote this code to extract some areas of a page given as arguments to the function:

Rectangle2D region = new Rectangle2D.Double(x, y, width, height);
String regionName = "region";
PDFTextStripperByArea stripper;

stripper = new PDFTextStripperByArea();
stripper.addRegion(regionName, region);
stripper.extractRegions(page);

So, x and y are the absolute coordinates of the upper-left corner of the Rectangle and then you specify its width and height. page is a PDPage variable given as argument to this function.

回复收藏 0 原文

一抹淡然 2024-12-28 19:24:31

正在考虑做这样的事情，所以我想我会传递我发现的东西。

这是使用 itext 创建原始 pdf 的代码。

import com.lowagie.text.Document
import com.lowagie.text.Paragraph
import com.lowagie.text.pdf.PdfWriter

class SimplePdfCreator {
    void createFrom(String path) {
        Document d = new Document()
        try {
            PdfWriter writer = PdfWriter.getInstance(d, new FileOutputStream(path))
            d.open()
            d.add(new Paragraph("This is a test."))
            d.close()
        } catch (Exception e) {
            e.printStackTrace()
        }
    }
}

如果您打开 PDF，您会在左上角看到文本。这是显示您正在寻找的内容的测试。

@Test
void createFrom_using_pdf_box_to_extract_text_targeted_extraction() {
    new SimplePdfCreator().createFrom("myFileLocation")
    def doc = PDDocument.load("myFileLocation")
    Rectangle2D.Double d = new Rectangle2D.Double(0, 0, 120, 100)
    def stripper = new PDFTextStripperByArea()
    def pages = doc.getDocumentCatalog().allPages
    stripper.addRegion("myRegion", d)
    stripper.extractRegions(pages[0])
    assert stripper.getTextForRegion("myRegion").contains("This is a test.")
}

位置 (0, 0) 是文档的左上角。宽度和高度向下和向右。我能够将范围稍微缩小到 (35, 52, 120, 3)，但仍然可以通过测试。

所有代码都是用groovy 编写的。

Was looking into doing something like this, so I thought I'd pass what I found along.

Here's the code for creating my original pdf using itext.

import com.lowagie.text.Document
import com.lowagie.text.Paragraph
import com.lowagie.text.pdf.PdfWriter

class SimplePdfCreator {
    void createFrom(String path) {
        Document d = new Document()
        try {
            PdfWriter writer = PdfWriter.getInstance(d, new FileOutputStream(path))
            d.open()
            d.add(new Paragraph("This is a test."))
            d.close()
        } catch (Exception e) {
            e.printStackTrace()
        }
    }
}

If you crack open the pdf, you'll see the text in the upper left hand corner. Here's the test showing what you are looking for.

@Test
void createFrom_using_pdf_box_to_extract_text_targeted_extraction() {
    new SimplePdfCreator().createFrom("myFileLocation")
    def doc = PDDocument.load("myFileLocation")
    Rectangle2D.Double d = new Rectangle2D.Double(0, 0, 120, 100)
    def stripper = new PDFTextStripperByArea()
    def pages = doc.getDocumentCatalog().allPages
    stripper.addRegion("myRegion", d)
    stripper.extractRegions(pages[0])
    assert stripper.getTextForRegion("myRegion").contains("This is a test.")
}

Position (0, 0) is the upper left hand corner of the document. The width and height are heading down and to the right. I was able to trim down the range a bit to (35, 52, 120, 3) and still get the test to pass.

All code is written in groovy.

回复收藏 0 原文

叹梦 2024-12-28 19:24:31

Code in java using PDFBox.

 public String fetchTextByRegion(String path, String filename, int pageNumber) throws IOException {
        File file = new File(path + filename);
        PDDocument document = PDDocument.load(file);
        //Rectangle2D region = new Rectangle2D.Double(x,y,width,height);
        Rectangle2D region = new Rectangle2D.Double(0, 100, 550, 700);
        String regionName = "region";
        PDFTextStripperByArea stripper;
        PDPage page = document.getPage(pageNumber + 1);
        stripper = new PDFTextStripperByArea();
        stripper.addRegion(regionName, region);
        stripper.extractRegions(page);
        String text = stripper.getTextForRegion(regionName);
        return text;
    }

Code in java using PDFBox.

 public String fetchTextByRegion(String path, String filename, int pageNumber) throws IOException {
        File file = new File(path + filename);
        PDDocument document = PDDocument.load(file);
        //Rectangle2D region = new Rectangle2D.Double(x,y,width,height);
        Rectangle2D region = new Rectangle2D.Double(0, 100, 550, 700);
        String regionName = "region";
        PDFTextStripperByArea stripper;
        PDPage page = document.getPage(pageNumber + 1);
        stripper = new PDFTextStripperByArea();
        stripper.addRegion(regionName, region);
        stripper.extractRegions(page);
        String text = stripper.getTextForRegion(regionName);
        return text;
    }

回复收藏 0 原文

~没有更多了~