PDFBox PDFTextStripperByArea 区域坐标

发布于 2024-12-21 19:24:30 字数 355 浏览 3 评论 0原文

中的矩形的尺寸和方向是多少

PDFTextStripperByArea 函数addRegion(StringregionName, Rectangle2D rect)

。换句话说,矩形R从哪里开始以及它有多大(原点值的尺寸,矩形的尺寸)以及它的方向go(图中蓝色箭头的方向),如果将 new Rectangle(10,10,100,100) 作为第二个参数给出?

PdfBox 矩形

In what dimensions and direction is the Rectangle in the

PDFTextStripperByArea's function addRegion(String regionName, Rectangle2D rect).

In other words, where does the rectangle R start and how big is it (dimensions of the origin values, dimensions of the rectangle) and in what direction does it go (direction of the blue arrows in illustration), if new Rectangle(10,10,100,100) is given as a second parameter?

PdfBox rectangle

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

度的依靠╰つ 2024-12-28 19:24:31
new Rectangle(10,10,100,100)

表示矩形的左上角位于 (10, 10) 位置,即距离 PDF 文档左侧和顶部 10 个单位。这里的“单位”是 1 pt = 1/72 英寸。

第一个 100 表示矩形的宽度,第二个 100 表示矩形的高度。
总结一下,右图是第一张。

我编写了这段代码来提取作为函数参数给出的页面的某些区域:

Rectangle2D region = new Rectangle2D.Double(x, y, width, height);
String regionName = "region";
PDFTextStripperByArea stripper;

stripper = new PDFTextStripperByArea();
stripper.addRegion(regionName, region);
stripper.extractRegions(page);

因此,x 和 y 是矩形左上角的绝对坐标,然后指定其宽度和高度。 page 是作为该函数的参数给出的 PDPage 变量。

new Rectangle(10,10,100,100)

means that the rectangle will have its upper-left corner at position (10, 10), so 10 units far from the left and the top of the PDF document. Here a "unit" is 1 pt = 1/72 inch.

The first 100 represents the width of the rectangle and the second one its height.
To sum up, the right picture is the first one.

I wrote this code to extract some areas of a page given as arguments to the function:

Rectangle2D region = new Rectangle2D.Double(x, y, width, height);
String regionName = "region";
PDFTextStripperByArea stripper;

stripper = new PDFTextStripperByArea();
stripper.addRegion(regionName, region);
stripper.extractRegions(page);

So, x and y are the absolute coordinates of the upper-left corner of the Rectangle and then you specify its width and height. page is a PDPage variable given as argument to this function.

一抹淡然 2024-12-28 19:24:31

正在考虑做这样的事情,所以我想我会传递我发现的东西。

这是使用 itext 创建原始 pdf 的代码。

import com.lowagie.text.Document
import com.lowagie.text.Paragraph
import com.lowagie.text.pdf.PdfWriter

class SimplePdfCreator {
    void createFrom(String path) {
        Document d = new Document()
        try {
            PdfWriter writer = PdfWriter.getInstance(d, new FileOutputStream(path))
            d.open()
            d.add(new Paragraph("This is a test."))
            d.close()
        } catch (Exception e) {
            e.printStackTrace()
        }
    }
}

如果您打开 PDF,您会在左上角看到文本。这是显示您正在寻找的内容的测试。

@Test
void createFrom_using_pdf_box_to_extract_text_targeted_extraction() {
    new SimplePdfCreator().createFrom("myFileLocation")
    def doc = PDDocument.load("myFileLocation")
    Rectangle2D.Double d = new Rectangle2D.Double(0, 0, 120, 100)
    def stripper = new PDFTextStripperByArea()
    def pages = doc.getDocumentCatalog().allPages
    stripper.addRegion("myRegion", d)
    stripper.extractRegions(pages[0])
    assert stripper.getTextForRegion("myRegion").contains("This is a test.")
}

位置 (0, 0) 是文档的左上角。宽度和高度向下和向右。我能够将范围稍微缩小到 (35, 52, 120, 3),但仍然可以通过测试。

所有代码都是用groovy 编写的。

Was looking into doing something like this, so I thought I'd pass what I found along.

Here's the code for creating my original pdf using itext.

import com.lowagie.text.Document
import com.lowagie.text.Paragraph
import com.lowagie.text.pdf.PdfWriter

class SimplePdfCreator {
    void createFrom(String path) {
        Document d = new Document()
        try {
            PdfWriter writer = PdfWriter.getInstance(d, new FileOutputStream(path))
            d.open()
            d.add(new Paragraph("This is a test."))
            d.close()
        } catch (Exception e) {
            e.printStackTrace()
        }
    }
}

If you crack open the pdf, you'll see the text in the upper left hand corner. Here's the test showing what you are looking for.

@Test
void createFrom_using_pdf_box_to_extract_text_targeted_extraction() {
    new SimplePdfCreator().createFrom("myFileLocation")
    def doc = PDDocument.load("myFileLocation")
    Rectangle2D.Double d = new Rectangle2D.Double(0, 0, 120, 100)
    def stripper = new PDFTextStripperByArea()
    def pages = doc.getDocumentCatalog().allPages
    stripper.addRegion("myRegion", d)
    stripper.extractRegions(pages[0])
    assert stripper.getTextForRegion("myRegion").contains("This is a test.")
}

Position (0, 0) is the upper left hand corner of the document. The width and height are heading down and to the right. I was able to trim down the range a bit to (35, 52, 120, 3) and still get the test to pass.

All code is written in groovy.

叹梦 2024-12-28 19:24:31
Code in java using PDFBox.

 public String fetchTextByRegion(String path, String filename, int pageNumber) throws IOException {
        File file = new File(path + filename);
        PDDocument document = PDDocument.load(file);
        //Rectangle2D region = new Rectangle2D.Double(x,y,width,height);
        Rectangle2D region = new Rectangle2D.Double(0, 100, 550, 700);
        String regionName = "region";
        PDFTextStripperByArea stripper;
        PDPage page = document.getPage(pageNumber + 1);
        stripper = new PDFTextStripperByArea();
        stripper.addRegion(regionName, region);
        stripper.extractRegions(page);
        String text = stripper.getTextForRegion(regionName);
        return text;
    }
Code in java using PDFBox.

 public String fetchTextByRegion(String path, String filename, int pageNumber) throws IOException {
        File file = new File(path + filename);
        PDDocument document = PDDocument.load(file);
        //Rectangle2D region = new Rectangle2D.Double(x,y,width,height);
        Rectangle2D region = new Rectangle2D.Double(0, 100, 550, 700);
        String regionName = "region";
        PDFTextStripperByArea stripper;
        PDPage page = document.getPage(pageNumber + 1);
        stripper = new PDFTextStripperByArea();
        stripper.addRegion(regionName, region);
        stripper.extractRegions(page);
        String text = stripper.getTextForRegion(regionName);
        return text;
    }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文