文本的x和y位置不正确
我有一个庞大的PDF文档(1600页),我只通过在Acrobat Reader中打印一些页面来保留其中的一部分。
我正在使用以下代码读取文本:
public void decode(File file) throws IOException {
PdfReader reader = new PdfReader(file.toURI().toURL());
int numberOfPages = reader.getNumberOfPages();
ProcessorListener listener = new ProcessorListener();
PdfContentStreamProcessor processor = new PdfContentStreamProcessor(listener);
for (int pageNumber = 1; pageNumber <= numberOfPages; pageNumber++) {
PdfDictionary resourcesDic = pageDic.getAsDict(PdfName.RESOURCES);
processor.processContent(ContentByteUtils.getContentBytesForPage(reader, pageNumber), resourcesDic);
}
}
以及处理器:
public class ProcessorListener implements RenderListener {
private final PdfReader reader;
public ProcessorListener(PdfReader reader) {
this.reader = reader;
}
@Override
public void beginTextBlock() {
}
@Override
public void renderText(TextRenderInfo tri) {
String text = tri.getText();
double x = tri.getDescentLine().getBoundingRectange().getX();
double y = tri.getDescentLine().getBoundingRectange().getY();
System.out.println(text + " => x:" + x + " y:" + y);
}
@Override
public void endTextBlock() {
}
}
我观察到的是,在我的初始文档(巨大)文档中,x和y坐标会根据文本位置而变化,但是在我的较小的保存文档中,所有文本几乎总是相同的(x很小,y很大),例如,x总是小于2,而y总是几乎等于480。
这种行为的原因是什么?我只有使用Acrobat Reader打印PDF文档时才有这个问题,而不是使用Edge打印它。另外,结果文档在Acrobat Reader或Edge中正确显示,这就是我认为问题在我的代码中的原因。
I have a huge PDF document (1600 pages) from which I kept only a part of it by printing some pages in Acrobat Reader.
I am reading the text using iText with the following code:
public void decode(File file) throws IOException {
PdfReader reader = new PdfReader(file.toURI().toURL());
int numberOfPages = reader.getNumberOfPages();
ProcessorListener listener = new ProcessorListener();
PdfContentStreamProcessor processor = new PdfContentStreamProcessor(listener);
for (int pageNumber = 1; pageNumber <= numberOfPages; pageNumber++) {
PdfDictionary resourcesDic = pageDic.getAsDict(PdfName.RESOURCES);
processor.processContent(ContentByteUtils.getContentBytesForPage(reader, pageNumber), resourcesDic);
}
}
and the processor:
public class ProcessorListener implements RenderListener {
private final PdfReader reader;
public ProcessorListener(PdfReader reader) {
this.reader = reader;
}
@Override
public void beginTextBlock() {
}
@Override
public void renderText(TextRenderInfo tri) {
String text = tri.getText();
double x = tri.getDescentLine().getBoundingRectange().getX();
double y = tri.getDescentLine().getBoundingRectange().getY();
System.out.println(text + " => x:" + x + " y:" + y);
}
@Override
public void endTextBlock() {
}
}
What I observe is that in my initial (huge) document, the x and y coordinates change depending on the text position, but in my smaller saved document the coordinates of all texts are almost always the same (x is very small and y is big), in my case, for example x is invariably smaller than 2, and y is always almost equal to 480.
What's the reason for this behavior? I only have this problem if I print my PDF document using Acrobat Reader, not when I am printing it using Edge for example. Also the resulting document is correctly shown in Acrobat Reader or Edge, which is the reason I think that the problem is in my code.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论