如何通过 iText 将黑白图像作为 JBIG2DECODE 流添加到 PDF 中
我正在开发一个实用程序,用较小的单色(2 色黑白)版本替换 PDF 中的图像,以缩小扫描的 PDF。下面的程序(这是整个程序)当前将所有图像导出到大型 .png 文件到 in
目录中,然后用户获取这些文件,执行任何必要的图像操作,并复制结果,名称相同,但现在带有 .jb2
扩展名,位于 out
目录中。再次运行该程序应将修改后的文件复制回流中,替换原始图像。
不用说,这是行不通的。流标头都是正确的,但我认为该流没有被正确压缩以符合 JBIG2DEOCDE 格式,因此任何修改后的图像都不会显示在阅读器中。由于我要替换现有流,因此无法使用 document.add(Image)
,因此我必须手动执行所有这些流操作。我可能缺少执行此操作的 iText 工具,但我应该如何将这些图像放入流中?
.jb2
格式的使用由 iText 规定,但我可以轻松使用更常见的格式,例如 .gif
。重要的是,我想要将具有黑白 2 调色板的图像放置在 PDF 中,并且具有适合单色文本图像的压缩格式(我更喜欢 JBIG2,但 CCITT 3 或 4 或 RLE 将也为我工作)。目标是最大限度地节省空间;我没有处理时间要求。
或者,如果有人知道任何好的实用程序可以完成我想做的事情,那就太好了。我想用替代图像替换 PDF 文件中的所有现有图像(它们需要可供外部应用程序处理),并且我需要控制替换图像的压缩方式。它还必须以适合批处理模式处理的方式完成,因为通常我处理的是数百页且每页一个图像的 PDF。我正在尝试减小 PDF 的大小,但我需要完全控制压缩,并且我想自己进行所有有损压缩。 Acrobat 的缩小 PDF 功能总是会破坏我的图像。
public class Test {
public static void main(String[] args) throws IOException, DocumentException
{
PdfReader pdf = new PdfReader("data\\in.pdf");
int n = pdf.getXrefSize();
for (int i = 0; i < n; i++) {
PdfObject object = pdf.getPdfObject(i);
if (object == null || !object.isStream()) continue;
PRStream stream = (PRStream)object;
if (!stream.contains(PdfName.WIDTH)) continue;
PdfImageObject image = new PdfImageObject(stream);
BufferedImage bi = image.getBufferedImage();
if (bi == null) continue;
File in = new File("data\\in\\" + i + ".png");
if (!in.exists()) {
ImageIO.write(bi, "png", in);
}
File out = new File("data\\out\\" + i + ".jb2");
if (!out.exists()) continue;
Image img = Image.getInstance("data\\out\\" + i + ".jb2");
byte[] data = new byte[(int)out.length()];
new FileInputStream(out).read(data);
stream.clear();
stream.setData(data, false, PRStream.NO_COMPRESSION);
stream.put(PdfName.TYPE, PdfName.XOBJECT);
stream.put(PdfName.SUBTYPE, PdfName.IMAGE);
stream.put(PdfName.FILTER, PdfName.JBIG2DECODE);
stream.put(PdfName.WIDTH, new PdfNumber((int)img.getWidth()));
stream.put(PdfName.HEIGHT, new PdfNumber((int)img.getHeight()));
stream.put(PdfName.BITSPERCOMPONENT, new PdfNumber(1));
stream.put(PdfName.COLORSPACE, PdfName.DEVICEGRAY);
}
new PdfStamper(pdf, new FileOutputStream("data\\out.pdf")).close();
}
}
I am working on a utility to replace images in a PDF with smaller, monochrome (2-color B&W) versions for the purpose of shrinking scanned PDFs. The program below (which is the whole thing) currently exports all images to large .png files to the in
directory, whereupon the user takes these files, does any necessary image manipulations, and copies the results, with the same names, but now with the .jb2
extension, to the out
directory. Running this program again should copy the modified files back into the stream, replacing the original images.
Needless to say, it doesn't work. The stream headers are all correct, but I don't think the stream is properly compressed to conform to JBIG2DEOCDE
format, so none of the modified images show up in a reader. Since I'm replacing an existing stream, I can't use document.add(Image)
, so I have to do all this stream stuff manually. I may be missing an iText facility for doing this, but how am I supposed to get these images into the stream?
The usage of the .jb2
format was dictated by iText, but I can just as easily use a more common format like .gif
. The important part is that I want an image with a B&W 2-color palette to be placed in the PDF, and with a compression format suitable for monochrome text images (I'd prefer JBIG2, but CCITT 3 or 4 or RLE will work for me too). The goal is maximum space saving; I have no processing time requirements.
Alternatively, if anyone knows any good utility programs to do what I'm trying to do, that would be just as well. I want to replace all the existing images in a PDF file with alternates (they need to be made available to be processed by an external application), and I need control over how the replacements are being compressed. It also has to be done in a manner suitable for batch mode processing, because I'm dealing with PDFs with hundreds of pages and one image per page, generally. I'm trying to reduce the size of my PDFs, but I need complete control over the compression, and I want to do all lossy compression myself. Acrobat's Reduce Size PDF function always mangles my images.
public class Test {
public static void main(String[] args) throws IOException, DocumentException
{
PdfReader pdf = new PdfReader("data\\in.pdf");
int n = pdf.getXrefSize();
for (int i = 0; i < n; i++) {
PdfObject object = pdf.getPdfObject(i);
if (object == null || !object.isStream()) continue;
PRStream stream = (PRStream)object;
if (!stream.contains(PdfName.WIDTH)) continue;
PdfImageObject image = new PdfImageObject(stream);
BufferedImage bi = image.getBufferedImage();
if (bi == null) continue;
File in = new File("data\\in\\" + i + ".png");
if (!in.exists()) {
ImageIO.write(bi, "png", in);
}
File out = new File("data\\out\\" + i + ".jb2");
if (!out.exists()) continue;
Image img = Image.getInstance("data\\out\\" + i + ".jb2");
byte[] data = new byte[(int)out.length()];
new FileInputStream(out).read(data);
stream.clear();
stream.setData(data, false, PRStream.NO_COMPRESSION);
stream.put(PdfName.TYPE, PdfName.XOBJECT);
stream.put(PdfName.SUBTYPE, PdfName.IMAGE);
stream.put(PdfName.FILTER, PdfName.JBIG2DECODE);
stream.put(PdfName.WIDTH, new PdfNumber((int)img.getWidth()));
stream.put(PdfName.HEIGHT, new PdfNumber((int)img.getHeight()));
stream.put(PdfName.BITSPERCOMPONENT, new PdfNumber(1));
stream.put(PdfName.COLORSPACE, PdfName.DEVICEGRAY);
}
new PdfStamper(pdf, new FileOutputStream("data\\out.pdf")).close();
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
![扫码二维码加入Web技术交流群](/public/img/jiaqun_03.jpg)
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我在 codeplex 上编写了一个库,可能会对您有所帮助。
它用于使用 jbig2 进行 OCR 和压缩扫描的 PDF,并有一个委托在将图像添加到 pdf 之前对图像进行一些处理。
I've written a library on codeplex that may help you out.
It's used for OCRing and compressing scanned PDFs with jbig2 and has a delegate to do some processing on the image before it's added to the pdf.