在代码中检测 PDF 包或包

发布于 2024-11-05 20:38:58 字数 434 浏览 4 评论 0原文

有谁知道如何检测给定的 PDF 文件是 PDF 包还是 PDF 包,而不是“常规”PDF?我更喜欢 Java 解决方案,尽管由于我还没有找到任何关于检测 PDF 特定类型的信息,我会利用我能得到的信息,然后他们会尝试找出 Java 解决方案。

(在搜索过去的问题时,似乎很多人不知道诸如 PDF 包和 PDF 包之类的东西存在。通常,它们都是 Adob​​e 允许将多个离散 PDF 打包到单个 PDF 文件中的方法在 Reader 中打开 PDF 包会向用户显示嵌入 PDF 的列表,并允许从那里进一步查看 PDF 包似乎有点复杂 - 它们还包括用于嵌入文件的基于 Flash 的浏览器,然后允许用户。从那里提取离散的 PDF 文件是我的问题,也是我希望能够在代码中检测它们的原因,因为 OS X 的内置 Preview.app 无法读取这些文件 - 所以我'我想至少警告我的网络应用程序的用户,上传它们可能会导致跨平台兼容性降低。)

Does anyone know of a way to detect whether a given PDF file is a PDF Portfolio or a PDF Package, rather than a "regular" PDF? I'd prefer Java solutions, although since I haven't yet found any information on detecting the specific type of PDF, I'll take what I can get and they try to figure out the Java solution afterwards.

(In searching past questions, it appears that a bunch of folks don't know that such things as PDF Portfolios and PDF Packages exist. Generally, they're both ways that Adobe allows multiple, discrete PDFs to be packaged into a single PDF file. Opening a PDF Package in Reader shows the user a list of the embedded PDFs and allows further viewing from there. PDF Portfolios appear to be a bit more complicated -- they also include Flash-based browser for the embedded files, and then allow users to extract the discrete PDFs from there. My issue with them, and the reason I'd like to be able to detect them in code, is because OS X's built-in Preview.app can't read these files -- so I'd like to at least warn users of a web app of mine that uploading them can lead to diminished compatibility across platforms.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

墨小墨 2024-11-12 20:38:59
I'm also facing same problem while extracting data through kofax,  but i got solution and its working fine need to add extra jar for Document class.

import java.io.File;
import java.io.IOException;
import java.io.InputStream;

public class PDFPortfolio {

    /**
     * @param args
     */
    public static void main(String[] args) {

        com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document("e:/pqr1.pdf");
        // get collection of embedded files
        com.aspose.pdf.EmbeddedFileCollection embeddedFiles = pdfDocument.getEmbeddedFiles();
        // iterate through individual file of Portfolio
        for(int counter=1; counter<=pdfDocument.getEmbeddedFiles().size();counter++)
        {
            com.aspose.pdf.FileSpecification fileSpecification = embeddedFiles.get_Item(counter);
            try {
                InputStream input = fileSpecification.getContents();
                File file = new File(fileSpecification.getName());
                // create path for file from pdf
              //  file.getParentFile().mkdirs();
                // create and extract file from pdf
                java.io.FileOutputStream output = new java.io.FileOutputStream("e:/"+fileSpecification.getName(), true);
                byte[] buffer = new byte[4096];
                int n = 0;
                while (-1 != (n = input.read(buffer)))
                output.write(buffer, 0, n);

                // close InputStream object
                input.close();
                output.close();
                } catch (IOException e) {
                e.printStackTrace();
            }
        }

    }

}
I'm also facing same problem while extracting data through kofax,  but i got solution and its working fine need to add extra jar for Document class.

import java.io.File;
import java.io.IOException;
import java.io.InputStream;

public class PDFPortfolio {

    /**
     * @param args
     */
    public static void main(String[] args) {

        com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document("e:/pqr1.pdf");
        // get collection of embedded files
        com.aspose.pdf.EmbeddedFileCollection embeddedFiles = pdfDocument.getEmbeddedFiles();
        // iterate through individual file of Portfolio
        for(int counter=1; counter<=pdfDocument.getEmbeddedFiles().size();counter++)
        {
            com.aspose.pdf.FileSpecification fileSpecification = embeddedFiles.get_Item(counter);
            try {
                InputStream input = fileSpecification.getContents();
                File file = new File(fileSpecification.getName());
                // create path for file from pdf
              //  file.getParentFile().mkdirs();
                // create and extract file from pdf
                java.io.FileOutputStream output = new java.io.FileOutputStream("e:/"+fileSpecification.getName(), true);
                byte[] buffer = new byte[4096];
                int n = 0;
                while (-1 != (n = input.read(buffer)))
                output.write(buffer, 0, n);

                // close InputStream object
                input.close();
                output.close();
                } catch (IOException e) {
                e.printStackTrace();
            }
        }

    }

}
怪异←思 2024-11-12 20:38:58

这个问题很老了,但如果有人想知道,这是可能的。可以使用以下命令通过 Acrobat 和 JavaScript 来完成此操作。

 if (Doc.collection() != null)
 {
     //It Is Portfolio
 }

Acrobat JavaScript API 说:“集合对象是从 Doc.collection 属性获取的。当没有 PDF 集合(也称为 PDF 包和 PDF 组合)时,Doc.collection 返回空值。集合对象用于设置初始值。”集合中的文档,设置集合的初始视图,以及获取、添加和删除集合字段(或类别)。”

This question is old, but in-case someone wants to know, it is possible. It can be done with Acrobat and JavaScript by using the following command.

 if (Doc.collection() != null)
 {
     //It Is Portfolio
 }

Acrobat JavaScript API says, "A collection object is obtained from the Doc.collection property. Doc.collection returns a null value when there is no PDF collection (also called PDF package and PDF portfolio).The collection object is used to set the initial document in the collection, set the initial view of the collection, and to get, add, and remove collection fields (or categories)."

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文