XMP 元数据的自定义架构

发布于 2024-12-12 22:31:10 字数 4426 浏览 3 评论 0原文

我想将自定义元数据写入 XMP 标准架构不支持的 pdf 文件,因此我编写了自己的架构,其中包含我自己的属性。我可以使用 PDFBox 或 iTextPDF 库成功地将这些额外的自定义元数据写入我的 PDF 文件。但是,如果不解析 XMP xml,我无法在客户端读取自定义元数据。

我想应该有一些我不知道的 API 可以将您的自定义模式返回到您的 java 类。

如果我的思考方向正确,或者我实际上是否需要解析 xml 以将自定义数据返回到客户端,请帮助我?

编写的代码

这是我使用 PDFBox 库自定义元数据文件

package com.ecomail.emx.core.xmp;

import java.io.IOException;

import org.apache.jempbox.xmp.XMPMetadata;

public class EMXMetadata extends XMPMetadata {

public EMXMetadata() throws IOException {
    super();
}

public EMXSchema addEMXSchema() {
    EMXSchema schema = new EMXSchema(this);
    return (EMXSchema) basicAddSchema(schema);
}

public EMXSchema getEMXSchema() throws IOException {
    return (EMXSchema) getSchemaByClass(EMXSchema.class);
}
}

。自定义架构文件。

package com.ecomail.emx.core.xmp;

import java.util.List;

import org.apache.jempbox.xmp.XMPMetadata;
import org.apache.jempbox.xmp.XMPSchema;
import org.w3c.dom.Element;

public class EMXSchema extends XMPSchema {
public static final String NAMESPACE = "http://www.test.com/emx/elements/1.1/";

public EMXSchema(XMPMetadata parent) {
    super(parent, "test", NAMESPACE);
}

public EMXSchema(Element element, String prefix) {
    super(element, prefix);
}

public String getMetaDataType() {
    return getTextProperty(prefix + ":metaDataType");
}

public void setMetaDataType(String metaDataType) {
    setTextProperty(prefix + ":metaDataType", metaDataType);
}

public void removeRecipient(String recipient) {
    removeBagValue(prefix + ":recipient", recipient);
}

public void addRecipient(String recipient) {
    addBagValue(prefix + ":recipient", recipient);
}

public List<String> getRecipients() {
    return getBagList(prefix + ":recipient");
}
}

XML 客户端文件。

package com.ecomail.emx.core.xmp;

import java.util.GregorianCalendar;

import org.apache.jempbox.xmp.XMPMetadata;
import org.apache.jempbox.xmp.XMPSchemaDublinCore;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.pdmodel.common.PDMetadata;

public class XMPClient {

private XMPClient() {
}

public static void main(String[] args) throws Exception {
    PDDocument document = null;

    try {
        document = PDDocument.load("/home/silver/SVNRoot/ecomail/trunk/sample.pdf");
        PDDocumentCatalog catalog = document.getDocumentCatalog();
        PDDocumentInformation info = document.getDocumentInformation();

        EMXMetadata metadata = new EMXMetadata();

        XMPSchemaDublinCore dcSchema = metadata.addDublinCoreSchema();
        dcSchema.setTitle(info.getTitle());
        dcSchema.addContributor("Contributor");
        dcSchema.setCoverage("coverage");
        dcSchema.addCreator("PDFBox");
        dcSchema.addDate(new GregorianCalendar());
        dcSchema.setDescription("description");
        dcSchema.addLanguage("language");
        dcSchema.setCoverage("coverage");
        dcSchema.setFormat("format");

        EMXSchema emxSchema = metadata.addEMXSchema();
        emxSchema.addRecipient("Recipient 1");
        emxSchema.addRecipient("Recipient 2");

        PDMetadata metadataStream = new PDMetadata(document);
        metadataStream.importXMPMetadata(metadata);
        catalog.setMetadata(metadataStream);

        document.save("/home/silver/SVNRoot/ecomail/trunk/sample1.pdf");
        document.close();

        document = PDDocument.load("/home/silver/SVNRoot/ecomail/trunk/sample1.pdf");

        PDDocumentCatalog catalog2 = document.getDocumentCatalog();
        PDMetadata metadataStream2 = catalog2.getMetadata();

        XMPMetadata metadata2 = metadataStream2.exportXMPMetadata();
        EMXSchema emxSchema2 = (EMXSchema) metadata2.getSchemaByClass(EMXSchema.class);
        System.out.println("recipients : " + emxSchema2.getRecipients());
    } finally {
        if (document != null) {
            document.close();
        }
    }
}
 }

在 XMPClient 文件中,我希望通过从类名查询结果元数据来获取 EMXSchema 对象。

XMPMetadata metadata2 = metadataStream2.exportXMPMetadata();
EMXSchema emxSchema2 = (EMXSchema) metadata2.getSchemaByClass(EMXSchema.class);
System.out.println("recipients : " + emxSchema2.getRecipients());

但我收到空指针异常,表明未找到该异常。 如果我做得正确,或者我是否需要解析 XMP 来获取收件人值,任何人都可以帮助我吗?

谢谢

I want to write custom metadata to a pdf file which are not supported by XMP standard schemas hence I wrote my own schema containing my own properties. I can successfully write these additional custom metadata to my PDF file using either PDFBox or iTextPDF library. I am however unable to read the custom metadata at client side without parsing the XMP xml.

I guess there should be some API that I am not aware of for getting your custom schema back to your java class.

Please help me if I am thinking in right direction or do I actually need to parse the xml for getting my custom data back at client side?

Here is the code I wrote using PDFBox library

Custom Metadata File.

package com.ecomail.emx.core.xmp;

import java.io.IOException;

import org.apache.jempbox.xmp.XMPMetadata;

public class EMXMetadata extends XMPMetadata {

public EMXMetadata() throws IOException {
    super();
}

public EMXSchema addEMXSchema() {
    EMXSchema schema = new EMXSchema(this);
    return (EMXSchema) basicAddSchema(schema);
}

public EMXSchema getEMXSchema() throws IOException {
    return (EMXSchema) getSchemaByClass(EMXSchema.class);
}
}

Custom Schema File.

package com.ecomail.emx.core.xmp;

import java.util.List;

import org.apache.jempbox.xmp.XMPMetadata;
import org.apache.jempbox.xmp.XMPSchema;
import org.w3c.dom.Element;

public class EMXSchema extends XMPSchema {
public static final String NAMESPACE = "http://www.test.com/emx/elements/1.1/";

public EMXSchema(XMPMetadata parent) {
    super(parent, "test", NAMESPACE);
}

public EMXSchema(Element element, String prefix) {
    super(element, prefix);
}

public String getMetaDataType() {
    return getTextProperty(prefix + ":metaDataType");
}

public void setMetaDataType(String metaDataType) {
    setTextProperty(prefix + ":metaDataType", metaDataType);
}

public void removeRecipient(String recipient) {
    removeBagValue(prefix + ":recipient", recipient);
}

public void addRecipient(String recipient) {
    addBagValue(prefix + ":recipient", recipient);
}

public List<String> getRecipients() {
    return getBagList(prefix + ":recipient");
}
}

XML Client File.

package com.ecomail.emx.core.xmp;

import java.util.GregorianCalendar;

import org.apache.jempbox.xmp.XMPMetadata;
import org.apache.jempbox.xmp.XMPSchemaDublinCore;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.pdmodel.common.PDMetadata;

public class XMPClient {

private XMPClient() {
}

public static void main(String[] args) throws Exception {
    PDDocument document = null;

    try {
        document = PDDocument.load("/home/silver/SVNRoot/ecomail/trunk/sample.pdf");
        PDDocumentCatalog catalog = document.getDocumentCatalog();
        PDDocumentInformation info = document.getDocumentInformation();

        EMXMetadata metadata = new EMXMetadata();

        XMPSchemaDublinCore dcSchema = metadata.addDublinCoreSchema();
        dcSchema.setTitle(info.getTitle());
        dcSchema.addContributor("Contributor");
        dcSchema.setCoverage("coverage");
        dcSchema.addCreator("PDFBox");
        dcSchema.addDate(new GregorianCalendar());
        dcSchema.setDescription("description");
        dcSchema.addLanguage("language");
        dcSchema.setCoverage("coverage");
        dcSchema.setFormat("format");

        EMXSchema emxSchema = metadata.addEMXSchema();
        emxSchema.addRecipient("Recipient 1");
        emxSchema.addRecipient("Recipient 2");

        PDMetadata metadataStream = new PDMetadata(document);
        metadataStream.importXMPMetadata(metadata);
        catalog.setMetadata(metadataStream);

        document.save("/home/silver/SVNRoot/ecomail/trunk/sample1.pdf");
        document.close();

        document = PDDocument.load("/home/silver/SVNRoot/ecomail/trunk/sample1.pdf");

        PDDocumentCatalog catalog2 = document.getDocumentCatalog();
        PDMetadata metadataStream2 = catalog2.getMetadata();

        XMPMetadata metadata2 = metadataStream2.exportXMPMetadata();
        EMXSchema emxSchema2 = (EMXSchema) metadata2.getSchemaByClass(EMXSchema.class);
        System.out.println("recipients : " + emxSchema2.getRecipients());
    } finally {
        if (document != null) {
            document.close();
        }
    }
}
 }

In the XMPClient file I am expecting that I will get EMXSchema object back from the resulatant meta data by querying it from its class name.

XMPMetadata metadata2 = metadataStream2.exportXMPMetadata();
EMXSchema emxSchema2 = (EMXSchema) metadata2.getSchemaByClass(EMXSchema.class);
System.out.println("recipients : " + emxSchema2.getRecipients());

But I am getting Null Pointer Exception indicating this was not found.
Can anybody please help me if I am doing it right way or do I need to parse the XMP to get my recipients values.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

_蜘蛛 2024-12-19 22:31:10

最后我自己开始工作了。
解决方案是使用 XMPMetadata 类的另一个构造函数来接受预定义的文档类。

        document = PDDocument.load("/home/silver/SVNRoot/ecomail/trunk/sample1.pdf");

        PDDocumentCatalog catalog2 = document.getDocumentCatalog();
        PDMetadata metadataStream2 = catalog2.getMetadata();
        System.out.println(metadataStream2.getInputStreamAsString());
        InputStream xmpIn = metadataStream2.createInputStream();

        DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
        f.setExpandEntityReferences(true);
        f.setIgnoringComments(true);
        f.setIgnoringElementContentWhitespace(true);
        f.setValidating(false);
        f.setCoalescing(true);
        f.setNamespaceAware(true);
        DocumentBuilder builder = f.newDocumentBuilder();
        Document xmpDoc = builder.parse(xmpIn);

        EMXMetadata emxMetadata = new EMXMetadata(xmpDoc);
        EMXSchema emxSchema2 = emxMetadata.getEMXSchema();
        System.out.println("recipients : " + emxSchema2.getRecipients());

现在我的自定义 emxMetadata 包含非空 emxSchema2 对象,我可以从中取回我的收件人对象。然而,为了使其工作,我必须修改 EMXMetadata 以支持您的架构类的 XMLNamespaceMapping

public class EMXMetadata extends XMPMetadata {

public EMXMetadata() throws IOException {
    super();
    addXMLNSMapping(EMXSchema.NAMESPACE, EMXSchema.class);
}

public EMXMetadata(Document xmpDoc) {
    super(xmpDoc);
    addXMLNSMapping(EMXSchema.NAMESPACE, EMXSchema.class);
}

public EMXSchema addEMXSchema() {
    EMXSchema schema = new EMXSchema(this);
    return (EMXSchema) basicAddSchema(schema);
}

public EMXSchema getEMXSchema() throws IOException {
    return (EMXSchema) getSchemaByClass(EMXSchema.class);
}

}

Finally I got it working myself.
The solution is to use the another constructor of the XMPMetadata class that accepts a predefined document class.

        document = PDDocument.load("/home/silver/SVNRoot/ecomail/trunk/sample1.pdf");

        PDDocumentCatalog catalog2 = document.getDocumentCatalog();
        PDMetadata metadataStream2 = catalog2.getMetadata();
        System.out.println(metadataStream2.getInputStreamAsString());
        InputStream xmpIn = metadataStream2.createInputStream();

        DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
        f.setExpandEntityReferences(true);
        f.setIgnoringComments(true);
        f.setIgnoringElementContentWhitespace(true);
        f.setValidating(false);
        f.setCoalescing(true);
        f.setNamespaceAware(true);
        DocumentBuilder builder = f.newDocumentBuilder();
        Document xmpDoc = builder.parse(xmpIn);

        EMXMetadata emxMetadata = new EMXMetadata(xmpDoc);
        EMXSchema emxSchema2 = emxMetadata.getEMXSchema();
        System.out.println("recipients : " + emxSchema2.getRecipients());

Now my custom emxMetadata contains non null emxSchema2 object and I can get back my recipient objects from it. However to make it work I had to modify EMXMetadata to support XMLNamespaceMapping for your schema class

public class EMXMetadata extends XMPMetadata {

public EMXMetadata() throws IOException {
    super();
    addXMLNSMapping(EMXSchema.NAMESPACE, EMXSchema.class);
}

public EMXMetadata(Document xmpDoc) {
    super(xmpDoc);
    addXMLNSMapping(EMXSchema.NAMESPACE, EMXSchema.class);
}

public EMXSchema addEMXSchema() {
    EMXSchema schema = new EMXSchema(this);
    return (EMXSchema) basicAddSchema(schema);
}

public EMXSchema getEMXSchema() throws IOException {
    return (EMXSchema) getSchemaByClass(EMXSchema.class);
}

}

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文