是否有使用Saxparser读取复杂XML的通用方法?

发布于 2025-02-05 20:07:58 字数 5196 浏览 2 评论 0原文

我正在使用saxparser读取大型复杂XML文件。我不希望创建模型类,因为我不知道将在XML中出现的确切数据,因此我试图找到是否有使用某种上下文读取XML数据的通用方法。

我使用杰克逊(Jackson)使用了类似的JSON方法,这对我来说效果很好。由于我是萨克斯解析器的新手,因此我无法完全理解如何实现同样的事情。对于复杂的内部值,我无法建立亲子关系,并且无法在标签和属性之间建立关系。

以下是我到目前为止所拥有的代码:

上下文node我的通用类,以使用父子关系来存储所有XML信息。

@Getter
@Setter
@ToString
@NoArgsConstructor
public class ContextNode {
    protected String name;
    protected String value;
    protected ArrayList<ContextNode> children = new ArrayList<>();
    protected ContextNode parent;

    //Constructor 1: To store the simple field information.
    public ContextNode(final String name, final String value) {
        this.name = name;
        this.value = value;
    }

    //Constructor 2: To store the complex field which has inner elements.
    public ContextNode(final ContextNode parent, final String name, final String value) {
        this(name, value);
        this.parent = parent;
    }

eventReader.class中使用sax解析XML的方法

public class EventReader{
//Method to read XML events and create pre-hash string from it.
public static void xmlParser(final InputStream xmlStream) {
    final SAXParserFactory factory = SAXParserFactory.newInstance();

    try {
        final SAXParser saxParser = factory.newSAXParser();
        final SaxHandler handler = new SaxHandler();
        saxParser.parse(xmlStream, handler);
    } catch (ParserConfigurationException | SAXException | IOException e) {
        e.printStackTrace();
    }
}
}

以下

import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;

import java.util.HashMap;

public class SaxHandler extends DefaultHandler {

    private final List<String> XML_IGNORE_FIELDS = Arrays.asList("person:personDocument","DocumentBody","DocumentList");
    private final List<String> EVENT_TYPES = Arrays.asList("person");
    private Map<String, String> XML_NAMESPACES = null;
    private ContextNode contextNode = null;
    private StringBuilder currentValue = new StringBuilder();

    @Override
    public void startDocument() {
        ConstantEventInfo.XML_NAMESPACES = new HashMap<>();
    }

    @Override
    public void startElement(final String uri, final String localName, final String qName, final Attributes attributes) {
        //For every new element in XML reset the StringBuilder.
        currentValue.setLength(0);

        if (qName.equalsIgnoreCase("person:personDocument")) {
            // Add the attributes and name-spaces to Map
            for (int att = 0; att < attributes.getLength(); att++) {

                if (attributes.getQName(att).contains(":")) {
                    //Find all Namespaces within the XML Header information and save it to the Map for future use.
                    XML_NAMESPACES.put(attributes.getQName(att).substring(attributes.getQName(att).indexOf(":") + 1), attributes.getValue(att));
                } else {
                    //Find all other attributes within XML and store this information within Map.
                    XML_NAMESPACES.put(attributes.getQName(att), attributes.getValue(att));
                }
            }
        } else if (EVENT_TYPES.contains(qName)) {
            contextNode = new ContextNode("type", qName);
        }
    }

    @Override
    public void characters(char ch[], int start, int length) {
        currentValue.append(ch, start, length);
    }

    @Override
    public void endElement(final String uri, final String localName, final String qName) {
        if (!XML_IGNORE_FIELDS.contains(qName)) {
            if (!EVENT_TYPES.contains(qName)) {
                System.out.println("QName : " + qName + " Value : " + currentValue);
                contextNode.children.add(new ContextNode(qName, currentValue.toString()));
            }
        }
    }

    @Override
    public void endDocument() {
        System.out.println(contextNode.getChildren().toString());
        System.out.println("End of Document");
    }
}

@Test
public void xmlReader() throws Exception {
    final InputStream xmlStream = getClass().getResourceAsStream("/xmlFileContents.xml");
    EventReader.xmlParser(xmlStream);
}

是我在 是我需要使用通用方法阅读的XML:

<?xml version="1.0" ?>
<person:personDocument xmlns:person="https://example.com" schemaVersion="1.2" creationDate="2020-03-03T13:07:51.709Z">
<DocumentBody>
    <DocumentList>
        <Person>
            <bithTime>2020-03-04T11:00:30.000+01:00</bithTime>
            <name>Batman</name>
            <Place>London</Place>
            <hobbies>
                <hobby>painting</hobby>
                <hobby>football</hobby>
            </hobbies>
            <jogging distance="10.3">daily</jogging>
            <purpose2>
                <id>1</id>
                <purpose>Dont know</purpose>
            </purpose2>
        </Person>
    </DocumentList>
</DocumentBody>
</person:personDocument>

I am using SaxParser to read the large complex XML file. I do not wish to create the model class as I do not know the exact data which will be coming in the XML so I am trying to find if there is a generic way of reading the XML data using some sort of Context.

I have used a similar approach for JSON using the Jackson, which worked very well for me. Since I am new to Sax Parser, I cannot completely understand how to achieve the same. for complex inner values, I am unable to establish a parent-child relationship and I am unable to build relationships between tags and attributes.

Following is the code I have so far:

ContextNode my generic class to store all XML information using the parent-child relationships.

@Getter
@Setter
@ToString
@NoArgsConstructor
public class ContextNode {
    protected String name;
    protected String value;
    protected ArrayList<ContextNode> children = new ArrayList<>();
    protected ContextNode parent;

    //Constructor 1: To store the simple field information.
    public ContextNode(final String name, final String value) {
        this.name = name;
        this.value = value;
    }

    //Constructor 2: To store the complex field which has inner elements.
    public ContextNode(final ContextNode parent, final String name, final String value) {
        this(name, value);
        this.parent = parent;
    }

Following is my method to parse XML using SAX within EventReader.class

public class EventReader{
//Method to read XML events and create pre-hash string from it.
public static void xmlParser(final InputStream xmlStream) {
    final SAXParserFactory factory = SAXParserFactory.newInstance();

    try {
        final SAXParser saxParser = factory.newSAXParser();
        final SaxHandler handler = new SaxHandler();
        saxParser.parse(xmlStream, handler);
    } catch (ParserConfigurationException | SAXException | IOException e) {
        e.printStackTrace();
    }
}
}

Following is my SaxHandler:

import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;

import java.util.HashMap;

public class SaxHandler extends DefaultHandler {

    private final List<String> XML_IGNORE_FIELDS = Arrays.asList("person:personDocument","DocumentBody","DocumentList");
    private final List<String> EVENT_TYPES = Arrays.asList("person");
    private Map<String, String> XML_NAMESPACES = null;
    private ContextNode contextNode = null;
    private StringBuilder currentValue = new StringBuilder();

    @Override
    public void startDocument() {
        ConstantEventInfo.XML_NAMESPACES = new HashMap<>();
    }

    @Override
    public void startElement(final String uri, final String localName, final String qName, final Attributes attributes) {
        //For every new element in XML reset the StringBuilder.
        currentValue.setLength(0);

        if (qName.equalsIgnoreCase("person:personDocument")) {
            // Add the attributes and name-spaces to Map
            for (int att = 0; att < attributes.getLength(); att++) {

                if (attributes.getQName(att).contains(":")) {
                    //Find all Namespaces within the XML Header information and save it to the Map for future use.
                    XML_NAMESPACES.put(attributes.getQName(att).substring(attributes.getQName(att).indexOf(":") + 1), attributes.getValue(att));
                } else {
                    //Find all other attributes within XML and store this information within Map.
                    XML_NAMESPACES.put(attributes.getQName(att), attributes.getValue(att));
                }
            }
        } else if (EVENT_TYPES.contains(qName)) {
            contextNode = new ContextNode("type", qName);
        }
    }

    @Override
    public void characters(char ch[], int start, int length) {
        currentValue.append(ch, start, length);
    }

    @Override
    public void endElement(final String uri, final String localName, final String qName) {
        if (!XML_IGNORE_FIELDS.contains(qName)) {
            if (!EVENT_TYPES.contains(qName)) {
                System.out.println("QName : " + qName + " Value : " + currentValue);
                contextNode.children.add(new ContextNode(qName, currentValue.toString()));
            }
        }
    }

    @Override
    public void endDocument() {
        System.out.println(contextNode.getChildren().toString());
        System.out.println("End of Document");
    }
}

Following is my TestCase which will call the method xmlParser

@Test
public void xmlReader() throws Exception {
    final InputStream xmlStream = getClass().getResourceAsStream("/xmlFileContents.xml");
    EventReader.xmlParser(xmlStream);
}

Following is the XML I need to read using a generic approach:

<?xml version="1.0" ?>
<person:personDocument xmlns:person="https://example.com" schemaVersion="1.2" creationDate="2020-03-03T13:07:51.709Z">
<DocumentBody>
    <DocumentList>
        <Person>
            <bithTime>2020-03-04T11:00:30.000+01:00</bithTime>
            <name>Batman</name>
            <Place>London</Place>
            <hobbies>
                <hobby>painting</hobby>
                <hobby>football</hobby>
            </hobbies>
            <jogging distance="10.3">daily</jogging>
            <purpose2>
                <id>1</id>
                <purpose>Dont know</purpose>
            </purpose2>
        </Person>
    </DocumentList>
</DocumentBody>
</person:personDocument>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

擦肩而过的背影 2025-02-12 20:08:00

提供答案,因为它可能会对将来有所帮助:

首先,我们需要创建一个可以保存信息的类上下文node

@Getter
@Setter
public class ContextNode {
    protected String name;
    protected String value;
    protected ArrayList<ContextNode> attributes = new ArrayList<>();
    protected ArrayList<ContextNode> children = new ArrayList<>();
    protected ContextNode parent;
    protected Map<String, String> namespaces;

    public ContextNode(final ContextNode parent, final String name, final String value) {
        this.parent = parent;
        this.name = name;
        this.value = value;
        this.namespaces = parent.namespaces;
    }
   
    public ContextNode(final Map<String, String> namespaces) {
        this.namespaces = namespaces;
    }

    public ContextNode(final Map<String, String> namespaces) {
        this.namespaces = namespaces;
    }
}

然后,我们可以读取XML并将信息存储在上下文节点中:

import lombok.Getter;
import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;

import java.security.NoSuchAlgorithmException;
import java.util.*;

public class SaxHandler extends DefaultHandler {

    //Variables needed to store the required information during the parsing of the XML document.
    private final Deque<String> path = new ArrayDeque<>();
    private final StringBuilder currentValue = new StringBuilder();
    private ContextNode currentNode = null;
    private ContextNode rootNode = null;
    private Map<String, String> currentAttributes;
    private final HashMap<String, String> contextHeader = new HashMap<>();

    @Override
    public void startElement(final String uri, final String localName, final String qName, final Attributes attributes) {
        //Put every XML tag within the stack at the beginning of the XML tag.
        path.push(qName);

        //Reset attributes for every element
        currentAttributes = new HashMap<>();

        //Get the path from Deque as / separated values.
        final String p = path();

        //If the XML tag contains the Namespaces or attributes then add to respective Namespaces Map or Attributes Map.
        if (attributes.getLength() > 0) {
            //Loop over every attribute and add them to respective Map.
            for (int att = 0; att < attributes.getLength(); att++) {
                //If the attributes contain the : then consider them as namespaces.
                if (attributes.getQName(att).contains(":") && attributes.getQName(att).startsWith("xmlns:")) {
                    contextHeader.put(attributes.getQName(att).substring(attributes.getQName(att).indexOf(":") + 1), attributes.getValue(att));
                } else {
                    currentAttributes.put(attributes.getQName(att), attributes.getValue(att).trim());
                }
            }
        }

        if (rootNode == null) {
            rootNode = new ContextNode(contextHeader);
            currentNode = rootNode;
            rootNode.children.add(new ContextNode(rootNode, "type", qName));
        } else if (currentNode != null) {
            ContextNode n = new ContextNode(currentNode, qName, (String) null);
            currentNode.children.add(n);
            currentNode = n;
        }
    }

    @Override
    public void characters(char[] ch, int start, int length) {
        currentValue.append(ch, start, length);
    }

    @Override
    public void endElement(final String uri, final String localName, final String qName) {
        try {
            System.out.println("completed reading");
            System.out.println(rootNode);
        } catch (NoSuchAlgorithmException e) {
            e.printStackTrace();
        }


        rootNode = null;
        

        //At the end of the XML element tag reset the value for next element.
        currentValue.setLength(0);

        //After completing the particular element reading, remove that element from the stack.
        path.pop();
    }

    private String path() {
        return String.join("/", this.path);
    }
}


您可能需要根据您的特定要求进行一些其他更改。这只是一个给出一些想法的样本。

Providing the answer as it can be helpful to someone in the future:

First we need to create a class ContextNode which can hold the information:

@Getter
@Setter
public class ContextNode {
    protected String name;
    protected String value;
    protected ArrayList<ContextNode> attributes = new ArrayList<>();
    protected ArrayList<ContextNode> children = new ArrayList<>();
    protected ContextNode parent;
    protected Map<String, String> namespaces;

    public ContextNode(final ContextNode parent, final String name, final String value) {
        this.parent = parent;
        this.name = name;
        this.value = value;
        this.namespaces = parent.namespaces;
    }
   
    public ContextNode(final Map<String, String> namespaces) {
        this.namespaces = namespaces;
    }

    public ContextNode(final Map<String, String> namespaces) {
        this.namespaces = namespaces;
    }
}

Then we can read the XML and store the information in the context node:

import lombok.Getter;
import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;

import java.security.NoSuchAlgorithmException;
import java.util.*;

public class SaxHandler extends DefaultHandler {

    //Variables needed to store the required information during the parsing of the XML document.
    private final Deque<String> path = new ArrayDeque<>();
    private final StringBuilder currentValue = new StringBuilder();
    private ContextNode currentNode = null;
    private ContextNode rootNode = null;
    private Map<String, String> currentAttributes;
    private final HashMap<String, String> contextHeader = new HashMap<>();

    @Override
    public void startElement(final String uri, final String localName, final String qName, final Attributes attributes) {
        //Put every XML tag within the stack at the beginning of the XML tag.
        path.push(qName);

        //Reset attributes for every element
        currentAttributes = new HashMap<>();

        //Get the path from Deque as / separated values.
        final String p = path();

        //If the XML tag contains the Namespaces or attributes then add to respective Namespaces Map or Attributes Map.
        if (attributes.getLength() > 0) {
            //Loop over every attribute and add them to respective Map.
            for (int att = 0; att < attributes.getLength(); att++) {
                //If the attributes contain the : then consider them as namespaces.
                if (attributes.getQName(att).contains(":") && attributes.getQName(att).startsWith("xmlns:")) {
                    contextHeader.put(attributes.getQName(att).substring(attributes.getQName(att).indexOf(":") + 1), attributes.getValue(att));
                } else {
                    currentAttributes.put(attributes.getQName(att), attributes.getValue(att).trim());
                }
            }
        }

        if (rootNode == null) {
            rootNode = new ContextNode(contextHeader);
            currentNode = rootNode;
            rootNode.children.add(new ContextNode(rootNode, "type", qName));
        } else if (currentNode != null) {
            ContextNode n = new ContextNode(currentNode, qName, (String) null);
            currentNode.children.add(n);
            currentNode = n;
        }
    }

    @Override
    public void characters(char[] ch, int start, int length) {
        currentValue.append(ch, start, length);
    }

    @Override
    public void endElement(final String uri, final String localName, final String qName) {
        try {
            System.out.println("completed reading");
            System.out.println(rootNode);
        } catch (NoSuchAlgorithmException e) {
            e.printStackTrace();
        }


        rootNode = null;
        

        //At the end of the XML element tag reset the value for next element.
        currentValue.setLength(0);

        //After completing the particular element reading, remove that element from the stack.
        path.pop();
    }

    private String path() {
        return String.join("/", this.path);
    }
}


You may need to make some additional changes based on your particular requirement. This is just a sample that gives some idea.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文