如何用Java解析高级XML文件

发布于 2024-10-04 18:51:22 字数 1840 浏览 5 评论 0 原文

我见过很多关于如何用 Java 读取 XML 文件的示例。但它们只显示简单的 XML 文件。例如,它们展示了如何从 XML 文件中提取名字和姓氏。但是我需要从 collada XML 文件中提取数据。像这样:

<library_visual_scenes>
    <visual_scene id="ID1">
        <node name="SketchUp">
            <instance_geometry url="#ID2">
                <bind_material>
                    <technique_common>
                        <instance_material symbol="Material2" target="#ID3">
                            <bind_vertex_input semantic="UVSET0" input_semantic="TEXCOORD" input_set="0" />
                        </instance_material>
                    </technique_common>
                </bind_material>
            </instance_geometry>
        </node>
    </visual_scene>
</library_visual_scenes>

这只是 collada 文件的一小部分。这里我需要提取 Visual_scene 的 id,然后是 instance_geometry 的 url,最后是 instance_material 的目标。当然,我需要提取更多内容,但我不明白如何真正使用它,这是一个开始的地方。

到目前为止,我有这段代码:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
    builder = factory.newDocumentBuilder();
}
catch( ParserConfigurationException error ) {
    Log.e( "Collada", error.getMessage() ); return;
}
Document document = null;
try {
    document = builder.parse( string );
}
catch( IOException error ) {
    Log.e( "Collada", error.getMessage() ); return;
}
catch( SAXException error ) {
    Log.e( "Collada", error.getMessage() ); return;
}
NodeList library_visual_scenes = document.getElementsByTagName( "library_visual_scenes" );

似乎网络上的大多数示例都与此类似: http://www.easywayserver.com/blog/java-how-to-read-xml-file/

我需要帮助弄清楚当我想提取更深的标签时该怎么做或者找到一个关于读取/解析 XML 文件的好教程。

I've seen numerous examples about how to read XML files in Java. But they only show simple XML files. For example they show how to extract first and last names from an XML file. However I need to extract data from a collada XML file. Like this:

<library_visual_scenes>
    <visual_scene id="ID1">
        <node name="SketchUp">
            <instance_geometry url="#ID2">
                <bind_material>
                    <technique_common>
                        <instance_material symbol="Material2" target="#ID3">
                            <bind_vertex_input semantic="UVSET0" input_semantic="TEXCOORD" input_set="0" />
                        </instance_material>
                    </technique_common>
                </bind_material>
            </instance_geometry>
        </node>
    </visual_scene>
</library_visual_scenes>

This is only a small part of a collada file. Here I need to extract the id of visual_scene, and then the url of instance_geometry and last the target of instance_material. Of course I need to extract much more, but I don't understand how to use it really and this is a place to start.

I have this code so far:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
    builder = factory.newDocumentBuilder();
}
catch( ParserConfigurationException error ) {
    Log.e( "Collada", error.getMessage() ); return;
}
Document document = null;
try {
    document = builder.parse( string );
}
catch( IOException error ) {
    Log.e( "Collada", error.getMessage() ); return;
}
catch( SAXException error ) {
    Log.e( "Collada", error.getMessage() ); return;
}
NodeList library_visual_scenes = document.getElementsByTagName( "library_visual_scenes" );

It seems like most examples on the web is similar to this one: http://www.easywayserver.com/blog/java-how-to-read-xml-file/

I need help figuring out what to do when I want to extract deeper tags or find a good tutorial on reading/parsing XML files.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

楠木可依 2024-10-11 18:51:22

实际上,当您调用 builder.parse(string) 时,您的解析本身就已经完成了。现在您需要了解的是如何从解析的 XML 文档中选择/查询信息。

关于如何做到这一点,我同意@khachik 的观点。详细说明一下(因为没有其他人发布答案):

XPath 是提取信息最方便的方法,如果您的输入文档不是很大,XPath 就足够快了。 这里是关于 Java 中 XPath 的一个很好的入门教程。如果您需要随机访问 XML 数据(即,如果您必须以与源文档中显示的顺序不同的顺序从树中来回提取数据),也建议使用 XPath,因为 SAX 是为线性访问而设计的。

一些示例 XPath 表达式:

  • 提取 Visual_scene 的 id:/*/visual_scene/@id
  • instance_geometry 的 url:/*/visual_scene/node/instance_geometry/@url
  • url名称为 Sketchup 的节点的 instance_geometry:/*/visual_scene/node[@name = 'Sketchup']/instance_geometry/@url
  • instance_material 的目标:/*/visual_scene/node/ instance_geometry/bind_material/technique_common/instance_material/@target

由于 COLLADA 模型可能非常大,因此您可能需要执行基于 SAX 的过滤器,这将允许您以流模式处理文档,而不必保留所有文档一下子就记在记忆里了。但是,如果您现有的用于解析 XML 的代码已经执行得足够好,那么您可能不需要 SAX。使用 SAX 提取特定数据比 XPath 更复杂。

Really, your parsing per se is already done when you call builder.parse(string). What you need to know now is how to select/query information from the parsed XML document.

I would agree with @khachik regarding how to do that. Elaborating a little (since no one else has posted an answer):

XPath is the most convenient way to extract information, and if your input document is not huge, XPath is fast enough. Here is a good starting tutorial on XPath in Java. XPath is also recommended if you need random access to the XML data (i.e. if you have to go back and forth extracting data from the tree in a different order than it appears in the source document), since SAX is designed for linear access.

Some sample XPath expressions:

  • extract the id of visual_scene: /*/visual_scene/@id
  • the url of instance_geometry: /*/visual_scene/node/instance_geometry/@url
  • the url of instance_geometry for node whose name is Sketchup: /*/visual_scene/node[@name = 'Sketchup']/instance_geometry/@url
  • the target of instance_material: /*/visual_scene/node/instance_geometry/bind_material/technique_common/instance_material/@target

Since COLLADA models can be really large, you might need to do a SAX-based filter, which will allow you to process the document in stream mode without having to keep it all in memory at once. But if your existing code to parse the XML is already performing well enough, you may not need SAX. SAX is more complicated to use for extracting specific data than XPath.

笨死的猪 2024-10-11 18:51:22

您在代码中使用 DOM。
DOM 为其解析的 xml 文件创建一个树形结构,您必须遍历该树才能获取各个节点中的信息。
在您的代码中,您所做的就是创建树表示。即

document = builder.parse( string );//document is loaded in memory as tree  

现在您应该引用 DOM api 来了解如何获取您需要的信息。

NodeList library_visual_scenes = document.getElementsByTagName( "library_visual_scenes" );

例如,此方法返回具有指定名称的所有元素的 NodeList。
现在您应该循环遍历 NodeList

 for (int i = 0; i < library_visual_scenes.getLength(); i++) {
   Element element = (Element) nodes.item(i);
   Node visual_scene = element.getFirstChild();
   if(visual_scene.getNodeType() == Node.ELEMENT_NODE)
   {
      String id = ((Element)visual_scene).getAttribute(id);
      System.out.println("id="+id);
    }
 }

免责声明:这是示例代码。没有编译过。它向您展示了这个概念。您应该研究 DOM api。

You are using DOM in your code.
DOM creates a tree structure of the xml file it parsed, and you have to traverse the tree to get the information in various nodes.
In your code all you did is create the tree representation. I.e.

document = builder.parse( string );//document is loaded in memory as tree  

Now you should reference the DOM apis to see how to get the information you need.

NodeList library_visual_scenes = document.getElementsByTagName( "library_visual_scenes" );

For instance this method returns a NodeList of all elements with the specified name.
Now you should loop over the NodeList

 for (int i = 0; i < library_visual_scenes.getLength(); i++) {
   Element element = (Element) nodes.item(i);
   Node visual_scene = element.getFirstChild();
   if(visual_scene.getNodeType() == Node.ELEMENT_NODE)
   {
      String id = ((Element)visual_scene).getAttribute(id);
      System.out.println("id="+id);
    }
 }

DISCLAIMER: This is a sample code. Have not compiled it. It shows you the concept. You should look into DOM apis.

清风无影 2024-10-11 18:51:22

EclipseLink JAXB (MOXy) 有一个有用的 @XmlPath 扩展,用于利用 XPath 来填充对象。这可能就是您正在寻找的。注意:我是 MOXy 技术负责人。

以下示例将简单的地址对象映射到 Google 的地理编码信息表示形式:

package blog.geocode;

import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;

import org.eclipse.persistence.oxm.annotations.XmlPath;

@XmlRootElement(name="kml")
@XmlType(propOrder={"country", "state", "city", "street", "postalCode"})
public class Address {

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:Thoroughfare/ns:ThoroughfareName/text()")
    private String street;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:LocalityName/text()")
    private String city;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:AdministrativeAreaName/text()")
    private String state;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:CountryNameCode/text()")
    private String country;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:PostalCode/ns:PostalCodeNumber/text()")
    private String postalCode;

}

有关示例的其余部分,请参阅:

EclipseLink JAXB (MOXy) has a useful @XmlPath extension for leveraging XPath to populate an object. It may be what you are looking for. Note: I am the MOXy tech lead.

The following example maps a simple address object to Google's representation of geocode information:

package blog.geocode;

import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;

import org.eclipse.persistence.oxm.annotations.XmlPath;

@XmlRootElement(name="kml")
@XmlType(propOrder={"country", "state", "city", "street", "postalCode"})
public class Address {

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:Thoroughfare/ns:ThoroughfareName/text()")
    private String street;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:LocalityName/text()")
    private String city;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:AdministrativeAreaName/text()")
    private String state;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:CountryNameCode/text()")
    private String country;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:PostalCode/ns:PostalCodeNumber/text()")
    private String postalCode;

}

For the rest of the example see:

戏蝶舞 2024-10-11 18:51:22

如今,一些 java RAD 工具具有来自给定 DTD 的 java 代码生成器,因此您可以使用它们。

Nowadays, several java RAD tools have java code generators from given DTDs, so you can use them.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文