在 Java 中从 Wordpress feed 解析 XML

发布于 2024-12-08 16:03:52 字数 1296 浏览 0 评论 0原文

private void parseXml(String urlPath) throws Exception {
    URL url = new URL(urlPath);
    URLConnection connection = url.openConnection();
    DocumentBuilder db = DOCUMENT_BUILDER_FACTORY.newDocumentBuilder();

    final Document document = db.parse(connection.getInputStream());
    XPath xPathEvaluator = XPATH_FACTORY.newXPath();
    XPathExpression nameExpr = xPathEvaluator.compile("rss/channel/item/title");
    NodeList trackNameNodes = (NodeList) nameExpr.evaluate(document, XPathConstants.NODESET);
    for (int i = 0; i < trackNameNodes.getLength(); i++) {
        Node trackNameNode = trackNameNodes.item(i);
            System.out.println(String.format("Blog Entry Title: %s" , trackNameNode.getTextContent()));
        XPathExpression artistNameExpr = xPathEvaluator.compile("rss/channel/item/content:encoded");
        NodeList artistNameNodes = (NodeList) artistNameExpr.evaluate(trackNameNode, XPathConstants.NODESET);
        for (int j=0; j < artistNameNodes.getLength(); j++) {
            System.out.println(String.format(" - Artist Name: %s", artistNameNodes.item(j).getTextContent()));
        }
    }
}

我有这段代码用于解析默认 wordpress xml 中的标题和内容,唯一的问题是当我尝试获取博客条目的内容时,xml 标记为: 我不明白如何检索这些数据?

private void parseXml(String urlPath) throws Exception {
    URL url = new URL(urlPath);
    URLConnection connection = url.openConnection();
    DocumentBuilder db = DOCUMENT_BUILDER_FACTORY.newDocumentBuilder();

    final Document document = db.parse(connection.getInputStream());
    XPath xPathEvaluator = XPATH_FACTORY.newXPath();
    XPathExpression nameExpr = xPathEvaluator.compile("rss/channel/item/title");
    NodeList trackNameNodes = (NodeList) nameExpr.evaluate(document, XPathConstants.NODESET);
    for (int i = 0; i < trackNameNodes.getLength(); i++) {
        Node trackNameNode = trackNameNodes.item(i);
            System.out.println(String.format("Blog Entry Title: %s" , trackNameNode.getTextContent()));
        XPathExpression artistNameExpr = xPathEvaluator.compile("rss/channel/item/content:encoded");
        NodeList artistNameNodes = (NodeList) artistNameExpr.evaluate(trackNameNode, XPathConstants.NODESET);
        for (int j=0; j < artistNameNodes.getLength(); j++) {
            System.out.println(String.format(" - Artist Name: %s", artistNameNodes.item(j).getTextContent()));
        }
    }
}

I have this code for parsing the title and content from the default wordpress xml, the only problem is that when I try to get the content of the blog entry, the xml tag is: <content:encoded> and I do not understand how to retrieve this data ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

假面具 2024-12-15 16:03:52

标签 表示 XML 命名空间中名称为 encoded 且前缀为 content 的元素。 XPath 评估器可能无法解析其名称空间的 content 前缀,我认为它是 http://purl.org/rss/1.0/modules/content/ 来自快速谷歌。

要解决此问题,您需要执行以下操作:

  1. 确保您的 DocumentBuilderFactory 在构造后调用了 setNamespaceAware( true ) ,否则在解析过程中所有名称空间都将被丢弃。
  2. 编写 javax.xml.namespace.NamespaceContext 的实现来解析其命名空间的前缀 (doc)。
  3. 使用您的实现调用 XPath#setNamespaceContext()

The tag <content:encoded> means an element with the name encoded in the XML namespace with the prefix content. The XPath evaluator is probably unable to resolve the content prefix to it's namespace, which I think is http://purl.org/rss/1.0/modules/content/ from a quick Google.

To get it to resolve, you'll need to do the following:

  1. Ensure your DocumentBuilderFactory has setNamespaceAware( true ) called on it after construction, otherwise all namespaces are discarded during parsing.
  2. Write an implementation of javax.xml.namespace.NamespaceContext to resolve the prefix to it's namespace (doc).
  3. Call XPath#setNamespaceContext() with your implementation.
世态炎凉 2024-12-15 16:03:52

您还可以尝试使用 XStream,这是一个很好且易于使用的 XML 解析器。使您几乎无需解析已知的 XML 结构。

PS:他们的网站目前离线,使用Google Cache查看=P

You could also try to use XStream, wich is a good and easy to use XML parser. Makes you have almost no work for parsing known XML structures.

PS: Their site is currently offline, use Google Cache to see it =P

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文