在 Java 中从 Wordpress feed 解析 XML
private void parseXml(String urlPath) throws Exception {
URL url = new URL(urlPath);
URLConnection connection = url.openConnection();
DocumentBuilder db = DOCUMENT_BUILDER_FACTORY.newDocumentBuilder();
final Document document = db.parse(connection.getInputStream());
XPath xPathEvaluator = XPATH_FACTORY.newXPath();
XPathExpression nameExpr = xPathEvaluator.compile("rss/channel/item/title");
NodeList trackNameNodes = (NodeList) nameExpr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < trackNameNodes.getLength(); i++) {
Node trackNameNode = trackNameNodes.item(i);
System.out.println(String.format("Blog Entry Title: %s" , trackNameNode.getTextContent()));
XPathExpression artistNameExpr = xPathEvaluator.compile("rss/channel/item/content:encoded");
NodeList artistNameNodes = (NodeList) artistNameExpr.evaluate(trackNameNode, XPathConstants.NODESET);
for (int j=0; j < artistNameNodes.getLength(); j++) {
System.out.println(String.format(" - Artist Name: %s", artistNameNodes.item(j).getTextContent()));
}
}
}
我有这段代码用于解析默认 wordpress xml 中的标题和内容,唯一的问题是当我尝试获取博客条目的内容时,xml 标记为:
我不明白如何检索这些数据?
private void parseXml(String urlPath) throws Exception {
URL url = new URL(urlPath);
URLConnection connection = url.openConnection();
DocumentBuilder db = DOCUMENT_BUILDER_FACTORY.newDocumentBuilder();
final Document document = db.parse(connection.getInputStream());
XPath xPathEvaluator = XPATH_FACTORY.newXPath();
XPathExpression nameExpr = xPathEvaluator.compile("rss/channel/item/title");
NodeList trackNameNodes = (NodeList) nameExpr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < trackNameNodes.getLength(); i++) {
Node trackNameNode = trackNameNodes.item(i);
System.out.println(String.format("Blog Entry Title: %s" , trackNameNode.getTextContent()));
XPathExpression artistNameExpr = xPathEvaluator.compile("rss/channel/item/content:encoded");
NodeList artistNameNodes = (NodeList) artistNameExpr.evaluate(trackNameNode, XPathConstants.NODESET);
for (int j=0; j < artistNameNodes.getLength(); j++) {
System.out.println(String.format(" - Artist Name: %s", artistNameNodes.item(j).getTextContent()));
}
}
}
I have this code for parsing the title and content from the default wordpress xml, the only problem is that when I try to get the content of the blog entry, the xml tag is: <content:encoded>
and I do not understand how to retrieve this data ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
标签
表示 XML 命名空间中名称为encoded
且前缀为content
的元素。 XPath 评估器可能无法解析其名称空间的content
前缀,我认为它是http://purl.org/rss/1.0/modules/content/
来自快速谷歌。要解决此问题,您需要执行以下操作:
setNamespaceAware( true )
,否则在解析过程中所有名称空间都将被丢弃。javax.xml.namespace.NamespaceContext
的实现来解析其命名空间的前缀 (doc)。XPath#setNamespaceContext()
。The tag
<content:encoded>
means an element with the nameencoded
in the XML namespace with the prefixcontent
. The XPath evaluator is probably unable to resolve thecontent
prefix to it's namespace, which I think ishttp://purl.org/rss/1.0/modules/content/
from a quick Google.To get it to resolve, you'll need to do the following:
setNamespaceAware( true )
called on it after construction, otherwise all namespaces are discarded during parsing.javax.xml.namespace.NamespaceContext
to resolve the prefix to it's namespace (doc).XPath#setNamespaceContext()
with your implementation.您还可以尝试使用 XStream,这是一个很好且易于使用的 XML 解析器。使您几乎无需解析已知的 XML 结构。
PS:他们的网站目前离线,使用Google Cache查看=P
You could also try to use XStream, wich is a good and easy to use XML parser. Makes you have almost no work for parsing known XML structures.
PS: Their site is currently offline, use Google Cache to see it =P