确定提要是 Atom 还是 RSS
我正在尝试确定给定的提要是基于 Atom 还是基于 RSS。
这是我的代码:
public boolean isRSS(String URL) throws ParserConfigurationException, SAXException, IOException{
DocumentBuilder builder = DocumentBuilderFactory.newInstance()
.newDocumentBuilder();
Document doc = builder
.parse(URL);
return doc.getDocumentElement().getNodeName().equalsIgnoreCase() == "rss";
}
有更好的方法吗?如果我使用 SAX 解析器会更好吗?
I'm trying to determine whether a given feed is Atom based or RSS based.
Here's my code:
public boolean isRSS(String URL) throws ParserConfigurationException, SAXException, IOException{
DocumentBuilder builder = DocumentBuilderFactory.newInstance()
.newDocumentBuilder();
Document doc = builder
.parse(URL);
return doc.getDocumentElement().getNodeName().equalsIgnoreCase() == "rss";
}
Is there a better way to do it? would it be better if I used a SAX Parser instead?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
根元素是确定提要类型的最简单方法。
rss
(请参阅 规范)feed
(请参阅规范)对于不同的解析器获取根元素的方法有多种。没有一个比另一个低劣。关于 StAX 与 SAX 与 DOM 等的文章已经足够多了,它们可以用作特定决策的基础。
前两行代码没有任何问题:
在 return 语句中,您在 Java 字符串比较中犯了错误。
当您对字符串使用比较运算符
==
时,它会比较引用而不是值(即检查两者是否完全相同的对象)。您应该在此处使用equals()
方法。为了确保我建议使用equalsIgnoreCase()
:提示:如果您在
isRss()
方法您不必使用三元运算符。The root element is the easiest way to determine the type of a feed.
rss
(see specification)feed
(see specification)For different Parsers there are different ways to get the root element. None is inferior to the other. There has been written enough about StAX vs. SAX vs. DOM etc, which can be used as basis for a specific decision.
There is nothing wrong with your first two lines of code:
In your return statement you make a mistake on Java String comparison.
When you use the comparison operator
==
with Strings, it compares references not values (i.e. you check if both are exactly the same object). You should use theequals()
method here. Just to be sure I would recommend to useequalsIgnoreCase()
:Hint: If you check for "rss" instead of "feed" (like for Atom) in your
isRss()
method you don't have to use the ternary operator.嗅探内容是一种方法。但请注意,atom 使用命名空间,并且您正在创建一个非命名空间感知的解析器。
另请注意,不能使用 equalsIgnorCase() 进行比较,因为 XML 元素名称区分大小写。
另一种方法是对 Content-Type 标头做出反应(如果它在 HTTP GET 请求中可用)。 ATOM 的内容类型为
application/atom+xml
,RSS 的内容类型为application/rss+xml
。但我怀疑并非所有 RSS 提要都可以信任正确设置此标头。第三个选项是查看 URL 后缀,例如 .atom 和 .rss。
如果您使用 Spring 或 JAX-RS,最后两种方法很容易配置
Sniffing content is one method. But note that atom uses namespaces, and you are creating a non namespace aware parser.
Note also that you cannot compare using equalsIgnorCase(), since XML element names are case sensitive.
Another method is to react on the Content-Type header, if it is available in a HTTP GET request. Content-Type for ATOM would be
application/atom+xml
and for RSSapplication/rss+xml
. I would suspect though, that not all RSS feed can be trusted to correctky set this header.A third option is to look at the URL suffix, e.g. .atom and .rss.
The last two methods are easily configurable if you are using Spring or JAX-RS
您可以使用 StAX 解析器来避免将整个 XML 文档解析到内存中:
You could use a StAX parser to avoid parsing the entire XML document into memory: