在 Java 中使用 XPath 查询 HTML 页面

发布于 2024-09-11 14:01:47 字数 1058 浏览 3 评论 0原文

谁能告诉我一个 Java 库,它允许我在 html 页面上执行 XPath 查询?

我尝试使用 JAXP,但它一直给我一个奇怪的错误,我似乎无法修复(线程“main”java.io.IOException:服务器返回 HTTP 响应代码:503 对于 URL:http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd)。

非常感谢。

编辑

我发现了这个:

// Create a new SAX Parser factory
SAXParserFactory factory = SAXParserFactory.newInstance();

// Turn on validation
factory.setValidating(true);

// Create a validating SAX parser instance
SAXParser parser = factory.newSAXParser();

// Create a new DOM Document Builder factory
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

// Turn on validation
factory.setValidating(true);

// Create a validating DOM parser
DocumentBuilder builder = factory.newDocumentBuilder();

来自 http://www.ibm.com/ developerworks/xml/library/x-jaxpval.html 但是将参数变为 false 并没有改变任何东西。

Can anyone advise me a library for Java that allows me to perform an XPath Query over an html page?

I tried using JAXP but it keeps giving me a strange error that I cannot seem to fix (thread "main" java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd).

Thank you very much.

EDIT

I found this:

// Create a new SAX Parser factory
SAXParserFactory factory = SAXParserFactory.newInstance();

// Turn on validation
factory.setValidating(true);

// Create a validating SAX parser instance
SAXParser parser = factory.newSAXParser();

// Create a new DOM Document Builder factory
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

// Turn on validation
factory.setValidating(true);

// Create a validating DOM parser
DocumentBuilder builder = factory.newDocumentBuilder();

from http://www.ibm.com/developerworks/xml/library/x-jaxpval.html But turning the argumrent to false did not change anything.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

微暖i 2024-09-18 14:01:47

将解析器设置为“非验证”只会关闭验证;它禁止获取DTD。据我记得,获取 DTD 不仅是为了验证,还为了实体扩展。

如果您想禁止获取 DTD,则需要向 DocumentBuilderFactoryDocumentBuilder 注册适当的 EntityResolver。实现 EntityResolverresolveEntity 方法以始终返回空字符串。

Setting the parser to "non validating" just turns off validation; it does not inhibit fetching of DTD's. Fetching of DTD is needed not just for validation, but also for entity expansion... as far as I recall.

If you want to suppress fetching of DTD's, you need to register a proper EntityResolver to the DocumentBuilderFactory or DocumentBuilder. Implement the EntityResolver's resolveEntity method to always return an empty string.

方圜几里 2024-09-18 14:01:47

看看这个:

http://www.w3c_s_excessive_dtd_traffic" w3.org/2005/06/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

可能您已将解析器设置为执行 DOM 验证,并且它正在尝试检索 DTD。 JAXP 应该有一种方法来禁用 DTD 验证,并且只需针对假定有效的文档运行 XPATH。我已经很多年没有使用 JAXP 了,所以很抱歉我无法为您提供更多帮助。

Take a look at this:

http://www.w3.org/2005/06/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

Probably you have the parser set to perform DOM validation, and it is trying to retrieve the DTD. JAXP should have a way to disable DTD validation, and just run XPATH against a document assumed to be valid. I haven't used JAXP is many years so I'm sorry I couldn't be more helpful.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文