java html解析器不读取所有页面
我正在解析 html 页面以获取特定信息,但有些页面无法获取网页上显示的所有信息,例如 此页面
我无法获取评论信息。 顺便说一句,如果您查看该页面的源代码,会发现有很多空行,并且没有出现评论信息。
你知道为什么吗? 有图书馆可以阅读这种类型的页面吗?
谢谢
I'm parsing html pages to get specific information, but there are some pages that I cant get all the information displayed on the web page, for example in this page
I cant get the reviews information.
By the way, if you see the source code of the page there are very much empty lines, and the reviews information dont appear.
Do you know why?
Some library to read this type of pages?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我敢打赌他们正在使用某种 JavaScript 来加载评论信息。为了访问该信息,您需要以某种方式模拟请求或评估 javascript,然后解析结果页面。我建议检查他们的 javascript 并模仿他们用于下载评论信息的请求,因为这比尝试评估代码中的 javascript 容易得多。
I'm willing to bet they are using some sort of javascript to load in the review information. In order to access that information, you are going to need to somehow either mimic the request or evaluate the javascript and then parse the resulting page. I would suggest examining their javascript and mimicking the request they use to download the review information as that will be much easier than attempting to evaluate the javascript in your code.