PhantomJS 和修改 DOM
我正在开发一个工具,需要从第 3 方服务器下载网页,像浏览器一样执行它,然后解析 HTML。我遇到的困难是该工具需要在执行所有 javascript 并修改 DOM 后解析 HTML。我正在尝试使用 PhantomJS 来实现此目的,它适用于小代码片段(只是一个带有外部 javascript 的小 html 文档,将一些节点添加到 DOM),但是当我对真实站点执行相同操作时(http://www.dba.dk/) 在 js 代码完成所有修改后,我没有得到最终的 HTML。
我真的需要这方面的帮助,因为我已经坚持了一个多星期了。
我的 PhantomJS 代码很简单:
if (phantom.state.length === 0) {
if (phantom.args.length === 0) {
console.log('Usage: test.js <some URL>');
phantom.exit();
} else {
var address = phantom.args[0];
phantom.state = Date.now().toString();
phantom.viewportSize = { width: 1280, height: 800 };
phantom.open(address);
}
} else {
var elapsed = Date.now() - new Date().setTime(phantom.state);
if (phantom.loadStatus === 'success') {
if (!first_time) {
var first_time = true;
if (!document.addEventListener) {
console.log('Not SUPPORTED!');
}
phantom.render('result.png');
var markup = document.documentElement.innerHTML;
console.log(markup);
phantom.exit();
}
} else {
console.log('FAIL to load the address');
phantom.exit();
}
}
转储到控制台的 HTML 不包含动态生成的内容
I'm developing a tool that needs to download a web page from 3rd party server, execute it as a browser would and then parse the HTML. What I struggle with is that the tool need to parse the HTML after all javascript is executed and DOM is modified. I'm trying to use PhantomJS for this purpose and it works on small snippets of code (just a tiny html document with external javascript that adds some nodes to DOM) but when I do the same with a real site (http://www.dba.dk/) I'm not getting the final HTML after all modifications done by the js code.
I really need help on this as I have been stuck with it for more than a week.
My PhantomJS code is simple:
if (phantom.state.length === 0) {
if (phantom.args.length === 0) {
console.log('Usage: test.js <some URL>');
phantom.exit();
} else {
var address = phantom.args[0];
phantom.state = Date.now().toString();
phantom.viewportSize = { width: 1280, height: 800 };
phantom.open(address);
}
} else {
var elapsed = Date.now() - new Date().setTime(phantom.state);
if (phantom.loadStatus === 'success') {
if (!first_time) {
var first_time = true;
if (!document.addEventListener) {
console.log('Not SUPPORTED!');
}
phantom.render('result.png');
var markup = document.documentElement.innerHTML;
console.log(markup);
phantom.exit();
}
} else {
console.log('FAIL to load the address');
phantom.exit();
}
}
the HTML dumped to the console doesn't contain content generated dynamically
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题出在 Flash 插件上。页面正在检测它的缺失。一旦正确加载,问题就消失了
The problem was in the Flash plugin. The pages were detecting its absense. Once it was loaded correctly the problem was gone