PhantomJS 页面转储脚本问题
Digikey 更改了他们的网站,现在有一个名为 onload via post 的 javascript。这杀死了我以前的简单 java HTML 代码检索器。我正在尝试使用 PhantomJS 来允许在保存 HTML/文本之前执行 javascript。
var page = new WebPage(),
t, address;
var fs = require('fs');
if (phantom.args.length === 0) {
console.log('Usage: save.js <some URL>');
phantom.exit();
} else {
address = encodeURI(phantom.args[0]);
page.open(address, function (status) {
if (status !== 'success') {
console.log('FAIL to load the address');
} else {
f = null;
var markup = page.content;
console.log(markup);
try {
f = fs.open('htmlcode.txt', "w");
f.write(markup);
f.close();
} catch (e) {
console.log(e);
}
}
phantom.exit();
});
}
此代码适用于大多数网页,但在以下网页上失败:
http://search.digikey.com/ script/dksearch/dksus.dll?keywords=S7072-ND
这是我的测试用例。它无法打开 URL,然后 PhantomJS 崩溃。使用win32静态构建1.3。
有什么建议吗?
基本上我所追求的是 wget 来竞争页面渲染和保存文件之前修改文档的脚本。
Digikey has changed their website and now has a javascript that is called onload via post. This killed my former simple java HTML code retriever. I am trying to use PhantomJS to allow the execution of the javascript before saving the HTML/text.
var page = new WebPage(),
t, address;
var fs = require('fs');
if (phantom.args.length === 0) {
console.log('Usage: save.js <some URL>');
phantom.exit();
} else {
address = encodeURI(phantom.args[0]);
page.open(address, function (status) {
if (status !== 'success') {
console.log('FAIL to load the address');
} else {
f = null;
var markup = page.content;
console.log(markup);
try {
f = fs.open('htmlcode.txt', "w");
f.write(markup);
f.close();
} catch (e) {
console.log(e);
}
}
phantom.exit();
});
}
This code works with most webpages but fails on:
http://search.digikey.com/scripts/dksearch/dksus.dll?keywords=S7072-ND
Which is my test case. It fails to open the URL and then PhantomJS crashes. Using win32 static build 1.3.
Any tips?
Basically what I am after is wget that competes the page rendering and scripts that modify the document before saving the file.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
一个快速但肮脏的解决方案...但已发布在 phantomjs 网站上...是使用超时。我已经修改了您的代码以包含 2 秒的等待。这允许页面在将内容转储到文件之前加载 2 秒。如果您需要精确的秒数或时间量会有很大差异,则此解决方案可能不适合您。
a quick an dirty solution... and yet is posted on the phantomjs site... is to use a time out. I have modified your code to include a 2 second wait. this allows the page to load for 2 seconds before dumping the contents to a file. If you need the exact second or the amount of time will vary greatly this solution probably wont work for you.