PhantomJS 页面转储脚本问题

发布于 2024-12-23 10:11:57 字数 1112 浏览 3 评论 0原文

Digikey 更改了他们的网站,现在有一个名为 onload via post 的 javascript。这杀死了我以前的简单 java HTML 代码检索器。我正在尝试使用 PhantomJS 来允许在保存 HTML/文本之前执行 javascript。

var page = new WebPage(),
t, address;


var fs = require('fs');

if (phantom.args.length === 0) {

console.log('Usage: save.js <some URL>');
phantom.exit();
} else {

address = encodeURI(phantom.args[0]);
page.open(address, function (status) {
    if (status !== 'success') {
        console.log('FAIL to load the address');
    } else {
        f = null;
        var markup = page.content;
        console.log(markup);
        try {
        f = fs.open('htmlcode.txt', "w");
        f.write(markup);
        f.close();          
        } catch (e) {
            console.log(e);
        }
    }   
    phantom.exit();

});

}

此代码适用于大多数网页,但在以下网页上失败:

http://search.digikey.com/ script/dksearch/dksus.dll?keywords=S7072-ND

这是我的测试用例。它无法打开 URL,然后 PhantomJS 崩溃。使用win32静态构建1.3。

有什么建议吗?

基本上我所追求的是 wget 来竞争页面渲染和保存文件之前修改文档的脚本。

Digikey has changed their website and now has a javascript that is called onload via post. This killed my former simple java HTML code retriever. I am trying to use PhantomJS to allow the execution of the javascript before saving the HTML/text.

var page = new WebPage(),
t, address;


var fs = require('fs');

if (phantom.args.length === 0) {

console.log('Usage: save.js <some URL>');
phantom.exit();
} else {

address = encodeURI(phantom.args[0]);
page.open(address, function (status) {
    if (status !== 'success') {
        console.log('FAIL to load the address');
    } else {
        f = null;
        var markup = page.content;
        console.log(markup);
        try {
        f = fs.open('htmlcode.txt', "w");
        f.write(markup);
        f.close();          
        } catch (e) {
            console.log(e);
        }
    }   
    phantom.exit();

});

}

This code works with most webpages but fails on:

http://search.digikey.com/scripts/dksearch/dksus.dll?keywords=S7072-ND

Which is my test case. It fails to open the URL and then PhantomJS crashes. Using win32 static build 1.3.

Any tips?

Basically what I am after is wget that competes the page rendering and scripts that modify the document before saving the file.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

初相遇 2024-12-30 10:11:58

一个快速但肮脏的解决方案...但已发布在 phantomjs 网站上...是使用超时。我已经修改了您的代码以包含 2 秒的等待。这允许页面在将内容转储到文件之前加载 2 秒。如果您需要精确的秒数或时间量会有很大差异,则此解决方案可能不适合您。

var page = new WebPage(),

t, address;


var fs = require('fs');

if (phantom.args.length === 0) {

console.log('Usage: save.js <some URL>');
phantom.exit();
} else {

address = encodeURI(phantom.args[0]);
page.open(address, function (status) {
    if (status !== 'success') {
        console.log('FAIL to load the address');
    } else {
         window.setTimeout(function(){
            f = null;
            var markup = page.content;
            console.log(markup);
            try {
            f = fs.open('htmlcode.txt', "w");
            f.write(markup);
            f.close();          
            } catch (e) {
                console.log(e);
            }
        }   
        phantom.exit();
    },2000);
});

}

a quick an dirty solution... and yet is posted on the phantomjs site... is to use a time out. I have modified your code to include a 2 second wait. this allows the page to load for 2 seconds before dumping the contents to a file. If you need the exact second or the amount of time will vary greatly this solution probably wont work for you.

var page = new WebPage(),

t, address;


var fs = require('fs');

if (phantom.args.length === 0) {

console.log('Usage: save.js <some URL>');
phantom.exit();
} else {

address = encodeURI(phantom.args[0]);
page.open(address, function (status) {
    if (status !== 'success') {
        console.log('FAIL to load the address');
    } else {
         window.setTimeout(function(){
            f = null;
            var markup = page.content;
            console.log(markup);
            try {
            f = fs.open('htmlcode.txt', "w");
            f.write(markup);
            f.close();          
            } catch (e) {
                console.log(e);
            }
        }   
        phantom.exit();
    },2000);
});

}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文