使用 phantomJS 将数据从一个页面复制到另一个页面

发布于 2024-12-25 04:59:14 字数 910 浏览 2 评论 0原文

我正在尝试将一些数据从一个已处理的网页复制到我想要导出的新网页中。背景是我需要抓取页面的部分内容,并需要使用原始页面的部分内容构建一个新页面。 问题似乎是 phantomJs includeJs() 和 evaluate() 方法被沙箱化了,我看不到将 DOM 从一个页面导入到另一个页面的正确方法。

我有一些测试代码,如下所示,页面是原始页面,新页面是:

    ....
    var title = page.evaluate(function() {
        return title = document.getElementById('fooo').innerHTML;
    });
    console.log('page title:' + title);
    //fs.write('c:/Temp/title.js', "var title = '" + title + "';", 'w');

    var out = new WebPage;
    out.viewportSize = page.viewportSize;
    out.content = '<html><head></head><body><div id="wrapper"></div><p>done</p></body></html>';
    out.includeJs('c:/Temp/title.js', function() {
        var p = document.createElement('p');
        p.appendChild(document.createTextNode(title));
        document.getElementById('wrapper').appendChild(p);
    });
    ...

I am trying to copy some data from one processed web page into a new one that I want to export. The background is that I need to scrape parts of a page and need to build a new page with parts of the original page.
The problem seems that phantomJs includeJs() and evaluate() methods are sandboxed and I can't see a proper way to import DOM from one page to another.

I have some test code that looks like this, with page being the original and out the new page:

    ....
    var title = page.evaluate(function() {
        return title = document.getElementById('fooo').innerHTML;
    });
    console.log('page title:' + title);
    //fs.write('c:/Temp/title.js', "var title = '" + title + "';", 'w');

    var out = new WebPage;
    out.viewportSize = page.viewportSize;
    out.content = '<html><head></head><body><div id="wrapper"></div><p>done</p></body></html>';
    out.includeJs('c:/Temp/title.js', function() {
        var p = document.createElement('p');
        p.appendChild(document.createTextNode(title));
        document.getElementById('wrapper').appendChild(p);
    });
    ...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我的奇迹 2025-01-01 04:59:14

您最后一次 includeJs 调用中的函数将不起作用 - 正如您所注意到的,它是沙盒的,这意味着闭包将不起作用,因此 title 将不会定义的。将变量传递给 page.evaluate 的方法是 注明为功能请求,但从 PhantomJS v.1.4.1 开始不可用。

我解决这个问题的一般方法是使用 Function 构造函数,它允许您使用字符串创建一个函数:

var myVar = {some:"values", I:"want to pass into my page"},
    test = new Function("window.myVar = " + JSON.stringify(myVar));
page.evaluate(test);

现在您可以评估像您拥有的函数一样,在沙箱中引用 myVar,您的数据将在客户端范围内可用。

The function in your last includeJs call here won't work - as you note, it's sandboxed, and that means that closures won't work, so title won't be defined. A method of passing variables to page.evaluate is noted as a feature request, but isn't available as of PhantomJS v.1.4.1.

The general way I get around this is by using the Function constructor, which allows you to create a function using a string:

var myVar = {some:"values", I:"want to pass into my page"},
    test = new Function("window.myVar = " + JSON.stringify(myVar));
page.evaluate(test);

Now you can evaluate a function like the one you have, referencing myVar in the sandbox, and your data will be available in the client scope.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文