352-phantom 中文文档教程

发布于 9年前 浏览 27 项目主页 更新于 3年前

PhantomJS bridge for NodeJS Build Status

“如果 PhantomJS 是一个 NodeJS 模块,那肯定会很整洁”,我听到你说。 好吧,不要再等了! 这个节点模块在 Phantom 和 Node 之间实现了一个非常聪明的桥梁,这样你就可以使用所有你最喜欢的 PhantomJS 功能,而不用离开 NPM 和住在山洞里。

Installation

首先,确保安装了 PhantomJS。 该模块期望 phantomjs 二进制文件位于 PATH 中的某处。 换句话说,输入:

$ phantomjs

如果可行,那么 phantomjs-node 也行。 它仅在 PhantomJS 1.3 上进行过测试,几乎可以肯定不适用于任何旧版本。

像这样安装:

npm install phantom

如需简要介绍,请继续阅读,否则请转到 Wiki 页面 了解更多信息!

How do I use it?

在 Coffeescript 中这样使用它:

phantom = require 'phantom'

phantom.create (ph) ->
  ph.createPage (page) ->
    page.open "http://www.google.com", (status) ->
      console.log "opened google? ", status
      page.evaluate (-> document.title), (result) ->
        console.log 'Page title is ' + result
        ph.exit()

在 Javascript 中:

var phantom = require('phantom');

phantom.create(function (ph) {
  ph.createPage(function (page) {
    page.open("http://www.google.com", function (status) {
      console.log("opened google? ", status);
      page.evaluate(function () { return document.title; }, function (result) {
        console.log('Page title is ' + result);
        ph.exit();
      });
    });
  });
});

Use it in Windows

默认使用 dnodeweak 模块。 这意味着您需要使用 Microsoft VS2010 或 VS2012 设置 node-gyp,这在 Windows 上是一个巨大的安装。

dnodeOpts 属性可以帮助您控制 dnode 设置,因此您可以通过将其设置为 false 来禁用 weak 以避免复杂的安装。

var phantom = require('phantom');

phantom.create(function (ph) {
  ph.createPage(function (page) {
    /* the page actions */
  });
}, {
  dnodeOpts: {
    weak: false
  }
});

Use it in restricted enviroments

一些环境(例如 OpenShift ) 具有难以或不可能更改的特殊要求,特别是:内部通信服务器的主机名/IP 和端口限制以及 phantomjs 二进制文件的路径。

默认情况下,用于此的主机名/IP 将是 localhost,端口将是端口 0 并且假定 phantomjs 二进制文件位于 PATH 环境变量,但您可以使用 options 对象使用特定配置,如下所示:

var options = {
  port: 16000,
  hostname: "192.168.1.3",
  path: "/phantom_path/"
}

phantom.create(function, options);

Functionality details

您可以使用 PhantomJS API 页面

由于桥的异步性质,一些事情已经改变,但是:

  • Return values (ie, of page.evaluate) are returned in a callback instead
  • page.render() takes a callback so you can tell when it's done writing the file
  • Properties can't be get/set directly, instead use page.get('version', callback) or page.set('viewportSize', {width:640,height:480}), etc. Nested objects can be accessed by including dots in keys, such as page.set('settings.loadImages', false)
  • Callbacks can't be set directly, instead use page.set('callbackName', callback), e.g. page.set('onLoadFinished', function(success) {})
  • onResourceRequested takes a function that executes in the scope of phantom which has access to request.abort(), request.changeUrl(url), and request.setHeader(key,value). The second argument is the callback which can execute in the scope of your code, with access to just the requestData. e.g.
page.onResourceRequested(
    function(requestData, request) { request.abort(); },
    function(requestData) { console.log(requestData.url) }
);

ph.createPage() 生成新的 PhantomJS WebPage 对象,所以如果你使用它想打开很多网页。 您还可以通过多次调用 phantom.create('flags', { port: someDiffNumber}) 来创建多个 phantomjs 进程,所以如果您出于某种疯狂的原因需要它,请把自己搞砸!

此外,您还可以设置退出回调,这将在 phantom.exit() 之后或 phantom 进程崩溃之后调用:

phantom.create('flags', { port: 8080, onExit: exitCallback})

您还可以通过向 指定额外的参数将命令行开关传递给 phantomjs 进程phantom.create(),例如:

phantom.create '--load-images=no', '--local-to-remote-url-access=yes', (page) ->

或通过在选项对象中指定它们:

phantom.create {parameters: {'load-images': 'no', 'local-to-remote-url-access': 'yes'}}, (page) ->

如果您需要访问 ChildProcess phantom process 以获取其 PID,例如,您可以通过 process 属性访问它,如下所示:

phantom.create(function (ph) {
  console.log('phantom process pid:', ph.process.pid);
});

Note for Mac users

Phantom 要求您在机器上安装 XCode 命令行工具,或者否则你会得到一些讨厌的错误(xcode not found 或 make not found)。 如果您还没有安装 XCode,只需通过 App Store 安装 XCode,然后 安装命令行工具

How does it work?

不要问。 这些眼睛看到的东西。

No really, how does it work?

我会用一个问题来回答那个问题。 如何与不支持共享内存、套接字、FIFO 或标准输入的进程通信?

好吧,PhantomJS 确实支持一件事,那就是打开网页。 事实上,它真的很擅长打开网页。 因此,我们通过启动一个 ExpressJS 实例与 PhantomJS 进行通信,在子进程中打开 Phantom,并将其指向一个特殊网页,该网页将 socket.io 消息转换为 alert() 调用。 那些 alert() 调用被 Phantom 接收,你就可以了!

通信本身是通过 James Halliday 出色的 dnode 库进行的,幸运的是,当与 browserify 直接在 PhantomJS 的 pidgin Javascript 环境中运行。

如果你想破解幻影,请做! 您可以使用 cake testnpm test 运行测试,并使用 cake build 重建 coffeescript/browserified 代码。 您可能需要 npm install -g coffee-script 才能让蛋糕工作。

PhantomJS bridge for NodeJS Build Status

"It sure would be neat if PhantomJS was a NodeJS module", I hear you say. Well, wait no longer! This node module implements a nauseously clever bridge between Phantom and Node, so that you can use all your favourite PhantomJS functions without leaving NPM behind and living in a cave.

Installation

First, make sure PhantomJS is installed. This module expects the phantomjs binary to be in PATH somewhere. In other words, type this:

$ phantomjs

If that works, so will phantomjs-node. It's only been tested with PhantomJS 1.3, and almost certainly doesn't work with anything older.

Install it like this:

npm install phantom

For a brief introduction continue reading, otherwise go to the Wiki page for more information!

How do I use it?

Use it like this in Coffeescript:

phantom = require 'phantom'

phantom.create (ph) ->
  ph.createPage (page) ->
    page.open "http://www.google.com", (status) ->
      console.log "opened google? ", status
      page.evaluate (-> document.title), (result) ->
        console.log 'Page title is ' + result
        ph.exit()

In Javascript:

var phantom = require('phantom');

phantom.create(function (ph) {
  ph.createPage(function (page) {
    page.open("http://www.google.com", function (status) {
      console.log("opened google? ", status);
      page.evaluate(function () { return document.title; }, function (result) {
        console.log('Page title is ' + result);
        ph.exit();
      });
    });
  });
});

Use it in Windows

It would use dnode with weak module by default. It means that you need to setup node-gyp with Microsoft VS2010 or VS2012, which is a huge installation on Windows.

dnodeOpts property could help you to control dnode settings, so you could disable weak by setting it false to avoid that complicated installations.

var phantom = require('phantom');

phantom.create(function (ph) {
  ph.createPage(function (page) {
    /* the page actions */
  });
}, {
  dnodeOpts: {
    weak: false
  }
});

Use it in restricted enviroments

Some enviroments (eg. OpenShift) have special requirements that are difficult or impossible to change, especifficaly: hostname/ip and port restrictions for the internal communication server and path for the phantomjs binary.

By default, the hostname/ip used for this will be localhost, the port will be port 0 and the phantomjs binary is going to be assumed to be in the PATH enviroment variable, but you can use specific configurations using an options object like this:

var options = {
  port: 16000,
  hostname: "192.168.1.3",
  path: "/phantom_path/"
}

phantom.create(function, options);

Functionality details

You can use all the methods listed on the PhantomJS API page

Due to the async nature of the bridge, some things have changed, though:

  • Return values (ie, of page.evaluate) are returned in a callback instead
  • page.render() takes a callback so you can tell when it's done writing the file
  • Properties can't be get/set directly, instead use page.get('version', callback) or page.set('viewportSize', {width:640,height:480}), etc. Nested objects can be accessed by including dots in keys, such as page.set('settings.loadImages', false)
  • Callbacks can't be set directly, instead use page.set('callbackName', callback), e.g. page.set('onLoadFinished', function(success) {})
  • onResourceRequested takes a function that executes in the scope of phantom which has access to request.abort(), request.changeUrl(url), and request.setHeader(key,value). The second argument is the callback which can execute in the scope of your code, with access to just the requestData. e.g.
page.onResourceRequested(
    function(requestData, request) { request.abort(); },
    function(requestData) { console.log(requestData.url) }
);

ph.createPage() makes new PhantomJS WebPage objects, so use that if you want to open lots of webpages. You can also make multiple phantomjs processes by calling phantom.create('flags', { port: someDiffNumber}) multiple times, so if you need that for some crazy reason, knock yourself out!

Also, you can set exit callback, which would be invoked after phantom.exit() or after phantom process crash:

phantom.create('flags', { port: 8080, onExit: exitCallback})

You can also pass command line switches to the phantomjs process by specifying additional args to phantom.create(), eg:

phantom.create '--load-images=no', '--local-to-remote-url-access=yes', (page) ->

or by specifying them in the options object:

phantom.create {parameters: {'load-images': 'no', 'local-to-remote-url-access': 'yes'}}, (page) ->

If you need to access the ChildProcess of the phantom process to get its PID, for instance, you can access it through the process property like this:

phantom.create(function (ph) {
  console.log('phantom process pid:', ph.process.pid);
});

Note for Mac users

Phantom requires you to have the XCode Command Line Tools installed on your box, or else you will get some nasty errors (xcode not found or make not found). If you haven't already, simply install XCode through the App Store, then install the command line tools.

How does it work?

Don't ask. The things these eyes have seen.

No really, how does it work?

I will answer that question with a question. How do you communicate with a process that doesn't support shared memory, sockets, FIFOs, or standard input?

Well, there's one thing PhantomJS does support, and that's opening webpages. In fact, it's really good at opening web pages. So we communicate with PhantomJS by spinning up an instance of ExpressJS, opening Phantom in a subprocess, and pointing it at a special webpage that turns socket.io messages into alert() calls. Those alert() calls are picked up by Phantom and there you go!

The communication itself happens via James Halliday's fantastic dnode library, which fortunately works well enough when combined with browserify to run straight out of PhantomJS's pidgin Javascript environment.

If you'd like to hack on phantom, please do! You can run the tests with cake test or npm test, and rebuild the coffeescript/browserified code with cake build. You might need to npm install -g coffee-script for cake to work.

    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文