jsdom表单提交?

发布于 2024-12-25 19:41:38 字数 1444 浏览 2 评论 0原文

我正在尝试使用 Node.js 包 request 和 jsdom 来抓取网页,并且我想知道如何提交表单并获取他们的响应。我不确定 jsdom 或其他模块是否可以做到这一点,但我确实知道请求支持 cookie。

以下代码演示了我如何使用 jsdom(以及 request 和 jQuery)来检索和解析网页(在本例中为维基百科主页)。 (请注意,此代码改编自本教程中的 jquery-request.js 代码 http://blog.nodejitsu.com/jsdom-jquery-in-5-lines-on-nodejs)

var request = require('request'),
    jsdom = require('jsdom'),

    url = 'http://www.wikipedia.org';

request({ uri:url }, function (error, response, body) {
  if (error && response.statusCode !== 200) {
    console.log('Error when contacting '+url);
  }

  jsdom.env({
    html: body,
    scripts: [
      'http://code.jquery.com/jquery-1.5.min.js'
    ]
  }, function (err, window) {
    var $ = window.jQuery,
        // jQuery is now loaded on the jsdom window created from 'agent.body'
        $searchform = $('#searchform'); //search form jQuery object

    $('#searchInput').val('Wood');

    console.log('form HTML is ' + $searchform.html(),
      'search value is ' + $('#searchInput').val()

    //how I'd like to submit the search form
    $('#searchform .searchButton').click();
    );
  });
});

上面的代码从 Wikipedia 的搜索表单中打印 HTML,然后打印“Wood”,即值 I设置要包含的 searchInput 字段。当然,这里的 click() 方法实际上并没有做任何事情,因为 jQuery 不在浏览器中运行;我什至不知道 jsdom 是否支持任何类型的事件处理。

有没有任何模块可以帮助我以这种方式或类似的非 jQuery 方式与网页交互?这可以在jsdom中完成吗?

提前致谢!

I'm trying to use the Node.js packages request and jsdom to scrape web pages, and I want to know how I can submit forms and get their responses. I'm not sure if this is possible with jsdom or another module, but I do know that request supports cookies.

The following code demonstrates how I'm using jsdom (along with request and jQuery) to retrieve and parse a web page (in this case, the Wikipedia home page). (Note that this code is adapted from the jquery-request.js code from this tutorial http://blog.nodejitsu.com/jsdom-jquery-in-5-lines-on-nodejs)

var request = require('request'),
    jsdom = require('jsdom'),

    url = 'http://www.wikipedia.org';

request({ uri:url }, function (error, response, body) {
  if (error && response.statusCode !== 200) {
    console.log('Error when contacting '+url);
  }

  jsdom.env({
    html: body,
    scripts: [
      'http://code.jquery.com/jquery-1.5.min.js'
    ]
  }, function (err, window) {
    var $ = window.jQuery,
        // jQuery is now loaded on the jsdom window created from 'agent.body'
        $searchform = $('#searchform'); //search form jQuery object

    $('#searchInput').val('Wood');

    console.log('form HTML is ' + $searchform.html(),
      'search value is ' + $('#searchInput').val()

    //how I'd like to submit the search form
    $('#searchform .searchButton').click();
    );
  });
});

The above code prints the HTML from Wikipedia's search form, then "Wood", the value I set the searchInput field to contain. Of course, here the click() method doesn't really do anything, because jQuery isn't operating in a browser; I don't even know if jsdom supports any kind of event handling.

Is there any module that can help me to interact with web pages in this way, or in a similar non-jQuery way? Can this be done in jsdom?

Thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

倚栏听风 2025-01-01 19:41:38

如果您不想像其他答案一样自己处理 POST 请求,您可以使用 jsdom 的替代方案,它在浏览器中支持更多功能。

http://www.phantomjs.org/

If you don't want to handle the POST request yourself like in the other answer, you can use an alternative to jsdom that does support more things in a browser.

http://www.phantomjs.org/

对你再特殊 2025-01-01 19:41:38

我不熟悉 Nodejs 库,它可以让您获得网页的完全交互式客户端视图,但您可以不用太担心地获得表单提交的结果。

HTML 表单本质上只是将 HTTP 请求发送到特定 URL 的一种方式(可以在 form 标记的 action 属性中找到)。通过访问 DOM,您可以提取这些值并为指定的 URL 创建您自己的请求。

像这样的请求维基百科主页的回调将为您提供用英语搜索“键盘猫”的结果:

var $ = window.jQuery;

var search_term = "keyboard cat";
var search_term_safe = encodeURIComponent(search_term).replace("%20", "+");

var lang = "en";
var lang_safe = encodeURIComponent(lang).replace("%20", "+");

var search_submit_url = $("#searchform").attr("action");
var search_input_name = $("#searchInput").attr("name");
var search_language_name = $("#language").attr("name");

var search_string = search_input_name + "=" + search_term_safe + "&" + search_language_name + "=" + lang_safe;

// Note the wikipedia specific hack by prepending "http:".
var full_search_uri = "http:" + search_submit_url + "?" + search_string;

request({ uri: full_search_uri }, function(error, response) {
    if (error && response.statusCode != 200) {
        console.log("Got an error from the search page: " + error);
    } else {
        // Do some stuff with the response page here.
    }
});

基本上重要的东西是:

  1. “提交搜索”实际上只是意味着发送一个 HTTP form 标记的 action 属性中指定的 URL 进行 GETPOST 请求。
  2. 使用每个表单的 input 标记的 name 属性,结合它们实际提交的值,创建用于表单提交的字符串,格式如下:name1=value1&name2=value2
  3. 对于 GET 请求,只需将该字符串作为查询字符串附加到 URL 中 (URL?query-string)
  4. 对于 < code>POST 请求,将该字符串发布为请求正文。
  5. 请注意,用于表单提交的字符串必须进行转义并包含空格,表示为 +

I'm not familiar with a nodejs library that will let you get a fully interactive client-side view of a web-page, but you can get the results of a form submission without too much worry.

HTML forms are essentially just a way of sending HTTP requests to a specific URL (which can be found as the action attribute of the form tag). With access to the DOM, you can just pull out these values and create your own request for the specified URL.

Something like this as the callback from requesting the wikipedia home page will get you the result of doing a search for "keyboard cat" in english:

var $ = window.jQuery;

var search_term = "keyboard cat";
var search_term_safe = encodeURIComponent(search_term).replace("%20", "+");

var lang = "en";
var lang_safe = encodeURIComponent(lang).replace("%20", "+");

var search_submit_url = $("#searchform").attr("action");
var search_input_name = $("#searchInput").attr("name");
var search_language_name = $("#language").attr("name");

var search_string = search_input_name + "=" + search_term_safe + "&" + search_language_name + "=" + lang_safe;

// Note the wikipedia specific hack by prepending "http:".
var full_search_uri = "http:" + search_submit_url + "?" + search_string;

request({ uri: full_search_uri }, function(error, response) {
    if (error && response.statusCode != 200) {
        console.log("Got an error from the search page: " + error);
    } else {
        // Do some stuff with the response page here.
    }
});

Basically the important stuff is:

  1. "Submitting a search" really just means sending either a HTTP GET or POST request to the URL specified at the action attribute of the form tag.
  2. Create the string to use for form submission using the name attributes of each of the form's input tags, combined with the value that they are actually submitting, in this format: name1=value1&name2=value2
  3. For GET requests, just append that string to the URL as a query string (URL?query-string)
  4. For POST requests, post that string as the body of the request.
  5. Note that the string used for form submission must be escaped and have spaces represented as +.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文