jsdom表单提交?
我正在尝试使用 Node.js 包 request 和 jsdom 来抓取网页,并且我想知道如何提交表单并获取他们的响应。我不确定 jsdom 或其他模块是否可以做到这一点,但我确实知道请求支持 cookie。
以下代码演示了我如何使用 jsdom(以及 request 和 jQuery)来检索和解析网页(在本例中为维基百科主页)。 (请注意,此代码改编自本教程中的 jquery-request.js 代码 http://blog.nodejitsu.com/jsdom-jquery-in-5-lines-on-nodejs)
var request = require('request'),
jsdom = require('jsdom'),
url = 'http://www.wikipedia.org';
request({ uri:url }, function (error, response, body) {
if (error && response.statusCode !== 200) {
console.log('Error when contacting '+url);
}
jsdom.env({
html: body,
scripts: [
'http://code.jquery.com/jquery-1.5.min.js'
]
}, function (err, window) {
var $ = window.jQuery,
// jQuery is now loaded on the jsdom window created from 'agent.body'
$searchform = $('#searchform'); //search form jQuery object
$('#searchInput').val('Wood');
console.log('form HTML is ' + $searchform.html(),
'search value is ' + $('#searchInput').val()
//how I'd like to submit the search form
$('#searchform .searchButton').click();
);
});
});
上面的代码从 Wikipedia 的搜索表单中打印 HTML,然后打印“Wood”,即值 I设置要包含的 searchInput 字段。当然,这里的 click() 方法实际上并没有做任何事情,因为 jQuery 不在浏览器中运行;我什至不知道 jsdom 是否支持任何类型的事件处理。
有没有任何模块可以帮助我以这种方式或类似的非 jQuery 方式与网页交互?这可以在jsdom中完成吗?
提前致谢!
I'm trying to use the Node.js packages request and jsdom to scrape web pages, and I want to know how I can submit forms and get their responses. I'm not sure if this is possible with jsdom or another module, but I do know that request supports cookies.
The following code demonstrates how I'm using jsdom (along with request and jQuery) to retrieve and parse a web page (in this case, the Wikipedia home page). (Note that this code is adapted from the jquery-request.js code from this tutorial http://blog.nodejitsu.com/jsdom-jquery-in-5-lines-on-nodejs)
var request = require('request'),
jsdom = require('jsdom'),
url = 'http://www.wikipedia.org';
request({ uri:url }, function (error, response, body) {
if (error && response.statusCode !== 200) {
console.log('Error when contacting '+url);
}
jsdom.env({
html: body,
scripts: [
'http://code.jquery.com/jquery-1.5.min.js'
]
}, function (err, window) {
var $ = window.jQuery,
// jQuery is now loaded on the jsdom window created from 'agent.body'
$searchform = $('#searchform'); //search form jQuery object
$('#searchInput').val('Wood');
console.log('form HTML is ' + $searchform.html(),
'search value is ' + $('#searchInput').val()
//how I'd like to submit the search form
$('#searchform .searchButton').click();
);
});
});
The above code prints the HTML from Wikipedia's search form, then "Wood", the value I set the searchInput field to contain. Of course, here the click() method doesn't really do anything, because jQuery isn't operating in a browser; I don't even know if jsdom supports any kind of event handling.
Is there any module that can help me to interact with web pages in this way, or in a similar non-jQuery way? Can this be done in jsdom?
Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您不想像其他答案一样自己处理 POST 请求,您可以使用 jsdom 的替代方案,它在浏览器中支持更多功能。
http://www.phantomjs.org/
If you don't want to handle the POST request yourself like in the other answer, you can use an alternative to jsdom that does support more things in a browser.
http://www.phantomjs.org/
我不熟悉 Nodejs 库,它可以让您获得网页的完全交互式客户端视图,但您可以不用太担心地获得表单提交的结果。
HTML 表单本质上只是将 HTTP 请求发送到特定 URL 的一种方式(可以在
form
标记的action
属性中找到)。通过访问 DOM,您可以提取这些值并为指定的 URL 创建您自己的请求。像这样的请求维基百科主页的回调将为您提供用英语搜索“键盘猫”的结果:
基本上重要的东西是:
对
或form
标记的action
属性中指定的 URL 进行 GETPOST
请求。input
标记的name
属性,结合它们实际提交的值,创建用于表单提交的字符串,格式如下:name1=value1&name2=value2
GET
请求,只需将该字符串作为查询字符串附加到 URL 中 (URL?query-string
)+
。I'm not familiar with a nodejs library that will let you get a fully interactive client-side view of a web-page, but you can get the results of a form submission without too much worry.
HTML forms are essentially just a way of sending HTTP requests to a specific URL (which can be found as the
action
attribute of theform
tag). With access to the DOM, you can just pull out these values and create your own request for the specified URL.Something like this as the callback from requesting the wikipedia home page will get you the result of doing a search for "keyboard cat" in english:
Basically the important stuff is:
GET
orPOST
request to the URL specified at theaction
attribute of theform
tag.name
attributes of each of the form'sinput
tags, combined with the value that they are actually submitting, in this format:name1=value1&name2=value2
GET
requests, just append that string to the URL as a query string (URL?query-string
)POST
requests, post that string as the body of the request.+
.