Node.js 抓取 ASU 课程
我对 Node.js 还很陌生,所以如果我不知道我在说什么,请提前道歉。
我正在尝试从 ASU 的课程目录 (https://webapp4.asu.edu/catalog/) 中删除一些课程,并使用 Zombie、Node.IO 和 HTTPS api 进行了多次尝试。在这两种情况下,我都遇到了重定向循环。
我想知道是否是因为我没有正确设置标题?
下面是我使用的示例代码(不是 Zombie/Node.IO):
var https = require('https');
var option = {
host: 'webapp4.asu.edu',
path: '/catalog',
method: 'GET',
headers: {
'set-cookie': 'onlineCampusSelection=C'
}
};
var req = https.request(options, function(res) {
console.log("statusCode: ", res.statusCode);
console.log("headers: ", res.headers);
res.on('data', function(d) {
process.stdout.write(d);
});
});
只是为了澄清一下,我通常在使用 Node.js 进行抓取时没有遇到问题。然而,更具体地说,亚利桑那州立大学的课程目录给我带来了麻烦。
感谢你们给我的任何想法,谢谢!
更新:如果我使用从 Chrome/FF 获得的 JSESSIONID 创建 cookie,我的请求就会成功通过。有没有办法让我请求/创建 JSESSIONID?
I'm pretty new to Node.js, so apologies in advance if I don't know what I'm talking about.
I'm trying to scrape some courses off ASU's course catalog (https://webapp4.asu.edu/catalog/) and have made numerous attempts using Zombie, Node.IO, and the HTTPS api. In both cases I've run into a redirect loop.
I'm wondering if it's because I'm not setting my headers properly?
Below is a sample code of what I used (not Zombie/Node.IO):
var https = require('https');
var option = {
host: 'webapp4.asu.edu',
path: '/catalog',
method: 'GET',
headers: {
'set-cookie': 'onlineCampusSelection=C'
}
};
var req = https.request(options, function(res) {
console.log("statusCode: ", res.statusCode);
console.log("headers: ", res.headers);
res.on('data', function(d) {
process.stdout.write(d);
});
});
Just to clarify, I'm not having trouble with scraping with Node.js in general. More specifically, however, is ASU's course catalog that is giving me trouble.
Appreciate any ideas you guys could give me, thanks!
Update: My request successfully went through if I create a cookie with a JSESSIONID I got from Chrome/FF. Is there a way for me to request/create a JSESSIONID?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
看起来服务器设置了 JSESSIONID cookie,然后重定向走了,所以如果你想获取 cookie,你需要告诉 node.js 不要遵循重定向。我不知道如何使用
http
或https
包执行此操作,但您可以通过 npm 获取另一个包:request
,它可以让你做到这一点。下面是一个可以帮助您入门的示例:输出应如下所示:
It looks like the server sets the JSESSIONID cookie and then redirects away, so you need to tell node.js not to follow redirects if you want to grab the cookie. I don't know how to do this with the
http
orhttps
packages, but there is another package you can get via npm:request
, which lets you do it. Here's a sample that should get you started:Output should look something like this:
我强烈建议将 jsDOM 与 jQuery(用于节点)。我已经多次使用它来进行缩放,因为它非常容易。
这是 jsdom 的自述文件中的示例:
希望有所帮助,jsdom 使将抓取实验组合在一起变得非常容易(至少对我来说)。
Id highly recommend using jsDOM in conjunction with jQuery(for node). I've used it many many times for scaping as it makes it super easy.
heres the example from jsdom's readme:
Hope that helps, jsdom has made it real easy to hack together scraping experiments (for me at least).