Node.js代理,处理gzip解压缩
我目前正在开发一个代理服务器,在这种情况下我们必须修改我们推送的数据(通过使用正则表达式)。
在大多数情况下,它工作得很好,除了使用 gzip 作为内容编码的网站(我认为),我遇到过一个名为 压缩并尝试通过解压缩/gunzip流推送我收到的块,但结果并没有真正达到我的预期。 (请参阅下面的代码)
我想发布一些代码来支持我的问题,这是使用 mvc (express)加载的代理:
module.exports = {
index: function(request, response){
var iframe_url = "www.nu.nl"; // site with gzip encoding
var http = require('http');
var httpClient = http.createClient(80, iframe_url);
var headers = request.headers;
headers.host = iframe_url;
var remoteRequest = httpClient.request(request.method, request.url, headers);
request.on('data', function(chunk) {
remoteRequest.write(chunk);
});
request.on('end', function() {
remoteRequest.end();
});
remoteRequest.on('response', function (remoteResponse){
var body_regexp = new RegExp("<head>"); // regex to find first head tag
var href_regexp = new RegExp('\<a href="(.*)"', 'g'); // regex to find hrefs
response.writeHead(remoteResponse.statusCode, remoteResponse.headers);
remoteResponse.on('data', function (chunk) {
var body = doDecompress(new compress.GunzipStream(), chunk);
body = body.replace(body_regexp, "<head><base href=\"http://"+ iframe_url +"/\">");
body = body.replace(href_regexp, '<a href="#" onclick="javascript:return false;"');
response.write(body, 'binary');
});
remoteResponse.on('end', function() {
response.end();
});
});
}
};
在 var 主体部分我想读取主体,例如在本例中删除所有通过将其替换为 # 来指定 href。当然,这里的问题是,当我们有一个经过 gzip 编码/压缩的网站时,它都是乱码,我们无法应用正则表达式。
现在我已经厌倦了使用节点压缩模块:
doDecompress(new compress.GunzipStream(), chunk);
它指的是
function doDecompress(decompressor, input) {
var d1 = input.substr(0, 25);
var d2 = input.substr(25);
sys.puts('Making decompression requests...');
var output = '';
decompressor.setInputEncoding('binary');
decompressor.setEncoding('utf8');
decompressor.addListener('data', function(data) {
output += data;
}).addListener('error', function(err) {
throw err;
}).addListener('end', function() {
sys.puts('Decompressed length: ' + output.length);
sys.puts('Raw data: ' + output);
});
decompressor.write(d1);
decompressor.write(d2);
decompressor.close();
sys.puts('Requests done.');
}
但它失败了,因为块输入是一个对象,所以我尝试将它作为 chunk.toString() 提供,它也因无效输入而失败数据。
我想知道我是否正朝着正确的方向前进?
I'm currently working on a proxy server where we in this case have to modify the data (by using regexp) that we push through it.
In most cases it works fine except for websites that use gzip as content-encoding (I think), I've come across a module called compress and tried to push the chunks that I receive through a decompress / gunzip stream but it isn't really turning out as I expected. (see below for code)
figured i'd post some code to support my prob, this is the proxy that gets loaded with mvc (express):
module.exports = {
index: function(request, response){
var iframe_url = "www.nu.nl"; // site with gzip encoding
var http = require('http');
var httpClient = http.createClient(80, iframe_url);
var headers = request.headers;
headers.host = iframe_url;
var remoteRequest = httpClient.request(request.method, request.url, headers);
request.on('data', function(chunk) {
remoteRequest.write(chunk);
});
request.on('end', function() {
remoteRequest.end();
});
remoteRequest.on('response', function (remoteResponse){
var body_regexp = new RegExp("<head>"); // regex to find first head tag
var href_regexp = new RegExp('\<a href="(.*)"', 'g'); // regex to find hrefs
response.writeHead(remoteResponse.statusCode, remoteResponse.headers);
remoteResponse.on('data', function (chunk) {
var body = doDecompress(new compress.GunzipStream(), chunk);
body = body.replace(body_regexp, "<head><base href=\"http://"+ iframe_url +"/\">");
body = body.replace(href_regexp, '<a href="#" onclick="javascript:return false;"');
response.write(body, 'binary');
});
remoteResponse.on('end', function() {
response.end();
});
});
}
};
at the var body part i want to read the body and for example in this case remove all hrefs by replacing them with an #. The problem here of course is when we have an site which is gzip encoded/ compressed it's all jibberish and we can't apply the regexps.
now I've already tired to mess around with the node-compress module:
doDecompress(new compress.GunzipStream(), chunk);
which refers to
function doDecompress(decompressor, input) {
var d1 = input.substr(0, 25);
var d2 = input.substr(25);
sys.puts('Making decompression requests...');
var output = '';
decompressor.setInputEncoding('binary');
decompressor.setEncoding('utf8');
decompressor.addListener('data', function(data) {
output += data;
}).addListener('error', function(err) {
throw err;
}).addListener('end', function() {
sys.puts('Decompressed length: ' + output.length);
sys.puts('Raw data: ' + output);
});
decompressor.write(d1);
decompressor.write(d2);
decompressor.close();
sys.puts('Requests done.');
}
But it fails on it since the chunk input is an object, so i tried supplying it as an chunk.toString() which also fails with invalid input data.
I was wondering if I am at all heading in the right direction?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
解压缩器需要二进制编码输入。您的响应收到的块是 Buffer 的实例,其中
toString()< /code> 方法默认返回一个 UTF-8 编码的字符串。
所以你必须使用 chunk.toString('binary') 才能使其工作,这也可以在 演示。
The decompressor expects binary encoded input. The chunk that your response receives is an instance of Buffer which
toString()
method does by default give you an UTF-8 encoded string back.So you have to use
chunk.toString('binary')
to make it work, this can also be seen in the demo.