NodeJS：处理 TCP 套接字流的正确方法是什么？我应该使用哪个分隔符？

发布于 2024-11-29 17:04:06 字数 2112 浏览 0 评论 0 原文

根据我此处的理解，“V8 有一个分代垃圾收集器。随机移动对象。Node 不能获取指向原始字符串数据的指针以写入套接字。”因此，我不应该将来自 TCP 流的数据存储在字符串中，特别是当该字符串大于 Math.pow(2,16) 字节时。（希望到目前为止我是对的..）

那么处理来自 TCP 套接字的所有数据的最佳方法是什么？到目前为止，我一直在尝试使用 _:_:_ 作为分隔符，因为我认为它在某种程度上是唯一的，并且不会弄乱其他东西。

即将到来的数据样本将是某物_:_:_可能是一个大文本_:_:_可能是大量行_:_:_更多和更多数据

这就是我试图做的：

net = require('net');
var server = net.createServer(function (socket) {
    socket.on('connect',function() {
        console.log('someone connected');
        buf = new Buffer(Math.pow(2,16));  //new buffer with size 2^16
        socket.on('data',function(data) {
            if (data.toString().search('_:_:_') === -1) {    // If there's no separator in the data that just arrived...
                buf.write(data.toString());   // ... write it on the buffer. it's part of another message that will come.
            } else {        // if there is a separator in the data that arrived
                parts = data.toString().split('_:_:_'); // the first part is the end of a previous message, the last part is the start of a message to be completed in the future. Parts between separators are independent messages
                if (parts.length == 2) {
                    msg = buf.toString('utf-8',0,4) + parts[0];
                    console.log('MSG: '+ msg);
                    buf = (new Buffer(Math.pow(2,16))).write(parts[1]);
                } else {
                    msg = buf.toString() + parts[0];
                    for (var i = 1; i <= parts.length -1; i++) {
                        if (i !== parts.length-1) {
                            msg = parts[i];
                            console.log('MSG: '+msg);
                        } else {
                            buf.write(parts[i]);
                        }
                    }
                }
            }
        });
    });
});

server.listen(9999);

每当我尝试console.log('MSG' + msg)，它会打印出整个缓冲区，因此查看是否有效是没有用的。

我怎样才能以正确的方式处理这些数据？即使该数据不是面向行的，惰性模块也会工作吗？是否有其他模块可以处理非面向行的流？

原文

From what I understood here, "V8 has a generational garbage collector. Moves objects aound randomly. Node can’t get a pointer to raw string data to write to socket." so I shouldn't store data that comes from a TCP stream in a string, specially if that string becomes bigger than Math.pow(2,16) bytes. (hope I'm right till now..)

What is then the best way to handle all the data that's comming from a TCP socket ? So far I've been trying to use _:_:_ as a delimiter because I think it's somehow unique and won't mess around other things.

A sample of the data that would come would be something_:_:_maybe a large text_:_:_ maybe tons of lines_:_:_more and more data

This is what I tried to do:

net = require('net');
var server = net.createServer(function (socket) {
    socket.on('connect',function() {
        console.log('someone connected');
        buf = new Buffer(Math.pow(2,16));  //new buffer with size 2^16
        socket.on('data',function(data) {
            if (data.toString().search('_:_:_') === -1) {    // If there's no separator in the data that just arrived...
                buf.write(data.toString());   // ... write it on the buffer. it's part of another message that will come.
            } else {        // if there is a separator in the data that arrived
                parts = data.toString().split('_:_:_'); // the first part is the end of a previous message, the last part is the start of a message to be completed in the future. Parts between separators are independent messages
                if (parts.length == 2) {
                    msg = buf.toString('utf-8',0,4) + parts[0];
                    console.log('MSG: '+ msg);
                    buf = (new Buffer(Math.pow(2,16))).write(parts[1]);
                } else {
                    msg = buf.toString() + parts[0];
                    for (var i = 1; i <= parts.length -1; i++) {
                        if (i !== parts.length-1) {
                            msg = parts[i];
                            console.log('MSG: '+msg);
                        } else {
                            buf.write(parts[i]);
                        }
                    }
                }
            }
        });
    });
});

server.listen(9999);

Whenever I try to console.log('MSG' + msg), it will print out the whole buffer, so it's useless to see if something worked.

How can I handle this data the proper way ? Would the lazy module work, even if this data is not line oriented ? Is there some other module to handle streams that are not line oriented ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦屿孤独相伴 2024-12-06 17:04:06

确实有人说有额外的工作正在进行，因为 Node 必须获取该缓冲区，然后将其推入 v8/将其转换为字符串。然而，在缓冲区上执行 toString() 并没有更好。据我所知，目前还没有好的解决方案，特别是如果你的最终目标是得到一根绳子并摆弄它。这是 Ryan 提到@nodeconf 作为需要完成工作的领域之一。

至于分隔符，你可以选择任何你想要的。许多二进制协议选择包含固定标头，这样您就可以将内容放入正常结构中，其中很多时候包含长度。通过这种方式，您可以分割已知的标头并获取有关其余数据的信息，而无需迭代整个缓冲区。通过这样的方案，可以使用如下工具：