Web 服务器 - 如何解析请求?异步流分词器?
我正在尝试以异步套接字编程风格在 C# 中创建一个简单的网络服务器。目的非常狭窄 - Comet 服务器(http 长轮询)。
我已经运行了 Windows 服务,接受连接,将请求信息转储到控制台并向客户端返回简单的固定内容。
现在,我无法找出一种可管理的策略来异步且安全地解析请求数据。我之前写过同步 LL1 解析器。我不确定 LL1 解析器对于 HTTP 是否合适或必要。我不知道如何异步标记输入流。我能想到的就是为每个客户端提供一个输入缓冲区,读取该缓冲区,然后将其复制到 StringBuilder 并定期检查是否有完整的请求。但这似乎效率低下,并且可能导致代码难以调试/维护。
此外,连接还包括完整接收请求和发送响应的两个阶段 - 在本例中,是在一段延迟之后。一旦请求经过验证并且可操作,我才计划在长轮询管理器中注册连接。但是,行为不当的客户端可能会继续发送数据并填充缓冲区,因此我认为我需要在响应阶段继续监视并清空输入流,对吗?
对此的任何指导表示赞赏。
我想第一步是知道是否可以在没有大型中间缓冲区的情况下异步有效地标记网络流。即使没有合适的解析器,创建标记生成器的同样挑战也适用于一次读取“行”输入,甚至读取直到两个空行(一个大标记)。我不想一次从网络读取一个字节,但我也不想读取太多字节并必须将它们存储在某个中间缓冲区中,对吗?
I'm attempting to create a simple webserver in C# in asynchronous socket programming style. The purpose is very narrow - a Comet server (http long-polling).
I've got the windows service running, accepting connections, dumping request info to the Console and returning simple fixed content to the client.
Now, I can't figure out a manageable strategy for parsing the request data asynchronously and safely. I've written synchronous LL1 parsers before. I'm not sure if LL1 Parser is appropriate or necessary for HTTP. I don't know how to tokenize the input stream asynchronously. All I can think of is having an input buffer per client, reading into that, then copying that to a StringBuilder and periodically checking to see if I have a complete request. But that seems inefficient and might led to difficult to debug/maintain code.
Also, there are the two phases of the connection of receiving the request in full and the sending a response - in this case, after some delay. Once the request is validated and actionable, only then am I planning to enroll the connection in the long-polling manager. However, a misbehaving client could continue to send data and fill up a buffer, so I think I need to continue to monitor and empty the input stream during the response phase, right?
Any guidance on this is appreciated.
I guess the first step is knowing whether it is possible to efficiently tokenize a network stream asynchronously and without a large intermediate buffer. Even without a proper parser, the same challenges of creating a tokenizer apply to reading "lines" of input at a time, or even reading until double blank lines (one big token). I don't want to read one byte at a time from the network, but neither do I want to read too many bytes and have to store them in some intermediate buffer, right?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于 HTTP,最好的方法是完全读取内存中的标头(直到收到
\r\n\r\n
),然后简单地按\r\n
拆分以获得标头和每个标头都通过:
来分隔名称和值。无需为此使用复杂的解析器。
For HTTP the best way is reading the headers in memory completely (until you receive
\r\n\r\n
) and then simply splitting by\r\n
to get the headers and every header by:
to separate name and value.There's no need to use a complex parser for that.