如何在 python 中检测 HTTP 请求 +扭曲?

发布于 2024-09-13 10:58:00 字数 1515 浏览 4 评论 0原文

我正在使用 python 中的twisted 10 学习网络编程。在下面的代码中,有什么方法可以在收到数据时检测 HTTP 请求吗?还可以从中检索域名、子域、端口值?如果不是http数据就丢弃它?

from twisted.internet import stdio, reactor, protocol

from twisted.protocols import basic

import re



class DataForwardingProtocol(protocol.Protocol):

    def _ _init_ _(self):

        self.output = None

        self.normalizeNewlines = False



    def dataReceived(self, data):

        if self.normalizeNewlines:

            data = re.sub(r"(\r\n|\n)", "\r\n", data)

        if self.output:

            self.output.write(data)



class StdioProxyProtocol(DataForwardingProtocol):

    def connectionMade(self):

        inputForwarder = DataForwardingProtocol( )

        inputForwarder.output = self.transport

        inputForwarder.normalizeNewlines = True

        stdioWrapper = stdio.StandardIO(inputForwarder)

        self.output = stdioWrapper

        print "Connected to server.  Press ctrl-C to close connection."



class StdioProxyFactory(protocol.ClientFactory):

    protocol = StdioProxyProtocol



    def clientConnectionLost(self, transport, reason):

        reactor.stop( )



    def clientConnectionFailed(self, transport, reason):

        print reason.getErrorMessage( )

        reactor.stop( )



if __name__ == '_ _main_ _':

    import sys

    if not len(sys.argv) == 3:

        print "Usage: %s host port" % _ _file_ _

        sys.exit(1)



    reactor.connectTCP(sys.argv[1], int(sys.argv[2]), StdioProxyFactory( ))

    reactor.run( )

I am learning network programming using twisted 10 in python. In below code is there any way to detect HTTP Request when data recieved? also retrieve Domain name, Sub Domain, Port values from this? Discard it if its not http data?

from twisted.internet import stdio, reactor, protocol

from twisted.protocols import basic

import re



class DataForwardingProtocol(protocol.Protocol):

    def _ _init_ _(self):

        self.output = None

        self.normalizeNewlines = False



    def dataReceived(self, data):

        if self.normalizeNewlines:

            data = re.sub(r"(\r\n|\n)", "\r\n", data)

        if self.output:

            self.output.write(data)



class StdioProxyProtocol(DataForwardingProtocol):

    def connectionMade(self):

        inputForwarder = DataForwardingProtocol( )

        inputForwarder.output = self.transport

        inputForwarder.normalizeNewlines = True

        stdioWrapper = stdio.StandardIO(inputForwarder)

        self.output = stdioWrapper

        print "Connected to server.  Press ctrl-C to close connection."



class StdioProxyFactory(protocol.ClientFactory):

    protocol = StdioProxyProtocol



    def clientConnectionLost(self, transport, reason):

        reactor.stop( )



    def clientConnectionFailed(self, transport, reason):

        print reason.getErrorMessage( )

        reactor.stop( )



if __name__ == '_ _main_ _':

    import sys

    if not len(sys.argv) == 3:

        print "Usage: %s host port" % _ _file_ _

        sys.exit(1)



    reactor.connectTCP(sys.argv[1], int(sys.argv[2]), StdioProxyFactory( ))

    reactor.run( )

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

泡沫很甜 2024-09-20 10:58:00

protocol.dataReceived,您'是压倒性的,太低级了,如果没有你没有做的智能缓冲,就无法达到目的——根据我刚刚引用的文档,

每当收到数据时调用。

使用此方法可以转换为
更高级别的消息。通常,一些
收到后将进行回调
每个完整的协议消息。

参数

data

一串
长度不确定。请留在
请注意您可能需要
缓冲一些数据,作为部分数据(或
多个)协议消息可以是
已收到!我建议单元测试
对于协议调用此
具有不同块大小的方法,
一次减少到一个字节。

您似乎完全忽略了文档的这一关键部分。

您可以改为使用 LineReceiver.lineReceived (当然,继承自protocols.basic.LineReceiver)以利用HTTP请求以“行”形式出现的事实——您仍然需要将正在发送的标头连接起来多行,因为本教程说:

标题行以空格或开头
选项卡实际上是前一个选项卡的一部分
标题行,折叠成多个
方便阅读的线条。

一旦您有了格式良好/解析良好的响应(考虑研究 twisted.web 的来源 所以看看一种可行的方法),

检索域名、子域、端口
值由此而来?

现在是 Host 标头(cfr RFC< /a> 第 14.23 节)包含此信息。

protocol.dataReceived, which you're overriding, is too low-level to serve for the purpose without smart buffering that you're not doing -- per the docs I just quoted,

Called whenever data is received.

Use this method to translate to a
higher-level message. Usually, some
callback will be made upon the receipt
of each complete protocol message.

Parameters

data

a string of
indeterminate length. Please keep in
mind that you will probably need to
buffer some data, as partial (or
multiple) protocol messages may be
received! I recommend that unit tests
for protocols call through to this
method with differing chunk sizes,
down to one byte at a time.

You appear to be completely ignoring this crucial part of the docs.

You could instead use LineReceiver.lineReceived (inheriting from protocols.basic.LineReceiver, of course) to take advantage of the fact that HTTP requests come in "lines" -- you'll still need to join up headers that are being sent as multiple lines, since as this tutorial says:

Header lines beginning with space or
tab are actually part of the previous
header line, folded into multiple
lines for easy reading.

Once you have a nicely formatted/parsed response (consider studying twisted.web's sources so see one way it could be done),

retrieve Domain name, Sub Domain, Port
values from this?

now the Host header (cfr the RFC section 14.23) is the one containing this info.

爺獨霸怡葒院 2024-09-20 10:58:00

根据您似乎正在尝试的内容,我认为以下是阻力最小的路径:
http://twistedmatrix.com/documents/10.0.0/ api/twisted.web.proxy.html

这是用于构建 HTTP 代理的扭曲类。它可以让您拦截请求,查看目的地并查看发送者。您还可以查看所有标题和来回内容。您似乎正在尝试重写twisted 已经为您提供的HTTP 协议和代理类。我希望这有帮助。

Just based on what you seems to be attempting, I think the following would be the path of least resistance:
http://twistedmatrix.com/documents/10.0.0/api/twisted.web.proxy.html

That's the twisted class for building an HTTP Proxy. It will let you intercept the requests, look at the destination and look at the sender. You can also look at all the headers and the content going back and forth. You seem to be trying to re-write the HTTP Protocol and Proxy class that twisted has already provided for you. I hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文