扭曲的Python getPage

发布于 2024-08-29 03:49:13 字数 1063 浏览 1 评论 0原文

我试图获得这方面的支持，但我完全感到困惑。

这是我的代码：


from twisted.internet import reactor
from twisted.web.client import getPage
from twisted.web.error import Error
from twisted.internet.defer import DeferredList
from sys import argv

class GrabPage:
 def __init__(self, page):
  self.page = page

 def start(self, *args):
  if args == ():
   # We apparently don't need authentication for this
   d1 = getPage(self.page)
  else:
   if len(args) == 2:
    # We have our login information
    d1 = getPage(self.page, headers={"Authorization": " ".join(args)})
   else:
    raise Exception('Missing parameters')

  d1.addCallback(self.pageCallback)
  dl = DeferredList([d1])
  d1.addErrback(self.errorHandler)
  dl.addCallback(self.listCallback)

 def errorHandler(self,result):
  # Bad thingy!
  pass

 def pageCallback(self, result):
  return result

 def listCallback(self, result):
  print result

a = GrabPage('http://www.google.com')
data = a.start() # Not the HTML

我希望获取调用 start() 时提供给 pageCallback 的 HTML。这对我来说是一个皮塔饼。泰！对我糟糕的编码感到抱歉。

原文

I tried to get support on this but I am TOTALLY confused.

Here's my code:


from twisted.internet import reactor
from twisted.web.client import getPage
from twisted.web.error import Error
from twisted.internet.defer import DeferredList
from sys import argv

class GrabPage:
 def __init__(self, page):
  self.page = page

 def start(self, *args):
  if args == ():
   # We apparently don't need authentication for this
   d1 = getPage(self.page)
  else:
   if len(args) == 2:
    # We have our login information
    d1 = getPage(self.page, headers={"Authorization": " ".join(args)})
   else:
    raise Exception('Missing parameters')

  d1.addCallback(self.pageCallback)
  dl = DeferredList([d1])
  d1.addErrback(self.errorHandler)
  dl.addCallback(self.listCallback)

 def errorHandler(self,result):
  # Bad thingy!
  pass

 def pageCallback(self, result):
  return result

 def listCallback(self, result):
  print result

a = GrabPage('http://www.google.com')
data = a.start() # Not the HTML

I wish to get the HTML out which is given to pageCallback when start() is called. This has been a pita for me. Ty! And sorry for my sucky coding.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

慕烟庭风 2024-09-05 03:49:13

您缺少 Twisted 运作方式的基础知识。这一切都围绕着 reactor 展开，而您甚至从未运行过它。将反应器想象成这样：

_{（来源：krondo.com)}

在启动反应器之前，通过设置延迟，您所做的就是将它们链接起来，而不需要触发任何事件。

我建议您提供 Twisted Intro by Dave Peticolas 阅读。它速度很快，并且确实为您提供了 Twisted 文档所没有的所有缺失信息。

无论如何，这里是 getPage 最基本的使用示例：

from twisted.web.client import getPage
from twisted.internet import reactor

url = 'http://aol.com'

def print_and_stop(output):
    print output
    if reactor.running:
       reactor.stop()

if __name__ == '__main__':
    print 'fetching', url
    d = getPage(url)
    d.addCallback(print_and_stop)
    reactor.run()

由于 getPage 返回延迟，我将回调 print_and_stop 添加到延迟链。之后，我启动reactor。反应器触发 getPage，然后触发 print_and_stop，打印来自 aol.com 的数据，然后停止反应器。

编辑以显示 OP 代码的工作示例：

class GrabPage:
    def __init__(self, page):
        self.page = page
        ########### I added this:
        self.data = None

    def start(self, *args):
        if args == ():
            # We apparently don't need authentication for this
            d1 = getPage(self.page)
        else:
            if len(args) == 2:
                # We have our login information
                d1 = getPage(self.page, headers={"Authorization": " ".join(args)})
            else:
                raise Exception('Missing parameters')

        d1.addCallback(self.pageCallback)
        dl = DeferredList([d1])
        d1.addErrback(self.errorHandler)
        dl.addCallback(self.listCallback)

    def errorHandler(self,result):
        # Bad thingy!
        pass

    def pageCallback(self, result):
        ########### I added this, to hold the data:
        self.data = result
        return result

    def listCallback(self, result):
        print result
        # Added for effect:
        if reactor.running:
            reactor.stop()

a = GrabPage('http://google.com')
########### Just call it without assigning to data
#data = a.start() # Not the HTML
a.start()

########### I added this:
if not reactor.running:
    reactor.run()

########### Reference the data attribute from the class
data = a.data
print '------REACTOR STOPPED------'
print
########### First 100 characters of a.data:
print '------a.data[:100]------'
print data[:100]

You're missing the basics of how Twisted operates. It all revolves around the reactor, which you're never even running. Think of the reactor like this:

_{(source: krondo.com)}

Until you start the reactor, by setting up deferreds all you're doing is chaining them with no events from which to fire.

I recommend you give the Twisted Intro by Dave Peticolas a read. It's quick and it really gives you all the missing information that the Twisted documentation doesn't.

Anyways, here is the most basic usage example of getPage as possible:

from twisted.web.client import getPage
from twisted.internet import reactor

url = 'http://aol.com'

def print_and_stop(output):
    print output
    if reactor.running:
       reactor.stop()

if __name__ == '__main__':
    print 'fetching', url
    d = getPage(url)
    d.addCallback(print_and_stop)
    reactor.run()

Since getPage returns a deferred, I'm adding the callback print_and_stop to the deferred chain. After that, I start the reactor. The reactor fires getPage, which then fires print_and_stop which prints the data from aol.com and then stops the reactor.

Edit to show a working example of OP's code:

class GrabPage:
    def __init__(self, page):
        self.page = page
        ########### I added this:
        self.data = None

    def start(self, *args):
        if args == ():
            # We apparently don't need authentication for this
            d1 = getPage(self.page)
        else:
            if len(args) == 2:
                # We have our login information
                d1 = getPage(self.page, headers={"Authorization": " ".join(args)})
            else:
                raise Exception('Missing parameters')

        d1.addCallback(self.pageCallback)
        dl = DeferredList([d1])
        d1.addErrback(self.errorHandler)
        dl.addCallback(self.listCallback)

    def errorHandler(self,result):
        # Bad thingy!
        pass

    def pageCallback(self, result):
        ########### I added this, to hold the data:
        self.data = result
        return result

    def listCallback(self, result):
        print result
        # Added for effect:
        if reactor.running:
            reactor.stop()

a = GrabPage('http://google.com')
########### Just call it without assigning to data
#data = a.start() # Not the HTML
a.start()

########### I added this:
if not reactor.running:
    reactor.run()

########### Reference the data attribute from the class
data = a.data
print '------REACTOR STOPPED------'
print
########### First 100 characters of a.data:
print '------a.data[:100]------'
print data[:100]

回复收藏 0 原文

~没有更多了~