使用 Pycurl 获取 HTML

发布于 2024-11-18 09:52:04 字数 286 浏览 4 评论 0原文

我一直在尝试使用 pycurl 检索 HTML 页面，因此我可以使用 str.split 和一些 for 循环解析它以获取相关信息。我知道 Pycurl 检索 HTML，因为它会将其打印到终端，但是，如果我尝试执行类似的操作，则

html = str(c.perform())

变量将仅包含一个显示“None”的字符串。

如何使用 pycurl 获取 html，或重定向它发送到控制台的任何内容，以便它可以用作如上所述的字符串？

非常感谢任何有建议的人！

原文

I've been trying to retrieve a page of HTML using pycurl, so I can then parse it for relevant information using str.split and some for loops. I know Pycurl retrieves the HTML, since it prints it to the terminal, however, if I try to do something like

html = str(c.perform())

The variable will just hold a string which says "None".

How can I use pycurl to get the html, or redirect whatever it sends to the console so it can be used as a string as described above?

Thanks a lot to anyone who has any suggestions!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

余生再见 2024-11-25 09:52:04

这将发送请求并存储/打印响应正文：

from StringIO import StringIO    
import pycurl

url = 'http://www.google.com/'

storage = StringIO()
c = pycurl.Curl()
c.setopt(c.URL, url)
c.setopt(c.WRITEFUNCTION, storage.write)
c.perform()
c.close()
content = storage.getvalue()
print content

如果要存储响应标头，请使用：

c.setopt(c.HEADERFUNCTION, storage.write)

this will send a request and store/print the response body:

from StringIO import StringIO    
import pycurl

url = 'http://www.google.com/'

storage = StringIO()
c = pycurl.Curl()
c.setopt(c.URL, url)
c.setopt(c.WRITEFUNCTION, storage.write)
c.perform()
c.close()
content = storage.getvalue()
print content

if you want to store the response headers, use:

c.setopt(c.HEADERFUNCTION, storage.write)

回复收藏 0 原文

近箐 2024-11-25 09:52:04

Perform() 方法执行 html 获取并将结果写入您指定的函数。您需要提供一个缓冲区来放入 html 和一个写入函数。通常，这可以使用 StringIO 对象来完成，如下所示：

import pycurl
import StringIO

c = pycurl.Curl()
c.setopt(pycurl.URL, "http://www.google.com/")

b = StringIO.StringIO()
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.setopt(pycurl.FOLLOWLOCATION, 1)
c.setopt(pycurl.MAXREDIRS, 5)
c.perform()
html = b.getvalue()

您还可以使用文件或临时文件或任何其他可以存储数据的东西。

The perform() method executes the html fetch and writes the result to a function you specify. You need to provide a buffer to put the html into and a write function. Usually, this can be accomplished using a StringIO object as follows:

import pycurl
import StringIO

c = pycurl.Curl()
c.setopt(pycurl.URL, "http://www.google.com/")

b = StringIO.StringIO()
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.setopt(pycurl.FOLLOWLOCATION, 1)
c.setopt(pycurl.MAXREDIRS, 5)
c.perform()
html = b.getvalue()

You could also use a file or tempfile or anything else that can store data.

回复收藏 0 原文

~没有更多了~