urllib2.urlopen() 是否缓存内容?

发布于 2024-09-15 23:58:41 字数 144 浏览 9 评论 0原文

他们在 python 文档中没有提到这一点。最近我正在测试一个网站,只需使用 urllib2.urlopen() 刷新网站来提取某些内容,我注意到有时当我更新网站时 urllib2.urlopen() 似乎无法获取新添加的内容。所以我想知道它确实在某个地方缓存了东西,对吗?

They didn't mention this in python documentation. And recently I'm testing a website simply refreshing the site using urllib2.urlopen() to extract certain content, I notice sometimes when I update the site urllib2.urlopen() seems not get the newly added content. So I wonder it does cache stuff somewhere, right?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

苦笑流年记忆 2024-09-22 23:58:41

所以我想知道它确实在某个地方缓存了东西,对吧?

事实并非如此。

如果您没有看到新数据,可能有多种原因。出于性能原因,大多数大型 Web 服务都使用服务器端缓存,例如使用 Varnish 和 Squid 等缓存代理或应用程序级缓存。

如果问题是由服务器端缓存引起的,通常无法强制服务器为您提供最新数据。


对于像鱿鱼这样的缓存代理,情况有所不同。通常,squid 会向 HTTP 响应添加一些附加标头 (response().info().headers)。

如果您看到名为 X-CacheX-Cache-Lookup 的标头字段,这意味着您没有直接连接到远程服务器,而是通过透明代理连接。

如果您有类似以下内容:X-Cache: HIT from proxy.domain.tld,这意味着您获得的响应已被缓存。相反的是X-Cache MISS from proxy.domain.tld,这意味着响应是新鲜的。

So I wonder it does cache stuff somewhere, right?

It doesn't.

If you don't see new data, this could have many reasons. Most bigger web services use server-side caching for performance reasons, for example using caching proxies like Varnish and Squid or application-level caching.

If the problem is caused by server-side caching, usally there's no way to force the server to give you the latest data.


For caching proxies like squid, things are different. Usually, squid adds some additional headers to the HTTP response (response().info().headers).

If you see a header field called X-Cache or X-Cache-Lookup, this means that you aren't connected to the remote server directly, but through a transparent proxy.

If you have something like: X-Cache: HIT from proxy.domain.tld, this means that the response you got is cached. The opposite is X-Cache MISS from proxy.domain.tld, which means that the response is fresh.

狼性发作 2024-09-22 23:58:41

非常老的问题,但我有一个类似的问题,这个解决方案没有解决。
就我而言,我不得不像这样欺骗用户代理:

request = urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/5.0')
content = urllib2.build_opener().open(request)

希望这对任何人都有帮助......

Very old question, but I had a similar problem which this solution did not resolve.
In my case I had to spoof the User-Agent like this:

request = urllib2.Request(url)
request.add_header('User-Agent', 'Mozilla/5.0')
content = urllib2.build_opener().open(request)

Hope this helps anyone...

九歌凝 2024-09-22 23:58:41

您的 Web 服务器或 HTTP 代理可能正在缓存内容。您可以尝试通过添加 Pragma: no-cache 请求标头来禁用缓存:

request = urllib2.Request(url)
request.add_header('Pragma', 'no-cache')
content = urllib2.build_opener().open(request)

Your web server or an HTTP proxy may be caching content. You can try to disable caching by adding a Pragma: no-cache request header:

request = urllib2.Request(url)
request.add_header('Pragma', 'no-cache')
content = urllib2.build_opener().open(request)
牛↙奶布丁 2024-09-22 23:58:41

如果您进行更改并测试浏览器和 urllib 的行为,很容易犯愚蠢的错误。
在浏览器中,您已登录,但在 urllib.urlopen 中,您的应用程序可以将您始终重定向到相同的登录页面,因此,如果您只看到页面大小或公共布局的顶部,您可能会认为您的更改没有效果。

If you make changes and test the behaviour from browser and from urllib, it is easy to make a stupid mistake.
In browser you are logged in, but in urllib.urlopen your app can redirect you always to the same login page, so if you just see the page size or the top of your common layout, you could think that your changes have no effect.

情归归情 2024-09-22 23:58:41

我发现很难相信 urllib2 不进行缓存,因为在我的例子中,程序重新启动后数据就会刷新。如果程序没有重新启动,数据似乎会被永远缓存。此外,从 Firefox 检索相同的数据永远不会返回过时的数据。

I find it hard to believe that urllib2 does not do caching, because in my case, upon restart of the program the data is refreshed. If the program is not restarted, the data appears to be cached forever. Also retrieving the same data from Firefox never returns stale data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文