urllib2.urlopen() 是否缓存内容?
他们在 python 文档中没有提到这一点。最近我正在测试一个网站,只需使用 urllib2.urlopen() 刷新网站来提取某些内容,我注意到有时当我更新网站时 urllib2.urlopen() 似乎无法获取新添加的内容。所以我想知道它确实在某个地方缓存了东西,对吗?
They didn't mention this in python documentation. And recently I'm testing a website simply refreshing the site using urllib2.urlopen() to extract certain content, I notice sometimes when I update the site urllib2.urlopen() seems not get the newly added content. So I wonder it does cache stuff somewhere, right?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
事实并非如此。
如果您没有看到新数据,可能有多种原因。出于性能原因,大多数大型 Web 服务都使用服务器端缓存,例如使用 Varnish 和 Squid 等缓存代理或应用程序级缓存。
如果问题是由服务器端缓存引起的,通常无法强制服务器为您提供最新数据。
对于像鱿鱼这样的缓存代理,情况有所不同。通常,squid 会向 HTTP 响应添加一些附加标头 (
response().info().headers
)。如果您看到名为
X-Cache
或X-Cache-Lookup
的标头字段,这意味着您没有直接连接到远程服务器,而是通过透明代理连接。如果您有类似以下内容:
X-Cache: HIT from proxy.domain.tld
,这意味着您获得的响应已被缓存。相反的是X-Cache MISS from proxy.domain.tld
,这意味着响应是新鲜的。It doesn't.
If you don't see new data, this could have many reasons. Most bigger web services use server-side caching for performance reasons, for example using caching proxies like Varnish and Squid or application-level caching.
If the problem is caused by server-side caching, usally there's no way to force the server to give you the latest data.
For caching proxies like squid, things are different. Usually, squid adds some additional headers to the HTTP response (
response().info().headers
).If you see a header field called
X-Cache
orX-Cache-Lookup
, this means that you aren't connected to the remote server directly, but through a transparent proxy.If you have something like:
X-Cache: HIT from proxy.domain.tld
, this means that the response you got is cached. The opposite isX-Cache MISS from proxy.domain.tld
, which means that the response is fresh.非常老的问题,但我有一个类似的问题,这个解决方案没有解决。
就我而言,我不得不像这样欺骗用户代理:
希望这对任何人都有帮助......
Very old question, but I had a similar problem which this solution did not resolve.
In my case I had to spoof the User-Agent like this:
Hope this helps anyone...
您的 Web 服务器或 HTTP 代理可能正在缓存内容。您可以尝试通过添加
Pragma: no-cache
请求标头来禁用缓存:Your web server or an HTTP proxy may be caching content. You can try to disable caching by adding a
Pragma: no-cache
request header:如果您进行更改并测试浏览器和 urllib 的行为,很容易犯愚蠢的错误。
在浏览器中,您已登录,但在 urllib.urlopen 中,您的应用程序可以将您始终重定向到相同的登录页面,因此,如果您只看到页面大小或公共布局的顶部,您可能会认为您的更改没有效果。
If you make changes and test the behaviour from browser and from urllib, it is easy to make a stupid mistake.
In browser you are logged in, but in urllib.urlopen your app can redirect you always to the same login page, so if you just see the page size or the top of your common layout, you could think that your changes have no effect.
我发现很难相信 urllib2 不进行缓存,因为在我的例子中,程序重新启动后数据就会刷新。如果程序没有重新启动,数据似乎会被永远缓存。此外,从 Firefox 检索相同的数据永远不会返回过时的数据。
I find it hard to believe that urllib2 does not do caching, because in my case, upon restart of the program the data is refreshed. If the program is not restarted, the data appears to be cached forever. Also retrieving the same data from Firefox never returns stale data.