在Python中,我如何检查两个不同的链接是否实际上指向同一页面?
例如,这两个链接指向同一位置:
http://www.independent .co.uk/life-style/gadgets-and-tech/news/2292113.html
我如何在Python中检查这个?
For example, these 2 links point to the same location:
http://www.independent.co.uk/life-style/gadgets-and-tech/news/2292113.html
How do i check this in python?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对
urllib2.urlopen()
的结果调用geturl()
。geturl()
“返回检索到的资源的 URL,通常用于确定是否遵循重定向。”例如:
输出为:
Call
geturl()
on the result ofurllib2.urlopen()
.geturl()
"returns the URL of the resource retrieved, commonly used to determine if a redirect was followed."For example:
The output is:
显然,仅从 URL 无法辨别这一点。
您可以获取内容并进行比较,但我想您必须使用一个智能标准来决定两个页面何时相同 - 例如,两个页面都指向同一篇文章,但出现随机广告不同的或相关的文章会根据其他因素而变化。
以这样一种方式设计您的程序,即匹配页面的标准可以轻松替换,甚至是动态替换,并尝试直到找到一个不会失败的标准 - 例如,对于报纸页面,您可以尝试查找标题。
It's impossible to discern that merely from the URLs, obviously.
You could fetch the content and compare it, but then I imagine you'd have to use a smart criterion to decide when two pages are the same -- say, for example, that both point to the same article, but a random advertising comes different, or related articles change depending on other factors.
Design your program in such a way that the criterion for matching pages is easily replaced, even dynamically, and try until you find one that doesn't fail -- for example, for a newspaper page, you could try finding headlines.