WebClient 禁止打开维基百科页面?
这是我尝试运行的代码:
var wc = new WebClient();
var stream = wc.OpenRead(
"http://en.wikipedia.org/wiki/List_of_communities_in_New_Brunswick");
但我不断收到 403 禁止错误。不明白为什么。对于其他页面来说效果很好。我可以在浏览器中正常打开该页面。我该如何解决这个问题?
Here's the code I'm trying to run:
var wc = new WebClient();
var stream = wc.OpenRead(
"http://en.wikipedia.org/wiki/List_of_communities_in_New_Brunswick");
But I keep getting a 403 forbidden error. Don't understand why. It worked fine for other pages. I can open the page fine in my browser. How can I fix this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我通常不会使用
OpenRead()
,而是尝试使用DownloadData()
或DownloadString()
。另外,维基百科也可能故意阻止您的请求,因为您没有提供用户代理字符串:
我经常使用 WebClient,并且很快了解到,如果您不提供用户代理字符串,网站可以并且将会阻止您的请求与已知的网络浏览器匹配。另外,如果您编写自己的用户代理字符串(例如“我的超酷网络抓取工具”),您也将被阻止。
[编辑]
我将示例用户代理字符串更改为现代版本 Firefox 的用户代理字符串。我给出的原始示例是 IE6 的用户代理字符串,这不是一个好主意。为什么?某些网站可能会基于 IE6 执行过滤,并向使用该浏览器的任何人发送消息或发送到其他页面,提示“请更新您的浏览器” - 这意味着您将无法获得您想要的内容。
I wouldn't normally use
OpenRead()
, tryDownloadData()
orDownloadString()
instead.Also it might be that wikipedia is deliberately blocking your request because you have not provided a user agent string:
I use WebClient quite often, and learned quite quickly that websites can and will block your request if you don't provide a user agent string that matches a known web browser. Also, if you make up your own user agent string (eg "my super cool web scraper") you will also be blocked.
[Edit]
I changed my example user agent string to that of a modern version of Firefox. The original example I gave was the user agent string for IE6 which is not a good idea. Why? Some websites may perform filtering based on IE6 and send anyone with that browser a message or to a different page that says "Please update your browser" - this means you will not get the content you wanted to get.