matweb.com:如何获取页面源?
我的网址如下:
http://www.matweb.com/search/DataSheet.aspx?MatGUID=849e2916ab1541be9ff6a17b78f95c82
我想使用此代码从该页面下载源代码:
private static string urlTemplate = @"http://www.matweb.com/search/DataSheet.aspx?MatGUID=";
static string GetSource(string guid)
{
try
{
Uri url = new Uri(urlTemplate + guid);
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.Method = "GET";
HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();
Stream responseStream = webResponse.GetResponseStream();
StreamReader responseStreamReader = new StreamReader(responseStream);
String result = responseStreamReader.ReadToEnd();
return result;
}
catch (Exception ex)
{
return null;
}
}
当我这样做时,我得到:
您似乎没有启用 cookie。 MatWeb 需要启用 cookie。
好的,我明白了,所以我添加了几行:
CookieContainer cc = new CookieContainer();
webRequest.CookieContainer = cc;
我得到:
您的 IP 地址因过度使用而受到限制。当公司中的许多人或通过互联网服务提供商共享 IP 地址时,问题可能会变得更加复杂。对于给您带来的任何不便,我们深表歉意。
我可以理解这一点,但当我尝试使用网络浏览器访问此页面时,我没有收到此消息。我可以做什么来获取源代码?一些 cookie 或 http 标头?
I have url like:
http://www.matweb.com/search/DataSheet.aspx?MatGUID=849e2916ab1541be9ff6a17b78f95c82
I want to download source code from that page using this code:
private static string urlTemplate = @"http://www.matweb.com/search/DataSheet.aspx?MatGUID=";
static string GetSource(string guid)
{
try
{
Uri url = new Uri(urlTemplate + guid);
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.Method = "GET";
HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();
Stream responseStream = webResponse.GetResponseStream();
StreamReader responseStreamReader = new StreamReader(responseStream);
String result = responseStreamReader.ReadToEnd();
return result;
}
catch (Exception ex)
{
return null;
}
}
When I do so I get:
You do not seem to have cookies enabled. MatWeb Requires cookies to be enabled.
Ok, that I understand, so I added lines:
CookieContainer cc = new CookieContainer();
webRequest.CookieContainer = cc;
I got:
Your IP Address has been restricted due to excessive use. The problem may be compounded when an IP address may be shared by many people in a company or through an internet service provider. We apologize for any inconvenience.
I can understand this but I'm not getting this message when I try to visit this page using web browser. What can I do to get the source code? Some cookies or http headers?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
它可能不喜欢您的 UserAgent。试试这个:
It probably doesn't like your UserAgent. Try this:
如果您收到“过度使用”的回复,那么您似乎正在做公司不喜欢的事情。
It looks like you're doing something that the company doesn't like, if you got an "excessive use" response.
您下载页面的速度太快。
当您使用浏览器时,您可能每秒最多只能浏览一页。使用应用程序,您可以每秒获取多个页面,这可能就是他们的网络服务器正在检测的内容。因此过度使用。
You are downloading pages too fast.
When you use a browser you might get up to one page per second. Using a application you can get several pages per second and that's probably what their web server is detecting. Hence the excessive usage.