关于从网站提取特定数据的 WebRequest/POST 问题
我正在尝试从公司网站收集有关特定事件的数据: http://pipeline .kinderorgan.com/infoposting/notices.aspx?type=CRIT 我工作过很多类似的网站,但到目前为止,它们都非常简单,只需访问该网站并使用响应流即可。在这种情况下,网站要求您从第一个组合框中选择一个值(TSP/TSP 名称)。如果不传递任何信息,URL 将返回与列表中第一项相关的数据。我确实需要能够获取与列表中任何项目关联的数据。
这是我到目前为止一直在使用的代码,但它失败并出现服务器错误 500,所以我猜测要么我没有正确形成 POST,要么在帖子数据中丢失了一些数据):
对于我上面列出的页面,我只是想从组合框中获取包含特定 TSP 通知表的响应流(从 Trailblazer 开始)。我知道控件是“ctl00$ContentPlaceHolder1$ddlpipeline”,我要发送的值是24。当我通过 IE 导航时,我还必须按“检索”按钮。
当我使用 FireBug 查看 POST 请求时,我注意到其中包含许多其他目标/值。我不确定是否需要发送所有这些内容,并且(以前从未做过 POST)我不确定如何格式化 POST 中的数据来做到这一点。
如果这个要求看起来很奇怪,请耐心等待。我更像是一个数据库人员,并且希望将我们每天需要手动查看的许多内容自动化。任何帮助将不胜感激!
var encoding = new ASCIIEncoding();
var postData = "ctl00$ContentPlaceHolder1$ddlpipeline=24";
byte[] data = encoding.GetBytes(postData);
string RemoteURI = "http://pipeline.kindermorgan.com/infoposting/notices.aspx?type=CRIT";
var myRequest = (HttpWebRequest)WebRequest.Create(RemoteURI);
myRequest.Method = "POST";
myRequest.ContentType = "application/x-www-form-urlencoded";
myRequest.ContentLength = data.Length;
var newStream = myRequest.GetRequestStream();
newStream.Write(data, 0, data.Length);
newStream.Close();
var response = myRequest.GetResponse();
var responseStream = response.GetResponseStream();
var responseReader = new StreamReader(responseStream);
I am trying to gather data on specific events from a company website: http://pipeline.kindermorgan.com/infoposting/notices.aspx?type=CRIT
I have worked a lot of similar websites but so far they have been pretty simple and it’s just a matter of going to the website and working with the response stream. In this case the website requires you to select a value from the first combo box (TSP/TSP Name). Without any info being passed, the URL will return the data associated with the first item in the list. I really need to be able to get the data associated with any of the items in the list.
Here is the code I have been using thus far but it fails with a Server Error 500 so I am guessing that either I did not form the POST properly or am missing some data in the post data):
For the page I have listed above I just want to get a response stream with the table of the notices for a particular TSP from the combo box (starting with Trailblazer). I know the control is “ctl00$ContentPlaceHolder1$ddlpipeline” and the value I want to send is 24. When I navigate via IE, I also have to press the “Retrieve” button.
When I look at the POST request using FireBug, I notice that there are a lot of other target/values included. I’m not sure if I need to send all of those as well and (having never done a POST before) I am not sure how to format the data in the POST to do that.
Bear with me if this request seems odd. I am more of a database person and am looking to automate a lot of the stuff we are required to look at manually every day. Any help would be greatly appreciated!
var encoding = new ASCIIEncoding();
var postData = "ctl00$ContentPlaceHolder1$ddlpipeline=24";
byte[] data = encoding.GetBytes(postData);
string RemoteURI = "http://pipeline.kindermorgan.com/infoposting/notices.aspx?type=CRIT";
var myRequest = (HttpWebRequest)WebRequest.Create(RemoteURI);
myRequest.Method = "POST";
myRequest.ContentType = "application/x-www-form-urlencoded";
myRequest.ContentLength = data.Length;
var newStream = myRequest.GetRequestStream();
newStream.Write(data, 0, data.Length);
newStream.Close();
var response = myRequest.GetResponse();
var responseStream = response.GetResponseStream();
var responseReader = new StreamReader(responseStream);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我实际上解决了这个问题,并且在这个过程中我发现了很多事情,我将与其他可能查看该帖子的人分享这些事情。
首先,我必须完全按照浏览器中 POST 中显示的方式构建 POST 数据(我使用 Firebug 来查看 POST 数据)。这意味着还要获取隐藏的参数(特别是 VIEWSTATE 和 EVENTVALIDATION)。我只需下载页面的默认页面源(顺便说一句,我在代码中执行此操作,因为它对于该网站来说不是静态的)并解析出隐藏字段的值即可获得这些内容。然后,我使用可能进行的任何更改构建 POST 数据字符串(在我的情况下,更改日期很重要,但将来我可能会更改其他内容)。
现在真正让我难住的是那件事。我通过逐字符比较确认POST数据串与FireFox/FireBug发送的一模一样,但还是不行。然后我想起在之前的抓取案例中我必须设置用户代理。
所以这是我最终得到的代码:
希望这对其他人有帮助。
I actually resolved the issue and there were a number of things I found out in the process that I will share for the benefit of others who may look at this thread.
First, I had to build the POST data exactly as it appears in the POST in a browser (I used Firebug to see the POST data). This meant getting the hidden arguments as well (particularly VIEWSTATE and EVENTVALIDATION). I was able to get these by just downloading the default page source for the page (by the way, i do this in the code because it is not static for this site) and parsing out the values for the hidden fields. I then build the POST data string with any changes I may have (in my case changing the date was important but in the future I may change other things).
Now the thing that really had me stumped. I confirmed that the POST data string was exactly the same as the one sent by FireFox/FireBug through a character by character comparison and it still wouldn't work. I then remembered in a previous scraping case that I had to set the user agent.
So here is the code I ended up with:
Hope this helps someone else.