关于从网站提取特定数据的 WebRequest/POST 问题

发布于 2024-10-22 19:09:17 字数 1500 浏览 2 评论 0原文

我正在尝试从公司网站收集有关特定事件的数据: http://pipeline .kinderorgan.com/infoposting/notices.aspx?type=CRIT 我工作过很多类似的网站,但到目前为止,它们都非常简单,只需访问该网站并使用响应流即可。在这种情况下,网站要求您从第一个组合框中选择一个值(TSP/TSP 名称)。如果不传递任何信息,URL 将返回与列表中第一项相关的数据。我确实需要能够获取与列表中任何项目关联的数据。

这是我到目前为止一直在使用的代码,但它失败并出现服务器错误 500,所以我猜测要么我没有正确形成 POST,要么在帖子数据中丢失了一些数据):

对于我上面列出的页面,我只是想从组合框中获取包含特定 TSP 通知表的响应流(从 Trailblazer 开始)。我知道控件是“ctl00$ContentPlaceHolder1$ddlpipeline”,我要发送的值是24。当我通过 IE 导航时,我还必须按“检索”按钮。

当我使用 FireBug 查看 POST 请求时,我注意到其中包含许多其他目标/值。我不确定是否需要发送所有这些内容,并且(以前从未做过 POST)我不确定如何格式化 POST 中的数据来做到这一点。

如果这个要求看起来很奇怪,请耐心等待。我更像是一个数据库人员,并且希望将我们每天需要手动查看的许多内容自动化。任何帮助将不胜感激!

    var encoding = new ASCIIEncoding(); 

    var postData = "ctl00$ContentPlaceHolder1$ddlpipeline=24";

    byte[] data = encoding.GetBytes(postData);

    string RemoteURI = "http://pipeline.kindermorgan.com/infoposting/notices.aspx?type=CRIT";

    var myRequest = (HttpWebRequest)WebRequest.Create(RemoteURI);

    myRequest.Method = "POST";

    myRequest.ContentType = "application/x-www-form-urlencoded";

    myRequest.ContentLength = data.Length;

    var newStream = myRequest.GetRequestStream();

    newStream.Write(data, 0, data.Length);

    newStream.Close();

    var response = myRequest.GetResponse();

    var responseStream = response.GetResponseStream();

    var responseReader = new StreamReader(responseStream);

I am trying to gather data on specific events from a company website: http://pipeline.kindermorgan.com/infoposting/notices.aspx?type=CRIT
I have worked a lot of similar websites but so far they have been pretty simple and it’s just a matter of going to the website and working with the response stream. In this case the website requires you to select a value from the first combo box (TSP/TSP Name). Without any info being passed, the URL will return the data associated with the first item in the list. I really need to be able to get the data associated with any of the items in the list.

Here is the code I have been using thus far but it fails with a Server Error 500 so I am guessing that either I did not form the POST properly or am missing some data in the post data):

For the page I have listed above I just want to get a response stream with the table of the notices for a particular TSP from the combo box (starting with Trailblazer). I know the control is “ctl00$ContentPlaceHolder1$ddlpipeline” and the value I want to send is 24. When I navigate via IE, I also have to press the “Retrieve” button.

When I look at the POST request using FireBug, I notice that there are a lot of other target/values included. I’m not sure if I need to send all of those as well and (having never done a POST before) I am not sure how to format the data in the POST to do that.

Bear with me if this request seems odd. I am more of a database person and am looking to automate a lot of the stuff we are required to look at manually every day. Any help would be greatly appreciated!

    var encoding = new ASCIIEncoding(); 

    var postData = "ctl00$ContentPlaceHolder1$ddlpipeline=24";

    byte[] data = encoding.GetBytes(postData);

    string RemoteURI = "http://pipeline.kindermorgan.com/infoposting/notices.aspx?type=CRIT";

    var myRequest = (HttpWebRequest)WebRequest.Create(RemoteURI);

    myRequest.Method = "POST";

    myRequest.ContentType = "application/x-www-form-urlencoded";

    myRequest.ContentLength = data.Length;

    var newStream = myRequest.GetRequestStream();

    newStream.Write(data, 0, data.Length);

    newStream.Close();

    var response = myRequest.GetResponse();

    var responseStream = response.GetResponseStream();

    var responseReader = new StreamReader(responseStream);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

蓬勃野心 2024-10-29 19:09:17

我实际上解决了这个问题,并且在这个过程中我发现了很多事情,我将与其他可能查看该帖子的人分享这些事情。

首先,我必须完全按照浏览器中 POST 中显示的方式构建 POST 数据(我使用 Firebug 来查看 POST 数据)。这意味着还要获取隐藏的参数(特别是 VIEWSTATE 和 EVENTVALIDATION)。我只需下载页面的默认页面源(顺便说一句,我在代码中执行此操作,因为它对于该网站来说不是静态的)并解析出隐藏字段的值即可获得这些内容。然后,我使用可能进行的任何更改构建 POST 数据字符串(在我的情况下,更改日期很重要,但将来我可能会更改其他内容)。

现在真正让我难住的是那件事。我通过逐字符比较确认POST数据串与FireFox/FireBug发送的一模一样,但还是不行。然后我想起在之前的抓取案例中我必须设置用户代理。

所以这是我最终得到的代码:

string postData = String.Format("__EVENTTARGET=&__EVENTARGUMENT=&__LASTFOCUS="
    + "&__VIEWSTATE={0}"
    + "&ctl00%24UltraWebTree1={1}"
    + "&ctl00%24ContentPlaceHolder1%24ddlNoticeCategory={2}"
    + "&ctl00%24ContentPlaceHolder1%24ddlpipeline={3}"
    + "&ctl00%24ContentPlaceHolder1%24Button1={4}"
    + "&ctl00%24ContentPlaceHolder1%24tbDate={5}"
    + "&ctl00%24ContentPlaceHolder1%24ddlNoticeType={6}"
    + "&ctl00%24ContentPlaceHolder1%24tbSubject={7}"
    + "&ctl00%24ContentPlaceHolder1%24ddlNoticeSubType={8}"
    + "&ctl00%24ContentPlaceHolder1%24ddlOrderBy={9}"
    + "&ctl00%24ContentPlaceHolder1%24hfmode={10}"
    + "&ctl00%24ContentPlaceHolder1%24hfODSCommand={11}&ctl00%24hfPipeline={12}"
    + "&__PREVIOUSPAGE={13}&__EVENTVALIDATION={14}",
    viewstate, webtree, noticecategory, pplcode, 
    button1, todaydate, noticetype, subject, 
    noticesubtype, orderby, hfmode, hfODSCommand, 
    hfPipeline, previouspage, eventvalidation);

var encoding = new ASCIIEncoding(); 
byte[] data = encoding.GetBytes(postData);

var myRequest = (HttpWebRequest)WebRequest.Create(RemoteURI); 
myRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)" ;
myRequest.Method = "POST" ;
myRequest.ContentType = "application/x-www-form-urlencoded";
myRequest.ContentLength = data.Length;

var newStream = myRequest.GetRequestStream(); 
newStream.Write(data, 0, data.Length);
newStream.Close();

var myresponse = myRequest.GetResponse(); 
var responseStream = myresponse.GetResponseStream(); 
var responseReader = new StreamReader(responseStream); 
string webpagesource = responseReader.ReadToEnd();

希望这对其他人有帮助。

I actually resolved the issue and there were a number of things I found out in the process that I will share for the benefit of others who may look at this thread.

First, I had to build the POST data exactly as it appears in the POST in a browser (I used Firebug to see the POST data). This meant getting the hidden arguments as well (particularly VIEWSTATE and EVENTVALIDATION). I was able to get these by just downloading the default page source for the page (by the way, i do this in the code because it is not static for this site) and parsing out the values for the hidden fields. I then build the POST data string with any changes I may have (in my case changing the date was important but in the future I may change other things).

Now the thing that really had me stumped. I confirmed that the POST data string was exactly the same as the one sent by FireFox/FireBug through a character by character comparison and it still wouldn't work. I then remembered in a previous scraping case that I had to set the user agent.

So here is the code I ended up with:

string postData = String.Format("__EVENTTARGET=&__EVENTARGUMENT=&__LASTFOCUS="
    + "&__VIEWSTATE={0}"
    + "&ctl00%24UltraWebTree1={1}"
    + "&ctl00%24ContentPlaceHolder1%24ddlNoticeCategory={2}"
    + "&ctl00%24ContentPlaceHolder1%24ddlpipeline={3}"
    + "&ctl00%24ContentPlaceHolder1%24Button1={4}"
    + "&ctl00%24ContentPlaceHolder1%24tbDate={5}"
    + "&ctl00%24ContentPlaceHolder1%24ddlNoticeType={6}"
    + "&ctl00%24ContentPlaceHolder1%24tbSubject={7}"
    + "&ctl00%24ContentPlaceHolder1%24ddlNoticeSubType={8}"
    + "&ctl00%24ContentPlaceHolder1%24ddlOrderBy={9}"
    + "&ctl00%24ContentPlaceHolder1%24hfmode={10}"
    + "&ctl00%24ContentPlaceHolder1%24hfODSCommand={11}&ctl00%24hfPipeline={12}"
    + "&__PREVIOUSPAGE={13}&__EVENTVALIDATION={14}",
    viewstate, webtree, noticecategory, pplcode, 
    button1, todaydate, noticetype, subject, 
    noticesubtype, orderby, hfmode, hfODSCommand, 
    hfPipeline, previouspage, eventvalidation);

var encoding = new ASCIIEncoding(); 
byte[] data = encoding.GetBytes(postData);

var myRequest = (HttpWebRequest)WebRequest.Create(RemoteURI); 
myRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)" ;
myRequest.Method = "POST" ;
myRequest.ContentType = "application/x-www-form-urlencoded";
myRequest.ContentLength = data.Length;

var newStream = myRequest.GetRequestStream(); 
newStream.Write(data, 0, data.Length);
newStream.Close();

var myresponse = myRequest.GetResponse(); 
var responseStream = myresponse.GetResponseStream(); 
var responseReader = new StreamReader(responseStream); 
string webpagesource = responseReader.ReadToEnd();

Hope this helps someone else.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文