如何将 WebResponse 中的 FORM 解析为 WebRequest 的 POST 正文
我对此很陌生,这是我的处女航,手头的任务是在 C# 中创建一个事务,该事务将通过 WebRequest/WebResponse 浏览 Web 应用程序的页面流。我让请求/响应机制、cookies 等都正常工作(我可以使用 POST URL 和 POST 正文的硬编码值成功执行事务),困难在于从 WebRequest 的值对生成 WebRequest 的动态 POST 正文和 POST URL 。 本质上,一旦流程从第一个 WebRequest(始终具有相同的静态 URL 和“硬编码”主体)开始,接下来的每个请求都是根据前一个响应的 FORM 值对构建的,例如:响应中的 FORM 的一部分(我已将 HTML 左括号和右括号替换为方括号,不知道如何将 HTML 直接粘贴到此处):
<form id="expressform" method="post" action="">
<div>
<input type="hidden" name="ScreenData.widgets.modified" value=""/><input type="hidden" name="ScreenData.header.hidden.name" value="ScreenData.widgets.modified"/><input type="hidden" name="ScreenData.marshalled" value="true"/><input type="hidden" name="ScreenData.header.hidden.name" value="ScreenData.marshalled"/><input type="hidden" name="isCreateAccountWizard" value="true"/><input type="hidden" name="ScreenData.header.hidden.name" value="isCreateAccountWizard"/>
<input type="hidden" name="versionPoint" value="77777"/>
然后表单中的一些文本区域用于提交值,如下所示:
<tr>
<td class="dataOut" style="padding-left:30px">
<textarea name="ScreenData.sicInfo.natureOfBusiness" rows="5" cols="60" class="dataOut" onmouseup="textAreaCounter(this,250);;" onkeypress="textAreaCounter(this,250);;" onkeyup="textAreaCounter(this,250);;" onchange="markDataDirty(this);;"></textarea>
</td>
</tr>
然后在“提交”上有 URL:
<a class="detailBtnOn" href="javascript:submitForm('express?displayAction=CreateAccountWizard&saveAction=SaveCreateSICCode&flow=forward&saveActionToken=84454A7D-50FE-5856-CE17-916B70EDFE1A&flowToken=CF3827F4-1DE7-54B1-D87B-D72F01C454C3')">Submit</a>
然后下一个 WebResponse应该在其 POST 主体中包含此内容:
ScreenData.widgets.modified=&ScreenData.header.hidden.name=ScreenData.widgets.modified&ScreenData.marshalled=true&ScreenData.header.hidden.name=ScreenData.marshalled&isCreateAccountWizard=true&ScreenData.header.hidden.name=isCreateAccountWizard&versionPoint=77777&ScreenData.commonHeaderInfo.accountName=SomeAccountName&ScreenData.commonHeaderInfo.effectiveDate=08%2F01%2F2011&ScreenData.sicInfo.natureOfBusiness=business&ScreenData.sicInfo.sic=7777&ScreenData.widgets.modified=ScreenData.sicInfo.natureOfBusiness&ScreenData.widgets.modified=ScreenData.sicInfo.sic
并将其作为 URL:
express?displayAction=CreateAccountWizard&saveAction=SaveCreateSICCode&flow=forward&saveActionToken=84454A7D-50FE-5856-CE17-916B70EDFE1A&flowToken=CF3827F4-1DE7-54B1-D87B-D72F01C454C3
但我不仅不知道如何构建此解析引擎,甚至无法从 FORM 中获取值对。我正在尝试使用 AgilityPack,这里至少应该打印出 FORMs“重要”内容:
var page = new HtmlDocument();
page.OptionReadEncoding = false;
var stream = HttpWResponse.GetResponseStream();
page.Load(stream);
foreach (var f in page.DocumentNode.Descendants("form"))
{
foreach (var d in page.DocumentNode.Descendants("div"))
{
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info((f.GetAttributeValue("name", null) ?? f.GetAttributeValue("id", "<no name>")) + ": ");
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info(f.GetAttributeValue("method", "<no method>") + ' ');
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info(f.GetAttributeValue("action", "<no action>"));
foreach(var i in f.Descendants("input"))//{
{
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info('\t' + (i.GetAttributeValue("name", null) ?? f.GetAttributeValue("id", "<no name>")));
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info(" (");
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info(i.GetAttributeValue("type", "<no type>"));
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info("): " + i.GetAttributeValue("value", "<no value>"));
}
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info("");
}
}
但它只打印出这个:(
INFO EventsLogger -
INFO EventsLogger - expressform:
INFO EventsLogger -
INFO EventsLogger - post
如果我去掉“div”位 - foreach(page.DocumentNode 中的 var d) .Descendants("div")), - 没有任何变化)
任何有关 FORM 打印输出解析器发生的情况以及如何构建解析引擎来构建来自响应的请求的解析引擎的帮助或建议都将非常有用赞赏。
I'm new to this, this is my virgin voyage, the task at hand is to create a transaction in C# that will navigate through a page flow of a web app via WebRequest/WebResponse. I got the Request/Response mechanism working, cookies and all (I can successfully execute a transaction with hardcoded values for POST URLs and POST bodies), the difficulty is with generating dynamic POST body and POST URL for the WebRequest from the value pairs of WebRequest.
Essentially, once the flow is started with first WebRequest, which has always the same static URL and "hardcoded" body, each following Request is built from the FORM value pairs of the previous Response, for example: part of the FORM that's in the Response (I've replaced HTML opening and closing brackets with square ones, not sure how to paste HTML straight into here):
<form id="expressform" method="post" action="">
<div>
<input type="hidden" name="ScreenData.widgets.modified" value=""/><input type="hidden" name="ScreenData.header.hidden.name" value="ScreenData.widgets.modified"/><input type="hidden" name="ScreenData.marshalled" value="true"/><input type="hidden" name="ScreenData.header.hidden.name" value="ScreenData.marshalled"/><input type="hidden" name="isCreateAccountWizard" value="true"/><input type="hidden" name="ScreenData.header.hidden.name" value="isCreateAccountWizard"/>
<input type="hidden" name="versionPoint" value="77777"/>
and then some text areas in the form to submit values, like this:
<tr>
<td class="dataOut" style="padding-left:30px">
<textarea name="ScreenData.sicInfo.natureOfBusiness" rows="5" cols="60" class="dataOut" onmouseup="textAreaCounter(this,250);;" onkeypress="textAreaCounter(this,250);;" onkeyup="textAreaCounter(this,250);;" onchange="markDataDirty(this);;"></textarea>
</td>
</tr>
and then on Submit there's the URL:
<a class="detailBtnOn" href="javascript:submitForm('express?displayAction=CreateAccountWizard&saveAction=SaveCreateSICCode&flow=forward&saveActionToken=84454A7D-50FE-5856-CE17-916B70EDFE1A&flowToken=CF3827F4-1DE7-54B1-D87B-D72F01C454C3')">Submit</a>
And then the next WebResponse should have this in its POST body:
ScreenData.widgets.modified=&ScreenData.header.hidden.name=ScreenData.widgets.modified&ScreenData.marshalled=true&ScreenData.header.hidden.name=ScreenData.marshalled&isCreateAccountWizard=true&ScreenData.header.hidden.name=isCreateAccountWizard&versionPoint=77777&ScreenData.commonHeaderInfo.accountName=SomeAccountName&ScreenData.commonHeaderInfo.effectiveDate=08%2F01%2F2011&ScreenData.sicInfo.natureOfBusiness=business&ScreenData.sicInfo.sic=7777&ScreenData.widgets.modified=ScreenData.sicInfo.natureOfBusiness&ScreenData.widgets.modified=ScreenData.sicInfo.sic
and this as a URL:
express?displayAction=CreateAccountWizard&saveAction=SaveCreateSICCode&flow=forward&saveActionToken=84454A7D-50FE-5856-CE17-916B70EDFE1A&flowToken=CF3827F4-1DE7-54B1-D87B-D72F01C454C3
But not only I can't figure out how to build this parsing engine, I can't even grab value pairs from the FORM. I'm trying to use AgilityPack, here's a bit that should at least print out FORMs "important" content:
var page = new HtmlDocument();
page.OptionReadEncoding = false;
var stream = HttpWResponse.GetResponseStream();
page.Load(stream);
foreach (var f in page.DocumentNode.Descendants("form"))
{
foreach (var d in page.DocumentNode.Descendants("div"))
{
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info((f.GetAttributeValue("name", null) ?? f.GetAttributeValue("id", "<no name>")) + ": ");
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info(f.GetAttributeValue("method", "<no method>") + ' ');
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info(f.GetAttributeValue("action", "<no action>"));
foreach(var i in f.Descendants("input"))//{
{
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info('\t' + (i.GetAttributeValue("name", null) ?? f.GetAttributeValue("id", "<no name>")));
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info(" (");
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info(i.GetAttributeValue("type", "<no type>"));
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info("): " + i.GetAttributeValue("value", "<no value>"));
}
Loggers.EventsLogger.Info("");
Loggers.EventsLogger.Info("");
}
}
but it only prints out this:
INFO EventsLogger -
INFO EventsLogger - expressform:
INFO EventsLogger -
INFO EventsLogger - post
(if i get rid of the "div" bit - foreach (var d in page.DocumentNode.Descendants("div")), - nothing changes)
Any help or suggestions on what's going on with the FORM print out parser and how to build a parsing engine for building Requests from Responses would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看看这个 使用 HtmlAgilityPack 解析 HTML 页面 和这个 http://refactoringaspnet.blogspot.com/2010/ 04/using-htmlagilitypack-to-get-and-post_19.html 和 http://htmlagilitypack.codeplex.com/discussions/247206 和 如何使用 HtmlAgility Pack 从特定表单获取输入? Lang:C#.net
编辑 - 更多信息:
您通过 foreach 遍历 HTML 文档中的表单,但您在下一个 foreach 中的 DIV 后面查找而不引用当前表单...在内部 foreach 循环中( s)你需要类似的东西
和
check this out Parsing HTML page with HtmlAgilityPack and this http://refactoringaspnet.blogspot.com/2010/04/using-htmlagilitypack-to-get-and-post_19.html and http://htmlagilitypack.codeplex.com/discussions/247206 and How would I get the inputs from a certain form with HtmlAgility Pack? Lang: C#.net
EDIT - some more info:
you loop via foreach over the forms in the HTML document but you go after the DIVs in the next foreach without referencing the current form... in the inner foreach loop(s) you need something similar to
and