使用屏幕抓取时出现页面发布问题

发布于 2024-09-03 09:06:16 字数 703 浏览 9 评论 0原文

我正在进行屏幕抓取并在 3 个网站中成功完成,我在最后一个网站中遇到问题

,这是我的网址,当我使用参数时,它会在下一页上显示结果,只需发布​​到其他页面并显示结果即可在其他页面上

这是我的测试

但是,当我从我的应用程序,因为在这里我没有发布选项,它只获取请求页面的 html,这显然是我上面提到的 HTML 测试链接,实际上 URL 中有参数来获取结果。

我该如何处理这种情况? 请给我提示。

谢谢

,这是我的 C# 代码,我正在使用 HTMLAgality

String url;
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc;
url = "http://mysampleURL";
doc = hw.Load(url);

I am working on screen scraping and done successfully in 3 websites, I have an issue in last website

here is my url, When I hit with my parameter, it is showing result on next page, simply posting to other page and showing the result fine on other page

Here is My Test

However, when I hit from my application, since here I don't have an option to post, it only fetch html of requested page that is obviously my above mention HTML test link, that actually have parameter in URL to get the result.

How can I handle this situtation?
Please give me hint.

Thanks

here is my C# code, I am using HTMLAgality

String url;
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc;
url = "http://mysampleURL";
doc = hw.Load(url);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

暖树树初阳… 2024-09-10 09:06:16

使用 WebClient 类发布带有预期输入值的第一页的表单。输入值可以在第一页的源代码中找到,但也可以使用 Fiddler 捕获它们,恕我直言,Fiddler 对于这些场景来说是一个很好的工具。

例子:

NameValueCollection values = new NameValueCollection();
values.Add("action","hotelPackageWizard@searchHotelOnly");
values.Add("packageType","HOTEL_ONLY");
// etc..
WebClient webclient = new WebClient();
webclient.Headers.Add("Content-Type","application/x-www-form-urlencoded");
byte[] responseArray = webclient.UploadValues("http://www.expedia.com/Hotels?rfrr=-905&","POST", values);
string response = System.Text.Encoding.ASCII.GetString(responseArray);

Use the WebClient class for posting the form of the first page with the expected input values. The input values can be found in the source of the first page, but it's also possible to capture them using Fiddler which is imho a great tool for these scenarios.

Example:

NameValueCollection values = new NameValueCollection();
values.Add("action","hotelPackageWizard@searchHotelOnly");
values.Add("packageType","HOTEL_ONLY");
// etc..
WebClient webclient = new WebClient();
webclient.Headers.Add("Content-Type","application/x-www-form-urlencoded");
byte[] responseArray = webclient.UploadValues("http://www.expedia.com/Hotels?rfrr=-905&","POST", values);
string response = System.Text.Encoding.ASCII.GetString(responseArray);
素罗衫 2024-09-10 09:06:16

如果资源需要 POST,那么您必须提交 POST。

这是一个相当简单的任务。以下是 Rick Strahl 的博客 中的示例。该代码有点简单,但可以使用,并且可以让您朝着正确的方向前进

string lcUrl = "http://www.west-wind.com/testpage.wwd";
HttpWebRequest loHttp =
   (HttpWebRequest) WebRequest.Create(lcUrl);

// *** Send any POST data
string lcPostData =
   "Name=" + HttpUtility.UrlEncode("Rick Strahl") +
   "&Company=" + HttpUtility.UrlEncode("West Wind ");

loHttp.Method="POST";
byte [] lbPostBuffer = System.Text.           
                       Encoding.GetEncoding(1252).GetBytes(lcPostData);
loHttp.ContentLength = lbPostBuffer.Length;

Stream loPostData = loHttp.GetRequestStream();
loPostData.Write(lbPostBuffer,0,lbPostBuffer.Length);
loPostData.Close();

HttpWebResponse loWebResponse = (HttpWebResponse) loHttp.GetResponse();

Encoding enc = System.Text.Encoding.GetEncoding(1252);

StreamReader loResponseStream =
   new StreamReader(loWebResponse.GetResponseStream(),enc);

string lcHtml = loResponseStream.ReadToEnd();

loWebResponse.Close();
loResponseStream.Close();

If the resource requires a POST, then you MUST submit a POST.

This is a fairly simple task. Here is an example from Rick Strahl's blog. The code is a bit rustic but works and will get you heading the right direction

string lcUrl = "http://www.west-wind.com/testpage.wwd";
HttpWebRequest loHttp =
   (HttpWebRequest) WebRequest.Create(lcUrl);

// *** Send any POST data
string lcPostData =
   "Name=" + HttpUtility.UrlEncode("Rick Strahl") +
   "&Company=" + HttpUtility.UrlEncode("West Wind ");

loHttp.Method="POST";
byte [] lbPostBuffer = System.Text.           
                       Encoding.GetEncoding(1252).GetBytes(lcPostData);
loHttp.ContentLength = lbPostBuffer.Length;

Stream loPostData = loHttp.GetRequestStream();
loPostData.Write(lbPostBuffer,0,lbPostBuffer.Length);
loPostData.Close();

HttpWebResponse loWebResponse = (HttpWebResponse) loHttp.GetResponse();

Encoding enc = System.Text.Encoding.GetEncoding(1252);

StreamReader loResponseStream =
   new StreamReader(loWebResponse.GetResponseStream(),enc);

string lcHtml = loResponseStream.ReadToEnd();

loWebResponse.Close();
loResponseStream.Close();
挽心 2024-09-10 09:06:16

对于涉及发布表单(例如登录、维护 cookie、处理 XSRF 令牌)的屏幕抓取任务,一种解决方案是使用 CURL。但这并不容易。

然后我探索了 Selenium,我喜欢它。有两件事 - 1)安装 Selenium IDE(仅适用于 Firefox)。 2) 安装 Selenium RC 服务器

启动 Selenium IDE 后,转到您尝试自动化的站点并开始记录您在该站点上执行的事件。将其视为在浏览器中录制宏。然后,您将获得所需语言的代码输出。

正如您所知,Browsermob 使用 Selenium 进行负载测试和在浏览器上自动执行任务。

我上传了一份我前段时间做的ppt。这应该可以节省您大量的时间 - http://www.4shared.com/get /tlwT3qb_/SeleniumInstructions.html

在上面的链接中选择常规下载选项。

我花了很多时间来弄清楚它,所以认为这可能会节省别人的时间。

For screen scraping tasks that involve posting forms such as log-ins, maintaining cookies, taking care of XSRF tokens, one solution is to use CURL. But it is not easy.

I then explored Selenium and I love it. There are 2 things- 1) install Selenium IDE (works only in Firefox). 2) Install Selenium RC Server

After starting Selenium IDE, go to the site that you are trying to automate and start recording events that you do on the site. Think it as recording a macro in the browser. Afterwards, you get the code output for the language you want.

Just so you know Browsermob uses Selenium for load testing and for automating tasks on browser.

I've uploaded a ppt that I made a while back. This should save you a good amount of time- http://www.4shared.com/get/tlwT3qb_/SeleniumInstructions.html

In the above link select the option of regular download.

I spent good amount of time in figuring it out, so thought it may save somebody's time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文