使用 C# 读取和发布到网页

发布于 2024-07-06 16:03:40 字数 390 浏览 6 评论 0原文

我的工作项目要求我能够在网页中输入信息,阅读重定向到的下一页,然后采取进一步的操作。 一个简化的现实示例类似于访问 google.com,输入“编码技巧”作为搜索条件,然后阅读结果页面。

小型编码示例,例如 http://www.csharp-station.com 链接的示例/HowTo/HttpWebFetch.aspx 讲述如何阅读网页,但不讲述如何通过将信息提交到表单并继续到下一页来与之交互。

根据记录,我不会构建恶意和/或垃圾邮件相关的产品。

那么我该如何去阅读那些需要正常浏览几步才能到达的网页呢?

I have a project at work the requires me to be able to enter information into a web page, read the next page I get redirected to and then take further action. A simplified real-world example would be something like going to google.com, entering "Coding tricks" as search criteria, and reading the resulting page.

Small coding examples like the ones linked to at http://www.csharp-station.com/HowTo/HttpWebFetch.aspx tell how to read a web page, but not how to interact with it by submitting information into a form and continuing on to the next page.

For the record, I'm not building a malicious and/or spam related product.

So how do I go read web pages that require a few steps of normal browsing to reach first?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

夏末的微笑 2024-07-13 16:03:40

您可以以编程方式创建 Http 请求并检索响应:

 string uri = "http://www.google.com/search";
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
        request.Method = "POST";
        request.ContentType = "application/x-www-form-urlencoded";

        // encode the data to POST:
        string postData = "q=searchterm&hl=en";
        byte[] encodedData = new ASCIIEncoding().GetBytes(postData);
        request.ContentLength = encodedData.Length;

        Stream requestStream = request.GetRequestStream();
        requestStream.Write(encodedData, 0, encodedData.Length);

        // send the request and get the response
        using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
        {

            // Do something with the response stream. As an example, we'll
            // stream the response to the console via a 256 character buffer
            using (StreamReader reader = new StreamReader(response.GetResponseStream()))
            {
                Char[] buffer = new Char[256];
                int count = reader.Read(buffer, 0, 256);
                while (count > 0)
                {
                    Console.WriteLine(new String(buffer, 0, count));
                    count = reader.Read(buffer, 0, 256);
                }
            } // reader is disposed here
        } // response is disposed here

当然,此代码将返回错误,因为 Google 使用 GET 而不是 POST 进行搜索查询。

如果您正在处理特定的网页,则此方法将起作用,因为 URL 和 POST 数据基本上都是硬编码的。 如果你需要一些更动态的东西,你必须:

  1. 捕获页面
  2. 剥离表单
  3. 基于表单字段创建一个 POST 字符串

FWIW,我认为像 Perl 或 Python 这样的东西可能更适合这种类型任务。

编辑:x-www-form-urlencoded

You can programmatically create an Http request and retrieve the response:

 string uri = "http://www.google.com/search";
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
        request.Method = "POST";
        request.ContentType = "application/x-www-form-urlencoded";

        // encode the data to POST:
        string postData = "q=searchterm&hl=en";
        byte[] encodedData = new ASCIIEncoding().GetBytes(postData);
        request.ContentLength = encodedData.Length;

        Stream requestStream = request.GetRequestStream();
        requestStream.Write(encodedData, 0, encodedData.Length);

        // send the request and get the response
        using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
        {

            // Do something with the response stream. As an example, we'll
            // stream the response to the console via a 256 character buffer
            using (StreamReader reader = new StreamReader(response.GetResponseStream()))
            {
                Char[] buffer = new Char[256];
                int count = reader.Read(buffer, 0, 256);
                while (count > 0)
                {
                    Console.WriteLine(new String(buffer, 0, count));
                    count = reader.Read(buffer, 0, 256);
                }
            } // reader is disposed here
        } // response is disposed here

Of course, this code will return an error since Google uses GET, not POST, for search queries.

This method will work if you are dealing with specific web pages, as the URLs and POST data are all basically hard-coded. If you needed something that was a little more dynamic, you'd have to:

  1. Capture the page
  2. Strip out the form
  3. Create a POST string based on the form fields

FWIW, I think something like Perl or Python might be better suited to that sort of task.

edit: x-www-form-urlencoded

眉目亦如画i 2024-07-13 16:03:40

您可以尝试 Selenium。 使用 Selenium IDE 记录 Firefox 中的操作,以 C# 格式保存脚本,然后使用 Selenium RC C# 包装器回放它们。 正如其他人提到的,您也可以使用 System.Net.HttpWebRequest< /a> 或 System.Net.WebClient。 如果这是桌面应用程序,另请参阅 System.Windows。 Forms.WebBrowser

附录:与基于 Java 的 Selenium IDE 和 Selenium RC 类似,WatiN Test RecorderWatiN 基于.NET。

You might try Selenium. Record the actions in Firefox using Selenium IDE, save the script in C# format, then play them back using the Selenium RC C# wrapper. As others have mentioned you could also use System.Net.HttpWebRequest or System.Net.WebClient. If this is a desktop application see also System.Windows.Forms.WebBrowser.

Addendum: Similar to Selenium IDE and Selenium RC, which are Java-based, WatiN Test Recorder and WatiN are .NET-based.

不如归去 2024-07-13 16:03:40

您需要做的是不断检索和分析链中每个页面的 html 源代码。 对于每个页面,您需要弄清楚表单提交的样子,并发送与该请求匹配的请求以获取链中的下一页。

我所做的是构建一个包装 System.Net.HttpWebRequest/HttpWebResponse 的自定义类,因此检索页面就像使用 System.Net.WebClient 一样简单。 然而,我的自定义类还在请求之间保留相同的 cookie 容器,并使发送发布数据、自定义用户代理等变得更容易。

What you need to do is keep retrieving and analyzing the html source for each page in the chain. For each page, you need to figure out what the form submission will look like and send a request that will match that to get the next page in the chain.

What I do is build a custom class the wraps System.Net.HttpWebRequest/HttpWebResponse, so retrieving pages is as simple as using System.Net.WebClient. However, my custom class also keeps the same cookie container across requests and makes it a little easier to send post data, customize the user agent, etc.

悲欢浪云 2024-07-13 16:03:40

根据网站的工作方式,您可以操纵 url 来执行您想要的操作。 例如,要搜索单词“beatles”,您只需向 google.com?q=beetles 打开请求,然后读取结果即可。

或者,如果网站不使用查询字符串值 (url) 来处理页面操作,那么您将需要处理网络请求,将所需的值发布到网站。 在 Google 中搜索如何使用 WebRequest 和 webresponse。

Depending on how the website works you can either manipulate the url to perform what you want. e.g to search for the word "beatles" you could just open a request to google.com?q=beetles and then just read the results.

Alternatively if the website does not use querystring values (url) to process page actions then you will need to work on a webrequest which posts the required values to the website instead. Search in Google for working with WebRequest and webresponse.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文