当前位置：文江博客话题详情

使用 C# 读取和发布到网页

发布于 2024-07-06 16:03:40 字数 390 浏览 8 评论 0原文

我的工作项目要求我能够在网页中输入信息，阅读重定向到的下一页，然后采取进一步的操作。一个简化的现实示例类似于访问 google.com，输入“编码技巧”作为搜索条件，然后阅读结果页面。

小型编码示例，例如 http://www.csharp-station.com 链接的示例/HowTo/HttpWebFetch.aspx 讲述如何阅读网页，但不讲述如何通过将信息提交到表单并继续到下一页来与之交互。

根据记录，我不会构建恶意和/或垃圾邮件相关的产品。

那么我该如何去阅读那些需要正常浏览几步才能到达的网页呢？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夏末的微笑 2024-07-13 16:03:40

您可以以编程方式创建 Http 请求并检索响应：

 string uri = "http://www.google.com/search";
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
        request.Method = "POST";
        request.ContentType = "application/x-www-form-urlencoded";

        // encode the data to POST:
        string postData = "q=searchterm&hl=en";
        byte[] encodedData = new ASCIIEncoding().GetBytes(postData);
        request.ContentLength = encodedData.Length;

        Stream requestStream = request.GetRequestStream();
        requestStream.Write(encodedData, 0, encodedData.Length);

        // send the request and get the response
        using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
        {

            // Do something with the response stream. As an example, we'll
            // stream the response to the console via a 256 character buffer
            using (StreamReader reader = new StreamReader(response.GetResponseStream()))
            {
                Char[] buffer = new Char[256];
                int count = reader.Read(buffer, 0, 256);
                while (count > 0)
                {
                    Console.WriteLine(new String(buffer, 0, count));
                    count = reader.Read(buffer, 0, 256);
                }
            } // reader is disposed here
        } // response is disposed here

当然，此代码将返回错误，因为 Google 使用 GET 而不是 POST 进行搜索查询。

如果您正在处理特定的网页，则此方法将起作用，因为 URL 和 POST 数据基本上都是硬编码的。如果你需要一些更动态的东西，你必须：

捕获页面
剥离表单
基于表单字段创建一个 POST 字符串

FWIW，我认为像 Perl 或 Python 这样的东西可能更适合这种类型任务。

编辑：x-www-form-urlencoded

You can programmatically create an Http request and retrieve the response:

 string uri = "http://www.google.com/search";
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
        request.Method = "POST";
        request.ContentType = "application/x-www-form-urlencoded";

        // encode the data to POST:
        string postData = "q=searchterm&hl=en";
        byte[] encodedData = new ASCIIEncoding().GetBytes(postData);
        request.ContentLength = encodedData.Length;

        Stream requestStream = request.GetRequestStream();
        requestStream.Write(encodedData, 0, encodedData.Length);

        // send the request and get the response
        using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
        {

            // Do something with the response stream. As an example, we'll
            // stream the response to the console via a 256 character buffer
            using (StreamReader reader = new StreamReader(response.GetResponseStream()))
            {
                Char[] buffer = new Char[256];
                int count = reader.Read(buffer, 0, 256);
                while (count > 0)
                {
                    Console.WriteLine(new String(buffer, 0, count));
                    count = reader.Read(buffer, 0, 256);
                }
            } // reader is disposed here
        } // response is disposed here

Of course, this code will return an error since Google uses GET, not POST, for search queries.

This method will work if you are dealing with specific web pages, as the URLs and POST data are all basically hard-coded. If you needed something that was a little more dynamic, you'd have to:

Capture the page
Strip out the form
Create a POST string based on the form fields

FWIW, I think something like Perl or Python might be better suited to that sort of task.

edit: x-www-form-urlencoded

回复收藏 0 原文

眉目亦如画i 2024-07-13 16:03:40

您可以尝试 Selenium。使用 Selenium IDE 记录 Firefox 中的操作，以 C# 格式保存脚本，然后使用 Selenium RC C# 包装器回放它们。正如其他人提到的，您也可以使用 System.Net.HttpWebRequest< /a> 或 System.Net.WebClient。如果这是桌面应用程序，另请参阅 System.Windows。 Forms.WebBrowser。

附录：与基于 Java 的 Selenium IDE 和 Selenium RC 类似，WatiN Test Recorder 和 WatiN 基于.NET。

回复收藏 0 原文

不如归去 2024-07-13 16:03:40

您需要做的是不断检索和分析链中每个页面的 html 源代码。对于每个页面，您需要弄清楚表单提交的样子，并发送与该请求匹配的请求以获取链中的下一页。

我所做的是构建一个包装 System.Net.HttpWebRequest/HttpWebResponse 的自定义类，因此检索页面就像使用 System.Net.WebClient 一样简单。然而，我的自定义类还在请求之间保留相同的 cookie 容器，并使发送发布数据、自定义用户代理等变得更容易。

回复收藏 0 原文