如何从屏幕上抓取网络邮件页面?

发布于 2024-07-16 15:28:50 字数 496 浏览 10 评论 0原文

我正在做一个项目,其中我需要登录网站并抓取网页内容。 我尝试了以下代码:

protected void Page_Load(object sender, EventArgs e)
{
    WebClient webClient = new WebClient();
    string strUrl = "http://www.mail.yahoo.com?username=sakthivel123&password=operator&login=1";
    byte[] reqHTML;
    reqHTML = webClient.DownloadData(strUrl);
    UTF8Encoding objUTF8 = new UTF8Encoding();
    Label1.Text = objUTF8.GetString(reqHTML1);
}

这会抓取邮件的登录页面。 但我需要抓取我的收件箱详细信息。 请指导我如何进一步进行,提前致谢。

I am doing a project, in which i need to login into a site and scrape the webpage contents. i tried the following code:

protected void Page_Load(object sender, EventArgs e)
{
    WebClient webClient = new WebClient();
    string strUrl = "http://www.mail.yahoo.com?username=sakthivel123&password=operator&login=1";
    byte[] reqHTML;
    reqHTML = webClient.DownloadData(strUrl);
    UTF8Encoding objUTF8 = new UTF8Encoding();
    Label1.Text = objUTF8.GetString(reqHTML1);
}

This scrapes the login page of the mail . But i need to scrape my inbox details. Please instruct me on how to proceed further, thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

寂寞笑我太脆弱 2024-07-23 15:28:50

Please see this questions and the related questions. We have to study the HTML source of a webpage before we can scrap it properly. So login manually and get the source of the inbox page and then study it to scrape it.

Why dont you use yahoo's webmail API? Which is a better solution.

つ低調成傷 2024-07-23 15:28:50

请参阅此问题 - 编写扫描的 C# 程序电子商务网站并从中提取产品图片+价格+描述

PS:这称为“抓取”,执行屏幕抓取的行为将被称为(你猜对了!)“屏幕抓取”。 “scrap”一词用作动词时意味着丢弃 - 例如“该项目已被废弃!” ;-)

See this question - Writing a C# program that scans ecommerce website and extracts products pictures + prices + description from them

P.S.: It's called "scrape" and the act of performing a screen scrape would be called (You guessed it!) "Screen scraping". The word "scrap" when used as a verb means to discard - Such as "the project has been scrapped!" ;-)

梅倚清风 2024-07-23 15:28:50

我建议您首先使用名为 Fiddler 的工具来分析目标站点和浏览器之间的通信。 您可以查看所有 http 标头、cookie、内容等。

一旦您的 webClient 对象能够复制浏览器的操作(包括登录、设置适当的 cookie 等),您就可以自动化该过程。

最后,一旦获得所需的 HTML,就可以使用正则表达式从中提取所需的信息。

I'd suggest you first use a tool called Fiddler to analize the communication between the target site and your browser. You can look at all the http headers, cookies, content,etc.

Once your webClient object is able to replicate the actions of a browser, including logging in, setting the appropriate cookies, etc, you can automate the procedure.

And finally, once you have the desired HTML, use regular expressions to extract the information you want from it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文