如何从屏幕上抓取网络邮件页面?
我正在做一个项目,其中我需要登录网站并抓取网页内容。 我尝试了以下代码:
protected void Page_Load(object sender, EventArgs e)
{
WebClient webClient = new WebClient();
string strUrl = "http://www.mail.yahoo.com?username=sakthivel123&password=operator&login=1";
byte[] reqHTML;
reqHTML = webClient.DownloadData(strUrl);
UTF8Encoding objUTF8 = new UTF8Encoding();
Label1.Text = objUTF8.GetString(reqHTML1);
}
这会抓取邮件的登录页面。 但我需要抓取我的收件箱详细信息。 请指导我如何进一步进行,提前致谢。
I am doing a project, in which i need to login into a site and scrape the webpage contents. i tried the following code:
protected void Page_Load(object sender, EventArgs e)
{
WebClient webClient = new WebClient();
string strUrl = "http://www.mail.yahoo.com?username=sakthivel123&password=operator&login=1";
byte[] reqHTML;
reqHTML = webClient.DownloadData(strUrl);
UTF8Encoding objUTF8 = new UTF8Encoding();
Label1.Text = objUTF8.GetString(reqHTML1);
}
This scrapes the login page of the mail . But i need to scrape my inbox details. Please instruct me on how to proceed further, thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
请参阅此问题 以及相关问题。 我们必须先研究网页的 HTML 源代码,然后才能正确地废弃它。 因此,手动登录并获取收件箱页面的来源,然后研究它以抓取它。
为什么不使用 yahoo 的网络邮件API? 这是一个更好的解决方案。
Please see this questions and the related questions. We have to study the HTML source of a webpage before we can scrap it properly. So login manually and get the source of the inbox page and then study it to scrape it.
Why dont you use yahoo's webmail API? Which is a better solution.
请参阅此问题 - 编写扫描的 C# 程序电子商务网站并从中提取产品图片+价格+描述
PS:这称为“抓取”,执行屏幕抓取的行为将被称为(你猜对了!)“屏幕抓取”。 “scrap”一词用作动词时意味着丢弃 - 例如“该项目已被废弃!” ;-)
See this question - Writing a C# program that scans ecommerce website and extracts products pictures + prices + description from them
P.S.: It's called "scrape" and the act of performing a screen scrape would be called (You guessed it!) "Screen scraping". The word "scrap" when used as a verb means to discard - Such as "the project has been scrapped!" ;-)
我建议您首先使用名为 Fiddler 的工具来分析目标站点和浏览器之间的通信。 您可以查看所有 http 标头、cookie、内容等。
一旦您的 webClient 对象能够复制浏览器的操作(包括登录、设置适当的 cookie 等),您就可以自动化该过程。
最后,一旦获得所需的 HTML,就可以使用正则表达式从中提取所需的信息。
I'd suggest you first use a tool called Fiddler to analize the communication between the target site and your browser. You can look at all the http headers, cookies, content,etc.
Once your webClient object is able to replicate the actions of a browser, including logging in, setting the appropriate cookies, etc, you can automate the procedure.
And finally, once you have the desired HTML, use regular expressions to extract the information you want from it.