HtmlAgilityPack.HtmlDocument Cookie

发布于 2024-10-30 19:51:03 字数 317 浏览 3 评论 0原文

这与脚本内(可能在脚本标签内)设置的cookie有关。

System.Windows.Forms.HtmlDocument 执行这些脚本,并且可以通过其 Cookies 检索 cookie 集(如 document.cookie=etc...) em> 属性。

我假设 HtmlAgilityPack.HtmlDocument 不会执行此操作(执行)。我想知道是否有一种简单的方法来模拟 System.Windows.Forms.HtmlDocument 功能(cookie 部分)。

有人吗?

This pertains to cookies set inside a script (maybe inside a script tag).

System.Windows.Forms.HtmlDocument executes those scripts and the cookies set (like document.cookie=etc...) can be retrieved through its Cookies property.

I assume HtmlAgilityPack.HtmlDocument doesn't do this (execution). I wonder if there is an easy way to emulate the System.Windows.Forms.HtmlDocument capabilities (the cookies part).

Anyone?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不即不离 2024-11-06 19:51:03

当我需要同时使用CookiesHtmlAgilityPack,或者只是创建自定义请求(例如,设置User-Agent属性等)时,这就是我所做的:

  • 创建一个封装请求/响应的类。让我们称这个类为 WebQuery
  • 在该类中有一个私有的 CookieCollection (在您的情况下为 public)属性 在该类中
  • 创建一个方法来手动执行要求。签名可以是:

...

public HtmlAgilityPack.HtmlDocument GetSource(string url);

我们需要在这个方法中做什么?

好吧,使用 HttpWebRequestHttpWebResponse,生成 http 请求手动(Internet 上有几个关于如何执行此操作的示例),使用接收流的构造函数创建 HtmlDocument 类的实例。

我们必须使用什么流?好吧,返回的是:

httpResponse.GetResponseStream();

如果您使用HttpWebRequest进行查询,您可以轻松设置CookieContainer 它的属性到您每次访问新页面之前声明的变量,这样您访问的站点设置的所有 cookie 都将正确存储在您在WebQuery,考虑您仅使用 WebQuery 类的一个实例。

希望您觉得这个解释有用。考虑一下,使用它,您可以做任何您想做的事情,无论 HtmlAgilityPack 是否支持它。

When I need to use Cookies and HtmlAgilityPack together, or just create custom requests (for example, set the User-Agent property, etc), here is what I do:

  • Create a class that encapsulates the request/response. Let's call this class WebQuery
  • Have a private CookieCollection (in your case public) property inside that class
  • Create a method inside the class that does manually the request. The signature could be:

...

public HtmlAgilityPack.HtmlDocument GetSource(string url);

What do we need to do inside this method?

Well, using HttpWebRequest and HttpWebResponse, generate the http request manually (there are several examples of how to do this on Internet), create an instance of a HtmlDocument class using the constructor that receives an stream.

What stream do we have to use? Well, the one returned by:

httpResponse.GetResponseStream();

If you use HttpWebRequest to make the query, you can easily set the CookieContainer property of it to the variable you declared before everytime you access a new page, and that way all cookies set by the sites you access will be properly stored in the CookieContainer variable you declared in your WebQuery class, taking in count you're using only one instance of the WebQuery class.

Hope you find useful this explanation. Take in count that using this, you can do whatever you want, no matter if HtmlAgilityPack supports it or not.

小兔几 2024-11-06 19:51:03

我还使用了 Rohit Agarwal 的 BrowserSession 类以及 HtmlAgilityPack。
但对我来说,随后的“获取函数”调用不起作用,因为每次都会设置新的 cookie。
这就是为什么我自己添加了一些功能。 (我的解决方案距离完美还有很长的路要走 - 这只是一个快速而肮脏的修复)但对我来说它有效,如果你不想花很多时间来调查 BrowserSession 类是我所做的:

添加/修改的功能如下:

class BrowserSession{
   private bool _isPost;
   private HtmlDocument _htmlDoc;
   public CookieContainer cookiePot;   //<- This is the new CookieContainer

 ...

    public string Get2(string url)
    {
        HtmlWeb web = new HtmlWeb();
        web.UseCookies = true;
        web.PreRequest = new HtmlWeb.PreRequestHandler(OnPreRequest2);
        web.PostResponse = new HtmlWeb.PostResponseHandler(OnAfterResponse2);
        HtmlDocument doc = web.Load(url);
        return doc.DocumentNode.InnerHtml;
    }
    public bool OnPreRequest2(HttpWebRequest request)
    {
        request.CookieContainer = cookiePot;
        return true;
    }
    protected void OnAfterResponse2(HttpWebRequest request, HttpWebResponse response)
    {
        //do nothing
    }
    private void SaveCookiesFrom(HttpWebResponse response)
    {
        if ((response.Cookies.Count > 0))
        {
            if (Cookies == null)
            {
                Cookies = new CookieCollection();
            }    
            Cookies.Add(response.Cookies);
            cookiePot.Add(Cookies);     //-> add the Cookies to the cookiePot
        }
    }

它的作用:它基本上保存了初始的 cookie" Post-Response”并将相同的 CookieContainer 添加到稍后调用的请求中。我不完全理解为什么它在初始版本中不起作用,因为它在 AddCookiesTo 函数中以某种方式执行相同的操作。 (if (Cookies != null && Cookies.Count > 0) request.CookieContainer.Add(Cookies);)
无论如何,有了这些附加功能,它现在应该可以正常工作了。

它可以这样使用:

//initial "Login-procedure"
BrowserSession b = new BrowserSession();
b.Get("http://www.blablubb/login.php");
b.FormElements["username"] = "yourusername";
b.FormElements["password"] = "yourpass";
string response = b.Post("http://www.blablubb/login.php");

所有后续调用都应该使用:

response = b.Get2("http://www.blablubb/secondpageyouwannabrowseto");
response = b.Get2("http://www.blablubb/thirdpageyouwannabrowseto");
...

我希望当您遇到同样的问题时它会有所帮助。

I also worked with Rohit Agarwal's BrowserSession class together with HtmlAgilityPack.
But for me subsequent calls of the "Get-function" didn't work, because every time new cookies have been set.
That's why I added some functions by my own. (My solution is far a way from beeing perfect - it's just a quick and dirty fix) But for me it worked and if you don't want to spent a lot of time in investigating BrowserSession class here is what I did:

The added/modified functions are the following:

class BrowserSession{
   private bool _isPost;
   private HtmlDocument _htmlDoc;
   public CookieContainer cookiePot;   //<- This is the new CookieContainer

 ...

    public string Get2(string url)
    {
        HtmlWeb web = new HtmlWeb();
        web.UseCookies = true;
        web.PreRequest = new HtmlWeb.PreRequestHandler(OnPreRequest2);
        web.PostResponse = new HtmlWeb.PostResponseHandler(OnAfterResponse2);
        HtmlDocument doc = web.Load(url);
        return doc.DocumentNode.InnerHtml;
    }
    public bool OnPreRequest2(HttpWebRequest request)
    {
        request.CookieContainer = cookiePot;
        return true;
    }
    protected void OnAfterResponse2(HttpWebRequest request, HttpWebResponse response)
    {
        //do nothing
    }
    private void SaveCookiesFrom(HttpWebResponse response)
    {
        if ((response.Cookies.Count > 0))
        {
            if (Cookies == null)
            {
                Cookies = new CookieCollection();
            }    
            Cookies.Add(response.Cookies);
            cookiePot.Add(Cookies);     //-> add the Cookies to the cookiePot
        }
    }

What it does: It basically saves the cookies from the initial "Post-Response" and adds the same CookieContainer to the request called later. I do not fully understand why it was not working in the initial version because it somehow does the same in the AddCookiesTo-function. (if (Cookies != null && Cookies.Count > 0) request.CookieContainer.Add(Cookies);)
Anyhow, with these added functions it should work fine now.

It can be used like this:

//initial "Login-procedure"
BrowserSession b = new BrowserSession();
b.Get("http://www.blablubb/login.php");
b.FormElements["username"] = "yourusername";
b.FormElements["password"] = "yourpass";
string response = b.Post("http://www.blablubb/login.php");

all subsequent calls should use:

response = b.Get2("http://www.blablubb/secondpageyouwannabrowseto");
response = b.Get2("http://www.blablubb/thirdpageyouwannabrowseto");
...

I hope it helps when you're facing the same problem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文