jsoup 发布和 cookie

发布于 2024-11-16 13:30:18 字数 451 浏览 5 评论 0原文

我正在尝试使用 jsoup 登录网站,然后抓取信息,我遇到了问题,我可以成功登录并从 index.php 创建文档,但我无法获取网站上的其他页面。我知道我需要在发布后设置一个 cookie,然后在尝试打开网站上的另一个页面时加载它。但我该怎么做呢?下面的代码让我登录并获取index.php

Document doc = Jsoup.connect("http://www.example.com/login.php")
               .data("username", "myUsername", 
                     "password", "myPassword")
               .post();

我知道我可以使用apache httpclient 来执行此操作,但我不想这样做。

I'm trying to use jsoup to login to a site and then scrape information, I am running into in a problem, I can login successfully and create a Document from index.php but I cannot get other pages on the site. I know I need to set a cookie after I post and then load it when I'm trying to open another page on the site. But how do I do this? The following code lets me login and get index.php

Document doc = Jsoup.connect("http://www.example.com/login.php")
               .data("username", "myUsername", 
                     "password", "myPassword")
               .post();

I know I can use apache httpclient to do this but I don't want to.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

薆情海 2024-11-23 13:30:18

当您登录该站点时,它可能会设置一个授权会话 cookie,需要在后续请求中发送该 cookie 以维持会话。

您可以像这样获取 cookie:

Connection.Response res = Jsoup.connect("http://www.example.com/login.php")
    .data("username", "myUsername", "password", "myPassword")
    .method(Method.POST)
    .execute();

Document doc = res.parse();
String sessionId = res.cookie("SESSIONID"); // you will need to check what the right cookie name is

然后在下一个请求中发送它,如下所示:

Document doc2 = Jsoup.connect("http://www.example.com/otherPage")
    .cookie("SESSIONID", sessionId)
    .get();

When you login to the site, it is probably setting an authorised session cookie that needs to be sent on subsequent requests to maintain the session.

You can get the cookie like this:

Connection.Response res = Jsoup.connect("http://www.example.com/login.php")
    .data("username", "myUsername", "password", "myPassword")
    .method(Method.POST)
    .execute();

Document doc = res.parse();
String sessionId = res.cookie("SESSIONID"); // you will need to check what the right cookie name is

And then send it on the next request like:

Document doc2 = Jsoup.connect("http://www.example.com/otherPage")
    .cookie("SESSIONID", sessionId)
    .get();
昇り龍 2024-11-23 13:30:18
//This will get you the response.
Response res = Jsoup
    .connect("loginPageUrl")
    .data("loginField", "[email protected]", "passField", "pass1234")
    .method(Method.POST)
    .execute();

//This will get you cookies
Map<String, String> loginCookies = res.cookies();

//And this is the easiest way I've found to remain in session
Document doc = Jsoup.connect("urlYouNeedToBeLoggedInToAccess")
      .cookies(loginCookies)
      .get();
//This will get you the response.
Response res = Jsoup
    .connect("loginPageUrl")
    .data("loginField", "[email protected]", "passField", "pass1234")
    .method(Method.POST)
    .execute();

//This will get you cookies
Map<String, String> loginCookies = res.cookies();

//And this is the easiest way I've found to remain in session
Document doc = Jsoup.connect("urlYouNeedToBeLoggedInToAccess")
      .cookies(loginCookies)
      .get();
迷爱 2024-11-23 13:30:18

代码在哪里:

Document doc = Jsoup.connect("urlYouNeedToBeLoggedInToAccess").cookies().get(); 

我遇到了困难,直到我将其更改为:

Document doc = Jsoup.connect("urlYouNeedToBeLoggedInToAccess").cookies(cookies).get();

现在它可以完美地工作。

Where the code was:

Document doc = Jsoup.connect("urlYouNeedToBeLoggedInToAccess").cookies().get(); 

I was having difficulties until I changed it to:

Document doc = Jsoup.connect("urlYouNeedToBeLoggedInToAccess").cookies(cookies).get();

Now it is working flawlessly.

峩卟喜欢 2024-11-23 13:30:18

您可以尝试以下方法...

import org.jsoup.Connection;


Connection.Response res = null;
    try {
        res = Jsoup
                .connect("http://www.example.com/login.php")
                .data("username", "your login id", "password", "your password")
                .method(Connection.Method.POST)
                .execute();
    } catch (IOException e) {
        e.printStackTrace();
    }

现在保存您所有的 cookie 并向您想要的其他页面发出请求。

//Store Cookies
cookies = res.cookies();

向另一个页面发出请求。

try {
    Document doc = Jsoup.connect("your-second-page-link").cookies(cookies).get();
}
catch(Exception e){
    e.printStackTrace();
}

询问是否需要进一步帮助。

Here is what you can try...

import org.jsoup.Connection;


Connection.Response res = null;
    try {
        res = Jsoup
                .connect("http://www.example.com/login.php")
                .data("username", "your login id", "password", "your password")
                .method(Connection.Method.POST)
                .execute();
    } catch (IOException e) {
        e.printStackTrace();
    }

Now save all your cookies and make request to the other page you want.

//Store Cookies
cookies = res.cookies();

Making request to another page.

try {
    Document doc = Jsoup.connect("your-second-page-link").cookies(cookies).get();
}
catch(Exception e){
    e.printStackTrace();
}

Ask if further help needed.

玩物 2024-11-23 13:30:18
Connection.Response res = Jsoup.connect("http://www.example.com/login.php")
    .data("username", "myUsername")
    .data("password", "myPassword")
    .method(Connection.Method.POST)
    .execute();
//Connecting to the server with login details
Document doc = res.parse();
//This will give the redirected file
Map<String,String> cooki=res.cookies();
//This gives the cookies stored into cooki
Document docs= Jsoup.connect("http://www.example.com/otherPage")
    .cookies(cooki)
    .get();
//This gives the data of the required website
Connection.Response res = Jsoup.connect("http://www.example.com/login.php")
    .data("username", "myUsername")
    .data("password", "myPassword")
    .method(Connection.Method.POST)
    .execute();
//Connecting to the server with login details
Document doc = res.parse();
//This will give the redirected file
Map<String,String> cooki=res.cookies();
//This gives the cookies stored into cooki
Document docs= Jsoup.connect("http://www.example.com/otherPage")
    .cookies(cooki)
    .get();
//This gives the data of the required website
若沐 2024-11-23 13:30:18

为什么要重新连接?
如果有任何 cookie 可以避免 403 状态,我就会这样做。

                Document doc = null;
                int statusCode = -1;
                String statusMessage = null;
                String strHTML = null;
        
                try {
    // connect one time.                
                    Connection con = Jsoup.connect(urlString);
    // get response.
                    Connection.Response res = con.execute();        
    // get cookies
                    Map<String, String> loginCookies = res.cookies();

    // print cookie content and status message
                    if (loginCookies != null) {
                        for (Map.Entry<String, String> entry : loginCookies.entrySet()) {
                            System.out.println(entry.getKey() + ":" + entry.getValue().toString() + "\n");
                        }
                    }
        
                    statusCode = res.statusCode();
                    statusMessage = res.statusMessage();
                    System.out.print("Status CODE\n" + statusCode + "\n\n");
                    System.out.print("Status Message\n" + statusMessage + "\n\n");
        
    // set login cookies to connection here
                    con.cookies(loginCookies).userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0");
        
    // now do whatever you want, get document for example
                    doc = con.get();
    // get HTML
                    strHTML = doc.head().html();

                } catch (org.jsoup.HttpStatusException hse) {
                    hse.printStackTrace();
                } catch (IOException ioe) {
                    ioe.printStackTrace();
                }

Why reconnect?
if there are any cookies to avoid 403 Status i do so.

                Document doc = null;
                int statusCode = -1;
                String statusMessage = null;
                String strHTML = null;
        
                try {
    // connect one time.                
                    Connection con = Jsoup.connect(urlString);
    // get response.
                    Connection.Response res = con.execute();        
    // get cookies
                    Map<String, String> loginCookies = res.cookies();

    // print cookie content and status message
                    if (loginCookies != null) {
                        for (Map.Entry<String, String> entry : loginCookies.entrySet()) {
                            System.out.println(entry.getKey() + ":" + entry.getValue().toString() + "\n");
                        }
                    }
        
                    statusCode = res.statusCode();
                    statusMessage = res.statusMessage();
                    System.out.print("Status CODE\n" + statusCode + "\n\n");
                    System.out.print("Status Message\n" + statusMessage + "\n\n");
        
    // set login cookies to connection here
                    con.cookies(loginCookies).userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0");
        
    // now do whatever you want, get document for example
                    doc = con.get();
    // get HTML
                    strHTML = doc.head().html();

                } catch (org.jsoup.HttpStatusException hse) {
                    hse.printStackTrace();
                } catch (IOException ioe) {
                    ioe.printStackTrace();
                }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文