jsoup 可以处理元刷新重定向吗

发布于 2024-12-03 09:40:50 字数 937 浏览 0 评论 0原文

我在使用 jsoup 时遇到问题,我想做的是从 url 获取文档,该文档将根据元刷新 url 重定向到另一个 url,但该 url 不起作用,以便清楚地解释我是否输入名为 http://www.amerisourcebergendrug.com 将自动重定向到 http://www.amerisourcebergendrug.com/abcdrug/ 取决于元刷新网址,但我的 jsoup 仍然存在使用 http://www.amerisourcebergendrug.com 并且不从 http://www.amerisourcebergendrug.com/abcdrug/

Document doc = Jsoup.connect("http://www.amerisourcebergendrug.com").get();

我也尝试过使用,

Document doc = Jsoup.connect("http://www.amerisourcebergendrug.com").followRedirects(true).get();

但两者都不起作用

任何解决这个问题的方法?

更新: 页面可以使用元刷新重定向方法

I have a problem using jsoup what I am trying to do is fetch a document from the url which will redirect to another url based on meta refresh url which is not working, to explain clearly if I am entering a website url named http://www.amerisourcebergendrug.com which will automatically redirect to http://www.amerisourcebergendrug.com/abcdrug/ depending upon the meta refresh url but my jsoup is still sticking with http://www.amerisourcebergendrug.com and not redirecting and fetching from http://www.amerisourcebergendrug.com/abcdrug/

Document doc = Jsoup.connect("http://www.amerisourcebergendrug.com").get();

I have also tried using,

Document doc = Jsoup.connect("http://www.amerisourcebergendrug.com").followRedirects(true).get();

but both are not working

Any workaround for this?

Update:
The Page may use meta refresh redirect methods

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦魇绽荼蘼 2024-12-10 09:40:50

更新(不区分大小写且具有良好的容错能力)


public static void main(String[] args) throws Exception {

    URI uri = URI.create("http://www.amerisourcebergendrug.com");

    Document d = Jsoup.connect(uri.toString()).get();

    for (Element refresh : d.select("html head meta[http-equiv=refresh]")) {

        Matcher m = Pattern.compile("(?si)\\d+;\\s*url=(.+)|\\d+")
                           .matcher(refresh.attr("content"));

        // find the first one that is valid
        if (m.matches()) {
            if (m.group(1) != null)
                d = Jsoup.connect(uri.resolve(m.group(1)).toString()).get();
            break;
        }
    }
}

正确输出:

http://www.amerisourcebergendrug.com/abcdrug/

旧答案:

您确定它不起作用吗?对我来说:

System.out.println(Jsoup.connect("http://www.ibm.com").get().baseUri());

..正确输出 http://www.ibm.com/us/en/ ..

Update (case insensitive and pretty fault tolerant)


public static void main(String[] args) throws Exception {

    URI uri = URI.create("http://www.amerisourcebergendrug.com");

    Document d = Jsoup.connect(uri.toString()).get();

    for (Element refresh : d.select("html head meta[http-equiv=refresh]")) {

        Matcher m = Pattern.compile("(?si)\\d+;\\s*url=(.+)|\\d+")
                           .matcher(refresh.attr("content"));

        // find the first one that is valid
        if (m.matches()) {
            if (m.group(1) != null)
                d = Jsoup.connect(uri.resolve(m.group(1)).toString()).get();
            break;
        }
    }
}

Outputs correctly:

http://www.amerisourcebergendrug.com/abcdrug/

Old answer:

Are you sure that it isn't working. For me:

System.out.println(Jsoup.connect("http://www.ibm.com").get().baseUri());

.. outputs http://www.ibm.com/us/en/ correctly..

套路撩心 2024-12-10 09:40:50

有更好的错误处理和区分大小写的问题

try
{
    Document doc = Jsoup.connect("http://www.ibm.com").get();
    Elements meta = doc.select("html head meta");
    if (meta != null)
    {
        String lvHttpEquiv = meta.attr("http-equiv");
        if (lvHttpEquiv != null && lvHttpEquiv.toLowerCase().contains("refresh"))
        {
            String lvContent = meta.attr("content");
            if (lvContent != null)
            {
                String[] lvContentArray = lvContent.split("=");
                if (lvContentArray.length > 1)
                    doc = Jsoup.connect(lvContentArray[1]).get();
            }
        }
    }

    // get page title
    return doc.title();

}
catch (IOException e)
{
    e.printStackTrace();
}

to have a better error handling and case sensitivity problem

try
{
    Document doc = Jsoup.connect("http://www.ibm.com").get();
    Elements meta = doc.select("html head meta");
    if (meta != null)
    {
        String lvHttpEquiv = meta.attr("http-equiv");
        if (lvHttpEquiv != null && lvHttpEquiv.toLowerCase().contains("refresh"))
        {
            String lvContent = meta.attr("content");
            if (lvContent != null)
            {
                String[] lvContentArray = lvContent.split("=");
                if (lvContentArray.length > 1)
                    doc = Jsoup.connect(lvContentArray[1]).get();
            }
        }
    }

    // get page title
    return doc.title();

}
catch (IOException e)
{
    e.printStackTrace();
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文