提取所有输入参数的 JSP 页面爬虫

发布于 2024-10-09 00:42:56 字数 220 浏览 0 评论 0原文

您是否知道有一个开源 Java 组件,它提供了扫描一组动态页面 (JSP) 的功能,然后从那里提取所有输入参数。当然,爬虫可以爬取静态代码,而不是动态代码,但我的想法是将其扩展为爬取网络服务器,包括所有服务器端代码。当然,我假设该工具可以完全访问爬行的网络服务器,而不是使用任何黑客手段。

这个想法是构建一个静态分析器,能够检测所有动态页面中的所有参数(request.getParameter() 等)字段。

Do you happen to know of an opensource Java component that provides the facility to scan a set of dynamic pages (JSP) and then extract all the input parameters from there. Of course, a crawler would be able to crawl static code and not dynamic code, but my idea here is to extend it to crawl a webserver including all the server-side code. Naturally, I am assuming that the tool will have full access to the crawled webserver and not by using any hacks.

The idea is to build a static analyzer that has the capacity to detect all parameters (request.getParameter() and such) fields from all dynamic pages.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

吻风 2024-10-16 00:42:56

这个想法是构建一个静态分析器,能够检测所有动态页面的所有参数字段。

您不能使用网络爬虫(基本上是HTML解析器)来提取请求参数。他们最多可以扫描 HTML 结构。例如,您可以使用 Jsoup 来实现:

for (Element form : Jsoup.connect("http://google.com").get().select("form")) {
    System.out.printf("Form found: action=%s, method=%s%n", form.attr("action"), form.attr("method"));
    for (Element input : form.select("input,select,textarea")) {
        System.out.printf("\tInput found: name=%s, value=%s%n", input.attr("name"), input.attr("value"));
    }
}

当前打印

Form found: action=, method=
    Input found: name=hl, value=en
    Input found: name=source, value=hp
    Input found: name=ie, value=ISO-8859-1
    Input found: name=q, value=
    Input found: name=btnG, value=Google Search
    Input found: name=btnI, value=I'm Feeling Lucky
    Input found: name=, value=
Form found: action=/search, method=
    Input found: name=hl, value=en
    Input found: name=source, value=hp
    Input found: name=ie, value=ISO-8859-1
    Input found: name=q, value=
    Input found: name=btnG, value=Google Search
    Input found: name=btnI, value=I'm Feeling Lucky

如果您想扫描 JSP 源代码Filter。

Map<String, String[]> params = request.getParameterMap();
// ...

The idea is to build a static analyzer that has the capacity to detect all parameter fields from all dynamic pages.

You cannot use a web crawler (basically, a HTML parser) to extract request parameters. They can at highest scan the HTML structure. You can use for example Jsoup for this:

for (Element form : Jsoup.connect("http://google.com").get().select("form")) {
    System.out.printf("Form found: action=%s, method=%s%n", form.attr("action"), form.attr("method"));
    for (Element input : form.select("input,select,textarea")) {
        System.out.printf("\tInput found: name=%s, value=%s%n", input.attr("name"), input.attr("value"));
    }
}

This prints currently

Form found: action=, method=
    Input found: name=hl, value=en
    Input found: name=source, value=hp
    Input found: name=ie, value=ISO-8859-1
    Input found: name=q, value=
    Input found: name=btnG, value=Google Search
    Input found: name=btnI, value=I'm Feeling Lucky
    Input found: name=, value=
Form found: action=/search, method=
    Input found: name=hl, value=en
    Input found: name=source, value=hp
    Input found: name=ie, value=ISO-8859-1
    Input found: name=q, value=
    Input found: name=btnG, value=Google Search
    Input found: name=btnI, value=I'm Feeling Lucky

If you want to scan the JSP source code for any forms/inputs, then you have to look in a different direction, it's definitely not to be called "web crawler". Unfortunately no such static analysis tool comes to mind. Closest what you can get is to create a Filter which logs all submitted request parameters.

Map<String, String[]> params = request.getParameterMap();
// ...
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文