如何获取 Web 服务器中的 servlet 参数列表?

发布于 2024-12-08 11:46:26 字数 321 浏览 0 评论 0原文

我正在开发一个 Web 数据挖掘项目,通过抓取服务器页面直接从 HTML 中提取信息。我的努力只集中在一个特定的网站上,该网站有一个java网络服务器,安装了caucho树脂。

参数是通过url中的值对传递的,例如 www.xxxxxx.com/jm/search?act=see&id=909&... 我已经通过尝试解码了许多参数,但是当然,结果来得很慢。

我的问题是...Java 大师知道如何获取此类服务器的所有有效参数吗?有可能吗?

我无法访问服务器,而且我对考乔树脂一无所知,我正在用 Java 编写一个实用程序来完成这项工作。

I'm working on a project of web data mining to extract information directly from HTML by crawling server pages. My effort is concentrated only in an specific website which has a java web server, with caucho resin installed.

Parameters are passed by value pairs in url, like www.xxxxxx.com/jm/search?act=see&id=909&... I have decoded many parameters by try but of course, results are comming very slowly.

My question is... do you Java Gurus know how to get all valid parameters of this kind of server? it is possible?

I don't have access to server and I don't know nothing about caucho resin, I'm coding an utility in Java to do the job.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

土豪 2024-12-15 11:46:26

除非您正在通信的服务器发布了完整的 API,否则可以有任意数量的参数。考虑一下这一点——Web 表单可能不会发布服务器响应的所有参数,例如内部使用的参数等。

由于参数处理是在远离“公众”视线的情况下实现的,因此在服务器端,它对外界是不透明的。

如果您指的是参数的可能,答案基本上是相同的。例如,亚马逊有多少个有效的产品SKU?

(另请注意,最好调用这些“请求参数”,因为 servlet 也有“初始化参数”,这是一个完全不同的问题:)

Unless the server you're communicating with publishes a complete API, there can be any number of parameters. Consider this--a web form may not post all the parameters the server responds to, like parameters for internal usage, etc.

Since parameter handling is implemented away from "public" eyes, on the server side, it is opaque to the outside world.

If you're referring to the possible values of the parameters, the answer is basically the same. For example, how many valid product SKUs does Amazon have?

(Also note that it might be better to call these "request parameters", as servlets also have "init parameters", which is an entirely different question :)

冷…雨湿花 2024-12-15 11:46:26

参数是否有效不是由Web 服务器定义的。它由自定义 servlet 代码本身定义。反过来,它通常在功能需求和/或技术规范文档中定义,也可能在自定义 servlet 生成的 javadoc 中定义。

您最好的选择是联系网站的所有者/维护者以获取此信息。如果您不能或可能不会,那么您可能正在做一些违反网站政策的事情。您至少可以在提交到此 servlet 的任何公共 HTML 表单的输入元素中找到所有有效的参数名称


更新:根据您的评论:

我说的是参数而不是值。通过查看“隐藏”标签的 HTML 源代码,我确实设法找到了其中的许多标签,但这些并不是唯一的标签,因为我能够通过反复试验找到更多的标签。

只需使用 FirebugFiddler跟踪真实网络浏览器发出的 HTTP 请求。您将获得所有参数,这些参数以名称=值对的形式发送到一个漂亮的表中。无需反复试验。

Whether a parameter is valid is not something which is definied by the web server. It's definied by the custom servlet code itself. It's in turn usually definied in a functional requirement and/or technical specification document and probably also in the generated javadoc of the custom servlet.

Your best bet is to contact the owner/maintainer of the website for this information. If you can not or may not, then you're probably doing something which violates the website's policy. You can at least find all valid parameter names in the input elements of any public HTML form which submits to this servlet.


Update: as per your comment:

I'm talking about parameters not values. I did manage to find many of them by looking at HTML source code for "hidden" tags, but those are not the only ones, as I was able to find more of them by trial and error.

Just use Firebug or Fiddler to track HTTP requests made by a real webbrowser. You'll get a all parameters which are been sent in a nice table with name=value pairs. No need for trial'n'error.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文