查找给定域的每个页面
是否有任何用于 Ruby 的工具/库,当给定域名时,将返回该域中所有页面的列表?
Is there any tool/library for Ruby that, when given a domain name, will return a list of all the pages at that domain?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你可以使用 Anemone,它是一个 Ruby 网络蜘蛛框架。它需要 Nokogiri 作为依赖项,因为它需要解析 (X)HTML。
You could use Anemone, it is a Ruby web spider framework. It requires Nokogiri as a dependency, since it needs to parse the (X)HTML.
如果站点不是静态 HTML 页面的集合,则枚举是一项艰巨的任务。一旦您进入任何类型的服务器端脚本,返回的“页面”可能会严重依赖于您的会话状态。一个明显的例子是只有在登录后才能访问的页面或资源。因此,许多自动枚举工具(通常是 Web 应用程序安全审核程序的一部分)会出错并错过站点的大部分内容。我在这里的观点是,枚举通常不仅仅只是运行一个工具。
好消息是,编写自己的枚举器非常容易,只要您只需在网站上浏览即可获得一些知识,该枚举器就能很好地工作。我使用 Mechanize 编写了类似的内容,它可以在您请求页面时轻松跟踪您的历史记录。因此,让 Mechanize 设置您需要的服务器端状态(即登录)然后访问您找到的每个链接是一项非常简单的任务。只需请求首页或您需要的任何“列表”页面并保留一系列链接即可。迭代此链接列表,如果该链接不在历史记录中,则转到该链接并将链接列表存储在该页面上。重复此操作,直到链接列表为空。
但就像我说的,这完全取决于服务器端发生的情况。可能有些页面未链接到您或您无法访问,您将无法通过这种方式发现这些页面。
Enumeration is a difficult task if a site is anything other than a collection of static HTML pages. Once you get into server-side scripting of any kind, the "page" returned can rely heavily on the state of your session. An obvious example would be pages or resources only accessible after you log in. Because of this, many automated enumeration tools (usually part of web application security auditing programs) get it wrong and miss large portions of the site. My point here is that there is often more to enumeration than simply running a tool.
The good news is that it's quite easy to write your own enumerator that works well given a bit of knowledge you can obtain mostly from just poking around on a site. I wrote something similar using Mechanize, which handily tracks your history as you request pages. So it's a pretty simple task of getting Mechanize to set up the server-side state you need (namely, logging in) and then visiting every link you find. Simply request the front page, or any "list" pages that you need and keep an array of links. Iterate over this list of links and, if the link is not in the history, go to that link and store the list of links on that page. Repeat until the list of links is empty.
But like I said, it all depends on what's happening server-side. There may be pages that aren't linked to, or aren't accessible by you that you won't be able to discover this way.