如何跟踪网站内的链接
所以我正在将旧网站转移到新服务器,并尝试在此过程中进行清理。
我正在寻找的是一些脚本或免费软件,它们可以:
a)显示通过网站的路径(以下超链接等),这样我就可以看到哪些链接指向什么
,b)一些软件可以查看哪些html文件是孤立的(未链接到)在文件夹结构中。
任何有关其中一个或两个的帮助将不胜感激:)
so I am transferring an old website to a new server, and attempting cleanup in the process.
What I am looking for is some script or free software that can:
a) show the paths through the website (following hyperlinks, etc), so I can see what links to what
and b) some software than can see which html files are orphans (not linked to) in the folder structure.
Any help with either or both of these would be greatly appreciated :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
http://haveamint.com/ 说明了一切,漂亮的 GUI,简单的集成,轻量级,数据库存储,JavaScript追踪。
有一个薄荷糖(y)
或者你可以使用谷歌分析的女巫现在几乎每个网站都在使用
http://haveamint.com/ says it all, Beautiful GUI, Simple integration, Light Weight, Database Storage, JavaScript Tracking.
Have a mint (y)
Or you can just use Google analytic's witch is pretty much used by every site these days
所以基本上是一个爬虫?您可以将某些内容与 http 库、html 解析器和任何品牌的脚本语言结合在一起。但我不知道有任何现成的脚本。
您的网站是否由纯 html 文件组成,或者是否有某种服务器端技术,例如 PHP?如果是这样,则无法自动检测所述孤立页面,因为它们是作为服务器端应用程序的功能生成的,并且不是实际页面,即使它们可能在浏览器中显示为实际页面。
So basically a crawler? You could whisk something together with an http-library, an html parser and any brand of scripting language. I don't know of any off-the-shelf scripts though.
Does your site consist of plain html files, or is there some sort of server-side technology, such as PHP? If so, there is no way of automatically detecting said orphans, since they are generated as a function of the server side application and aren't actual pages, even though they may appear as such in a browser.
a) 根据您网站的复杂性和内容的动态程度,您可以下载任何蜘蛛并将其限制到您的网站并检查结果(“burp suite”包含一个非常好的蜘蛛,并且是每个人都应该知道的工具) 。
b) 蜘蛛完成工作后,检查网站目录中所有文件的访问时间,任何访问时间早于蜘蛛执行时间的文件都可能是孤立文件。
(这两种解决方案在使用用户输入来引用页面的网站上都不太有效)
a) depending on the complexity of your site and how dynamic the content is you can download any spider and restrict it to your wevsite and check the results("burp suite" contains a pretty good spider and is alltogether a tool that everyone should know).
b) after the spider have done its work check the access time of all the files in your wevsites directory any file that has an access time older than the spider execution time is probably an orphan.
(both solutions will be less effective on a website that use user input to reffer to pages)
home.snafu.de/tilman/xenulink.html (Xenulink) 提供链接抓取,并且通过 FTP 访问,提供孤立文件检查。
home.snafu.de/tilman/xenulink.html (Xenulink) provides link spidering, and, with FTP access, orphan file checking.