ASP.NET。隐藏整个网站结构。是否可以?
我想其他人都希望相反,但我需要隐藏 ASP.NET 网站的“目录”结构。
因此,我正在考虑
按如下方式使用 robots.txt...
用户代理:* 禁止: /
使用 Url 重写,创建幽灵路径
禁用目录浏览。 (目录列表被拒绝...)
使用 .ashx 提供图像。
您可能建议的其他方式。
换句话说,我不希望有人拥有“下载器结构阅读器”,剥夺我的网站。
正如你所看到的,安全标签丢失了:)
PS 我不关心 SEO
I guess all everybody else would like the opposite but i need to hide the "directory" structure of an ASP.NET website.
For this reason i am thinking of
Use the robots.txt as follows...
User-agent: *
Disallow: /Use Url rewriting, to make ghost paths
Disable directory browsing. (Directory listing denied...)
Use .ashx to serve images.
Other ways that you may suggest.
In other words i would not like someone with a "downloader-structrure reader", strip my site.
As you see the sekurity tag is missing :)
P.S. I do not care about SEO
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您希望用户能够浏览您的网站,那么显然您需要其他页面的链接。为了(尝试)使这些链接对爬虫来说变得困难,您可以尝试在 JavaScript 中动态呈现所有链接。这意味着不像浏览器那样进行完整 DOM 渲染的机器人将无法提取链接。然而,当然有人可以看看您的网站并构建一些可以解析链接的东西,如果他们愿意的话。
关于向用户和/或机器人隐藏目录结构,那么是的,您必须实现某种 url 重写:
否则他们将能够检查浏览器中的链接(无论是静态还是动态呈现)并确定目录结构。
If you want users to be able to browse your site, you're obviously going to need links to other pages. To (attempt to) make these links difficult for a crawler, you could try rendering all your links dynamically in javascript. This means a robot which doesn't do full DOM rendering like a browser wont be able to extract the links. However of course someone could take a look at your site and build something which does parse out the links if they were so inclined.
With respects to hiding directory structure from users and/or bots, then yes you'll have to implement some kind of url rewriting:
Else they'll be able to inspect links (whether static or dynamically rendered) in their browser and determine directory structure.
像
wget -r
这样的网站下载器无论如何都可以工作。它遵循链接并且不关心目录(除了您可以限制深度这一事实)。如果您想排除像 Google 这样的合法爬虫,使用 robots.txt 就可以了。 wget 和流氓爬虫并不关心它。唯一真正好的解决方案是要么需要登录(但这仍然不能保护您免受那些只使用 wget 下载整个网站的人的侵害;他们只会向其提供登录信息/会话 ID)或混乱您的内容带有烦人的验证码(合法用户很快就会生气),或者使整个网站使用 JavaScript/AJAX 来显示/加载内容。这使得用户体验更好(如果做得正确的话),并有效地锁定了大多数/所有爬虫。
A site downloader like
wget -r
will work anyway. It follows links and doesn't care about directories (except the fact that you can limit the depth). If you want to exclude legit crawlers like Google, using robots.txt is fine. wget and rogue crawlers don't care about it though.The only really good solution is either requriring a login (but that still doesn't protect you against those people who'd just use wget to download your whole site; they'll just provide it with the login information/session id) or cluttering your content with annoying CAPTCHAs (pisses of legit users quickly) or making the whole site use JavaScript/AJAX to display/load content. That makes the user experience even better (if it's done properly) and effectively locks out most/all crawlers.