为 ASP.NET MVC 站点创建 robots.txt
我正在为我的网站创建一个 robots.txt 文件,但查看我的项目结构,我不确定要禁止什么。
我是否需要禁止标准 .NET MVC 目录和文件,例如 /App_Data、/web.config、/Controllers、/Models、/Global.asax?或者那些还没有被索引?
/bin 和 /obj 这样的目录怎么样?
如果我想禁止某个页面,我是否禁止 /Views/MyPage/Index.cshtml 或 /MyPage?
另外,在 robots.txt 文件中指定站点地图时,我可以使用我的 Web.sitemap,还是需要使用不同的 xml 文件?
I'm creating a robots.txt file for my website, but looking through my project structure, I'm not sure what to disallow.
Do I need to disallow standard .NET MVC directories and files like /App_Data, /web.config, /Controllers, /Models, /Global.asax? Or will those not be indexed already?
What about directories like /bin and /obj?
If I want to disallow a page, do I disallow /Views/MyPage/Index.cshtml, or /MyPage?
Also, when specifying the sitemap in the robots.txt file, can I use my Web.sitemap, or does it need to be a different xml file?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
“robots.txt”是指从网络爬虫中公开看到的路径。
爬网程序没有什么特别之处:它只是像用户一样使用 HTTP 从您的网站请求页面。
因此,假设您的 MVC 站点已正确配置,像
/web.config
这样的文件或您提到的路径将不会对外界可见,因为 IIS 和您的应用程序都不会配置为为它们提供服务。即使它被指向这些文件,蜘蛛也会收到 404 Not Found 并继续。同样,您的
.cshtml
或.aspx
内容文件也不会带有这些扩展名。相反,网络爬虫将准确地看到您将向用户展示的内容。'robots.txt' refers to paths as they are publically seen from Web crawlers.
There's nothing particularly special about a crawler: it merely uses HTTP to request pages from your site precisely like a user does.
So, given that your MVC site is properly configured, files like
/web.config
or the paths you mention won't be visible to the outside world as neither IIS nor your application will be configured to serve them. Even if it was pointed to those files the spider would receive a 404 Not Found and continue.Similarly, your
.cshtml
or.aspx
content files won't be seen with those extensions. Rather, a Web crawler will see precisely what you'll show to users.