如何防止 robots.txt 从暂存环境传递到生产环境?
过去,我们的一位 IT 专家意外地将 robots.txt 从生产环境中移出。 阻止谷歌和其他人在生产中对我们客户的网站建立索引。 有没有好的方法来处理这种情况?
提前致谢。
I had happen in the past that one of our IT Specialist will move the robots.txt from staging from production accidentally. Blocking google and others from indexing our customers' site in production. Is there a good way of managing this situation?
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
请您的 IT 人员将 robots.txt 上的文件权限更改为所有用户的“只读”,这样就需要执行以下额外步骤:
Ask your IT guys to change the file permissions on robots.txt to "read-only" for all users, so that it takes the extra steps of:
作为一名 SEO 人员,我感受到你的痛苦。
如果我错了,请原谅我,但我假设问题是由于您的登台服务器上有 robots.txt 引起的,因为您需要阻止整个登台环境,防止搜索引擎查找和抓取它。
如果是这种情况,我建议您将临时环境放置在内部,这样就不会出现问题。 (用于分段的 Intranet 类型或网络配置)。 这可以避免许多搜索引擎因内容被抓取而出现的问题,例如,他们不小心从您的暂存中删除了 robots.txt 文件,并抓取了重复的网站并编制了索引。
如果这不是一种选择,建议将暂存放置在服务器上的一个文件夹中,例如domain.com/staging/,并仅使用根文件夹中的一个robots.txt 文件来完全阻止该/staging/ 文件夹。 这样,您就不需要使用两个文件,并且您可以在晚上睡觉时知道另一个 robots.txt 不会取代您的文件。
如果这不是一个选项,也许要求他们将其添加到清单中以不移动该文件? 你只需要检查一下——少一点睡眠,但多一点预防措施。
As an SEO, I feel your pain.
Forgive me if I'm wrong, but I'm assuming that the problem is caused because there is a robots.txt on your staging server because you need to block the whole staging environment from the search engines finding and crawling it.
If this is the case, I would suggest your staging environment be placed internally where this isn't an issue. (Intranet-type or network configuration for staging). This can save a lot of search engine issues with that content getting crawled say, for instance, they deleted that robots.txt file from your Staging by accident and get a duplicate site crawled and indexed.
If that isn't an option, recommend staging to be placed in a folder on the server like domain.com/staging/ and use just one robots.txt file in the root folder to block out that /staging/ folder entirely. This way, you don't need to be using two files and you can sleep at night knowing another robots.txt won't be replacing yours.
If THAT isn't an option, maybe ask them to add it to their checklist to NOT move that file? You will just have to check this - A little less sleep, but a little more precaution.
创建部署脚本来移动各种工件(网页、图像、支持文件等),并让 IT 人员通过运行脚本来完成移动。 请确保不要在该脚本中包含 robots.txt。
Create a deployment script to move the various artifacts (web pages, images, supporting files, etc) and have the IT guy do the move by running your script. Be sure not to include robots.txt in that script.
我会在生产服务器上设置代码,该服务器将生产 robots.txt 保存在另一个位置,并让它监视正在使用的服务器。
如果它们不同,那么我会立即用生产版本覆盖正在使用的版本。 那么即使它被覆盖也没关系,因为坏版本不会长期存在。 在 UNIX 环境中,我会使用 cron 定期执行此操作。
I'd set up code on the production server which held the production robots.txt in another location and have it monitor the one that's in use.
If they're different, then I'd immediately overwrite the in-use one with the production version. Then it wouldn't matter if it gets overwritten since the bad version won't exists for long. In a UNIX environment, I'd do this periodically with cron.
为什么您的暂存环境不在防火墙后面且不公开暴露?
问题不在于Robots.txt...问题在于您的网络基础设施。
Why is your staging environment not behind a firewall and not publicly exposed?
The problem is not Robots.txt...The problem is your network infrastructure.