robots.txt 限制搜索引擎索引指定关键字以保护隐私

发布于 2024-08-13 03:00:00 字数 456 浏览 6 评论 0原文

我有一个很大的个人姓名目录以及通用的公开信息和类别特定信息,我希望在搜索引擎中尽可能多地索引这些信息。人们并不关心在网站上列出这些名字,但有些人在“谷歌”自己时不想出现在搜索结果中。

我们希望继续在页面中列出这些名称,并且仍然索引该页面,但不在搜索引擎中索引指定的名称或关键字。

可以逐页完成吗?还是设置两个页面是更好的解决方法:

可用选项:

  • PHP 可以审查关键字
  • 如果 user-agent=robot/search engine htaccess, strong> 将机器人限制为未经审查的内容,但允许第二个审查版本
  • 元标记定义不索引的单词?
  • JavaScript 可能会对机器人隐藏关键字,但仍可查看

I have a large directory of individual names along with generic publicaly available and category specific information that I want indexed as much as possible in search engines. Listing these names on the site itself is not a concern to people but some don't want to be in search results when they "Google" themselves.

We want to continue listing these names within a page AND still index the page BUT not index specified names or keywords in search engines.

Can this be done page-by-page or would setting up two pages be a better work around:

Options available:

  • PHP can censor keywords if user-agent=robot/search engine
  • htaccess to restrict robots to non-censored content, but allowing to a second censored version
  • meta tags defining words not to index ?
  • JavaScript could hide keywords from robots but otherwise viewable

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

深海不蓝 2024-08-20 03:00:00

我将仔细检查这些选项并告诉您一些我可以看到的问题:

PHP: 如果您不介意信任用户代理,这会很好地工作。我不确定一些搜索引擎将如何对其机器人显示的不同内容做出反应。

htaccess:您可能需要将机器人重定向到其他页面。您可以使用 url 参数,但这与使用纯 PHP 解决方案没有什么不同。机器人会索引其重定向到的页面,而不是您希望访问的页面。您也许可以使用重写引擎来解决这个问题。

元标记:即使您可以使用元标记让机器人忽略某些单词,也不能保证搜索引擎不会忽略它,因为元标记没有设定的“标准” 。但这并不重要,因为我没有任何方法让机器人使用元标签忽略某些单词或短语。

JavaScript:我听说过的机器人在查看页面时都不会执行(甚至读取)JavaScript,所以我认为这不起作用。您可以向使用 JavaScript 的用户显示您想要隐藏的内容,机器人将无法看到它,但禁用 JavaScript 的用户也看不到它。

我会走 PHP 路线。

I will go through the options and tell you some problems I can see:

PHP: If you don't mind trusting user agent this will work well. I am unsure how some search engines will react to different content being displayed for their bots.

htaccess: You would probably need to redirect the bot to a different page. You could use the url parameters but this would be no different then using a pure PHP solution. The bot would index the page it is redirected to and not the page you wish to visit. You may be able to use the rewrite engine to over come this.

meta tags: Even if you could use meta tags to get the bot to ignore certain words, it wouldn't guarantee that search engines won't ignore it since there is no set "standard" for meta tags. But that doesn't matter since I don't no of any way to get a bot to ignore certain words or phrases using meta tags.

JavaScript: No bot I have ever heard of executes (or even reads) JavaScript when looking at a page, so I don't see this working. You could display the content you want hidden to the users using JavaScript and bots won't be able to see it but neither will users who have JavaScript disabled.

I would go the PHP route.

岁月打碎记忆 2024-08-20 03:00:00

您可以通过添加 ROBOTS 元 告诉机器人跳过索引特定页面:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

更新:限制的方法我能想到的特定单词的索引是:

  1. 使用JS将它们添加到页面(见下文)。
  2. 将模块添加到服务器,该模块将从呈现的页面中删除这些单词。

JavaScript 可能是这样的:

<p>
  <span id="secretWord">
    <SCRIPT TYPE="text/javascript">
    <!-- 
       document.write('you can protect the word by concating strings/having HEX codes etc')
    //-->
    </script>
  </span>
</p>

服务器模块可能是最好的选择。在 ASP.NET 中,做到这一点应该相当容易。但不确定 PHP。

You can tell robots to skip indexing particular page by adding ROBOTS meta:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

UPDATE: The ways to restrict indexing of particular words I can think of are:

  1. Use JS to add those to the page (see below).
  2. Add module to the server that would strip those words from the rendered page.

JavaScript could be something like this:

<p>
  <span id="secretWord">
    <SCRIPT TYPE="text/javascript">
    <!-- 
       document.write('you can protect the word by concating strings/having HEX codes etc')
    //-->
    </script>
  </span>
</p>

The server module is probably best option. In ASP.NET it should be fairly easy to do that. Not sure about PHP though.

一指流沙 2024-08-20 03:00:00

从您的帖子中不清楚的是,您是否想要保护您的姓名和关键字免受 Google 或所有搜索引擎的侵害。谷歌总体表现良好。您可以使用 ROBOTS 元标记 来防止该页面被索引。但它不会阻止忽略 ROBOTS 标签的搜索引擎为您的网站建立索引。

您未建议的其他方法:

  • 使用客户端 JavaScript 获取页面内容。
  • 强制用户在显示文本之前执行验证码。我推荐 reCAPTCHA 包,它很容易使用。

在所有这些方法中,reCAPTCHA 方法可能是最好的,因为它还可以防止行为不端的蜘蛛。但这对您的用户来说是最繁重的。

What's not clear from your posting is whether you want to protect your names and keywords against Google, or against all search engines. Google is general well-behaved. You can use the ROBOTS meta tag to prevent that page from being indexed. But it won't prevent search engines that ignore the ROBOTS tags from indexing your site.

Other approaches you did not suggest:

  • Having the content of the page fetched with client-side JavaScript.
  • Force the user to execute a CAPTCHA before displaying the text. I recommend the reCAPTCHA package, which is easy to use.

Of all these, the reCAPTCHA approach is probably the best, as it will also protect against ilbehaved spiders. But it is the most onerous on your users.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文