谷歌如何知道你正在伪装?
我似乎找不到任何有关 Google 如何确定您是否隐藏您的内容。从技术角度来看,您认为他们是如何确定这一点的?他们是否发送了 googlebot 之外的其他内容并将其与 googlebot 结果进行比较?他们有人类团队进行比较吗?或者他们可以以某种方式告诉您已经检查了用户代理并执行了不同的代码路径,因为您在名称中看到了“googlebot”?
这与 legitimate url cloaking for seo 上的这个问题有关。如果文本内容完全相同,但渲染不同(1995 年风格的 html、ajax、flash),那么隐藏真的存在问题吗?
感谢您穿上这件衣服。
I can't seem to find any information on how google determines if you are cloaking your content. How, from a technical standpoint, do you think they are determining this? Are they sending in things other than the googlebot and comparing it to the googlebot results? Do they have a team of human beings comparing? Or can they somehow tell that you have checked the user agent and executed a different code path because you saw "googlebot" in the name?
It's in relation to this question on legitimate url cloaking for seo. If textual content is exactly the same, but the rendering is different (1995-style html vs. ajax vs. flash), is there really a problem with cloaking?
Thanks for your put on this one.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
据我所知,谷歌如何准备搜索引擎结果是秘密的,并且不断变化。欺骗不同的用户代理很容易,所以他们可能会这样做。对于 Javascript,它们还可能实际呈现部分或整个页面。 “他们有人类团队进行比较吗?”这是值得怀疑的。关于 Google 抓取策略的文章已经有很多,包括这个,但如果涉及到人类,他们只在特殊情况下才会被召唤。我什至怀疑这一点:所花费的人力很可能都花在了调整爬行引擎上。
As far as I know, how Google prepares search engine results is secret and constantly changing. Spoofing different user-agents is easy, so they might do that. They also might, in the case of Javascript, actually render partial or entire pages. "Do they have a team of human beings comparing?" This is doubtful. A lot has been written on Google's crawling strategies including this, but if humans are involved, they're only called in for specific cases. I even doubt this: any person-power spent is probably spent by tweaking the crawling engine.
Google 在显示除 googlebot 之外的用户代理时会查看您的网站。
Google looks at your site while presenting user-agent's other than googlebot.
请参阅 Google Chrome 漫画书第 11 页,其中有描述(甚至比外行人的更好)条款)有关 Google 工具如何获取网页示意图的信息。他们可以使用这种或类似的技术进行谷歌搜索索引和斗篷检测——至少这将是它的另一个很好的用途。
See the Google Chrome comic book page 11 where it describes (even better than layman's terms) about how a Google tool can take a schematic of a web page. They could be using this or similar technology for Google search indexing and cloak detection - at least that would be another good use for it.
谷歌确实雇佣承包商(间接地,通过外部机构,以非常低的工资)来手动审查作为搜索结果返回的文档,并判断它们与搜索词的相关性、翻译质量等。我非常怀疑这是他们唯一的工具检测隐形,但它是其中之一。
Google does hire contractors (indirectly, through an outside agency, for very low pay) to manually review documents returned as search results and judge their relevance to the search terms, quality of translations, etc. I highly doubt that this is their only tool for detecting cloaking, but it is one of them.
事实上,谷歌的许多算法都被简单地颠倒过来,而且离火箭科学还很远。在所谓的“伪装检测”的情况下,所有先前的猜测都是关于金钱的(除了有点讽刺的是,约翰·K哈哈)如果你不相信我设置了一些测试站点(输入)和一些'隐藏测试用例'(进一步输入),将您的网站提交给谷歌叔叔(处理)并通过伪先进的基于人类的认知相关量子感知来测试您的非假设(< - 顺便说一句,我弥补了娱乐价值(现在我嵌套括号来真正扰乱你的思想:))又名“检查谷歌结果以查看你是否被禁止”(输出)。循环直到启蒙== True(菜鸟!)哈哈
In reality, many of Google's algos are trivially reversed and are far from rocket science. In the case of, so called, "cloaking detection" all of the previous guesses are on the money (apart from, somewhat ironically, John K lol) If you don't believe me set up some test sites (inputs) and some 'cloaking test cases' (further inputs), submit your sites to uncle Google (processing) and test your non-assumptions via pseudo-advanced human-based cognitive correlationary quantum perceptions (<-- btw, i made that up for entertainment value (and now i'm nesting parentheses to really mess with your mind :)) AKA "checking google resuts to see if you are banned yet" (outputs). Loop until enlightenment == True (noob!) lol
一个非常简单的测试是将 Googlbot 看到的网页文件大小与看起来像普通用户的 Google 别名用户扫描的页面文件大小进行比较。
这将发现最可疑的候选人进行更仔细的检查。
A very simple test would be to compare the file size of a webpage the Googlbot saw against the file size of the page scanned by an alias user of Google that looks like a normal user.
This would detect most suspect candidates for closeer examination.
他们使用curl之类的工具调用您的页面,并在没有用户代理的情况下根据页面构建哈希,然后使用googlebot用户代理构建另一个哈希。两个哈希值必须相似,它们有算法来检查哈希值并知道其是否隐藏
They call your page using tools like curl and they construct a hash based on the page without the user agent, then they construct another hash with the googlebot user-agent. Both hashes must be similars, they have algorithms to check the hashes and know if its cloaking or not