关于搜索引擎:它们如何截取网站的屏幕截图?
这可能是一个愚蠢的问题,但我真的不知道而且我很好奇!所以请耐心听我说。
据我所知,搜索引擎只是读取 HTML 和网站中的文字。他们通常会忽略 CSS 或其中的一部分。他们可以说无法阅读图像。他们有吗?
如果他们真的不能或忽略阅读这些内容,那么我的问题是他们如何制作屏幕截图,这是一个按照 CSS 制作方式呈现的页面,并且有图像。
如果他们不阅读 CSS、图像,他们也不喜欢人类在他或她的屏幕上打开它们。他们如何制作屏幕截图?
谢谢!
This may be a dumb question, but I really have no idea and I'm utterly curious! So please bear with me.
What I know is search engines just read HTML and words in a site. They usually ignore CSS or part of it. They arguably cannot read images. Do they?
If they really cannot or ignore to read those, then my question is how do they make screenshot, which is a page that is presented just the way as CSS makes it, and has images.
If they do not read CSS, images, and they also do not like human being to open it in his or her screen. How do they make the screenshot?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您指的是 Google 新的屏幕截图功能,还是旧的缓存功能?您的问题是在谈论屏幕截图,根本没有提到缓存,但您对问题的评论似乎暗示您指的是缓存,而不是屏幕截图。
就屏幕截图而言:
您是正确的,搜索引擎通常只读取网站上的 HTML 和文本,因为这就是它们所需要的。但这并不意味着他们不能。
当他们想要截取网站的屏幕截图时,他们只会执行用户访问该网站时普通浏览器所做的操作。下载网站、CSS、图像和其他所有内容,并使用 Web 浏览器的渲染引擎(例如 WebKit)渲染它。
在缓存的情况下:
搜索引擎通常只存储 HTML,而不/在解析它之前。它将保存的 HTML 发送到您的浏览器,然后您的浏览器从原始网站提取页面中的所有其他内容(图像等)。搜索引擎不会读取任何内容,它只是逐字保存页面(好吧,进行一些细微的更改,即 URL 重写),然后将其提供给您的浏览器。
Are you referring to Google's new screenshot feature, or their old cache feature? Your question is talking about screenshots and doesn't mention the cache at all, but your comments on your question seem to imply that you're referring to the cache, not the screenshots.
In the case of the screenshots:
You are correct in that search engines usually only read the HTML and text on a website, because that's all they need. But that doesn't mean they can't.
When they want to take a screenshot of a site, they'll just do exactly what a normal browser does when a user visits the site. Download the website, the CSS, the images, and everything else, and render it with the rendering engine of a web browser, such as WebKit.
In the case of the cache:
The search engine usually just stores the HTML without/before parsing it. It sends the saved HTML to your browser, and your browser pulls all the other stuff in the page (images, etc) from the original website. The search engine isn't reading anything, it's just saving the page verbatim (well, with minor changes, namely URL rewriting), and giving it to your browser.
有些应用程序可以截取页面屏幕截图,就像显示在所选浏览器中一样。
Browershot 是执行此操作的在线服务的示例。
以下是网页缩略图生成器的一些链接和项目:
There are apps that takes screenshot of pages as if displayed in a chosen browser.
Browershot is an example of online service that does it.
Here are some links and projects of webpage thumbnail generator:
也许我不明白你的问题,但是......
你似乎使用“读取图像”来表示将数据从图像加载到搜索引擎。搜索引擎确实这样做(包括 CSS)。当人们说搜索引擎忽略图像时,他们的意思是它不将它们视为有意义的可搜索数据。换句话说,如果我制作了一张带有“Hello”一词的图像,那么您和我就“阅读”了它,因为我们看到并理解了该图像包含一个单词。搜索引擎通常不会尝试这样做,但是如果搜索引擎希望稍后能够将图像呈现给用户,则搜索引擎会将图像“读取”到其存储中。
Maybe I'm not understanding your question, but...
You seem to be using "read an image" to mean load the data from the image to the search engine. This the search engine does do (including CSS). When people say search engines ignore images they mean it doesn't see them as meaningful searchable data. In other words if I make an image that has the word "Hello" on it you and I "read" it in the sense that we see and understand that the image contains a word. A search engine typically will not attempt to do this, the search engine will however "read" the image into its storage if it wants to have the ability to present that to a user at a later time.
搜索引擎不使用 CSS 和图像内容进行索引,但他们可以将它们存储在服务器上以制作网站的缓存版本。
就 google 而言,我认为他们只存储文本文件,因此 HTML、CSS,也许还有 javascript,但不存储图像。
Search engine don't use the CSS and image content for indexing but they can store them on their servers to make a cached version of the site.
In the case of google I think they store only text files, so HTML, CSS, maybe javascript but no images.