Google 从哪里获取显示在搜索结果页面上的每个网站结果的摘要?

发布于 2024-12-08 14:33:39 字数 449 浏览 1 评论 0原文

我正在开发一个项目,其中我必须在搜索引擎上搜索术语,然后根据上下文对结果进行聚类。所以我必须将每个结果视为一个文档。不幸的是,结果页面上每个结果一起显示的数据太少,无法进行聚类。因此,我想知道搜索引擎从哪里获取它们显示的每个结果的摘要。如果我可以获得整个摘要,那么我可以通过将结果视为单独的文档来对结果进行聚类。

谷歌从哪里得到摘要? 例如:如果您在谷歌上搜索“1000英里”,第二个结果将显示以下摘要: “女式 1000 英里系列以经典设计为基础,体现了 Wolverine 制作优质鞋履的悠久传统。与这些经典相得益彰……”

此摘要不存在于 Meta 标签中 页面。

Google 从哪里找到这些数据。

谢谢

I am working on a project in which i have to search for terms on a search engine and then cluster the results on their contextual sense. So i have to treat each result as a document. unfortunately, the data present along with each result on the result page is too little for clustering. Hence, I wanted to know from where the search engines get the abstract for each result that they show. If i could get that entire abstract then i can cluster the results by treating them as separate documents.

From where does google get the abstract ?
For eg: If you search for "1000 Mile" on google, the second result shows the following abstract:
"The women's 1000 Mile Collection is based on classic designs and reflects Wolverine's long heritage of crafting quality footwear. Complementing these classics ..."

This abstract is not present in the Meta tags of the page.

From where does Google find this data.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小…红帽 2024-12-15 14:33:40

Google 是否使用元描述标签作为页面描述?

Google 将从以下位置选择您的搜索结果片段(不一定按此顺序):

  1. 页面的元描述标签
  2. 页面的开放目录项目 (ODP) 列表
  3. 与搜索查询相关的页面内容

如果您希望 Google 使用 ODP 列表的描述,那么您可以能使用以下元标记告诉他们不要这样做:

<meta name="robots" content="NOODP">

如果您想鼓励 Google 使用您的元描述标记,请确保它对于每个页面都是唯一的。还要确保它包含页面内容的准确描述。

如果没有 ODP 描述和元描述标签,Google 将使用页面文本的一部分作为描述。该文本将包含与搜索查询最接近的匹配项。我没有看到任何官方限制这可以持续多长时间,但几句话似乎是正确的。

在相关说明中,如果您不希望在特定页面上显示代码段,您可以使用以下元标记来阻止显示代码段:

<meta name="robots" content="nosnippet">

请参阅 这篇博文介绍了 Google 关于使用元描述标记的提示。

根据 此网站,“元描述通常最多应为 <长度为 145 到 150 个字符,因为这些是通常分别在 Yahoo! 和 Google 上显示的最大字符数。”

From Does Google use the Meta Description Tag for Description of Page?

Google will choose your search results snippets from the following places (not necessarily in this order):

  1. The page's Meta Description tag
  2. The page's Open Directory Project (ODP) Listing
  3. Page content relevant to the search query

If you do not want Google to use the ODP listing's description then you can tell them not to do so with the following Meta tag:

<meta name="robots" content="NOODP">

If you want to encourage Google to use your Meta Description tag then make sure it is unique to each page. Also make sure it contains an accurate description of the page's content.

In thew absence of an ODP description and Meta Description tag, Google will use a portion of the page's text as the description. This text will contain the closest matches to the search query. I have not seen any official limit to how long this can be but a couple of sentences seems about right.

On a related note, if you don't want a snippet to be shown with a particular page you can use the following Meta tag to prevent one from being shown:

<meta name="robots" content="nosnippet">

See this blog post for Google's tips on using the meta description tag.

According to this site, "The meta description should typically be at most 145 to 150 characters in length as these are the maximum number of characters typically displayed at Yahoo! and Google, respectively."

嘦怹 2024-12-15 14:33:40

该网站基于 Flash,Google 可以对 Flash 内容编制索引,因此鉴于该代码段并不像您指出的那样位于页面的 HTML 源代码中,也不位于页面的缓存版本中,我猜测它位于 Flash 影片中的某个位置。

代码片段提到“女子 1000 英里系列”有点武断,而网站链接本身是 1000 英里的父类别,而不仅仅是女子,所以我猜测从 Flash 网站收集代码片段友好的元数据是一门不精确的科学。这是我最好的猜测。

在这篇 Google 网站站长博文中,他们解释了他们如何使用加载到 Flash 影片中的外部文本或 HTML 文件,乔纳森·西蒙 (Jonathan Simon) 在其中一条评论中说道(抱歉):

“我们尽力抓取 Flash 内容,但结果有时可能不太理想。您仅是在网站的搜索结果中看到标题,因为这是 Flash 内容之外唯一的 HTML 文本。您可以添加元描述元素以在 HTML 中提供更多信息。您还可以添加一些其他文本。只需这样做就可以改善您在搜索结果中看到的与您的网站相关的代码段。”

That site is Flash-based, and Google can index Flash content, so given that the snippet isn't in the HTML source of the page as you point out, nor is it in the cached version of the page, I'm guessing that it's somewhere in the Flash movie.

It's kind of arbitrary that the snippet mentions 'The women's 1000 Mile Collection' while the site link itself is to the parent category of 1000 mile, not just women's, so I'm guessing here that gathering snippet-friendly metadata from a Flash site is an imprecise science. That's my best guess.

In this Google Webmaster blog post, they explain how they use external text or HTML files loaded into the Flash movie, and in one of the comments Jonathan Simon says (sorry):

"We try our best to crawl Flash content but the results can sometimes be less than ideal. You are only seeing a title in the search results for your site because that's the only bit of HTML text that you have outside of your Flash content. You could add a Meta description element to offer more information in HTML. You could also add some other text that's not a part of your Flash content. Just doing this should improve the snippet you see associated with your site in the search results."

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文