当前位置：文江博客话题详情

倒排索引和普通旧索引有什么区别？

发布于 2024-12-09 12:39:52 字数 76 浏览 0 评论 0原文

在软件工程中，我们一直在创建索引（例如，在数据库中），但我也听到很多人谈论倒排索引。两者之间有什么本质上的不同吗？它们听起来像是同一件事。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半寸时光 2024-12-16 12:39:52

一种常见用途是“...允许快速全文搜索。”

这两种类型表示方向性。一个带您向前浏览索引，另一个带您向后（反向）浏览索引。就是这样。这里没有什么秘密需要揭开。否则，这两种类型是相同的，问题只是您拥有什么信息，以及您想要查找什么信息。

为了解决您的询问，我我认为实际上没有办法知道为什么使用它是今天的样子。定义哪个是正向、哪个是反向很重要，这样我们就可以就它们进行对话，并且每个人都知道我们在谈论哪个方向关于。想想“左”和“右”这两个词：它们是相对的。哪一个并不重要，只是每个人都需要同意哪一个是“左”，哪一个是“右”才能使这些词有意义。如果作为一种文化，我们决定左右翻转，那么你会遇到同样的问题，因为商定的含义已经改变，所以你会在弄清楚“右转”和“左转”是什么时遇到同样的问题。然而，命名是任意的，所以哪个是哪个（本身）并不重要 - 重要的是我们都同意其含义。

在你的评论中，你问“请不要只定义术语”，你没有抓住要点，我认为你只是在措辞上挂了钩，而它们之间绝对没有区别。

为了方便未来的读者，我现在将提供几个“正向”和“反向”索引示例：

示例 1：Web 搜索

如果您认为反向索引类似于数学中函数的逆，其中逆是具有不同形式的特殊事物，那么您就错了：但这里的情况并非如此。

在搜索引擎中，您有一个文档列表（网站上的页面），您可以在其中输入一些关键字并获取结果。

正向索引（或只是索引）是文档列表，以及其中出现哪些单词。在网络搜索示例中，Google 抓取网络，构建文档列表，找出每个页面中出现的单词。

倒排索引是单词列表，以及其中包含的文档他们出现了。在网络搜索示例中，您提供单词列表（您的搜索查询），然后 Google 生成文档（搜索结果链接）。

它们都是索引——问题只是你要往哪个方向走。正向是从文档->到->单词，反向是从单词->到->文档。

示例 2：DNS

另一个示例是 DNS 查找（获取主机名并返回 IP 地址）和反向查找（获取 IP 地址并返回主机名）。

示例 3：一本书

一本书后面的索引实际上是一个倒排索引，如上面示例所定义 - 单词列表以及在书中的位置。在书中，目录就像一个正向索引：它是本书包含的文档（章节）的列表，除了列出这些部分中的单词之外，目录只是给出这些文档（章节）中包含的内容的名称/一般描述。

示例 4：您的手机

您手机中的转发索引是您的联系人列表，以及与这些联系人关联的电话号码（手机号码、家庭号码、工作号码）。 倒排索引允许您手动输入电话号码，当您点击“拨号”时，您会看到此人的姓名，而不是号码，因为您的手机已获取该电话号码并找到您与其关联的联系人。

One common use is "...to allow fast full-text searching."

The two types denote directionality. One takes you forward through the index, and the other takes you backward (the inverse) through the index. That's it. There's no mystery to uncover here. Otherwise the two types are identical, it's just a question of what information you have, and as a result what information you're trying to find.

To address your inquiry, I don't think there's actually a way to know why the use is what it is today. The only reason it's important to define which is forward and which one is inverted is so that we can all have a conversation about them, and everyone knows which direction we're talking about. Think about the terms "left" and "right": they are relative. Which is which doesn't matter, except that everyone needs to agree which one is "left" and which one is "right" in order for the words to have meaning. If, as a culture, we decided to flip left and right, then you'd have the same issue figuring out what a "right turn" vs a "left turn" is since the agreed upon meaning had changed. However, the naming is arbitrary, so which one is which (in and of itself) doesn't matter - what matters is that we all agree on the meaning.

In your comment where you ask, "please don't just define the terms", you're missing the point, and I think you're just getting hung up on the wording when there is absolutely no difference between them.

For the benefit of future readers, I will now provide several "forward" and "inverted" index examples:

Example 1: Web search

If you're thinking that the inverse of an index is something like the inverse of a function in mathematics, where the inverse is a special thing that has a different form, then you're mistaken: that's not the case here.

In a search engine you have a list of documents (pages on web sites), where you enter some keywords and get results back.

A forward index (or just index) is the list of documents, and which words appear in them. In the web search example, Google crawls the web, building the list of documents, figuring out which words appear in each page.

The inverted index is the list of words, and the documents in which they appear. In the web search example, you provide the list of words (your search query), and Google produces the documents (search result links).

They are both indexes - it's just a question of which direction you're going. Forward is from documents->to->words, inverted is from words->to->documents.

Example 2: DNS

Another example is a DNS lookup (which takes a host name, and returns an IP address) and a reverse lookup (which takes an IP address, and gives you the host name).

Example 3: A book

The index in the back of a book is actually an inverted index, as defined by the examples above - a list of words, and where to find them in the book. In a book, the table of contents is like a forward index: it's a list of documents (chapters) which the book contains, except instead of listing the words in those sections, the table of contents just gives a name/general description of what's contained in those documents (chapters).

Example 4: Your cell phone

The forward index in your cell phone is your list of contacts, and which phone numbers (cell, home, work) are associated with those contacts. The inverted index is what allows you to manually enter a phone number, and when you hit "dial" you see the person's name, rather than the number, because your phone has taken the phone number and found you the contact associated with it.

回复收藏 0 原文

何其悲哀 2024-12-16 12:39:52

他们称其为倒排，只是因为已经有了前向索引。以搜索引擎为例，它由两部分组成：第一部分是“网络爬虫和解析器”，建立从文档到单词的索引，第二部分是搜索数据库，建立从单词到文档的索引。由于第一个索引的存在，我们很自然地将第二个索引称为倒排索引。

如果将一本书的TOC（目录）命名为索引，那么应该将书末尾的索引称为“倒排索引”。或者，在另一方面，您可以将目录称为倒排索引。

回复收藏 0 原文

菩提树下叶撕阳。 2024-12-16 12:39:52

通常，当谈到索引时，您指的是为了加快应用程序的速度而添加的一些计算或存储的过程结果（例如 MySQL 或其他 RDBMS 查阅 MySQL 文档）。索引还可以与缓存等相关。

倒排索引创建的文件的结构主要用于（全文）搜索。

倒排索引由两个主要文件组成：

词汇
出现

词汇中是从文本中提取的常见单词（当然是在过滤代词等黑名单单词之后）。事件文件保存单词和文档之间的连接（word1 出现在 doc1 和 doc2 中，而不是出现在 doc3 中）。它以矩阵的形式表示。

索引过程 - 倒排索引

上图中显示了创建上述两个文件的过程。

如果您对这个问题进一步感兴趣，我可以向您推荐 Ricardo Yated 写的一本好书 - 现代信息检索 (在亚马逊上查看） - 我认为大约第 200 页。

希望它有帮助:-)

回复收藏 0 原文

初见 2024-12-16 12:39:52

正态已经很好地区分了正向索引和倒排索引，但问题是为什么一个叫正向索引，另一个叫倒排索引，也许这就是他们这么叫的原因吧——

以搜索引擎爬行和索引（或者为一本书建立索引）为例，当你建立一个正向索引时，你可以同时建立一个正向索引。是抓取网页（或阅读书籍）或继续。因此，如果您有 10 个网页要抓取（或一本书中的 10 个章节），您可以抓取第一个网页（阅读第一章），然后列出该网页中出现的单词（该章节中出现的单词）并继续此过程适用于其他网页（其他章节），因此，当您抓取了所有 10 个网页（阅读所有 10 章）时，您的正向索引已完成，每个网页（章节）都指向它的单词列表包含。

但是要创建倒排索引，您必须爬行所有 10 个网页（阅读 10 个章节），然后从每个文档列表中取出每个单词，并找出哪些文档包含该单词。 所以这就像在爬行网页后向后退（阅读本书的章节）。所以它被称为倒排索引。

这只是我的猜测。

回复收藏 0 原文

压抑⊿情绪 2024-12-16 12:39:52

术语“倒排词索引”是指关系的变化
包含许多单词的单个文档，每个唯一的单词包含
（或识别）多文档列表。这实际上是采用一对多关系（文档到单词）并将其反转（或反转），以便现在存在一个新的“反转”一对多关系，这是与许多相关的每个唯一单词 -文档（即所有包含该词的文档）。它的起源确实就是这么简单，早在计算机和电子高速索引存在之前，术语“倒排索引”就被用来描述相同类型的手动索引（是的，不可否认，我是一个老了的老程序员，几乎当 COBOL 还是一门闪亮的新语言时，她已经足够大了，可以将 Grace Hopper 视为“可爱的年轻女士”，适合重新追求的年龄）。请暂时不要抛弃我们这些怪人，因为我们偶尔可能会提供一两个有用的、甚至可能有价值的历史花絮——也就是说，当我们的个人 RAM 仍在工作时。 [咧嘴笑]

回复收藏 0 原文