设计牛津英语词典

发布于 2024-12-20 15:04:51 字数 89 浏览 0 评论 0原文

在一次采访中有人问我将如何设计《牛津英语词典》。

我告诉他我会使用 TREE 数据结构,但他回答说这会占用大量内存。那么应该使用哪种其他数据结构呢?

I was asked in an interview how I would design the Oxford English Dictionary.

I told him that I'd use a TREE data structure, but he replied that it would take a lot of memory. So which other data structure should be used?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

天冷不及心凉 2024-12-27 15:04:51

我听说过去在手机中用于存储 T9 词典的一种数据结构如下(好吧,这仅解决了关键问题,但没有解决定义存储):

条目已排序,每个条目应以偏移量开始上一个条目应该从哪里继续,也是延续。例如:

apple
4icable
7tion

将解码为 apple、applicable、a​​pplication。然而,这可能与合并链的尝试没有什么不同,请参阅

appl -> e 
     -> ica -> ble
            -> tion

Wikipedia 发现的有向非循环词图 ,它与树的不同之处在于它不仅有分支,而且分支可以合并,其中单词具有相同的后缀。这确实可以成为一个优越的存储空间。

        a
       / \
  pplic   utom
       \ /
      ation

One data structure I heard was used in the past in mobile phones for storing T9 dictionaries is the following (well, this addresses only the key issue, but not the definition storage):

Entries are sorted, and each entry should start with an offset into the previous entry from where it should be continued, and also the continuation. For example:

apple
4icable
7tion

would decode to apple, applicable, application. However this might not be that different from tries with merged chains, see

appl -> e 
     -> ica -> ble
            -> tion

Wikipedia uncovered the Directed acyclic word graph, which differs from trees that it not only branches, but branches can merge, where words have the same suffix. This indeed could be a superior storage.

        a
       / \
  pplic   utom
       \ /
      ation
埖埖迣鎅 2024-12-27 15:04:51

它不会使用很多内存。你的回答很好。也许是 1995 年。认为自己很幸运。

It wouldn't use a lot of memory. Your answer was fine. Maybe in 1995. Consider yourself lucky.

情话难免假 2024-12-27 15:04:51

正如其他人提到的,如果没有足够的屋顶来容纳精心设计的特里树,那么可能也没有空间容纳任何其他类型的索引。由于这是一个面试问题,听起来他试图引导您使用 B 树等经典的外核数据结构。

或者,一个好的回应可能是询问更多信息,例如“您想对此数据结构执行什么类型的操作,以及您需要什么类型的性能?”如果您只想要拼写检查,那么布隆过滤器可能是最有效的“数据结构”......

As others have mentioned, if there isn't enough roof for a well-designed trie, there probably isn't room for any other kind of index, either. Since this is about an interview question, it sounds like he was trying to steer you towards classic out-of-core datastructures like B-trees.

Alternately, a good response might have been to ask for more information, like "what kinds of operations will you want to do on this datastructure, and what kind of performance do you need?" If you just want a spellcheck, then a Bloom filter might be the most efficient "datastructure"...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文