Erlang：这个 trie 实现最错误的地方是什么？

发布于 2024-08-16 06:35:06 字数 960 浏览 6 评论 0 原文

假期里，我的家人喜欢玩Boggle。问题是，我的Boggle 很糟糕。所以我做了任何优秀程序员都会做的事情：编写一个程序来给我玩。

该算法的核心是一个简单的前缀 trie，其中每个节点都是一个 dict 对下一个字母的引用。

这是 trie:add 实现：

add([], Trie) ->
    dict:store(stop, true, Trie);

add([Ch|Rest], Trie) ->
    % setdefault(Key, Default, Dict) ->
    %     case dict:find(Key, Dict) of
    %         { ok, Val } -> { Dict, Val }
    %         error -> { dict:new(), Default }
    %     end.
    { NewTrie, SubTrie } = setdefault(Ch, dict:new(), Trie),
    NewSubTrie = add(Rest, SubTrie),
    dict:store(Ch, NewSubTrie, NewTrie).

您可以在此处看到其余部分，以及如何使用它的示例（在底部）：

http://gist.github.com/263513

现在，这是我在 Erlang 中的第一个严肃的程序，我我知道它可能有很多问题……但我最关心的是它使用800 MB RAM。

那么，我做的最错的是什么？我怎样才能让它少一点错误呢？

原文

Over the holidays, my family loves to play Boggle. Problem is, I'm terrible at Boggle. So I did what any good programmer would do: wrote a program to play for me.

At the core of the algorithm is a simple prefix trie, where each node is a dict of references to the next letters.

This is the trie:add implementation:

add([], Trie) ->
    dict:store(stop, true, Trie);

add([Ch|Rest], Trie) ->
    % setdefault(Key, Default, Dict) ->
    %     case dict:find(Key, Dict) of
    %         { ok, Val } -> { Dict, Val }
    %         error -> { dict:new(), Default }
    %     end.
    { NewTrie, SubTrie } = setdefault(Ch, dict:new(), Trie),
    NewSubTrie = add(Rest, SubTrie),
    dict:store(Ch, NewSubTrie, NewTrie).

And you can see the rest, along with an example of how it's used (at the bottom), here:

http://gist.github.com/263513

Now, this being my first serious program in Erlang, I know there are probably a bunch of things wrong with it… But my immediate concern is that it uses 800 megabytes of RAM.

So, what am I doing most-wrong? And how might I make it a bit less-wrong?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

新一帅帅 2024-08-23 06:35:06

您可以通过简单地将单词存储在 ets 表中来实现此功能：

% create table; add words
> ets:new(words, [named_table, set]).
> ets:insert(words, [{"zed"}]).  
> ets:insert(words, [{"zebra"}]).    

% check if word exists
> ets:lookup(words, "zed").          
[{"zed"}]

% check if "ze" has a continuation among the words
78> ets:match(words, {"ze" ++ '$1'}).
[["d"],["bra"]]

如果 trie 是必须的，但您可以接受非功能性方法，那么您可以尝试有向图，正如 Paul 已经建议的那样。

如果您想保持功能，您可以通过使用使用较少内存的结构来节省一些字节的内存，例如 proplist 或记录，例如 -record(node, {a,b,....,x,y,z})。

You could implement this functionality by simply storing the words in an ets table:

% create table; add words
> ets:new(words, [named_table, set]).
> ets:insert(words, [{"zed"}]).  
> ets:insert(words, [{"zebra"}]).    

% check if word exists
> ets:lookup(words, "zed").          
[{"zed"}]

% check if "ze" has a continuation among the words
78> ets:match(words, {"ze" ++ '$1'}).
[["d"],["bra"]]

If trie is a must, but you can live with a non-functional approach, then you can try digraphs, as Paul already suggested.

If you want to stay functional, you might save some bytes of memory by using structures using less memory, for example proplists, or records, such as -record(node, {a,b,....,x,y,z}).

回复收藏 0 原文

若相惜即相离 2024-08-23 06:35:06

我不记得一个字典需要多少内存，但让我们估计一下。您有 2.5e6 个字符和 2e5 个单词。如果你的字典树根本没有共享，那么字典中将需要 2.7e6 个关联（每个字符和每个“停止”符号一个）。一个简单的纯功能字典表示可能每个关联 4 个单词——它可能会更少，但我正在尝试获得一个上限。在 64 位计算机上，这需要 8*4*270 万字节，即 86 兆字节。这只是 800M 的十分之一，所以这里肯定有问题。

更新： dict.erl 表示带有哈希表的字典；当你有很多非常小的命令时，这意味着大量的开销，就像你所做的那样。我会尝试更改您的代码以使用 proplists 模块，它应该与我上面的计算相匹配。