当前位置：文江博客话题详情

计算文件中的唯一单词？好的线性搜索替代方案？

发布于 2024-09-15 16:47:21 字数 146 浏览 11 评论 0原文

我正在使用一种幼稚的方法来解决这个问题，我将单词放入链接列表中，然后对其进行线性搜索。但处理大文件会花费太多时间。

我正在考虑使用二叉搜索树，但我不知道它是否适用于字符串。也听说过Skip Lists，还没真正学过。

而且我还必须使用C语言...

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

弃爱 2024-09-22 16:47:21

您可以将所有单词放入 trie 中，然后计算单词数处理了整个文件。

回复收藏 0 原文

旧城空念 2024-09-22 16:47:21

二叉搜索树对于字符串来说效果很好。

如果您不关心单词的排序顺序，则可以仅使用哈希表。

回复收藏 0 原文

夜清冷一曲。 2024-09-22 16:47:21

您正在计算文件中唯一单词的数量吗？

你为什么不构造一个简单的哈希表？这样，对于列表中的每个单词，将其添加到哈希表中。任何重复项都将被丢弃，因为它们已经在哈希表中 - 最后，您可以计算数据结构中的元素数量（通过存储计数器并在每次添加到表时递增它）。

回复收藏 0 原文

汹涌人海 2024-09-22 16:47:21

算法的第一次升级可能是对列表进行排序，因此，您的线性搜索可能会更快（您只搜索直到找到一个比您的元素大的元素），但这仍然是一个幼稚的解决方案。

最好的方法是二叉搜索树，甚至更好的是前缀树（或特里树，在其他答案中已经提到）。

在 K&R 的“C 编程语言”中，您可以找到所需的确切示例。
“自动引用数据结构”（6.5）的第一个示例是二叉搜索树，用于计算字符串中每个单词的出现次数。（你不需要数：P）

结构是这样的：

struct tnode {
        char *word;
        struct tnode *left;
        struct tnode *right;
};

在书中你可以看到你想要做的整个例子。

二叉搜索树适用于任何可以接受顺序的数据结构，并且比列表中的线性搜索更好。

抱歉我的英语不好，如果我说错了，请纠正我，我对 C 非常菜鸟：p

编辑： 我无法向其他答案添加评论，但我已阅读OP 的评论说“列表没有排序，所以我不能使用二分搜索”。在链表上使用二分查找是无稽之谈。为什么？当对随机元素的访问速度很快时（就像在数组中一样），二分搜索是有效的。在双链表中，最差的访问次数将是 n/2。但是，您可以在列表中放置很多指针（访问关键元素），但这是一个糟糕的解决方案。

The first upgrade to your algorithm could be having the list sorted, so, your lineal search could be faster (you only search until you find one element greater than yours), but this is still a naive solution.

Best approaches are Binary Search Trees and even better, a prefix tree (or trie, already mentioned in other answer).

In "The C Programming Language" From K&R you have the exact example of what you are looking for.
The first example of "autoreferenced data structs" (6.5) is a binary search tree used for counting the ocurrences of every word in a string. (You don't need to count :P)

the structure is something like this:

struct tnode {
        char *word;
        struct tnode *left;
        struct tnode *right;
};

In the book you can see the whole example of what you want to do.

Binary Search Trees works good with any tipe of data structure that can accept an order, and will be better than a lineal search in a list.

Sorry for my poor english, and correct me if i was wrong with something I've said, Im very noob with C :p

EDIT: I can't add comments to other answers, but I have read a coment from OP saying "The list isn't sorted so I can't use binary search". It is nonsense to use binary search on a linked list. ¿Why? Binary Search is efficient when the access to a random element is fast, like in an array. In a double linked list, your worst access will be n/2.. However, you can put a lot of pointers in the list (accesing to key elements), but it is a bad solution..

回复收藏 0 原文