是否有使用链接列表的countsorter的已知实现

发布于 2025-02-06 16:00:48 字数 1952 浏览 2 评论 0 原文

是否有针对伯爵夫人的已知实现或类似的实现,至少与我在此处所做的链接列表一样高效:

https://github.com/tfreiling989/count_sorter

如果是这样,是否有已知的Python类/库?

Countsorter

这是一个可以在计数上对项目进行排序时可以有效计数项目的类。它使用一个双重链接列表(我从

方法签名和运行时间

countsorter类的两个公共方法的运行时如下

    • “”“”
      添加了一个项目。
      运行时:O(1)
      ”“”
  • def get_mast_frequent(self,num_items:int) - > ordereddict [any int,int]:
    • “”“”
      获取最常见的 num_items 项目及其计数以排序顺序(最大到最小) 示例结果:[(“ C”,6),(“ B”,5),(“ A”,5)] 运行时:o(num_items)
      ”“”

它的工作原理 - 理论

  • 使用双重链接列表,其中列表中的每个节点都有计数和该计数的一组项目。
  • 还有一个hashmap,每个项目对于每个项目,将当前节点的指针存储了。
  • ​o(1)
  • 当调用get_mast_frequent时,它可以收集列表末尾的n个项目
  • 。 解决方案

c = CountSorter()
c.add_occurrence("a")
c.add_occurrence("a")
c.add_occurrence("a")
c.add_occurrence("b")
c.add_occurrence("c")
c.add_occurrence("b")
print(c.get_most_frequent(2).items())

最快

是 在get_mast_frequent方法中 基准结果,在添加每个项目后调用get_mast_frequent:

method, num_items, get_n_most_frequent, time
Counter, 10000, 2, 10000, 2.556185007095337
CountSorter, 10000, 2, 10000, 4.463066577911377
Counter, 10000, 2, 1000, 6.818758010864258
CountSorter, 10000, 2, 1000, 2.0864181518554688
Counter, 10000, 2, 100, 2.4893338680267334
CountSorter, 10000, 2, 100, 0.2403562068939209
Counter, 10000, 2, 10, 1.6655514240264893
CountSorter, 10000, 2, 10, 0.05485248565673828
Counter, 10000, 2, 1, 1.0242605209350586
CountSorter, 10000, 2, 1, 0.039891719818115234
Counter, 100000, 2, 1, 13.034128665924072
CountSorter, 100000, 2, 1, 0.7719311714172363

请记住,肯定会使用该python解决方案可以使用。 有关更多详细信息,请参见Benchmark.py

Is there a known implementation for a CountSorter or similar that is at least as efficient as one using a linked list as I have done here:

https://github.com/tfreiling989/count_sorter

If so, is there a known python class/library for it?

CountSorter

This is a class that can efficiently count items while sorting the items by their count. It uses a doubly linked list (a slightly modified version of the one I grabbed from https://favtutor.com/blogs/doubly-linked-list-python )

Method Signatures and Runtimes

The runtime of the 2 public methods of the CountSorter class are as follows:

  • def add_occurrence(self, item):
    • """
      Adds an occurrence of an item.
      Runtime: O(1)
      """
  • def get_most_frequent(self, num_items:int) -> OrderedDict[Any,int]:
    • """
      Get the most frequent num_items items along with their counts in sorted order (largest to smallest)
      Example Result: [("c", 6), ("b",5) , ("a", 5)]
      Runtime: O(num_items)
      """

How It Works - Theory

  • It uses a doubly linked list where each node in the list has a count and the set of items with that count.
  • There is also a hashmap that for each item, a pointer to its current node is stored. This allows O(1) lookups.
  • When add_occurence is called for an item, it moves the item to the next node with the current count + 1. If that node doesn't exist, it will create it. This is also constant time O(1)
  • When calling get_most_frequent, it simply can collect the n items that are at the end of the list. This is O(n) where n is the number of items requested.
  • Theoretically, in a big O time complexity sense, this is the fastest possible solution.

Example Usage:

c = CountSorter()
c.add_occurrence("a")
c.add_occurrence("a")
c.add_occurrence("a")
c.add_occurrence("b")
c.add_occurrence("c")
c.add_occurrence("b")
print(c.get_most_frequent(2).items())

Would be expected to print: [("a",3),("b",2)]

Benchmark

I compared this solution with collections.Counter, and this solution is much faster assuming num_items in get_most_frequent method < total number of items
Benchmark results, calling the get_most_frequent after each item is added:

method, num_items, get_n_most_frequent, time
Counter, 10000, 2, 10000, 2.556185007095337
CountSorter, 10000, 2, 10000, 4.463066577911377
Counter, 10000, 2, 1000, 6.818758010864258
CountSorter, 10000, 2, 1000, 2.0864181518554688
Counter, 10000, 2, 100, 2.4893338680267334
CountSorter, 10000, 2, 100, 0.2403562068939209
Counter, 10000, 2, 10, 1.6655514240264893
CountSorter, 10000, 2, 10, 0.05485248565673828
Counter, 10000, 2, 1, 1.0242605209350586
CountSorter, 10000, 2, 1, 0.039891719818115234
Counter, 100000, 2, 1, 13.034128665924072
CountSorter, 100000, 2, 1, 0.7719311714172363

Keeping in mind that there are definitely optimizations that this python solution could utilize.
See benchmark.py for more details

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文