是否有针对伯爵夫人的已知实现或类似的实现,至少与我在此处所做的链接列表一样高效:
https://github.com/tfreiling989/count_sorter
如果是这样,是否有已知的Python类/库?
Countsorter
这是一个可以在计数上对项目进行排序时可以有效计数项目的类。它使用一个双重链接列表(我从)
方法签名和运行时间
countsorter类的两个公共方法的运行时如下
- :
- “”“”
添加了一个项目。
运行时:O(1)
”“”
- def get_mast_frequent(self,num_items:int) - > ordereddict [any int,int]:
- “”“”
获取最常见的 num_items
项目及其计数以排序顺序(最大到最小)
示例结果:[(“ C”,6),(“ B”,5),(“ A”,5)]
运行时:o(num_items)
”“”
它的工作原理 - 理论
- 使用双重链接列表,其中列表中的每个节点都有计数和该计数的一组项目。
- 还有一个hashmap,每个项目对于每个项目,将当前节点的指针存储了。
- o(1)
- 当调用get_mast_frequent时,它可以收集列表末尾的n个项目
- 。 解决方案
的
c = CountSorter()
c.add_occurrence("a")
c.add_occurrence("a")
c.add_occurrence("a")
c.add_occurrence("b")
c.add_occurrence("c")
c.add_occurrence("b")
print(c.get_most_frequent(2).items())
最快
。
是 在get_mast_frequent方法中
基准结果,在添加每个项目后调用get_mast_frequent:
method, num_items, get_n_most_frequent, time
Counter, 10000, 2, 10000, 2.556185007095337
CountSorter, 10000, 2, 10000, 4.463066577911377
Counter, 10000, 2, 1000, 6.818758010864258
CountSorter, 10000, 2, 1000, 2.0864181518554688
Counter, 10000, 2, 100, 2.4893338680267334
CountSorter, 10000, 2, 100, 0.2403562068939209
Counter, 10000, 2, 10, 1.6655514240264893
CountSorter, 10000, 2, 10, 0.05485248565673828
Counter, 10000, 2, 1, 1.0242605209350586
CountSorter, 10000, 2, 1, 0.039891719818115234
Counter, 100000, 2, 1, 13.034128665924072
CountSorter, 100000, 2, 1, 0.7719311714172363
请记住,肯定会使用该python解决方案可以使用。
有关更多详细信息,请参见Benchmark.py
Is there a known implementation for a CountSorter or similar that is at least as efficient as one using a linked list as I have done here:
https://github.com/tfreiling989/count_sorter
If so, is there a known python class/library for it?
CountSorter
This is a class that can efficiently count items while sorting the items by their count. It uses a doubly linked list (a slightly modified version of the one I grabbed from https://favtutor.com/blogs/doubly-linked-list-python )
Method Signatures and Runtimes
The runtime of the 2 public methods of the CountSorter class are as follows:
- def add_occurrence(self, item):
- """
Adds an occurrence of an item.
Runtime: O(1)
"""
- def get_most_frequent(self, num_items:int) -> OrderedDict[Any,int]:
- """
Get the most frequent num_items
items along with their counts in sorted order (largest to smallest)
Example Result: [("c", 6), ("b",5) , ("a", 5)]
Runtime: O(num_items)
"""
How It Works - Theory
- It uses a doubly linked list where each node in the list has a count and the set of items with that count.
- There is also a hashmap that for each item, a pointer to its current node is stored. This allows O(1) lookups.
- When add_occurence is called for an item, it moves the item to the next node with the current count + 1. If that node doesn't exist, it will create it. This is also constant time O(1)
- When calling get_most_frequent, it simply can collect the n items that are at the end of the list. This is O(n) where n is the number of items requested.
- Theoretically, in a big O time complexity sense, this is the fastest possible solution.
Example Usage:
c = CountSorter()
c.add_occurrence("a")
c.add_occurrence("a")
c.add_occurrence("a")
c.add_occurrence("b")
c.add_occurrence("c")
c.add_occurrence("b")
print(c.get_most_frequent(2).items())
Would be expected to print: [("a",3),("b",2)]
Benchmark
I compared this solution with collections.Counter, and this solution is much faster assuming num_items in get_most_frequent method < total number of items
Benchmark results, calling the get_most_frequent after each item is added:
method, num_items, get_n_most_frequent, time
Counter, 10000, 2, 10000, 2.556185007095337
CountSorter, 10000, 2, 10000, 4.463066577911377
Counter, 10000, 2, 1000, 6.818758010864258
CountSorter, 10000, 2, 1000, 2.0864181518554688
Counter, 10000, 2, 100, 2.4893338680267334
CountSorter, 10000, 2, 100, 0.2403562068939209
Counter, 10000, 2, 10, 1.6655514240264893
CountSorter, 10000, 2, 10, 0.05485248565673828
Counter, 10000, 2, 1, 1.0242605209350586
CountSorter, 10000, 2, 1, 0.039891719818115234
Counter, 100000, 2, 1, 13.034128665924072
CountSorter, 100000, 2, 1, 0.7719311714172363
Keeping in mind that there are definitely optimizations that this python solution could utilize.
See benchmark.py for more details
发布评论