有没有更好的方法来计算文件中所有符号的频率？

发布于 12-08 07:26 字数 596 浏览 1 评论 0原文

好吧，假设我有一个文本文件（不一定包含每个可能的符号），我想计算每个符号的频率，在计算频率后，我需要从最频繁的位置访问每个符号及其频率到最不频繁。这些符号不一定是 ASCII 字符，它们可以是任意字节序列，尽管长度相同。

我正在考虑做这样的事情（以伪代码）：

function add_to_heap (symbol)
    freq = heap.find(symbol).frequency
    if (freq.exists? == true)
        freq++
    else
        symbol.freq = 1
        heap.insert(symbol)

MaxBinaryHeap heap
while somefile != EOF
    symbol = read_byte(somefile)
    heap.add_to_heap(symbol)
heap.sort_by_frequency()

while heap.root != empty
    root = heap.extract_root()
    do_stuff(root)

我想知道：是否有更好、更简单的方法来计算和存储每个符号在文件中出现的次数？

原文

Okay, so, say I have a text file (not necessarily containing every possible symbol) and I'd like to calculate the frequency of each symbol and, after calculating the frequency, I then need to access each symbol and its frequency from most frequent to least frequent. The symbols are not necessarily ASCII characters, they could be arbitrary byte sequences, albeit all of the same length.

I was considering doing something like this (in pseudocode):

function add_to_heap (symbol)
    freq = heap.find(symbol).frequency
    if (freq.exists? == true)
        freq++
    else
        symbol.freq = 1
        heap.insert(symbol)

MaxBinaryHeap heap
while somefile != EOF
    symbol = read_byte(somefile)
    heap.add_to_heap(symbol)
heap.sort_by_frequency()

while heap.root != empty
    root = heap.extract_root()
    do_stuff(root)

I was wondering: is there a better, simpler way to calculate and store how many times each symbol occurs in a file?

分享到QQ

分享到微博