霍夫曼解码子表
我一直在尝试实现霍夫曼解码器,我最初的尝试由于性能低下而受到影响解码算法的次优选择。
我想我尝试使用表查找来实现霍夫曼解码。然而,我在生成子表方面有点卡住了,希望有人能给我指出正确的方向。
struct node
{
node* children; // 0 right, 1 left
uint8_t value;
uint8_t is_leaf;
};
struct entry
{
uint8_t next_table_index;
std::vector<uint8_t> values;
entry() : next_table_index(0){}
};
void build_tables(node* nodes, std::vector<std::array<entry, 256>>& tables, int table_index);
void unpack_tree(void* data, node* nodes);
std::vector<uint8_t, tbb::cache_aligned_allocator<uint8_t>> decode_huff(void* input)
{
// Initial setup
CACHE_ALIGN node nodes[512] = {};
auto data = reinterpret_cast<unsigned long*>(input);
size_t table_size = *(data++); // Size is first 32 bits.
size_t result_size = *(data++); // Data size is second 32 bits.
unpack_tree(data, nodes);
auto huffman_data = reinterpret_cast<long*>(input) + (table_size+32)/32;
size_t data_size = *(huffman_data++); // Size is first 32 bits.
auto huffman_data2 = reinterpret_cast<char*>(huffman_data);
// Build tables
std::vector<std::array<entry, 256>> tables(1);
build_tables(nodes, tables, 0);
// Decode
uint8_t current_table_index = 0;
std::vector<uint8_t, tbb::cache_aligned_allocator<uint8_t>> result;
while(result.size() < result_size)
{
auto& table = tables[current_table_index];
uint8_t key = *(huffman_data2++);
auto& values = table[key].values;
result.insert(result.end(), values.begin(), values.end());
current_table_index = table[key].next_table_index;
}
result.resize(result_size);
return result;
}
void build_tables(node* nodes, std::vector<std::array<entry, 256>>& tables, int table_index)
{
for(int n = 0; n < 256; ++n)
{
auto current = nodes;
for(int i = 0; i < 8; ++i)
{
current = current->children + ((n >> i) & 1);
if(current->is_leaf)
tables[table_index][n].values.push_back(current->value);
}
if(!current->is_leaf)
{
if(current->value == 0)
{
current->value = tables.size();
tables.push_back(std::array<entry, 256>());
build_tables(current, tables, current->value);
}
tables[table_index][n].next_table_index = current->value;
}
}
}
void unpack_tree(void* data, node* nodes)
{
node* nodes_end = nodes+1;
bit_reader table_reader(data);
unsigned char n_bits = ((table_reader.next_bit() << 2) | (table_reader.next_bit() << 1) | (table_reader.next_bit() << 0)) & 0x7; // First 3 bits are n_bits-1.
// Unpack huffman-tree
std::stack<node*> stack;
stack.push(&nodes[0]); // "nodes" is root
while(!stack.empty())
{
node* ptr = stack.top();
stack.pop();
if(table_reader.next_bit())
{
ptr->is_leaf = 1;
ptr->children = nodes[0].children;
for(int n = n_bits; n >= 0; --n)
ptr->value |= table_reader.next_bit() << n;
}
else
{
ptr->children = nodes_end;
nodes_end += 2;
stack.push(ptr->children+0);
stack.push(ptr->children+1);
}
}
}
I've been trying to implement a huffman decoder, and my initial attempt suffered from low performance due to a sub-optimal choice of decoding algorithm.
I thought I try to implement huffman decoding using table-lookups. However, I go a bit stuck on generating the subtables and was hoping someone could point me in the right direction.
struct node
{
node* children; // 0 right, 1 left
uint8_t value;
uint8_t is_leaf;
};
struct entry
{
uint8_t next_table_index;
std::vector<uint8_t> values;
entry() : next_table_index(0){}
};
void build_tables(node* nodes, std::vector<std::array<entry, 256>>& tables, int table_index);
void unpack_tree(void* data, node* nodes);
std::vector<uint8_t, tbb::cache_aligned_allocator<uint8_t>> decode_huff(void* input)
{
// Initial setup
CACHE_ALIGN node nodes[512] = {};
auto data = reinterpret_cast<unsigned long*>(input);
size_t table_size = *(data++); // Size is first 32 bits.
size_t result_size = *(data++); // Data size is second 32 bits.
unpack_tree(data, nodes);
auto huffman_data = reinterpret_cast<long*>(input) + (table_size+32)/32;
size_t data_size = *(huffman_data++); // Size is first 32 bits.
auto huffman_data2 = reinterpret_cast<char*>(huffman_data);
// Build tables
std::vector<std::array<entry, 256>> tables(1);
build_tables(nodes, tables, 0);
// Decode
uint8_t current_table_index = 0;
std::vector<uint8_t, tbb::cache_aligned_allocator<uint8_t>> result;
while(result.size() < result_size)
{
auto& table = tables[current_table_index];
uint8_t key = *(huffman_data2++);
auto& values = table[key].values;
result.insert(result.end(), values.begin(), values.end());
current_table_index = table[key].next_table_index;
}
result.resize(result_size);
return result;
}
void build_tables(node* nodes, std::vector<std::array<entry, 256>>& tables, int table_index)
{
for(int n = 0; n < 256; ++n)
{
auto current = nodes;
for(int i = 0; i < 8; ++i)
{
current = current->children + ((n >> i) & 1);
if(current->is_leaf)
tables[table_index][n].values.push_back(current->value);
}
if(!current->is_leaf)
{
if(current->value == 0)
{
current->value = tables.size();
tables.push_back(std::array<entry, 256>());
build_tables(current, tables, current->value);
}
tables[table_index][n].next_table_index = current->value;
}
}
}
void unpack_tree(void* data, node* nodes)
{
node* nodes_end = nodes+1;
bit_reader table_reader(data);
unsigned char n_bits = ((table_reader.next_bit() << 2) | (table_reader.next_bit() << 1) | (table_reader.next_bit() << 0)) & 0x7; // First 3 bits are n_bits-1.
// Unpack huffman-tree
std::stack<node*> stack;
stack.push(&nodes[0]); // "nodes" is root
while(!stack.empty())
{
node* ptr = stack.top();
stack.pop();
if(table_reader.next_bit())
{
ptr->is_leaf = 1;
ptr->children = nodes[0].children;
for(int n = n_bits; n >= 0; --n)
ptr->value |= table_reader.next_bit() << n;
}
else
{
ptr->children = nodes_end;
nodes_end += 2;
stack.push(ptr->children+0);
stack.push(ptr->children+1);
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,避免所有这些向量。您可以将指针指向单个预分配的缓冲区,但您不希望出现这样的情况:
vector
在整个内存中分配这些非常小的缓冲区,并且您的缓存占用空间会达到极限。另请注意,非叶状态的数量可能远少于 256 个。实际上,它可能低至 128 个。通过为它们分配低状态 ID,我们可以避免为整个状态节点集生成表条目(这可能总共高达511个节点)。毕竟,在消耗输入之后,我们永远不会到达叶节点;如果这样做,我们会生成输出,然后返回到根。
那么,我们应该做的第一件事就是将那些与内部节点相对应的状态(即那些指向非叶节点的指针)重新分配给低状态编号。您还可以使用它来减少状态转换表的内存消耗。
一旦我们分配了这些低状态编号,我们就可以遍历每个可能的非叶状态和每个可能的输入字节(即,双重嵌套的 for 循环)。像基于位的解码算法一样遍历树,并记录输出字节集、最终到达的节点 ID(不能是叶子!)以及是否到达流结束标记。
First off, avoid all those vectors. You can have pointers into a single preallocated buffer, but you don't want the scenario where
vector
allocates these tiny, tiny buffers all over memory, and your cache footprint goes through the roof.Note also that the number of non-leaf states might be much less than 256. Indeed, it might be as low as 128. By assigning them low state IDs, we can avoid generating table entries for the entire set of state nodes (which may be as high as 511 nodes in total). After all, after consuming input, we'll never end up on a leaf node; if we do, we generate output, then head back to the root.
The first thing we should do, then, is reassign those states that correspond to internal nodes (ie, ones with pointers out to non-leaves) to low state numbers. You can use this to also reduce memory consumption for your state transition table.
Once we've assigned these low state numbers, we can go through each possible non-leaf state, and each possible input byte (ie, a doubly-nested for loop). Traverse the tree as you would for a bit-based decoding algorithm, and record the set of output bytes, the final node ID you end up on (which must not be a leaf!), and whether you hit an end-of-stream mark.