数据包缓冲区的适当数据结构
如何实现数据包缓冲区,其中每个数据包的形式
typedef struct{
int32 IP; //4-byte IP-address
int16 ID; //unique sequence id
}t_Packet;
如下:最合适的数据结构应该是什么:
(1) 允许收集至少 8000 个这样的数据包(快速插入和删除操作)
(2) 允许使用 IP 地址进行非常快速的过滤,以便仅选择具有给定 IP 的数据包
(3)允许使用ID作为键进行非常快速的查找操作
(4) 允许非常快 (2),然后 (3) 在过滤结果内?
RAM 大小确实很重要,例如,无法使用巨大的查找表。
How to implement a buffer of the packets where each packet is of the form:
typedef struct{
int32 IP; //4-byte IP-address
int16 ID; //unique sequence id
}t_Packet;
What should be the most appropriate data structure which:
(1) allows to collect at least 8000 such packets (fast Insert and Delete operations)
(2) allows very fast filtering using IP address, so that only packets with given IP will be selected
(3) allows very fast find operation using ID as a key
(4) allows very fast (2), then (3) within filtered results ?
RAM size does matter, e.g. no huge lookup table is possible to use.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用 Patricia Trie 进行 IP 地址过滤。我相信大多数网络路由器都使用这种数据结构来表示 IPV4 IP 地址。文献中还有其他针对 IP 地址设计的尝试,您可以考虑使用。这是一个:http://citeseerx.ist.psu。 edu/viewdoc/summary?doi=10.1.1.35.4871
对应每个IP地址,现在可以有一个基于ID的哈希表了。
如果 IP 地址“足够”随机,那么您可能最好使用哈希表来基于 IP 进行过滤,因为 IP 地址非常适合大多数机器的单词,使得哈希查找非常快,并且 trie 可能会并不能真正节省你太多空间。
当然,正确的选择取决于你的情况......
You can use a Patricia Trie for the IP address filtering. I believe most Network Routers use this data structure for IPV4 IP addresses. There are also other tries in literature designed for IP addresses which you can consider using. Here is one: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.4871
Correponding to each IP address, you can now have a hashtable based on ID.
If the IP addresses are 'sufficiently' random, you might be better off using a hashtable to do filtering based on IP, though, as the IP addresses nicely fit in a word of most machines, making the hash lookups really fast and the trie might not really save you much space.
Of course, the right choice depends on your situation...
为数据创建两个索引(非顺序存储以实现快速插入等),一个是按 ID 分割的树,另一个是按 IP 分割的树。
如果您无法承担至少 8000 * sizeof(Packet) + 8000*12 + 8000*12 的数据 + 两个索引,那么老实说,仅迭代 8000 个项目不会花费很长时间。
这在 c++ 中比在 c 中更容易实现(如果只是因为您可以使用 boost::multi_index 或类似的)
Create two indices to the data (store it non-sequentially for fast insertions, etc), one a tree split by ID, the other a tree split by IP.
If you can't afford at least 8000 * sizeof(Packet) + 8000*12 + 8000*12 for data + two indices, then honestly, iterating over only 8000 items wouldn't really take very long.
This would be much easier to implement in c++ than c (if only because you could use boost::multi_index, or similar)