给定数据包的踪迹,您如何将它们分组为流?

发布于 2024-08-29 14:56:39 字数 865 浏览 9 评论 0原文

到目前为止,我已经尝试过以下方法:

1)使用源 IP/端口和目标 IP/端口作为密钥进行哈希。哈希中的每个位置都是一个数据包列表。然后将哈希值保存在文件中,每个流由一些特殊字符/行分隔。问题:没有足够的内存来容纳大痕迹。

2) 使用与上面相同的键创建哈希,但仅将文件句柄保留在内存中。然后将每个数据包放入指向正确文件的 hash[key] 中。问题:流/文件太多(~200k),并且也可能耗尽内存。

3) 散列源IP/端口和目标IP/端口,然后将信息放入文件中。 2和3之间的区别在于,这里每次操作都会打开和关闭文件,因此我不必担心因为同时打开太多而导致内存不足。问题:太慢了,文件数量与 2 个相同,所以也不切实际。

4) 对源 IP/端口对进行哈希处理,然后迭代每个流的整个跟踪。获取属于该流的数据包并将它们放入输出文件中。问题:假设我有一个 60 MB 的跟踪,其中有 200k 个流。这样,我可以处理 60 MB 的文件 20 万次。也许在迭代时删除数据包会让事情变得不那么痛苦,但到目前为止我不确定这是否是一个好的解决方案。

5) 按 IP 源/目标拆分它们,然后为每个文件创建一个文件,并用特殊字符分隔流。文件仍然太多(+50k)。

现在我正在使用 Ruby 来做这件事,我想这可能是一个坏主意。目前我已经用 tshark 过滤了痕迹,以便它们只有相关信息,所以我不能真正让它们变得更小。

我考虑过使用 C#/Java/C++ 将所有内容加载到内存中,如 1) 中所述,但我想知道这里是否没有更好的方法,特别是因为即使使用更高效的方法,我稍后也可能会耗尽内存如果我必须使用更大的痕迹,请使用语言。

总之,我面临的问题是文件太多或内存不足。

我也尝试过寻找一些工具来过滤信息,但我认为没有。我发现的那些只返回一些统计信息,并且不会根据我需要扫描每个流。

I've tried it these ways so far:

1) Make a hash with the source IP/port and destination IP/port as keys. Each position in the hash is a list of packets. The hash is then saved in a file, with each flow separated by some special characters/line. Problem: Not enough memory for large traces.

2) Make a hash with the same key as above, but only keep in memory the file handles. Each packet is then put into the hash[key] that points to the right file. Problems: Too many flows/files (~200k) and it might run out of memory as well.

3) Hash the source IP/port and destination IP/port, then put the info inside a file. The difference between 2 and 3 is that here the files are opened and closed for each operation, so I don't have to worry about running out of memory because I opened too many at the same time. Problems: WAY too slow, same number of files as 2 so also impractical.

4) Make a hash of the source IP/port pairs and then iterate over the whole trace for each flow. Take the packets that are part of that flow and place them into the output file. Problem: Suppose I have a 60 MB trace that has 200k flows. This way, I would process, say, a 60 MB file 200k times. Maybe removing the packets as I iterate would make it not so painful, but so far I'm not sure this would be a good solution.

5) Split them by IP source/destination and then create a single file for each one, separating the flows by special characters. Still too many files (+50k).

Right now I'm using Ruby to do it, which might've been a bad idea, I guess. Currently I've filtered the traces with tshark so that they only have relevant info, so I can't really make them any smaller.

I thought about loading everything in memory as described in 1) using C#/Java/C++, but I was wondering if there wouldn't be a better approach here, especially since I might also run out of memory later on even with a more efficient language if I have to use larger traces.

In summary, the problem I'm facing is that I either have too many files or that I run out of memory.

I've also tried searching for some tool to filter the info, but I don't think there is one. The ones I've found only return some statistics and wouldn't scan for every flow as I need.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

乖乖哒 2024-09-05 14:56:39

根据您的场景,我可能会将跟踪写入文件,但使用 LRU(最近最少使用)缓存机制来保持一次打开有限数量的文件。如果您需要访问当前未打开的文件,请关闭最长时间没有看到任何活动的文件,然后打开当前文件。

您可能需要调整 LRU 缓存中的文件数量以获得最佳性能。如果您有大量短期连接,此技术将特别有效。

Given your scenario, I might write the traces to files, but use an LRU (least-recently-used) caching mechanism to keep a limited number of files open at one time. If you need to access a file that isn't currently open, close the file that hasn't seen any activity the longest, and open the current file.

You may need to tune the number of files in your LRU cache in order to get the best performance. This technique will work especially well if you have a large number of short-lived connections.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文