给定数据包的踪迹，您如何将它们分组为流？

发布于 2024-08-29 14:56:39 字数 865 浏览 14 评论 0原文

到目前为止，我已经尝试过以下方法：

1）使用源 IP/端口和目标 IP/端口作为密钥进行哈希。哈希中的每个位置都是一个数据包列表。然后将哈希值保存在文件中，每个流由一些特殊字符/行分隔。问题：没有足够的内存来容纳大痕迹。

2) 使用与上面相同的键创建哈希，但仅将文件句柄保留在内存中。然后将每个数据包放入指向正确文件的 hash[key] 中。问题：流/文件太多（~200k），并且也可能耗尽内存。

3) 散列源IP/端口和目标IP/端口，然后将信息放入文件中。 2和3之间的区别在于，这里每次操作都会打开和关闭文件，因此我不必担心因为同时打开太多而导致内存不足。问题：太慢了，文件数量与 2 个相同，所以也不切实际。

4) 对源 IP/端口对进行哈希处理，然后迭代每个流的整个跟踪。获取属于该流的数据包并将它们放入输出文件中。问题：假设我有一个 60 MB 的跟踪，其中有 200k 个流。这样，我可以处理 60 MB 的文件 20 万次。也许在迭代时删除数据包会让事情变得不那么痛苦，但到目前为止我不确定这是否是一个好的解决方案。

5) 按 IP 源/目标拆分它们，然后为每个文件创建一个文件，并用特殊字符分隔流。文件仍然太多（+50k）。

现在我正在使用 Ruby 来做这件事，我想这可能是一个坏主意。目前我已经用 tshark 过滤了痕迹，以便它们只有相关信息，所以我不能真正让它们变得更小。

我考虑过使用 C#/Java/C++ 将所有内容加载到内存中，如 1) 中所述，但我想知道这里是否没有更好的方法，特别是因为即使使用更高效的方法，我稍后也可能会耗尽内存如果我必须使用更大的痕迹，请使用语言。

总之，我面临的问题是文件太多或内存不足。

我也尝试过寻找一些工具来过滤信息，但我认为没有。我发现的那些只返回一些统计信息，并且不会根据我需要扫描每个流。

原文

I've tried it these ways so far:

1) Make a hash with the source IP/port and destination IP/port as keys. Each position in the hash is a list of packets. The hash is then saved in a file, with each flow separated by some special characters/line. Problem: Not enough memory for large traces.

2) Make a hash with the same key as above, but only keep in memory the file handles. Each packet is then put into the hash[key] that points to the right file. Problems: Too many flows/files (~200k) and it might run out of memory as well.

3) Hash the source IP/port and destination IP/port, then put the info inside a file. The difference between 2 and 3 is that here the files are opened and closed for each operation, so I don't have to worry about running out of memory because I opened too many at the same time. Problems: WAY too slow, same number of files as 2 so also impractical.

4) Make a hash of the source IP/port pairs and then iterate over the whole trace for each flow. Take the packets that are part of that flow and place them into the output file. Problem: Suppose I have a 60 MB trace that has 200k flows. This way, I would process, say, a 60 MB file 200k times. Maybe removing the packets as I iterate would make it not so painful, but so far I'm not sure this would be a good solution.

5) Split them by IP source/destination and then create a single file for each one, separating the flows by special characters. Still too many files (+50k).

Right now I'm using Ruby to do it, which might've been a bad idea, I guess. Currently I've filtered the traces with tshark so that they only have relevant info, so I can't really make them any smaller.

I thought about loading everything in memory as described in 1) using C#/Java/C++, but I was wondering if there wouldn't be a better approach here, especially since I might also run out of memory later on even with a more efficient language if I have to use larger traces.

In summary, the problem I'm facing is that I either have too many files or that I run out of memory.

I've also tried searching for some tool to filter the info, but I don't think there is one. The ones I've found only return some statistics and wouldn't scan for every flow as I need.

分享到QQ

分享到微博