来自 netflow 的数据包配置文件

发布于 2024-12-27 12:56:07 字数 185 浏览 1 评论 0原文

我每 5 分钟有上个月的网络流量数据,我想对所有这些流量进行数据包分析。我需要 1 个数据包流、2 个数据包流等的百分比表示。可以在 1 个数据包流、1-100 个数据包流、100 个数据包流等类别中执行此操作...这并不那么重要。但我的问题是如何做到这一点。如何对无法相加的数据进行百分比表示?比如对每个文件进行百分比表示,然后对其进行某种类型的平均值?

I have netflow data from previous month in files per 5 minutes and I would like to do a packet profile of all this traffic. I need percentage representation of 1 packet flows, 2 packet flows etc. It is possible to do it in categories like 1 packet flow, 1-100 packet flows, 100 and more... Its not so important. But my question is how to do it. How to do percentage representation of data which I can't add together? Something like do percentage representation for every file and then do some type of average from it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蒗幽 2025-01-03 12:56:07

“我不能加在一起”是什么意思?实际上,如果您查看手册,您可以使用 nfdump 来做到这一点:-R expr /dir/ file1:file2 读取file1到file2的所有文件。例如

nfdump -R /yournetflowfolder/nfcapd.201204051609:nfcapd.201204051639

将在 16:09 至 16:39 期间收集 NetFlow 信息。然后您可以对该数据执行任何您需要的查询。

What do you mean with "I can't add together"? Actually you can do that with nfdump, if you look at the manual: -R expr /dir/file1:file2 Read all files from file1 to file2. For istance

nfdump -R /yournetflowfolder/nfcapd.201204051609:nfcapd.201204051639

will gather NetFlow informations from 16:09 to 16:39. Then you can do whatever query you need on that data.

云仙小弟 2025-01-03 12:56:07

听起来您正在描述直方图:您使用原始计数创建您所描述的大小的“箱”。箱的计数总和就是会话总数。要获得总流量的百分比,只需将每个容器除以总流量计数即可进行标准化。

因此,如果您制作一个两箱直方图,其中第一个箱是所有 << 会话的计数。 100个数据包流和另外100+个数据包流(注意不能有间隙或重叠),计算出前者有30个流,后者有60个流,那么总流数是90个,你33% 的流量少于 100 个数据包。

处理多个文件时,诀窍是始终使用相同的 bin 划分,并尽可能长时间地存储和使用原始计数,并且仅在最后一步导出 %s。您可以毫无问题地将直方图加在一起,只要它们的 bin 含义相同,然后当您对结果进行标准化时,您就可以得到每个 bin 的所有文件的总百分比。如果您需要添加文件,只需跟踪原始计数,以便在出现新数据时可以重新标准化。

您可以在 Matlab 等工具中轻松完成此操作,但要小心,因为其中许多工具会非常友好地为您自动确定 bin 宽度。因此,一个文件的直方图可能具有 bin {x <; 100, 100 <= x < 200, x >= 200} 另一个文件,{x <; 90, 90 <= x < 180, x >=180} 并且您将无法将结果相加。

It sounds like you're describing a histogram: You create 'bins' of the size you describe with the raw counts. The sum of the counts for the bins is the total number of sessions. To get the percentages of the total traffic, you just normalize by dividing each bin by the total flow count.

So, if you do a two-bin histogram where the first bin is the count of all sessions with < 100 packet flows and the other 100+ packet flows (note that there can't be gaps or overlaps), and it works out to 30 flows in the former and 60 in the latter, then the total number of flows is 90, and you have 33% of the flows being fewer than 100 packets.

When working with multiple files, the trick is to always use the same bin delineations and to store and work with the raw counts as long as possible and only derive the %s as the very last step. You can add together histograms with no trouble as long as their bins mean the same thing, and then when you normalize the result, you have for each bin the total percent for all files. If you're going to need to add a file, just keep track of the raw counts so that you can re-normalize when there's new data.

You can do this in a tool like Matlab pretty easily, but be careful because many of these tools will very kindly auto-determine bin widths for you. So, the histogram for one file might have bins {x < 100, 100 <= x < 200, x >= 200} and another file, {x < 90, 90 <= x < 180, x >=180} and you won't be able to add the results together.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文