使用 R 获取波动率和峰值平均值。互联网流量数据占比
我在 R 数据集中有十天时间段内每小时的网络流量数据,如下所示。
Day Hour Volume Category
0 00 100 P2P
0 00 50 email
0 00 200 gaming
0 00 200 video
0 00 150 web
0 00 120 P2P
0 00 180 web
0 00 80 email
....
0 01 150 P2P
0 01 200 P2P
0 01 50 Web
...
...
10 23 100 web
10 23 200 email
10 23 300 gaming
10 23 300 gaming
正如所见,一小时内也存在类别的重复。我需要计算这些不同应用类别的波动性以及高峰小时与平均小时的比率。
波动性:每小时成交量的标准差除以每小时平均值。
高峰时段平均。小时比率:最大小时交易量与交易量的比值。该应用程序的平均小时数。
那么如何汇总和计算每个类别的这两个统计数据呢?我是 R 新手,对如何聚合和获取上述平均值不太了解。
因此,最终结果看起来像这样,首先通过对数量进行求和,然后计算两个统计数据,在单个 24 小时内聚合每个类别的数量
Category Volatility Peak to Avg. Ratio
Web 0.55 1.5
P2P 0.30 2.1
email 0.6 1.7
gaming 0.4 2.9
编辑:plyr 让我了解到了这一点。
stats = ddply(
.data = my_data
, .variables = .( Hour , Category)
, .fun = function(x){
to_return = data.frame(
volatility = sd((x$Volume)/mean(x$Volume))
, pa_ratio = max(x$Volume)/mean(x$Volume)
)
return( to_return )
}
)
并不是我所希望的。我想要每个类别的统计数据,其中首先通过对交易量求和将一天中的所有时间聚合为 24 小时,然后计算波动率和 PA 比率。有什么改进建议吗?
I have network traffic data in the following for for each hour of a ten day period as follows in a R dataset.
Day Hour Volume Category
0 00 100 P2P
0 00 50 email
0 00 200 gaming
0 00 200 video
0 00 150 web
0 00 120 P2P
0 00 180 web
0 00 80 email
....
0 01 150 P2P
0 01 200 P2P
0 01 50 Web
...
...
10 23 100 web
10 23 200 email
10 23 300 gaming
10 23 300 gaming
As seen there are repetition of Category within a single hour also. I need to calculate the volatility and the peak hour to average hour ratios of these different application categories.
Volatility: Standard deviation of hourly volumes divided by hourly average.
Peak hour to avg. hour ratio: Ratio of volume of the maximum hour to the vol. of the average hour for that application.
So how do I aggregate and calculate these two statistics for each category? I am new to R and don't have much knowledge of how to aggregate and get the averages as mentioned.
So, the final result would look something like this where first the volume for each category is aggregated on a single 24 hour period by summing the volume and then calculating the two statistics
Category Volatility Peak to Avg. Ratio
Web 0.55 1.5
P2P 0.30 2.1
email 0.6 1.7
gaming 0.4 2.9
Edit: plyr got me as far as this.
stats = ddply(
.data = my_data
, .variables = .( Hour , Category)
, .fun = function(x){
to_return = data.frame(
volatility = sd((x$Volume)/mean(x$Volume))
, pa_ratio = max(x$Volume)/mean(x$Volume)
)
return( to_return )
}
)
But this is not what I was hoping for. I want the statistics per Category where all the hours of the days are aggregated first into 24 hours by summing the volumes and then the volatility and PA ratio calculated. Any suggestions for improvement?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您需要分两个阶段执行此操作(使用
plyr
包):首先,正如您所指出的,同一类别可以有多个“日-小时”组合,因此我们首先聚合,每个类别,每小时内的总数,无论哪一天:然后您将获得统计数据:
You'd need to do it in two stages (using the
plyr
package): First, as you pointed out, there can be multiple Day-Hour combos for the same category, so we first aggregate, for each category, its totals within each Hour, regardless of the day:Then you get your stats: