如何使用 CSV 时间数据在 R 中创建直方图？

发布于 2024-12-22 17:02:09 字数 828 浏览 2 评论 0原文

我有 24 小时日志的 CSV 数据，如下所示：

svr01,07:17:14,'[email protected]','8.3.1.35'
svr03,07:17:21,'[email protected]','82.15.1.35'
svr02,07:17:30,'[email protected]','2.15.1.35'
svr04,07:17:40,'[email protected]','2.1.1.35'

我使用 tbl <- read.csv("logs.csv") 读取数据

如何在直方图中绘制这些数据查看每小时的点击次数？理想情况下，我会得到 4 个条形图，代表每个 srv01、srv02、srv03、srv04 每小时的点击次数。

谢谢你在这里帮助我！

原文

I have CSV data of a log for 24 hours that looks like this:

svr01,07:17:14,'[email protected]','8.3.1.35'
svr03,07:17:21,'[email protected]','82.15.1.35'
svr02,07:17:30,'[email protected]','2.15.1.35'
svr04,07:17:40,'[email protected]','2.1.1.35'

I read the data with tbl <- read.csv("logs.csv")

How can I plot this data in a histogram to see the number of hits per hour?
Ideally, I would get 4 bars representing hits per hour per srv01, srv02, srv03, srv04.

Thank you for helping me here!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

与之呼应 2024-12-29 17:02:09

我不知道我是否理解正确，所以我将把我的答案分成两部分。第一部分是如何将时间转换为可用于绘图的向量。

a) 将数据转换为小时：

  #df being the dataframe
  df$timestamp <- strptime(df$timestamp, format="%H:%M:%S")
  df$hours <-  as.numeric(format(df$timestamp, format="%H"))
  hist(df$hours)

这将为您提供所有服务器上的点击次数直方图。如果您想分割直方图，这是一种方式，但当然还有许多其他方式：

b）使用ggplot2制作直方图

 #install.packages("ggplot2")
  require(ggplot2)
  ggplot(data=df) + geom_histogram(aes(x=hours), bin=1) +  facet_wrap(~ server)
  # or use a color instead
  ggplot(data=df) + geom_histogram(aes(x=hours, fill=server), bin=1)

c) 您还可以使用其他包：

 require(plotrix)
 l <- split(df$hours, f=df$server)
 multhist(l)

下面给出了示例。第三个使比较更容易，但我认为 ggplot2 看起来更好。

编辑

这些解决方案看起来像

第一个解决方案：
在此处输入图像描述

第二种解决方案：
在此处输入图像描述

第三种解决方案：
在此处输入图像描述

I don't know if I understood you right, so I will split my answer in two parts. The first part is how to convert your time into a vector you can use for plotting.

a) Converting your data into hours:

  #df being the dataframe
  df$timestamp <- strptime(df$timestamp, format="%H:%M:%S")
  df$hours <-  as.numeric(format(df$timestamp, format="%H"))
  hist(df$hours)

This gives you a histogram of hits over all servers. If you want to split the histograms this is one way but of course there are numerous others:

b) Making a histogram with ggplot2

 #install.packages("ggplot2")
  require(ggplot2)
  ggplot(data=df) + geom_histogram(aes(x=hours), bin=1) +  facet_wrap(~ server)
  # or use a color instead
  ggplot(data=df) + geom_histogram(aes(x=hours, fill=server), bin=1)

c) You could also use another package:

 require(plotrix)
 l <- split(df$hours, f=df$server)
 multhist(l)

The examples are given below. The third makes comparison easier but ggplot2 simply looks better I think.

EDIT

Here is how thes solutions would look like

first solution:
enter image description here

second solution:
enter image description here

third solution:
enter image description here

回复收藏 0 原文

孤城病女 2024-12-29 17:02:09

一个示例数据集：

dat = data.frame(server = paste("svr", round(runif(1000, 1, 10)), sep = ""),
                 time = Sys.time() + sort(round(runif(1000, 1, 36000))))

我使用的技巧是创建一个新变量，它仅指定记录点击的时间：

dat$hr = strftime(dat$time, "%H")

现在我们可以使用一些 plyr 魔法：

hits_hour = count(dat, vars = c("server","hr"))

并创建绘图：

ggplot(data = hits_hour) + geom_bar(aes(x = hr, y = freq, fill = server), stat="identity", position = "dodge")

看起来像：

< img src="https://i.sstatic.net/FaFcp.png" alt="在此处输入图像描述">

我不太喜欢这个情节，我更喜欢：

ggplot(data = hits_hour) + geom_line(aes(x = as.numeric(hr), y = freq)) + facet_wrap(~ server, nrow = 1)

看起来像：

<图像src="https://i.sstatic.net/1f1aV.png" alt="在此处输入图像描述">

将所有方面放在一行中可以轻松比较服务器之间的点击数。当使用真实数据而不是我的随机数据时，这看起来会更好。

An example dataset:

dat = data.frame(server = paste("svr", round(runif(1000, 1, 10)), sep = ""),
                 time = Sys.time() + sort(round(runif(1000, 1, 36000))))

The trick I use is to create a new variable which only specifies in which hour the hit was recorded:

dat$hr = strftime(dat$time, "%H")

Now we can use some plyr magick:

hits_hour = count(dat, vars = c("server","hr"))

And create the plot:

ggplot(data = hits_hour) + geom_bar(aes(x = hr, y = freq, fill = server), stat="identity", position = "dodge")

Which looks like:

enter image description here

I don't really like this plot, I'd be more in favor of:

ggplot(data = hits_hour) + geom_line(aes(x = as.numeric(hr), y = freq)) + facet_wrap(~ server, nrow = 1)

Which looks like:

enter image description here

Putting all the facets in one row allows easy comparison of the number of hits between the servers. This will look even better when using real data instead of my random data.

回复收藏 0 原文

~没有更多了~