具有多个时间序列的 csv 文件

发布于 2024-07-13 02:14:16 字数 703 浏览 5 评论 0原文

我导入了一个包含大量数据列和部分的 csv 文件。

v <- read.csv2("200109.csv", header=TRUE, sep=",", skip="6", na.strings=c(""))

文件的布局是这样的:(

Dataset1
time, data, .....
0       0
0       <NA>
0       0

Dataset2
time, data, .....
00:00   0
0       <NA>
0       0

不同数据集的标题完全相同。

现在,我可以使用以下方法绘制第一个数据集:

plot(as.numeric(as.character(v$Calls.served.by.agent[1:30])), type="l")

我很好奇是否有更好的方法:

  1. 获取所有数字读取为数字,无需转换。

  2. 以某种有意义的方式处理文件中的不同数据集。

谢谢您。


状态更新:

我还没有在 R 中找到一个好的解决方案,但我已经开始用 Lua 编写一个脚本来分隔每个单独的时间。我暂时将其保留为打开状态,因为我很好奇 R 每天会处理所有这些文件。

I've imported a csv file with lots of columns and sections of data.

v <- read.csv2("200109.csv", header=TRUE, sep=",", skip="6", na.strings=c(""))

The layout of the file is something like this:

Dataset1
time, data, .....
0       0
0       <NA>
0       0

Dataset2
time, data, .....
00:00   0
0       <NA>
0       0

(The headers of the different datasets is exactly the same.

Now, I can plot the first dataset with:

plot(as.numeric(as.character(v$Calls.served.by.agent[1:30])), type="l")

I am curious if there is a better way to:

  1. Get all the numbers read as numbers, without having to convert.

  2. Address the different datasets in the file, in some meaningfull way.

Any hints would be appreciated. Thank you.


Status update:

I haven't really found a good solution yet in R, but I've started writing a script in Lua to seperate each individual time-series into a seperate file. I'm leaving this open for now, because I'm curious how well R will deal with all these files. I'll get 8 files per day.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

萌辣 2024-07-20 02:14:16

我个人会做的是用某种脚本语言制作一个脚本,以便在将文件读入 R 之前分离不同的数据集,并且可能还进行一些必要的数据转换。

如果您想在 R 中进行拆分,请查找 readLinesscanread.csv2 级别太高,仅供阅读单个数据框。 您可以将不同的数据集写入不同的文件中,或者如果您雄心勃勃,可以创建可与 read.csv2 一起使用的类似文件的 R 对象,并从底层大文件的正确部分读取。

将数据集分成不同的文件后,请在这些文件上使用 read.csv2(或者最好的 read.table 变体 - 如果这些文件不是选项卡而是固定的) -width 字段,请参阅 read.fwf)。 如果 在您的文件中指示“不可用”,请务必将其指定为 na.strings 的一部分。 如果您不这样做,R 会认为该字段中有非数字数据,但使用正确的 na.strings,您会自动将该字段转换为数字。 似乎您的某个字段可以包含像 00:00 这样的时间戳,因此您需要使用 colClasses 并指定一个可以将时间戳格式转换为的类。 如果内置 Date 类不起作用,只需定义您自己的 timestamp 类和执行转换的 as.timestamp 函数即可。

What I personally would do is to make a script in some scripting language to separate the different data sets before the file is read into R, and possibly do some of the necessary data conversions, too.

If you want to do the splitting in R, look up readLines and scanread.csv2 is too high-level and is meant for reading a single data frame. You could write the different data sets into different files, or if you are ambitious, cook up file-like R objects that are usable with read.csv2 and read from the correct parts of the underlying big file.

Once you have dealt with separating the data sets into different files, use read.csv2 on those (or whichever read.table variant is best – if those are not tabs but fixed-width fields, see read.fwf). If <NA> indicates "not available" in your file, be sure to specify it as part of na.strings. If you don't do that, R thinks you have non-numeric data in that field, but with the right na.strings, you automatically get the field converted into numbers. It seems that one of your fields can include time stamps like 00:00, so you need to use colClasses and specify a class to which your time stamp format can be converted. If the built-in Date class doesn't work, just define your own timestamp class and an as.timestamp function that does the conversion.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文