具有多个时间序列的 csv 文件
我导入了一个包含大量数据列和部分的 csv 文件。
v <- read.csv2("200109.csv", header=TRUE, sep=",", skip="6", na.strings=c(""))
文件的布局是这样的:(
Dataset1
time, data, .....
0 0
0 <NA>
0 0
Dataset2
time, data, .....
00:00 0
0 <NA>
0 0
不同数据集的标题完全相同。
现在,我可以使用以下方法绘制第一个数据集:
plot(as.numeric(as.character(v$Calls.served.by.agent[1:30])), type="l")
我很好奇是否有更好的方法:
获取所有数字读取为数字,无需转换。
以某种有意义的方式处理文件中的不同数据集。
谢谢您。
状态更新:
我还没有在 R 中找到一个好的解决方案,但我已经开始用 Lua 编写一个脚本来分隔每个单独的时间。我暂时将其保留为打开状态,因为我很好奇 R 每天会处理所有这些文件。
I've imported a csv file with lots of columns and sections of data.
v <- read.csv2("200109.csv", header=TRUE, sep=",", skip="6", na.strings=c(""))
The layout of the file is something like this:
Dataset1
time, data, .....
0 0
0 <NA>
0 0
Dataset2
time, data, .....
00:00 0
0 <NA>
0 0
(The headers of the different datasets is exactly the same.
Now, I can plot the first dataset with:
plot(as.numeric(as.character(v$Calls.served.by.agent[1:30])), type="l")
I am curious if there is a better way to:
Get all the numbers read as numbers, without having to convert.
Address the different datasets in the file, in some meaningfull way.
Any hints would be appreciated. Thank you.
Status update:
I haven't really found a good solution yet in R, but I've started writing a script in Lua to seperate each individual time-series into a seperate file. I'm leaving this open for now, because I'm curious how well R will deal with all these files. I'll get 8 files per day.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我个人会做的是用某种脚本语言制作一个脚本,以便在将文件读入 R 之前分离不同的数据集,并且可能还进行一些必要的数据转换。
如果您想在 R 中进行拆分,请查找
readLines
和scan
–read.csv2
级别太高,仅供阅读单个数据框。 您可以将不同的数据集写入不同的文件中,或者如果您雄心勃勃,可以创建可与 read.csv2 一起使用的类似文件的 R 对象,并从底层大文件的正确部分读取。将数据集分成不同的文件后,请在这些文件上使用
read.csv2
(或者最好的read.table
变体 - 如果这些文件不是选项卡而是固定的) -width 字段,请参阅 read.fwf)。 如果
在您的文件中指示“不可用”,请务必将其指定为na.strings
的一部分。 如果您不这样做,R 会认为该字段中有非数字数据,但使用正确的na.strings
,您会自动将该字段转换为数字。 似乎您的某个字段可以包含像00:00
这样的时间戳,因此您需要使用colClasses
并指定一个可以将时间戳格式转换为的类。 如果内置Date
类不起作用,只需定义您自己的timestamp
类和执行转换的as.timestamp
函数即可。What I personally would do is to make a script in some scripting language to separate the different data sets before the file is read into R, and possibly do some of the necessary data conversions, too.
If you want to do the splitting in R, look up
readLines
andscan
–read.csv2
is too high-level and is meant for reading a single data frame. You could write the different data sets into different files, or if you are ambitious, cook up file-like R objects that are usable withread.csv2
and read from the correct parts of the underlying big file.Once you have dealt with separating the data sets into different files, use
read.csv2
on those (or whicheverread.table
variant is best – if those are not tabs but fixed-width fields, seeread.fwf
). If<NA>
indicates "not available" in your file, be sure to specify it as part ofna.strings
. If you don't do that, R thinks you have non-numeric data in that field, but with the rightna.strings
, you automatically get the field converted into numbers. It seems that one of your fields can include time stamps like00:00
, so you need to usecolClasses
and specify a class to which your time stamp format can be converted. If the built-inDate
class doesn't work, just define your owntimestamp
class and anas.timestamp
function that does the conversion.