具有多个时间序列的 csv 文件

发布于 2024-07-13 02:14:16 字数 703 浏览 5 评论 0原文

我导入了一个包含大量数据列和部分的 csv 文件。

v <- read.csv2("200109.csv", header=TRUE, sep=",", skip="6", na.strings=c(""))

文件的布局是这样的：（

Dataset1
time, data, .....
0       0
0       <NA>
0       0

Dataset2
time, data, .....
00:00   0
0       <NA>
0       0

不同数据集的标题完全相同。

现在，我可以使用以下方法绘制第一个数据集：

plot(as.numeric(as.character(v$Calls.served.by.agent[1:30])), type="l")

我很好奇是否有更好的方法：

获取所有数字读取为数字，无需转换。
以某种有意义的方式处理文件中的不同数据集。

谢谢您。

状态更新：

我还没有在 R 中找到一个好的解决方案，但我已经开始用 Lua 编写一个脚本来分隔每个单独的时间。我暂时将其保留为打开状态，因为我很好奇 R 每天会处理所有这些文件。

原文

I've imported a csv file with lots of columns and sections of data.

v <- read.csv2("200109.csv", header=TRUE, sep=",", skip="6", na.strings=c(""))

The layout of the file is something like this:

Dataset1
time, data, .....
0       0
0       <NA>
0       0

Dataset2
time, data, .....
00:00   0
0       <NA>
0       0

(The headers of the different datasets is exactly the same.

Now, I can plot the first dataset with:

plot(as.numeric(as.character(v$Calls.served.by.agent[1:30])), type="l")

I am curious if there is a better way to:

Get all the numbers read as numbers, without having to convert.
Address the different datasets in the file, in some meaningfull way.

Any hints would be appreciated. Thank you.

Status update:

I haven't really found a good solution yet in R, but I've started writing a script in Lua to seperate each individual time-series into a seperate file. I'm leaving this open for now, because I'm curious how well R will deal with all these files. I'll get 8 files per day.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

萌辣 2024-07-20 02:14:16

我个人会做的是用某种脚本语言制作一个脚本，以便在将文件读入 R 之前分离不同的数据集，并且可能还进行一些必要的数据转换。

如果您想在 R 中进行拆分，请查找 readLines 和 scan – read.csv2 级别太高，仅供阅读单个数据框。您可以将不同的数据集写入不同的文件中，或者如果您雄心勃勃，可以创建可与 read.csv2 一起使用的类似文件的 R 对象，并从底层大文件的正确部分读取。

将数据集分成不同的文件后，请在这些文件上使用 read.csv2（或者最好的 read.table 变体 - 如果这些文件不是选项卡而是固定的） -width 字段，请参阅 read.fwf）。如果在您的文件中指示“不可用”，请务必将其指定为 na.strings 的一部分。如果您不这样做，R 会认为该字段中有非数字数据，但使用正确的 na.strings，您会自动将该字段转换为数字。似乎您的某个字段可以包含像 00:00 这样的时间戳，因此您需要使用 colClasses 并指定一个可以将时间戳格式转换为的类。如果内置 Date 类不起作用，只需定义您自己的 timestamp 类和执行转换的 as.timestamp 函数即可。