将 3 维数据集读入 R

发布于 2024-09-14 01:49:05 字数 921 浏览 4 评论 0原文

Gnuplot 允许使用三维数据集,它们是一组由空行分隔的表格,例如:

54.32,16.17,7.42,4.28,3.09,2.11,1.66,1.22,0.99,0.82,7.9

54.63,15.50,8.53,5.31,3.75,1.66,1.14,0.83,0.94,0.52,7.18
56.49,16.67,6.38,3.69,2.80,1.45,1.12,0.89,1.12,0.89,8.50
56.35,16.26,7.76,3.57,2.62,1.89,1.05,1.15,0.63,1.05,7.66

53.79,16.19,6.47,4.57,3.47,1.74,1.95,1.37,1.00,0.74,8.73
55.63,16.28,7.87,3.72,2.48,1.99,1.40,1.19,0.70,1.08,7.65
54.09,15.76,7.96,4.70,2.77,2.21,1.27,1.27,0.66,1.11,8.19
53.79,16.19,6.47,4.57,3.47,1.74,1.95,1.37,1.00,0.74,8.73

...

这用于显示随着时间等而演变的数据集。在 Gnuplot 中,您可以选择要用于给定绘图的数据集(使用它的索引和关键字,呵呵,index IIRC)。

我一直在使用 R,到目前为止,我一直使用扫描/表函数一次手动输入一个数据集。我没有使用一个包含所有数据集的大文件,而是每个数据集都有一个文件,并且一次创建一个表。

是否有一种(内置的或非常简单的)方法可以一次性读取聚合数据集,以我所拥有的方式

dataset <- neatInput("my-aggregate-data")
dataset[1]    # first data set
dataset[2]    # second data set
...

或类似的方式?

Gnuplot allows for three dimensional datasets, which are a set of tables separated by empty lines, for instance:

54.32,16.17,7.42,4.28,3.09,2.11,1.66,1.22,0.99,0.82,7.9

54.63,15.50,8.53,5.31,3.75,1.66,1.14,0.83,0.94,0.52,7.18
56.49,16.67,6.38,3.69,2.80,1.45,1.12,0.89,1.12,0.89,8.50
56.35,16.26,7.76,3.57,2.62,1.89,1.05,1.15,0.63,1.05,7.66

53.79,16.19,6.47,4.57,3.47,1.74,1.95,1.37,1.00,0.74,8.73
55.63,16.28,7.87,3.72,2.48,1.99,1.40,1.19,0.70,1.08,7.65
54.09,15.76,7.96,4.70,2.77,2.21,1.27,1.27,0.66,1.11,8.19
53.79,16.19,6.47,4.57,3.47,1.74,1.95,1.37,1.00,0.74,8.73

...

This is for instance to show a dataset evolving through, for instance, time. In Gnuplot, you can then select which dataset (using it's index and the keyword, huh, index IIRC) you want to use for a given plot.

I've been using R, and up to now, I have been manually feeding it datasets one at a time, using the scan/table functions. Instead of having one big file with all the datasets in them, I have one file per dataset, and I create the tables one at a time.

Is there a (built-in, or very simple) way to read the aggregate datasets all at once, in such a way that I would have

dataset <- neatInput("my-aggregate-data")
dataset[1]    # first data set
dataset[2]    # second data set
...

or something similar?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蓝眼睛不忧郁 2024-09-21 01:49:05

我设法将代码合并为两行,FWIW:)

check <- read.csv("data.csv", blank.lines.skip = F, head = F)

split(check, (cumsum(is.na(check[,1]))+1) * !is.na(check[,1]))
## 

我设法将代码合并为两行,FWIW:)

0` ## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 ## 2 NA NA NA NA NA NA NA NA NA NA NA ## 6 NA NA NA NA NA NA NA NA NA NA NA ##

我设法将代码合并为两行,FWIW:)

1` ## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 ## 1 54.32 16.17 7.42 4.28 3.09 2.11 1.66 1.22 0.99 0.82 7.9 ##

我设法将代码合并为两行,FWIW:)

2` ## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 ## 3 54.63 15.50 8.53 5.31 3.75 1.66 1.14 0.83 0.94 0.52 7.18 ## 4 56.49 16.67 6.38 3.69 2.80 1.45 1.12 0.89 1.12 0.89 8.50 ## 5 56.35 16.26 7.76 3.57 2.62 1.89 1.05 1.15 0.63 1.05 7.66 ##

我设法将代码合并为两行,FWIW:)

3` ## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 ## 7 53.79 16.19 6.47 4.57 3.47 1.74 1.95 1.37 1.00 0.74 8.73 ## 8 55.63 16.28 7.87 3.72 2.48 1.99 1.40 1.19 0.70 1.08 7.65 ## 9 54.09 15.76 7.96 4.70 2.77 2.21 1.27 1.27 0.66 1.11 8.19 ## 10 53.79 16.19 6.47 4.57 3.47 1.74 1.95 1.37 1.00 0.74 8.73

I managed to consolidate the code into two lines, FWIW :)

check <- read.csv("data.csv", blank.lines.skip = F, head = F)

split(check, (cumsum(is.na(check[,1]))+1) * !is.na(check[,1]))
## 

I managed to consolidate the code into two lines, FWIW :)

0` ## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 ## 2 NA NA NA NA NA NA NA NA NA NA NA ## 6 NA NA NA NA NA NA NA NA NA NA NA ##

I managed to consolidate the code into two lines, FWIW :)

1` ## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 ## 1 54.32 16.17 7.42 4.28 3.09 2.11 1.66 1.22 0.99 0.82 7.9 ##

I managed to consolidate the code into two lines, FWIW :)

2` ## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 ## 3 54.63 15.50 8.53 5.31 3.75 1.66 1.14 0.83 0.94 0.52 7.18 ## 4 56.49 16.67 6.38 3.69 2.80 1.45 1.12 0.89 1.12 0.89 8.50 ## 5 56.35 16.26 7.76 3.57 2.62 1.89 1.05 1.15 0.63 1.05 7.66 ##

I managed to consolidate the code into two lines, FWIW :)

3` ## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 ## 7 53.79 16.19 6.47 4.57 3.47 1.74 1.95 1.37 1.00 0.74 8.73 ## 8 55.63 16.28 7.87 3.72 2.48 1.99 1.40 1.19 0.70 1.08 7.65 ## 9 54.09 15.76 7.96 4.70 2.77 2.21 1.27 1.27 0.66 1.11 8.19 ## 10 53.79 16.19 6.47 4.57 3.47 1.74 1.95 1.37 1.00 0.74 8.73
黯然 2024-09-21 01:49:05

如果您的第三个维度是时间,那么通常最好使用专门的时间/日期对象。 R 中最常用的通用时间序列包包含自定义函数来执行您想要的操作。例如,要汇总一些最初以几个月到几年为单位的数据:

> data(AirPassengers); AP = AirPassengers
> # import the package xts, which will 'auto-import' its sole dependency, 
> # the package 'zoo'
> library(xts)    

# AP is an R time series whose data points are in months
> class(AP)
[1] "ts"
> start(AP)
[1] 1949    1
> end(AP)
[1] 1960   12
> frequency(AP)
[1] 12
> AP[1:3]
[1] 112 118 132

> # step 1: convert ts object to an xts object
> X = as.xts(AP)
> class(X)
[1] "xts" "zoo"
> # step 2: create index of endpoints to pass to the aggregator function
> np = endpoints(X, on="years")
> # step 3: call the aggregator function
> X2 = period.apply(X, INDEX=np, FUN=sum)
> X2[1:3]
         [,1]
Dec 1949 1520
Dec 1950 1676
Dec 1951 2042
> # 'X2' is in years (each value is about 12X higher than the first three values for
> # AP above

If your third dimension is time, then it's usually best to work with specialized time/date objects. The most commonly used general-purpose time series packages in R include custom functions to do what you want. For instance, to roll-up some data originally in months to years:

> data(AirPassengers); AP = AirPassengers
> # import the package xts, which will 'auto-import' its sole dependency, 
> # the package 'zoo'
> library(xts)    

# AP is an R time series whose data points are in months
> class(AP)
[1] "ts"
> start(AP)
[1] 1949    1
> end(AP)
[1] 1960   12
> frequency(AP)
[1] 12
> AP[1:3]
[1] 112 118 132

> # step 1: convert ts object to an xts object
> X = as.xts(AP)
> class(X)
[1] "xts" "zoo"
> # step 2: create index of endpoints to pass to the aggregator function
> np = endpoints(X, on="years")
> # step 3: call the aggregator function
> X2 = period.apply(X, INDEX=np, FUN=sum)
> X2[1:3]
         [,1]
Dec 1949 1520
Dec 1950 1676
Dec 1951 2042
> # 'X2' is in years (each value is about 12X higher than the first three values for
> # AP above
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文