读取R中的数据集，其中逗号用于字段分隔符和小数点

发布于 2024-12-06 02:57:44 字数 670 浏览 1 评论 0原文

你如何在R中读取这个数据集，问题是这些数字是浮点数并且类似于4,000000059604644E+16 它们由 , 分隔，

4,000000059604644E-16 ,  7,999997138977056E-16,   9,000002145767216E-16
4,999999403953552E-16 ,  6,99999988079071E-16 ,   0,099999904632568E-16
9,999997615814208E-16 ,  4,30000066757202E-16 ,   3,630000114440918E-16
0,69999933242798E-16  ,  0,099999904632568E-16,  55,657576767799999E-16 
3,999999761581424E-16,   1,9900000095367432E-16,  0,199999809265136E-16

您将如何在 R 中加载这一数据集，使其具有 3 列。

如果我

dataset <- read.csv("C:\\data.txt",header=T,row.names=NULL)

这样做，它将返回 6 列而不是 3...

原文

How could you read this dataset in R, the problem is
that the numbers are floats and are like 4,000000059604644E+16
and they are separated by a ,

4,000000059604644E-16 ,  7,999997138977056E-16,   9,000002145767216E-16
4,999999403953552E-16 ,  6,99999988079071E-16 ,   0,099999904632568E-16
9,999997615814208E-16 ,  4,30000066757202E-16 ,   3,630000114440918E-16
0,69999933242798E-16  ,  0,099999904632568E-16,  55,657576767799999E-16 
3,999999761581424E-16,   1,9900000095367432E-16,  0,199999809265136E-16

How would you load this kinf of dataset in R so it has 3 columns.

If I do

dataset <- read.csv("C:\\data.txt",header=T,row.names=NULL)

it would return 6 columns instead 3...

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

拿命拼未来 2024-12-13 02:57:44

最好将输入数据转换为在浮点数中使用小数点，而不是逗号。实现此目的的一种方法是使用 sed（看起来您使用的是 Windows，因此您可能需要 sed 才能使用此方法）：

sed 's/\([0-9]\),\([0-9]\)/\1.\2/g' data.txt  > data2.txt

文件 data2 如下所示：

4.000000059604644E-16 ,  7.999997138977056E-16,   9.000002145767216E-16
4.999999403953552E-16 ,  6.99999988079071E-16 ,   0.099999904632568E-16
9.999997615814208E-16 ,  4.30000066757202E-16 ,   3.630000114440918E-16
0.69999933242798E-16  ,  0.099999904632568E-16,  55.657576767799999E-16 
3.999999761581424E-16,   1.9900000095367432E-16,  0.199999809265136E-16

然后在 R 中：

dataset <- read.csv("data2.txt",row.names=NULL)

It might be best to transform that input data to use decimal points, rather than commas, in the floating point numbers. One way you could do this is to use sed (it looks like you are using Windows, so you would likely need to sed to use this approach):

sed 's/\([0-9]\),\([0-9]\)/\1.\2/g' data.txt  > data2.txt

File data2 looks like this:

4.000000059604644E-16 ,  7.999997138977056E-16,   9.000002145767216E-16
4.999999403953552E-16 ,  6.99999988079071E-16 ,   0.099999904632568E-16
9.999997615814208E-16 ,  4.30000066757202E-16 ,   3.630000114440918E-16
0.69999933242798E-16  ,  0.099999904632568E-16,  55.657576767799999E-16 
3.999999761581424E-16,   1.9900000095367432E-16,  0.199999809265136E-16

Then in R:

dataset <- read.csv("data2.txt",row.names=NULL)

回复收藏 0 原文

云之铃。 2024-12-13 02:57:44

这是一个全 R 解决方案，使用三个 read.table 调用。第一个 read.table 语句将每个数据行读取为 6 个字段；第二个 read.table 语句将字段正确组合在一起并读取它们，第三个语句从标头中获取名称。

fn <- "data.txt"

# create a test file

Lines <- "A , B , C
4,000000059604644E-16 ,  7,999997138977056E-16,   9,000002145767216E-16
4,999999403953552E-16 ,  6,99999988079071E-16 ,   0,099999904632568E-16
9,999997615814208E-16 ,  4,30000066757202E-16 ,   3,630000114440918E-16
0,69999933242798E-16  ,  0,099999904632568E-16,  55,657576767799999E-16 
3,999999761581424E-16,   1,9900000095367432E-16,  0,199999809265136E-16"
cat(Lines, "\n", file = fn)

# now read it back in

DF0 <- read.table(fn, skip = 1, sep = ",", colClasses = "character")
DF <- read.table(
   file = textConnection(do.call("sprintf", c("%s.%s %s.%s %s.%s", DF0))), 
   col.names = names(read.csv(fn, nrow = 0))
)

给出：

> DF
             A            B            C
1 4.000000e-16 7.999997e-16 9.000002e-16
2 4.999999e-16 7.000000e-16 9.999990e-18
3 9.999998e-16 4.300001e-16 3.630000e-16
4 6.999993e-17 9.999990e-18 5.565758e-15
5 4.000000e-16 1.990000e-16 1.999998e-17

注意：问题中的read.csv语句暗示存在标题，但示例数据未显示标题。我假设有一个标头，但如果没有，则删除 skip= 和 col.names= 参数。

Here is an all R solution that uses three read.table calls. The first read.table statement reads each data row as 6 fields; the second read.table statement puts the fields back together properly and reads them and the third grabs the names from the header.

fn <- "data.txt"

# create a test file

Lines <- "A , B , C
4,000000059604644E-16 ,  7,999997138977056E-16,   9,000002145767216E-16
4,999999403953552E-16 ,  6,99999988079071E-16 ,   0,099999904632568E-16
9,999997615814208E-16 ,  4,30000066757202E-16 ,   3,630000114440918E-16
0,69999933242798E-16  ,  0,099999904632568E-16,  55,657576767799999E-16 
3,999999761581424E-16,   1,9900000095367432E-16,  0,199999809265136E-16"
cat(Lines, "\n", file = fn)

# now read it back in

DF0 <- read.table(fn, skip = 1, sep = ",", colClasses = "character")
DF <- read.table(
   file = textConnection(do.call("sprintf", c("%s.%s %s.%s %s.%s", DF0))), 
   col.names = names(read.csv(fn, nrow = 0))
)

which gives:

> DF
             A            B            C
1 4.000000e-16 7.999997e-16 9.000002e-16
2 4.999999e-16 7.000000e-16 9.999990e-18
3 9.999998e-16 4.300001e-16 3.630000e-16
4 6.999993e-17 9.999990e-18 5.565758e-15
5 4.000000e-16 1.990000e-16 1.999998e-17

Note: The read.csv statement in the question implies that there is a header but the sample data does not show one. I assumed that there is a header but if not then remove the skip= and col.names= arguments.

回复收藏 0 原文

帥小哥 2024-12-13 02:57:44

它不漂亮，但应该可以工作：

x <- matrix(scan("c:/data.txt", what=character(), sep=","), byrow=TRUE, ncol=6)
y <- t(apply(x, 1, function(a) { left <- seq(1, length(a), by=2)
                               as.numeric(paste(a[left], a[left+1], sep="."))
                             } ))

It's not pretty, but it should work:

x <- matrix(scan("c:/data.txt", what=character(), sep=","), byrow=TRUE, ncol=6)
y <- t(apply(x, 1, function(a) { left <- seq(1, length(a), by=2)
                               as.numeric(paste(a[left], a[left+1], sep="."))
                             } ))

回复收藏 0 原文

~没有更多了~