data.frame结构fubar:结构表示大部分字符,打印仅表示部分字符

发布于 2024-12-04 12:32:38 字数 1955 浏览 0 评论 0原文

我正在使用 sqldf 来对一个巨大的文件进行子集化。以下命令为我提供了 100 行和 42 列的 data.frame。

first <- read.csv.sql("first.txt", sep = " ", header = TRUE, row.names = FALSE,
        sql = "SELECT * FROM file WHERE n = '\"n63\"' AND ratio = 1 AND r_name = '\"r1\"' AND method = '\"nearest\"' AND variables = 10")

对象的结构是

'data.frame':   100 obs. of  42 variables:
 $ test_before       : chr  "TRUE" "TRUE" "TRUE" "TRUE" ...
 $ test_after        : chr  "TRUE" "TRUE" "TRUE" "TRUE" ...
 $ meanPSmatchRATIO  : chr  "1.54845330373635" "1.16857102212364" "1.25330045961256" "1.8011651466717" ...
snipped intervening normally printed columns
 $ PSdiff_DIFF       : chr  "-0.0103938442562762" "-0.00935228868105753" "-0.00947571480267878" 
snipped intervening normally printed columns
 $ nUNMATCHt         : chr  "0" "0" "0" "0" ...
 $ caliper           : chr  "\"no\"" "\"no\"" "\"no\"" "\"no\"" ...
 $ method            : chr  "\"nearest\"" "\"nearest\"" "\"nearest\"" "\"nearest\"" ...
 $ r_name            : chr  "\"r1\"" "\"r1\"" "\"r1\"" "\"r1\"" ...
 $ ratio             : int  1 1 1 1 1 1 1 1 1 1 ...
 $ n                 : chr  "\"n63\"" "\"n63\"" "\"n63\"" "\"n63\"" ...
 $ variables         : int  10 10 10 10 10 10 10 10 10 10 ...

现在,基于此,您会期望当我打印 data.frame 时,所有列(除了那些 int 都将是字符(用“”括起来))。但你错了!

  test_before test_after meanPSmatchRATIO del-  nUNMATCHt caliper    method r_name ratio     n variables
1        TRUE       TRUE 1.54845330373635 eted          0    "no" "nearest"   "r1"     1 "n63"        10
2        TRUE       TRUE 1.16857102212364 ...           0    "no" "nearest"   "r1"     1 "n63"        10
3        TRUE       TRUE 1.25330045961256 ...           0    "no" "nearest"   "r1"     1 "n63"        10
4        TRUE       TRUE  1.8011651466717 ...t          0    "no" "nearest"   "r1"     1 "n63"        10

请注意,只有最后几列是“字符”。我对正在发生的事情有点迷失。有人可以解释一下吗?

I'm using sqldf to subset an enormous file. The following command gives me a data.frame of 100 rows and 42 columns.

first <- read.csv.sql("first.txt", sep = " ", header = TRUE, row.names = FALSE,
        sql = "SELECT * FROM file WHERE n = '\"n63\"' AND ratio = 1 AND r_name = '\"r1\"' AND method = '\"nearest\"' AND variables = 10")

The structure of the object is

'data.frame':   100 obs. of  42 variables:
 $ test_before       : chr  "TRUE" "TRUE" "TRUE" "TRUE" ...
 $ test_after        : chr  "TRUE" "TRUE" "TRUE" "TRUE" ...
 $ meanPSmatchRATIO  : chr  "1.54845330373635" "1.16857102212364" "1.25330045961256" "1.8011651466717" ...
snipped intervening normally printed columns
 $ PSdiff_DIFF       : chr  "-0.0103938442562762" "-0.00935228868105753" "-0.00947571480267878" 
snipped intervening normally printed columns
 $ nUNMATCHt         : chr  "0" "0" "0" "0" ...
 $ caliper           : chr  "\"no\"" "\"no\"" "\"no\"" "\"no\"" ...
 $ method            : chr  "\"nearest\"" "\"nearest\"" "\"nearest\"" "\"nearest\"" ...
 $ r_name            : chr  "\"r1\"" "\"r1\"" "\"r1\"" "\"r1\"" ...
 $ ratio             : int  1 1 1 1 1 1 1 1 1 1 ...
 $ n                 : chr  "\"n63\"" "\"n63\"" "\"n63\"" "\"n63\"" ...
 $ variables         : int  10 10 10 10 10 10 10 10 10 10 ...

Now, based on this you would expect that when I print the data.frame, all columns (except those int will be character (enclosed in "")). But you would be wrong!

  test_before test_after meanPSmatchRATIO del-  nUNMATCHt caliper    method r_name ratio     n variables
1        TRUE       TRUE 1.54845330373635 eted          0    "no" "nearest"   "r1"     1 "n63"        10
2        TRUE       TRUE 1.16857102212364 ...           0    "no" "nearest"   "r1"     1 "n63"        10
3        TRUE       TRUE 1.25330045961256 ...           0    "no" "nearest"   "r1"     1 "n63"        10
4        TRUE       TRUE  1.8011651466717 ...t          0    "no" "nearest"   "r1"     1 "n63"        10

Notice that only the last few columns are "character". I'm a bit lost at what's going on. Can someone explain?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

傲性难收 2024-12-11 12:32:38

对我来说看起来不错。 print.data.frame 通常不会打印字符列的引号,但最后几列嵌入了引号,因此这就是为什么它们默认显示为“引用”的原因。

Data <- data.frame(x=1:5,y=as.character(1:5),
  z=letters[1:5], q=paste("\"",letters[1:5],"\"",sep=""))
print(Data)  # default print
#   x y z   q
# 1 1 1 a "a"
# 2 2 2 b "b"
# 3 3 3 c "c"
# 4 4 4 d "d"
# 5 5 5 e "e"
print(Data, quote=TRUE)  # show embedded quotes
#     x   y   z       q
# 1 "1" "1" "a" "\"a\""
# 2 "2" "2" "b" "\"b\""
# 3 "3" "3" "c" "\"c\""
# 4 "4" "4" "d" "\"d\""
# 5 "5" "5" "e" "\"e\""

Looks fine to me. print.data.frame doesn't usually print quotes for character columns, but those last few columns have embedded quotes, so that's why they appear "quoted" by default.

Data <- data.frame(x=1:5,y=as.character(1:5),
  z=letters[1:5], q=paste("\"",letters[1:5],"\"",sep=""))
print(Data)  # default print
#   x y z   q
# 1 1 1 a "a"
# 2 2 2 b "b"
# 3 3 3 c "c"
# 4 4 4 d "d"
# 5 5 5 e "e"
print(Data, quote=TRUE)  # show embedded quotes
#     x   y   z       q
# 1 "1" "1" "a" "\"a\""
# 2 "2" "2" "b" "\"b\""
# 3 "3" "3" "c" "\"c\""
# 4 "4" "4" "d" "\"d\""
# 5 "5" "5" "e" "\"e\""
梦幻的味道 2024-12-11 12:32:38

您将看到数据框对象的 print 方法的默认行为。请参阅?print.data.frame,其中包含:

   quote: logical, indicating whether or not entries should be printed
          with surrounding quotes.

因此,如果您希望打印的对象被引用,请使用quote = TRUE。例如:

> dat <- data.frame(X = c("A","B"), Y = c("1","2"), stringsAsFactors = FALSE)
> dat
  X Y
1 A 1
2 B 2
> dat[,1] ## not using the data frame print method...
[1] "A" "B"
> print(dat, quote = TRUE)
    X   Y
1 "A" "1"
2 "B" "2"

编辑:关于@Roman 的评论,打印带有引号的列在数据中包含嵌入引号。例如,caliper 的第一个元素是 "\"no\"",因此打印的是嵌入的引号,因此与默认行为完全一致print.data.frame() 的。

You are seeing the default behaviour of the print method for data frame objects. See ?print.data.frame, which has:

   quote: logical, indicating whether or not entries should be printed
          with surrounding quotes.

so if you want the printed object to be quoted, use quote = TRUE. E.g.:

> dat <- data.frame(X = c("A","B"), Y = c("1","2"), stringsAsFactors = FALSE)
> dat
  X Y
1 A 1
2 B 2
> dat[,1] ## not using the data frame print method...
[1] "A" "B"
> print(dat, quote = TRUE)
    X   Y
1 "A" "1"
2 "B" "2"

Edit: regarding @Roman's comment, the columns printing with quotes contain embedded quotes in the data. For example, the first element of caliper is "\"no\"", so it is the embedded quotes that are being printed and thus is fully consistent with the default behaviour of print.data.frame().

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文