data.frame结构fubar:结构表示大部分字符,打印仅表示部分字符
我正在使用 sqldf 来对一个巨大的文件进行子集化。以下命令为我提供了 100 行和 42 列的 data.frame。
first <- read.csv.sql("first.txt", sep = " ", header = TRUE, row.names = FALSE,
sql = "SELECT * FROM file WHERE n = '\"n63\"' AND ratio = 1 AND r_name = '\"r1\"' AND method = '\"nearest\"' AND variables = 10")
对象的结构是
'data.frame': 100 obs. of 42 variables:
$ test_before : chr "TRUE" "TRUE" "TRUE" "TRUE" ...
$ test_after : chr "TRUE" "TRUE" "TRUE" "TRUE" ...
$ meanPSmatchRATIO : chr "1.54845330373635" "1.16857102212364" "1.25330045961256" "1.8011651466717" ...
snipped intervening normally printed columns
$ PSdiff_DIFF : chr "-0.0103938442562762" "-0.00935228868105753" "-0.00947571480267878"
snipped intervening normally printed columns
$ nUNMATCHt : chr "0" "0" "0" "0" ...
$ caliper : chr "\"no\"" "\"no\"" "\"no\"" "\"no\"" ...
$ method : chr "\"nearest\"" "\"nearest\"" "\"nearest\"" "\"nearest\"" ...
$ r_name : chr "\"r1\"" "\"r1\"" "\"r1\"" "\"r1\"" ...
$ ratio : int 1 1 1 1 1 1 1 1 1 1 ...
$ n : chr "\"n63\"" "\"n63\"" "\"n63\"" "\"n63\"" ...
$ variables : int 10 10 10 10 10 10 10 10 10 10 ...
现在,基于此,您会期望当我打印 data.frame 时,所有列(除了那些 int
都将是字符(用“”括起来))。但你错了!
test_before test_after meanPSmatchRATIO del- nUNMATCHt caliper method r_name ratio n variables
1 TRUE TRUE 1.54845330373635 eted 0 "no" "nearest" "r1" 1 "n63" 10
2 TRUE TRUE 1.16857102212364 ... 0 "no" "nearest" "r1" 1 "n63" 10
3 TRUE TRUE 1.25330045961256 ... 0 "no" "nearest" "r1" 1 "n63" 10
4 TRUE TRUE 1.8011651466717 ...t 0 "no" "nearest" "r1" 1 "n63" 10
请注意,只有最后几列是“字符”。我对正在发生的事情有点迷失。有人可以解释一下吗?
I'm using sqldf
to subset an enormous file. The following command gives me a data.frame of 100 rows and 42 columns.
first <- read.csv.sql("first.txt", sep = " ", header = TRUE, row.names = FALSE,
sql = "SELECT * FROM file WHERE n = '\"n63\"' AND ratio = 1 AND r_name = '\"r1\"' AND method = '\"nearest\"' AND variables = 10")
The structure of the object is
'data.frame': 100 obs. of 42 variables:
$ test_before : chr "TRUE" "TRUE" "TRUE" "TRUE" ...
$ test_after : chr "TRUE" "TRUE" "TRUE" "TRUE" ...
$ meanPSmatchRATIO : chr "1.54845330373635" "1.16857102212364" "1.25330045961256" "1.8011651466717" ...
snipped intervening normally printed columns
$ PSdiff_DIFF : chr "-0.0103938442562762" "-0.00935228868105753" "-0.00947571480267878"
snipped intervening normally printed columns
$ nUNMATCHt : chr "0" "0" "0" "0" ...
$ caliper : chr "\"no\"" "\"no\"" "\"no\"" "\"no\"" ...
$ method : chr "\"nearest\"" "\"nearest\"" "\"nearest\"" "\"nearest\"" ...
$ r_name : chr "\"r1\"" "\"r1\"" "\"r1\"" "\"r1\"" ...
$ ratio : int 1 1 1 1 1 1 1 1 1 1 ...
$ n : chr "\"n63\"" "\"n63\"" "\"n63\"" "\"n63\"" ...
$ variables : int 10 10 10 10 10 10 10 10 10 10 ...
Now, based on this you would expect that when I print the data.frame, all columns (except those int
will be character (enclosed in "")). But you would be wrong!
test_before test_after meanPSmatchRATIO del- nUNMATCHt caliper method r_name ratio n variables
1 TRUE TRUE 1.54845330373635 eted 0 "no" "nearest" "r1" 1 "n63" 10
2 TRUE TRUE 1.16857102212364 ... 0 "no" "nearest" "r1" 1 "n63" 10
3 TRUE TRUE 1.25330045961256 ... 0 "no" "nearest" "r1" 1 "n63" 10
4 TRUE TRUE 1.8011651466717 ...t 0 "no" "nearest" "r1" 1 "n63" 10
Notice that only the last few columns are "character". I'm a bit lost at what's going on. Can someone explain?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对我来说看起来不错。
print.data.frame
通常不会打印字符列的引号,但最后几列嵌入了引号,因此这就是为什么它们默认显示为“引用”的原因。Looks fine to me.
print.data.frame
doesn't usually print quotes for character columns, but those last few columns have embedded quotes, so that's why they appear "quoted" by default.您将看到数据框对象的
print
方法的默认行为。请参阅?print.data.frame
,其中包含:因此,如果您希望打印的对象被引用,请使用
quote = TRUE
。例如:编辑:关于@Roman 的评论,打印带有引号的列在数据中包含嵌入引号。例如,
caliper
的第一个元素是"\"no\""
,因此打印的是嵌入的引号,因此与默认行为完全一致print.data.frame() 的。You are seeing the default behaviour of the
print
method for data frame objects. See?print.data.frame
, which has:so if you want the printed object to be quoted, use
quote = TRUE
. E.g.:Edit: regarding @Roman's comment, the columns printing with quotes contain embedded quotes in the data. For example, the first element of
caliper
is"\"no\""
, so it is the embedded quotes that are being printed and thus is fully consistent with the default behaviour ofprint.data.frame()
.