对R中事物类型的全面考察; “模式”和“类”和“类型”;不足

发布于 2024-12-26 15:07:14 字数 919 浏览 1 评论 0原文

R 语言让我很困惑。实体有模式,但即使这样也不足以完全描述实体。

这个答案

在 R 中,每个“对象”都有一个模式和一个类。

所以我做了这些实验:

> class(3)
[1] "numeric"
> mode(3)
[1] "numeric"
> typeof(3)
[1] "double"

到目前为止还不错,但后来我传入了一个向量:

> mode(c(1,2))
[1] "numeric"
> class(c(1,2))
[1] "numeric"
> typeof(c(1,2))
[1] "double"

这没有意义。整数向量肯定应该具有与单个整数不同的类或不同的模式吗?我的问题是:

  • R 中的所有内容是否都有(恰好一个)
  • R 中的所有内容都具有(恰好一个)模式 吗?
  • 如果有的话,“typeof”告诉我们什么?
  • 完整描述一个实体还需要哪些其他信息? (例如,“向量”存储在哪里?)

更新:显然,文字 3 只是长度为 1 的向量。没有标量。好的但是...我尝试了 mode("string") 并得到了 "character",这让我认为字符串是字符向量。但如果这是真的,那么这应该是真的,但事实并非如此! c('h','i') == "嗨"

The language R confuses me. Entities have modes and classes, but even this is insufficient to fully describe the entity.

This answer says

In R every 'object' has a mode and a class.

So I did these experiments:

> class(3)
[1] "numeric"
> mode(3)
[1] "numeric"
> typeof(3)
[1] "double"

Fair enough so far, but then I passed in a vector instead:

> mode(c(1,2))
[1] "numeric"
> class(c(1,2))
[1] "numeric"
> typeof(c(1,2))
[1] "double"

That doesn't make sense. Surely a vector of integers should have a different class, or different mode, than a single integer? My questions are:

  • Does everything in R have (exactly one) class ?
  • Does everything in R have (exactly one) mode ?
  • What, if anything, does 'typeof' tell us?
  • What other information is needed to fully describe an entity? (Where is the 'vectorness' stored, for example?)

Update: Apparently, a literal 3 is just a vector of length 1. There are no scalars. OK But... I tried mode("string") and got "character", leading me to think that a string was a vector of characters. But if that was true, then this should be true, but it's not! c('h','i') == "hi"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

银河中√捞星星 2025-01-02 15:07:14

我同意 R 中的类型系统相当奇怪。之所以会这样,是因为它已经发展了(很长一段)时间...

请注意,您错过了又一个类似类型的函数,storage.mode,以及又一个类似类的函数,oldClass

因此,modestorage.mode是旧式类型(其中storage.mode更准确),而typeof< /code> 是更新、更准确的版本。

mode(3L)                  # numeric
storage.mode(3L)          # integer
storage.mode(`identical`) # function
storage.mode(`if`)        # function
typeof(`identical`)       # closure
typeof(`if`)              # special

那么 class 就是一个完全不同的故事。 class 主要只是对象的 class 属性(这正是 oldClass 返回的内容)。但是当没有设置class属性时,class函数会根据对象类型和dim属性组成一个类。

oldClass(3L) # NULL
class(3L) # integer
class(structure(3L, dim=1)) # array
class(structure(3L, dim=c(1,1))) # matrix
class(list()) # list
class(structure(list(1), dim=1)) # array
class(structure(list(1), dim=c(1,1))) # matrix
class(structure(list(1), dim=1, class='foo')) # foo

最后,类可以返回多个字符串,但前提是类属性是这样的。第一个字符串值是主类的一种,下面的值是它继承的内容。补课的长度始终为 1。

# Here "A" inherits from "B", which inherits from "C"
class(structure(1, class=LETTERS[1:3])) # "A" "B" "C"

# an ordered factor:
class(ordered(3:1)) # "ordered" "factor"

I agree that the type system in R is rather weird. The reason for it being that way is that it has evolved over (a long) time...

Note that you missed one more type-like function, storage.mode, and one more class-like function, oldClass.

So, mode and storage.mode are the old-style types (where storage.mode is more accurate), and typeof is the newer, even more accurate version.

mode(3L)                  # numeric
storage.mode(3L)          # integer
storage.mode(`identical`) # function
storage.mode(`if`)        # function
typeof(`identical`)       # closure
typeof(`if`)              # special

Then class is a whole different story. class is mostly just the class attribute of an object (that's exactly what oldClass returns). But when the class attribute is not set, the class function makes up a class from the object type and the dim attribute.

oldClass(3L) # NULL
class(3L) # integer
class(structure(3L, dim=1)) # array
class(structure(3L, dim=c(1,1))) # matrix
class(list()) # list
class(structure(list(1), dim=1)) # array
class(structure(list(1), dim=c(1,1))) # matrix
class(structure(list(1), dim=1, class='foo')) # foo

Finally, the class can return more than one string, but only if the class attribute is like that. The first string value is then kind of the main class, and the following ones are what it inherits from. The made-up classes are always of length 1.

# Here "A" inherits from "B", which inherits from "C"
class(structure(1, class=LETTERS[1:3])) # "A" "B" "C"

# an ordered factor:
class(ordered(3:1)) # "ordered" "factor"
溺孤伤于心 2025-01-02 15:07:14

下面是一些代码,用于确定四种类型函数,class模式typeofstorage.mode 返回每种类型的 R 对象。

library(methods)
library(tibble)
library(purrr)
library(xml2)
library(kable)

setClass("dummy", representation(x="numeric", y="numeric"))

types <- list(
  "logical vector" = logical(),
  "integer vector" = integer(),
  "numeric vector" = numeric(),
  "complex vector" = complex(),
  "character vector" = character(),
  "raw vector" = raw(),
  factor = factor(),
  "logical matrix" = matrix(logical()),
  "numeric matrix" = matrix(numeric()),
  "logical array" = array(logical(8), c(2, 2, 2)),
  "numeric array" = array(numeric(8), c(2, 2, 2)),
  list = list(),
  pairlist = .Options,
  "data frame" = data.frame(),
  "closure function" = identity,
  "builtin function" = `+`,
  "special function" = `if`,
  environment = new.env(),
  null = NULL,
  formula = y ~ x,
  expression = expression(),
  call = call("identity"),
  name = as.name("x"),
  "paren in expression" = expression((1))[[1]],
  "brace in expression" = expression({1})[[1]],
  "S3 lm object" = lm(dist ~ speed, cars),
  "S4 dummy object" = new("dummy", x = 1:10, y = rnorm(10)),
  "external pointer" = read_xml("<foo><bar /></foo>")$node
)

type_info <- imap_dfr(
  types,
  function(x, nm)
  {
    tibble(
      "spoken type" = nm,
      class = class(x), 
      typeof = typeof(x),
      mode  = mode(x),
      storage.mode = storage.mode(x)
    )
  }
)

knitr::kable(type_info)

以下是输出:

|spoken type         |class       |typeof      |mode        |storage.mode |
|:-------------------|:-----------|:-----------|:-----------|:------------|
|logical vector      |logical     |logical     |logical     |logical      |
|integer vector      |integer     |integer     |numeric     |integer      |
|numeric vector      |numeric     |double      |numeric     |double       |
|complex vector      |complex     |complex     |complex     |complex      |
|character vector    |character   |character   |character   |character    |
|raw vector          |raw         |raw         |raw         |raw          |
|factor              |factor      |integer     |numeric     |integer      |
|logical matrix      |matrix      |logical     |logical     |logical      |
|logical matrix      |array       |logical     |logical     |logical      |
|numeric matrix      |matrix      |double      |numeric     |double       |
|numeric matrix      |array       |double      |numeric     |double       |
|logical array       |array       |logical     |logical     |logical      |
|numeric array       |array       |double      |numeric     |double       |
|list                |list        |list        |list        |list         |
|pairlist            |pairlist    |pairlist    |pairlist    |pairlist     |
|data frame          |data.frame  |list        |list        |list         |
|closure function    |function    |closure     |function    |function     |
|builtin function    |function    |builtin     |function    |function     |
|special function    |function    |special     |function    |function     |
|environment         |environment |environment |environment |environment  |
|null                |NULL        |NULL        |NULL        |NULL         |
|formula             |formula     |language    |call        |language     |
|expression          |expression  |expression  |expression  |expression   |
|call                |call        |language    |call        |language     |
|name                |name        |symbol      |name        |symbol       |
|paren in expression |(           |language    |(           |language     |
|brace in expression |{           |language    |call        |language     |
|S3 lm object        |lm          |list        |list        |list         |
|S4 dummy object     |dummy       |S4          |S4          |S4           |
|external pointer    |externalptr |externalptr |externalptr |externalptr  |

R 中可用的对象类型在 R 语言定义手册。这里没有提到一些类型:您无法测试“promise”、“...”和“ANY”类型的对象,并且“bytecode”和“weakref”仅在 C 级别可用。

R 源中可用类型的表为 在这里

Here's some code to determine what the four type functions, class, mode, typeof, and storage.mode return for each of the kinds of R object.

library(methods)
library(tibble)
library(purrr)
library(xml2)
library(kable)

setClass("dummy", representation(x="numeric", y="numeric"))

types <- list(
  "logical vector" = logical(),
  "integer vector" = integer(),
  "numeric vector" = numeric(),
  "complex vector" = complex(),
  "character vector" = character(),
  "raw vector" = raw(),
  factor = factor(),
  "logical matrix" = matrix(logical()),
  "numeric matrix" = matrix(numeric()),
  "logical array" = array(logical(8), c(2, 2, 2)),
  "numeric array" = array(numeric(8), c(2, 2, 2)),
  list = list(),
  pairlist = .Options,
  "data frame" = data.frame(),
  "closure function" = identity,
  "builtin function" = `+`,
  "special function" = `if`,
  environment = new.env(),
  null = NULL,
  formula = y ~ x,
  expression = expression(),
  call = call("identity"),
  name = as.name("x"),
  "paren in expression" = expression((1))[[1]],
  "brace in expression" = expression({1})[[1]],
  "S3 lm object" = lm(dist ~ speed, cars),
  "S4 dummy object" = new("dummy", x = 1:10, y = rnorm(10)),
  "external pointer" = read_xml("<foo><bar /></foo>")$node
)

type_info <- imap_dfr(
  types,
  function(x, nm)
  {
    tibble(
      "spoken type" = nm,
      class = class(x), 
      typeof = typeof(x),
      mode  = mode(x),
      storage.mode = storage.mode(x)
    )
  }
)

knitr::kable(type_info)

Here's the output:

|spoken type         |class       |typeof      |mode        |storage.mode |
|:-------------------|:-----------|:-----------|:-----------|:------------|
|logical vector      |logical     |logical     |logical     |logical      |
|integer vector      |integer     |integer     |numeric     |integer      |
|numeric vector      |numeric     |double      |numeric     |double       |
|complex vector      |complex     |complex     |complex     |complex      |
|character vector    |character   |character   |character   |character    |
|raw vector          |raw         |raw         |raw         |raw          |
|factor              |factor      |integer     |numeric     |integer      |
|logical matrix      |matrix      |logical     |logical     |logical      |
|logical matrix      |array       |logical     |logical     |logical      |
|numeric matrix      |matrix      |double      |numeric     |double       |
|numeric matrix      |array       |double      |numeric     |double       |
|logical array       |array       |logical     |logical     |logical      |
|numeric array       |array       |double      |numeric     |double       |
|list                |list        |list        |list        |list         |
|pairlist            |pairlist    |pairlist    |pairlist    |pairlist     |
|data frame          |data.frame  |list        |list        |list         |
|closure function    |function    |closure     |function    |function     |
|builtin function    |function    |builtin     |function    |function     |
|special function    |function    |special     |function    |function     |
|environment         |environment |environment |environment |environment  |
|null                |NULL        |NULL        |NULL        |NULL         |
|formula             |formula     |language    |call        |language     |
|expression          |expression  |expression  |expression  |expression   |
|call                |call        |language    |call        |language     |
|name                |name        |symbol      |name        |symbol       |
|paren in expression |(           |language    |(           |language     |
|brace in expression |{           |language    |call        |language     |
|S3 lm object        |lm          |list        |list        |list         |
|S4 dummy object     |dummy       |S4          |S4          |S4           |
|external pointer    |externalptr |externalptr |externalptr |externalptr  |

The types of objects available in R are discussed in the R Language Definition manual. There are a few types not mentioned here: you can't test for objects of type "promise", "...", and "ANY", and "bytecode" and "weakref" are only available at the C-level.

The table of available types in the R source is here.

北笙凉宸 2025-01-02 15:07:14

R 中的所有内容都具有(恰好一个)类吗?

恰好一个肯定是不对的:

> x <- 3
> class(x) <- c("hi","low")
> class(x)
[1] "hi"  "low"

所有内容都具有(至少一个)类。

R 中的所有内容都具有(恰好一种)模式吗?

不确定,但我怀疑是这样。

“typeof”告诉我们什么(如果有的话)?

typeof 给出对象的内部类型。根据 ?typeof 的可能值为:

向量类型“逻辑”、“整数”、“双精度”、“复数”、
“字符”、“原始”和“列表”、“NULL”、“闭包”(函数)、“特殊”
和“内置”(基本功能和运算符)、“环境”、“S4”
(一些 S4 对象)和其他用户不太可能看到的对象
级别(“符号”,“配对列表”,“承诺”,“语言”,“字符”,“...”,
“any”、“表达式”、“externalptr”、“字节码”和“weakref”)。

mode 依赖于 typeof。来自 ?mode

模式与类型具有相同的名称集(参见 typeof),除了
类型“integer”和“double”返回为“numeric”。
类型“special”和“builtin”作为“function”返回。
类型“符号”称为模式“名称”。
类型“语言”返回为“(”或“call”。

完整描述一个实体还需要哪些信息?(例如,“列表”存储在哪里?)

列表具有类list:

> y <- list(3)
> class(y)
[1] "list"

您的意思是矢量化吗? length 应该足以满足大多数用途:

> z <- 3
> class(z)
[1] "numeric"
> length(z)
[1] 1

视为长度为 1 的数值向量,而不是某种原始数值类型。

3 >结论

你可以过得很好classlength 当您需要其他东西时,您可能不必问它们的用途:-)

Does everything in R have (exactly one) class ?

Exactly one is definitely not right:

> x <- 3
> class(x) <- c("hi","low")
> class(x)
[1] "hi"  "low"

Everything has (at least one) class.

Does everything in R have (exactly one) mode ?

Not certain but I suspect so.

What, if anything, does 'typeof' tell us?

typeof gives the internal type of an object. Possible values according to ?typeof are:

The vector types "logical", "integer", "double", "complex",
"character", "raw" and "list", "NULL", "closure" (function), "special"
and "builtin" (basic functions and operators), "environment", "S4"
(some S4 objects) and others that are unlikely to be seen at user
level ("symbol", "pairlist", "promise", "language", "char", "...",
"any", "expression", "externalptr", "bytecode" and "weakref").

mode relies on typeof. From ?mode:

Modes have the same set of names as types (see typeof) except that
types "integer" and "double" are returned as "numeric".
types "special" and "builtin" are returned as "function".
type "symbol" is called mode "name".
type "language" is returned as "(" or "call".

What other information is needed to fully describe an entity? (Where is the 'listness' stored, for example?)

A list has class list:

> y <- list(3)
> class(y)
[1] "list"

Do you mean vectorization? length should be sufficient for most purposes:

> z <- 3
> class(z)
[1] "numeric"
> length(z)
[1] 1

Think of 3 as a numeric vector of length 1, rather than as some primitive numeric type.

Conclusion

You can get by just fine with class and length. By the time you need the other stuff, you likely won't have to ask what they're for :-)

沉溺在你眼里的海 2025-01-02 15:07:14

添加到您的子问题之一:

  • 完整描述一个实体还需要哪些其他信息?

另外还有classmodetypeofattributesstr等等另外,is() 也值得注意。

is(1)
[1] "numeric" "vector"

虽然有用,但也不尽如人意。在此示例中,1 不仅仅如此;它也是原子的、有限的和双精度的。以下函数应该根据所有可用的 is.(...) 函数显示对象的所有信息:

what.is <- function(x, show.all=FALSE) {

  # set the warn option to -1 to temporarily ignore warnings
  op <- options("warn")
  options(warn = -1)
  on.exit(options(op))

  list.fun <- grep(methods(is), pattern = "<-", invert = TRUE, value = TRUE)
  result <- data.frame(test=character(), value=character(), 
                       warning=character(), stringsAsFactors = FALSE)

  # loop over all "is.(...)" functions and store the results
  for(fun in list.fun) {
    res <- try(eval(call(fun,x)),silent=TRUE)
    if(class(res)=="try-error") {
      next() # ignore tests that yield an error
    } else if (length(res)>1) {
      warn <- "*Applies only to the first element of the provided object"
      value <- paste(res,"*",sep="")
    } else {
      warn <- ""
      value <- res
    }
    result[nrow(result)+1,] <- list(fun, value, warn)
  }

  # sort the results
  result <- result[order(result$value,decreasing = TRUE),]
  rownames(result) <- NULL

  if(show.all)
    return(result)
  else
    return(result[which(result$value=="TRUE"),])
}

现在我们得到了更完整的图片:

> what.is(1)
        test value warning
1  is.atomic  TRUE        
2  is.double  TRUE        
3  is.finite  TRUE        
4 is.numeric  TRUE        
5  is.vector  TRUE 

> what.is(CO2)
           test value warning
1 is.data.frame  TRUE        
2       is.list  TRUE        
3     is.object  TRUE        
4  is.recursive  TRUE 

您还可以通过参数 获得更多信息show.all=TRUE。我不会在此处粘贴任何示例,因为结果超过 50 行长。

最后,这意味着作为信息的补充来源,而不是作为前面提到的任何其他功能的替代。

编辑

要包含更多“is”函数,根据@Erdogan的评论,您可以将此位添加到函数中:

  # right after 
  # list.fun <- grep(methods(is), pattern = "<-", invert = TRUE, value = TRUE)
  list.fun.2 <- character()

  packs <- c('base', 'utils', 'methods') # include more packages if needed

  for (pkg in packs) {
    library(pkg, character.only = TRUE)
    objects <- grep("^is.+\\w$", ls(envir = as.environment(paste('package', pkg, sep = ':'))),
                    value = TRUE)
    objects <- grep("<-", objects, invert = TRUE, value = TRUE)
    if (length(objects) > 0) 
      list.fun.2 <- append(list.fun.2, objects[sapply(objects, function(x) class(eval(parse(text = x))) == "function")])
  }

  list.fun <- union(list.fun.1, list.fun.2)  

  # ...and continue with the rest
  result <- data.frame(test=character(), value=character(), 
                       warning=character(), stringsAsFactors = FALSE)
  # and so on...

Adding to one of your sub-questions :

  • What other information is needed to fully describe an entity?

In addition to class, mode, typeof, attributes, str, and so on, is() is also worth noting.

is(1)
[1] "numeric" "vector"

While useful, it is also unsatisfactory. In this example, 1 is more than just that; it is also atomic, finite, and a double. The following function should show all that an object is according to all available is.(...) functions:

what.is <- function(x, show.all=FALSE) {

  # set the warn option to -1 to temporarily ignore warnings
  op <- options("warn")
  options(warn = -1)
  on.exit(options(op))

  list.fun <- grep(methods(is), pattern = "<-", invert = TRUE, value = TRUE)
  result <- data.frame(test=character(), value=character(), 
                       warning=character(), stringsAsFactors = FALSE)

  # loop over all "is.(...)" functions and store the results
  for(fun in list.fun) {
    res <- try(eval(call(fun,x)),silent=TRUE)
    if(class(res)=="try-error") {
      next() # ignore tests that yield an error
    } else if (length(res)>1) {
      warn <- "*Applies only to the first element of the provided object"
      value <- paste(res,"*",sep="")
    } else {
      warn <- ""
      value <- res
    }
    result[nrow(result)+1,] <- list(fun, value, warn)
  }

  # sort the results
  result <- result[order(result$value,decreasing = TRUE),]
  rownames(result) <- NULL

  if(show.all)
    return(result)
  else
    return(result[which(result$value=="TRUE"),])
}

So now we get a more complete picture:

> what.is(1)
        test value warning
1  is.atomic  TRUE        
2  is.double  TRUE        
3  is.finite  TRUE        
4 is.numeric  TRUE        
5  is.vector  TRUE 

> what.is(CO2)
           test value warning
1 is.data.frame  TRUE        
2       is.list  TRUE        
3     is.object  TRUE        
4  is.recursive  TRUE 

You also get more information with the argument show.all=TRUE. I am not pasting any example here as the results are over 50 lines long.

Finally, this is meant as a complementary source of information, not as a replacement for any of the other functions mentionned earlier.

EDIT

To include even more "is" functions, as per @Erdogan's comment, you could add this bit to the function:

  # right after 
  # list.fun <- grep(methods(is), pattern = "<-", invert = TRUE, value = TRUE)
  list.fun.2 <- character()

  packs <- c('base', 'utils', 'methods') # include more packages if needed

  for (pkg in packs) {
    library(pkg, character.only = TRUE)
    objects <- grep("^is.+\\w$", ls(envir = as.environment(paste('package', pkg, sep = ':'))),
                    value = TRUE)
    objects <- grep("<-", objects, invert = TRUE, value = TRUE)
    if (length(objects) > 0) 
      list.fun.2 <- append(list.fun.2, objects[sapply(objects, function(x) class(eval(parse(text = x))) == "function")])
  }

  list.fun <- union(list.fun.1, list.fun.2)  

  # ...and continue with the rest
  result <- data.frame(test=character(), value=character(), 
                       warning=character(), stringsAsFactors = FALSE)
  # and so on...
总攻大人 2025-01-02 15:07:14

这个问题涉及 R 中各种“类型”的混乱。在阅读了这篇文章和其他一些网站之后,我认为 @RichieCotton 的一条评论最好地解决了这种混乱。我花了很长时间才遇到这个评论,我想通过将其添加为答案来突出显示它。

[...] 此时,modestorage.mode 是 S 遗留下来的遗留功能。您应该只需要关心 class()typeof()

因此,在阅读其他答案时请记住这一点,并重点关注 typeofclass.不幸的是,在其他帖子中,不应该使用 modestorage.mode 并不明显。首先阅读此评论可以节省我很多时间。

This question deals with the confusion around all kinds of "types" in R. After reading this and quite a few other sites, in my opinion one comment by @RichieCotton best solves the confusion. It took me quite a while to encounter this comment and I want to highlight it by adding it as an answer.

[...] at this point, mode and storage.mode are legacy features left over from S. You should only ever need to care about class() and typeof().

So while reading other answers keep this in mind and focus on typeof and class. Unfortunately, in other posts it does not become obvious that mode and storage.mode should not be used. It would have saved me a lot of time reading this comment first.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文