删除附有foreign/Hmisc SPSS导入功能的变量标签

发布于 2024-08-24 07:24:07 字数 525 浏览 8 评论 0原文

像往常一样,我得到了一些 SPSS 文件,并使用 Hmisc 包中的 spss.get 函数将其导入到 R 中。我对 Hmisc::spss.get 添加到 data.frame 中所有变量的 labelled 类感到困扰,因此想要删除它。

当我尝试运行 ggplot 甚至当我想做一些简单的分析时,labelled 类让我头疼!一种解决方案是从 data.frame 中的每个变量中删除 labelled 类。我怎样才能做到这一点?这可能吗?如果没有,我还有什么其他选择?

我真的想在适用的情况下使用 as.data.frame(lapply(x, as.numeric))as.character 绕过“从头开始”重新编辑变量...我当然不想运行 SPSS 并手动删除标签(不喜欢 SPSS,也不关心安装它)!

谢谢!

As usual, I got some SPSS file that I've imported into R with spss.get function from Hmisc package. I'm bothered with labelled class that Hmisc::spss.get adds to all variables in data.frame, hence want to remove it.

labelled class gives me headaches when I try to run ggplot or even when I want to do some menial analysis! One solution would be to remove labelled class from each variable in data.frame. How can I do that? Is that possible at all? If not, what are my other options?

I really want to bypass reediting variables "from scratch" with as.data.frame(lapply(x, as.numeric)) and as.character where applicable... And I certainly don't want to run SPSS and remove labels manually (don't like SPSS, nor care to install it)!

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

千纸鹤带着心事 2024-08-31 07:24:07

这是我完全摆脱标签的方法。与 Jyotirmoy 的解决方案类似,但适用于向量和 data.frame。 (部分归功于 Frank Harrell)

clear.labels <- function(x) {
  if(is.list(x)) {
    for(i in 1 : length(x)) class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled') 
    for(i in 1 : length(x)) attr(x[[i]],"label") <- NULL
  }
  else {
    class(x) <- setdiff(class(x), "labelled")
    attr(x, "label") <- NULL
  }
  return(x)
}

使用如下:

my.unlabelled.df <- clear.labels(my.labelled.df)

编辑

这是该函数的一个更简洁的版本,结果相同:

clear.labels <- function(x) {
  if(is.list(x)) {
    for(i in seq_along(x)) {
      class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled') 
      attr(x[[i]],"label") <- NULL
    } 
  } else {
    class(x) <- setdiff(class(x), "labelled")
    attr(x, "label") <- NULL
  }
  return(x)
}

Here's how I get rid of the labels altogether. Similar to Jyotirmoy's solution but works for a vector as well as a data.frame. (Partial credits to Frank Harrell)

clear.labels <- function(x) {
  if(is.list(x)) {
    for(i in 1 : length(x)) class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled') 
    for(i in 1 : length(x)) attr(x[[i]],"label") <- NULL
  }
  else {
    class(x) <- setdiff(class(x), "labelled")
    attr(x, "label") <- NULL
  }
  return(x)
}

Use as follows:

my.unlabelled.df <- clear.labels(my.labelled.df)

EDIT

Here's a bit of a cleaner version of the function, same results:

clear.labels <- function(x) {
  if(is.list(x)) {
    for(i in seq_along(x)) {
      class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled') 
      attr(x[[i]],"label") <- NULL
    } 
  } else {
    class(x) <- setdiff(class(x), "labelled")
    attr(x, "label") <- NULL
  }
  return(x)
}
我很坚强 2024-08-31 07:24:07

关于 R 对象中的类成员资格的迟来的注释/警告。识别“标记”的正确方法不是使用 is 函数或相等性 {==) 进行测试,而是使用 inherits 进行测试。测试特定位置的方法不会选择现有类的顺序与假设的顺序不同的情况。

您可以使用以下参数避免在 spss.get 中创建“带标签”变量: use.value.labels=FALSE。

w <- spss.get('/tmp/my.sav', use.value.labels=FALSE, datevars=c('birthdate','deathdate'))

如果标记向量的类只是“标记”而不是 c(“标记”, “因子”),则 Bhattacharya 的代码可能会失败,在这种情况下应该是:

class(x[[i]]) <- NULL  # no error from assignment of empty vector

您报告的错误可以用以下代码重现:

> b <- 4:6
> label(b) <- 'B Label'
> str(b)
Class 'labelled'  atomic [1:3] 4 5 6
  ..- attr(*, "label")= chr "B Label"
> class(b) <- class(b)[-1]
Error in class(b) <- class(b)[-1] : 
  invalid replacement object to be a class string

A belated note/warning regarding class membership in R objects. The correct method for identification of "labelled" is not to test for with an is function or equality {==) but rather with inherits. Methods that test for a specific location will not pick up cases where the order of existing classes are not the ones assumed.

You can avoid creating "labelled" variables in spss.get with the argument: , use.value.labels=FALSE.

w <- spss.get('/tmp/my.sav', use.value.labels=FALSE, datevars=c('birthdate','deathdate'))

The code from Bhattacharya could fail if the class of the labelled vector were simply "labelled" rather than c("labelled", "factor") in which case it should have been:

class(x[[i]]) <- NULL  # no error from assignment of empty vector

The error you report can be reproduced with this code:

> b <- 4:6
> label(b) <- 'B Label'
> str(b)
Class 'labelled'  atomic [1:3] 4 5 6
  ..- attr(*, "label")= chr "B Label"
> class(b) <- class(b)[-1]
Error in class(b) <- class(b)[-1] : 
  invalid replacement object to be a class string
赠意 2024-08-31 07:24:07

您可以尝试 foreign 包中的 read.spss 函数。

一个粗略且现成的方法可以摆脱 spss.get 创建的 labelled 类,

for (i in 1:ncol(x)) {
    z<-class(x[[i]])
    if (z[[1]]=='labelled'){
       class(x[[i]])<-z[-1]
       attr(x[[i]],'label')<-NULL
    }
}

但是您能否举一个 labelled 导致问题的示例?

如果我在由 spss.get 创建的数据框 x 中有一个变量 MAED,我有:

> class(x$MAED)
[1] "labelled" "factor"  
> is.factor(x$MAED)
[1] TRUE

编写良好的代码需要一个因子(说)不应该有任何问题。

You can try out the read.spss function from the foreign package.

A rough and ready way to get rid of the labelled class created by spss.get

for (i in 1:ncol(x)) {
    z<-class(x[[i]])
    if (z[[1]]=='labelled'){
       class(x[[i]])<-z[-1]
       attr(x[[i]],'label')<-NULL
    }
}

But can you please give an example where labelled causes problems?

If I have a variable MAED in a data frame x created by spss.get, I have:

> class(x$MAED)
[1] "labelled" "factor"  
> is.factor(x$MAED)
[1] TRUE

So well-written code that expects a factor (say) should not have any problems.

短暂陪伴 2024-08-31 07:24:07

假设:

library(Hmisc)
w <- spss.get('...')

您可以使用以下方法删除名为“var1”的变量的标签:

attributes(w$var1)$label <- NULL

如果您还想删除“labbled”类,您可以这样做:

class(w$var1) <- NULL 

或者如果该变量有多个类:

class(w$var1) <- class(w$var1)[-which(class(w$var1)=="labelled")]

希望这会有所帮助!

Suppose:

library(Hmisc)
w <- spss.get('...')

You could remove the labels of a variable called "var1" by using:

attributes(w$var1)$label <- NULL

If you also want to remove the class "labbled", you could do:

class(w$var1) <- NULL 

or if the variable has more than one class:

class(w$var1) <- class(w$var1)[-which(class(w$var1)=="labelled")]

Hope this helps!

肥爪爪 2024-08-31 07:24:07

好吧,我发现 unclass 函数可以用来删除类(谁会告诉,是吗?!):

library(Hmisc)
# let's presuppose that variable x is gathered through spss.get() function
# and that x is factor
> class(x)
[1] "labelled" "factor"
> foo <- unclass(x)
> class(foo)
[1] "integer"

这不是最幸运的解决方案,想象一下反向转换一堆向量......如果有人最重要的是,我会检查它作为答案......

Well, I figured out that unclass function can be utilized to remove classes (who would tell, aye?!):

library(Hmisc)
# let's presuppose that variable x is gathered through spss.get() function
# and that x is factor
> class(x)
[1] "labelled" "factor"
> foo <- unclass(x)
> class(foo)
[1] "integer"

It's not the luckiest solution, just imagine back-converting bunch of vectors... If anyone tops this, I'll check it as an answer...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文