复制“自定义表” R中的比较

发布于 2024-09-12 16:44:35 字数 1322 浏览 9 评论 0原文

我每天都使用 SPSS，但一直在努力学习 R。阻碍我的主要因素是我需要为我所做的市场研究轻松生成表格、横幅和交叉表。我喜欢 SPSS 中的“自定义表格”选项，并且正在寻求有关如何使用 R 复制它的建议。

我相信 R 比 SPSS 有很多优势，其中之一就是能够与 LaTeX 集成以生成可重复的报告。 SPSS 非常适合快速探索（点击），但在获取结果并将其打包为客户可接受的交付物等方面还有很多不足之处。也就是说，R 非常强大，我觉得我可以做所有我想做的事情如果我只能按照我需要的方式制作横幅/交叉表，就需要它。

简而言之，我可以选择哪些选项来生成类似于下面的值得报告的表格？我正在复制 SPSS 语法命令和输出以供参考。

CTABLES 
  /VLABELS VARIABLES=age educ paeduc maeduc speduc prestg80 happy 
    DISPLAY=DEFAULT 
  /TABLE age [MEAN F40.3, VALIDN COMMA40.0] + educ [MEAN F40.3, VALIDN COMMA40.0] + paeduc [MEAN F40.3, VALIDN COMMA40.0] + maeduc [MEAN F40.3, VALIDN COMMA40.0] + speduc [MEAN F40.3, VALIDN COMMA40.0] + prestg80 [MEAN F40.3, VALIDN COMMA40.0] BY happy 
  /SLABELS POSITION=ROW 
  /CATEGORIES VARIABLES=happy ORDER=A KEY=VALUE EMPTY=INCLUDE TOTAL=YES POSITION=AFTER MISSING=EXCLUDE 
  /SIGTEST TYPE=CHISQUARE ALPHA=0.05 INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE 
  /COMPARETEST TYPE=MEAN ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE MEANSVARIANCE=ALLCATS MERGE=NO 
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE MERGE=NO.

我附上了输出的图片。我对在行/列中拥有多个变量的能力特别感兴趣，并且喜欢在需要时灵活地嵌套它们。在图像中，我有一些连续变量，由列中的分类变量切割而成，汇总统计数据放置在行中。顺便说一句，我也非常喜欢快速列均值比较的功能——但是图可以在 R 中快速访问它们以进行条件交叉表生成。

原文

I use SPSS everyday but have really been trying to learn R. The major thing that is holding me back is my need to easily generate tables, banners, and cross-tabs for the market research that I do. I love the Custom Tables option in SPSS and am looking for advice on how to replicate it with R.

I believe R has a ton of advantages over SPSS, one of which is the ability to integrate with LaTeX for reproducible reports. SPSS is great for quick exploration (point and click), but leaves alot to be desired when taking the results and packaging it to an acceptable deliverable for clients, etc. That said, R is so powerful, I feel like I could do everything I need in it if I could only do my banners/crosstabs the way I need them.

In short, what are my options to generate report-worthy tables similar to what I have below? I am copying the SPSS syntax command and the output for reference.

CTABLES 
  /VLABELS VARIABLES=age educ paeduc maeduc speduc prestg80 happy 
    DISPLAY=DEFAULT 
  /TABLE age [MEAN F40.3, VALIDN COMMA40.0] + educ [MEAN F40.3, VALIDN COMMA40.0] + paeduc [MEAN F40.3, VALIDN COMMA40.0] + maeduc [MEAN F40.3, VALIDN COMMA40.0] + speduc [MEAN F40.3, VALIDN COMMA40.0] + prestg80 [MEAN F40.3, VALIDN COMMA40.0] BY happy 
  /SLABELS POSITION=ROW 
  /CATEGORIES VARIABLES=happy ORDER=A KEY=VALUE EMPTY=INCLUDE TOTAL=YES POSITION=AFTER MISSING=EXCLUDE 
  /SIGTEST TYPE=CHISQUARE ALPHA=0.05 INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE 
  /COMPARETEST TYPE=MEAN ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE MEANSVARIANCE=ALLCATS MERGE=NO 
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN INCLUDEMRSETS=YES CATEGORIES=ALLVISIBLE MERGE=NO.

I attached a picture of what the output looks like. I am particuarly interested in the ability to have multiple variables in the rows/columns and like the flexibility to nest them if I need to. In the image, I have a few continuous variables cut by a categorical variable in the column with the summary statistics placed in the rows. As an aside, I also really like the feature of quick column mean comparisons -- but figure in can quickly access them in R for conditional crosstab generation.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笑梦风尘 2024-09-19 16:44:36

有关某些表，请参阅 xtable 包导出为 LaTeX 和 HTML。不过，可能还有其他包。这看起来也很有希望。你听说过斯维夫吗？

回复收藏 0 原文

吾性傲以野 2024-09-19 16:44:36

我也多次遇到了 R 的非用户友好输出的问题...我发现的唯一解决方案是编写自己的函数，我很高兴在这里与您分享：

以下函数返回 a 中的所有因子变量data.frame 因子变量“变量”每个级别的频率或百分比 (calc="perc")。

最重要的事情可能是输出是一个简单的用户友好的数据框架。因此，以任何您想要的方式导出结果和工作都没有问题。

我意识到还有很大的进一步改进的潜力，即添加选择行与列百分比计算等的可能性。这是一个正在进行的状态，但完成了工作。

contitable <- function( survey_data, variable, calc="freq" ){    

  # Check which variables are not given as factor    
  # and exlude them from the given data.frame    
 survey_data_factor_test <- as.logical( sapply( Survey, FUN=is.factor) )    
  survey_data <- subset( survey_data, select=which( survey_data_factor_test ) )    

  # Inform the user about deleted variables    
  # is that proper use of printing to console during a function call??    
  # for now it worksjust fine...    
  flush.console()        
  writeLines( paste( "\n ", sum( !survey_data_factor_test, na.rm=TRUE),
            "non-factor variable(s) were excluded\n" ) )

  variable_levels <- levels(survey_data[ , variable ])    
  variable_levels_length <- length( variable_levels )    

  # Initializing the data.frame which will gather the results    
  result <- data.frame( "Variable", "Levels", t(rep( 1, each=variable_levels_length ) ) )    
  result_column_names <- paste( variable, variable_levels, sep="." )    
  names(result) <- c("Variable", "Levels", result_column_names )    

  for(column in 1:length( names(survey_data) ) ){       

      column_levels_length <- length( levels( survey_data[ , column ] ) )
      result_block <- as.data.frame( rep( names(survey_data)[column], each=column_levels_length ) )
      result_block <- cbind( result_block, as.data.frame( levels( survey_data[,column] ) ) )
      names(result_block) <- c( "Variable", "Levels" )

      results <- table( survey_data[ , column ], survey_data[ , variable ] )

      if( calc=="perc" ){ 
        results <- apply( results, MARGIN=2, FUN=function(x){ x/sum(x) }) 
        results <- round( results*100, 1 )
      }

      results <- unclass(results)
      results <- as.data.frame( results )
      names( results ) <- result_column_names
      rownames(results) <- NULL

      result_block <- cbind( result_block, results) 
      result <- rbind( result, result_block ) 
}    
result <- result[-1,]        
return( result )    
}

I also had trouble many times with un-user-friendly output of R... The only solution I found was writing my own function and I'm happy to share it with you here:

The following function returns for all factor variables in a data.frame the frequency or the percentage (calc="perc") for each level of the factor variable "variable".

The most important thing may be that the output is a simple user friendly data.frame. So it is no problem to export the results an work with it in any way you want.

I realize that there is much potential for further improvements, i.e. add a possibility for selecting row vs. column percentage calculation, etc. It's a work-in-progress status, but gets the job done.

contitable <- function( survey_data, variable, calc="freq" ){    

  # Check which variables are not given as factor    
  # and exlude them from the given data.frame    
 survey_data_factor_test <- as.logical( sapply( Survey, FUN=is.factor) )    
  survey_data <- subset( survey_data, select=which( survey_data_factor_test ) )    

  # Inform the user about deleted variables    
  # is that proper use of printing to console during a function call??    
  # for now it worksjust fine...    
  flush.console()        
  writeLines( paste( "\n ", sum( !survey_data_factor_test, na.rm=TRUE),
            "non-factor variable(s) were excluded\n" ) )

  variable_levels <- levels(survey_data[ , variable ])    
  variable_levels_length <- length( variable_levels )    

  # Initializing the data.frame which will gather the results    
  result <- data.frame( "Variable", "Levels", t(rep( 1, each=variable_levels_length ) ) )    
  result_column_names <- paste( variable, variable_levels, sep="." )    
  names(result) <- c("Variable", "Levels", result_column_names )    

  for(column in 1:length( names(survey_data) ) ){       

      column_levels_length <- length( levels( survey_data[ , column ] ) )
      result_block <- as.data.frame( rep( names(survey_data)[column], each=column_levels_length ) )
      result_block <- cbind( result_block, as.data.frame( levels( survey_data[,column] ) ) )
      names(result_block) <- c( "Variable", "Levels" )

      results <- table( survey_data[ , column ], survey_data[ , variable ] )

      if( calc=="perc" ){ 
        results <- apply( results, MARGIN=2, FUN=function(x){ x/sum(x) }) 
        results <- round( results*100, 1 )
      }

      results <- unclass(results)
      results <- as.data.frame( results )
      names( results ) <- result_column_names
      rownames(results) <- NULL

      result_block <- cbind( result_block, results) 
      result <- rbind( result, result_block ) 
}    
result <- result[-1,]        
return( result )    
}

回复收藏 0 原文