R 中 xtable 的变量概述

发布于 2024-11-05 07:13:38 字数 419 浏览 0 评论 0原文

我想知道是否可以通过命令 str(x) 创建一个 xtable 来获取您使用的变量的概述。如果要向某人介绍数据集,这将是一个很好的功能,但自己创建它会很烦人。所以我尝试制作一个像这样的 xtable

str(cars)
require(xtable)
xtable(str(cars))

汽车数据集是从 R 给出的。不幸的是 xtable 没有给出 str()< 的 Latexcode /代码>。这里有可能比 R 更聪明吗?以下是 xtable 能够理解的主要命令:

methods(xtable)

有什么想法吗?

I'm wondering if it's possible to create a xtable from the command str(x) to get an overview from the variables you use. This would be a nice feature to introduce someone to the dataset, but it's annoying to create it by yourself. So whta I tried is to make a xtable like this:

str(cars)
require(xtable)
xtable(str(cars))

the cars dataset is given from R. Unfortunately xtable doesn't give a Latexcode for str(). Is it possible outsmart R here? Here are the main commands that xtable will understand:

methods(xtable)

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

╄→承喏 2024-11-12 07:13:38

另一个值得关注的软件包是reporttools。下面是一段简短的代码,用于说明其在 reshape 包中的 tips 数据集上的用法。这两个摘要语句都会生成乳胶代码,可以将其复制粘贴到文档中,或用于编织。

library(reporttools)
data(tips, package = 'reshape')

# summarize numeric variables
tableContinuous(tips[,sapply(tips, is.numeric)])

# summarize non-numeric variables
tableNominal(tips[,!sapply(tips, is.numeric)])

编辑。如果您确实必须使用 str,那么这是一种解决方法

str_cars = capture.output(str(cars))
xtable(data.frame(str_cars))

REPORTTOOLS 的输出:enter此处图像描述此处输入图像描述

Another package to look at is reporttools. Here is an short piece of code to illustrate its usage on the tips dataset from reshape package. Both the summary statements produce latex code which can be copy pasted into a document, or used for weaving.

library(reporttools)
data(tips, package = 'reshape')

# summarize numeric variables
tableContinuous(tips[,sapply(tips, is.numeric)])

# summarize non-numeric variables
tableNominal(tips[,!sapply(tips, is.numeric)])

EDIT. If you really MUST use str, then here is one way to go about it

str_cars = capture.output(str(cars))
xtable(data.frame(str_cars))

OUTPUT FROM REPORTTOOLS:enter image description hereenter image description here

り繁华旳梦境 2024-11-12 07:13:38

如果您愿意花一些时间研究 Hmisc软件包有效,您很快就会发现有许多实用程序可以促进此类任务。特别是,contents()方法通过报告来方便描述data.frame

名称、标签(如果有)、单位(如果
任何),因子水平的数量(如果
任何)、因素水平、类别、存储
模式和 NA 数量

标签和单元的数量可以绑定(在内部,作为属性)到每个变量。
有关联的 printhtmllatex 方法用于查看和导出。

另一个不错的功能是describe()函数,如下所示:

> describe(cars)
cars 

 2  Variables      50  Observations
--------------------------------------------------------------------------------
speed 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
     50       0      19    15.4     7.0     8.9    12.0    15.0    19.0    23.1 
    .95 
   24.0 

          4 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25
Frequency 2 2 1 1  3  2  4  4  4  3  2  3  4  3  5  1  1  4  1
%         4 4 2 2  6  4  8  8  8  6  4  6  8  6 10  2  2  8  2
--------------------------------------------------------------------------------
dist 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
     50       0      35   42.98   10.00   15.80   26.00   36.00   56.00   80.40 
    .95 
  88.85 

lowest :   2   4  10  14  16, highest:  84  85  92  93 120 
--------------------------------------------------------------------------------

If you're willing to spend some time investigating how the Hmisc package works, you will soon discover that there are many utilities that facilitate such tasks. In particular, the contents() method facilitates the description of data.frame by reporting

names, labels (if any), units (if
any), number of factor levels (if
any), factor levels, class, storage
mode, and number of NAs

Labels and units can be binded (internally, as attributes) to each variable.
There are associated print, html and latex methods for viewing and exporting.

Another nice functionality is the describe() function, as seen below:

> describe(cars)
cars 

 2  Variables      50  Observations
--------------------------------------------------------------------------------
speed 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
     50       0      19    15.4     7.0     8.9    12.0    15.0    19.0    23.1 
    .95 
   24.0 

          4 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25
Frequency 2 2 1 1  3  2  4  4  4  3  2  3  4  3  5  1  1  4  1
%         4 4 2 2  6  4  8  8  8  6  4  6  8  6 10  2  2  8  2
--------------------------------------------------------------------------------
dist 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
     50       0      35   42.98   10.00   15.80   26.00   36.00   56.00   80.40 
    .95 
  88.85 

lowest :   2   4  10  14  16, highest:  84  85  92  93 120 
--------------------------------------------------------------------------------
手心的温暖 2024-11-12 07:13:38

由于 xtable 在与 data.framesmatrix 对象一起使用时提供最佳结果,因此我推荐如下内容:

library(xtable)
library(plyr)
dtf <- sapply(mtcars, each(min, max, mean, sd, var, median, IQR))
xtable(dtf)
% latex table generated in R 2.12.2 by xtable 1.5-6 package                                                                  
% Thu May  5 19:40:08 2011                                                                                                   
\begin{table}[ht]                                                                                                            
\begin{center}                                                                                                               
\begin{tabular}{rrrrrrrrrrrr}                                                                                                
  \hline                                                                                                                     
 & mpg & cyl & disp & hp & drat & wt & qsec & vs & am & gear & carb \\                                                       
  \hline                                                                                                                     
min & 10.40 & 4.00 & 71.10 & 52.00 & 2.76 & 1.51 & 14.50 & 0.00 & 0.00 & 3.00 & 1.00 \\                                      
  max & 33.90 & 8.00 & 472.00 & 335.00 & 4.93 & 5.42 & 22.90 & 1.00 & 1.00 & 5.00 & 8.00 \\                                  
  mean & 20.09 & 6.19 & 230.72 & 146.69 & 3.60 & 3.22 & 17.85 & 0.44 & 0.41 & 3.69 & 2.81 \\                                 
  sd & 6.03 & 1.79 & 123.94 & 68.56 & 0.53 & 0.98 & 1.79 & 0.50 & 0.50 & 0.74 & 1.62 \\                                      
  var & 36.32 & 3.19 & 15360.80 & 4700.87 & 0.29 & 0.96 & 3.19 & 0.25 & 0.25 & 0.54 & 2.61 \\                                
  median & 19.20 & 6.00 & 196.30 & 123.00 & 3.70 & 3.33 & 17.71 & 0.00 & 0.00 & 4.00 & 2.00 \\                               
  IQR & 7.38 & 4.00 & 205.18 & 83.50 & 0.84 & 1.03 & 2.01 & 1.00 & 1.00 & 1.00 & 2.00 \\                                     
   \hline                                                                                                                    
\end{tabular}                                                                                                                
\end{center}                                                                                                                 
\end{table} 

抱歉输出过长。您可以在此处获取 PDF。 each 是一个非常通用的函数,因为您可以非常轻松地定义自定义摘要。此外,str 将输出返回到 stdout,因此您无法检索特定变量的摘要。在这种情况下,sapply 将简化结果,生成 matrix 而不是 data.frame。但这并不是什么问题,对吧?

Since xtable provides best result when used with data.frames and matrix objects, I'd recommend something like this:

library(xtable)
library(plyr)
dtf <- sapply(mtcars, each(min, max, mean, sd, var, median, IQR))
xtable(dtf)
% latex table generated in R 2.12.2 by xtable 1.5-6 package                                                                  
% Thu May  5 19:40:08 2011                                                                                                   
\begin{table}[ht]                                                                                                            
\begin{center}                                                                                                               
\begin{tabular}{rrrrrrrrrrrr}                                                                                                
  \hline                                                                                                                     
 & mpg & cyl & disp & hp & drat & wt & qsec & vs & am & gear & carb \\                                                       
  \hline                                                                                                                     
min & 10.40 & 4.00 & 71.10 & 52.00 & 2.76 & 1.51 & 14.50 & 0.00 & 0.00 & 3.00 & 1.00 \\                                      
  max & 33.90 & 8.00 & 472.00 & 335.00 & 4.93 & 5.42 & 22.90 & 1.00 & 1.00 & 5.00 & 8.00 \\                                  
  mean & 20.09 & 6.19 & 230.72 & 146.69 & 3.60 & 3.22 & 17.85 & 0.44 & 0.41 & 3.69 & 2.81 \\                                 
  sd & 6.03 & 1.79 & 123.94 & 68.56 & 0.53 & 0.98 & 1.79 & 0.50 & 0.50 & 0.74 & 1.62 \\                                      
  var & 36.32 & 3.19 & 15360.80 & 4700.87 & 0.29 & 0.96 & 3.19 & 0.25 & 0.25 & 0.54 & 2.61 \\                                
  median & 19.20 & 6.00 & 196.30 & 123.00 & 3.70 & 3.33 & 17.71 & 0.00 & 0.00 & 4.00 & 2.00 \\                               
  IQR & 7.38 & 4.00 & 205.18 & 83.50 & 0.84 & 1.03 & 2.01 & 1.00 & 1.00 & 1.00 & 2.00 \\                                     
   \hline                                                                                                                    
\end{tabular}                                                                                                                
\end{center}                                                                                                                 
\end{table} 

Sorry for lengthy output. You can grab PDF here. each is a very versatile function, since you can define custom summary quite easy. Besides, str returns output to stdout, so you can't retrieve summary for specific variables. In this case, sapply will simplify the result, yielding matrix instead data.frame. But that's not so problematic, right?

娜些时光,永不杰束 2024-11-12 07:13:38

你也可以看一下,

library(magrittr)
    library(qwraps2)

    mtcars2 <-
      dplyr::mutate(mtcars,
                    cyl_factor = factor(cyl,
                                        levels = c(6, 4, 8),
                                        labels = paste(c(6, 4, 8), "cylinders")),
                    cyl_character = paste(cyl, "cylinders"))

    our_summary1 <-
      list("Miles Per Gallon" =
             list("min" = ~ min(.data$mpg),
                  "max" = ~ max(.data$mpg),
                  "mean (sd)" = ~ qwraps2::mean_sd(.data$mpg)),
           "Displacement" =
             list("min" = ~ min(.data$disp),
                  "median" = ~ median(.data$disp),
                  "max" = ~ max(.data$disp),
                  "mean (sd)" = ~ qwraps2::mean_sd(.data$disp)),
           "Weight (1000 lbs)" =
             list("min" = ~ min(.data$wt),
                  "max" = ~ max(.data$wt),
                  "mean (sd)" = ~ qwraps2::mean_sd(.data$wt)),
           "Forward Gears" =
             list("Three" = ~ qwraps2::n_perc0(.data$gear == 3),
                  "Four"  = ~ qwraps2::n_perc0(.data$gear == 4),
                  "Five"  = ~ qwraps2::n_perc0(.data$gear == 5))
      )

    by_cyl <- summary_table(dplyr::group_by(mtcars2, cyl_factor), our_summary1)
    xtable(by_cyl)

You may give a look also,

library(magrittr)
    library(qwraps2)

    mtcars2 <-
      dplyr::mutate(mtcars,
                    cyl_factor = factor(cyl,
                                        levels = c(6, 4, 8),
                                        labels = paste(c(6, 4, 8), "cylinders")),
                    cyl_character = paste(cyl, "cylinders"))

    our_summary1 <-
      list("Miles Per Gallon" =
             list("min" = ~ min(.data$mpg),
                  "max" = ~ max(.data$mpg),
                  "mean (sd)" = ~ qwraps2::mean_sd(.data$mpg)),
           "Displacement" =
             list("min" = ~ min(.data$disp),
                  "median" = ~ median(.data$disp),
                  "max" = ~ max(.data$disp),
                  "mean (sd)" = ~ qwraps2::mean_sd(.data$disp)),
           "Weight (1000 lbs)" =
             list("min" = ~ min(.data$wt),
                  "max" = ~ max(.data$wt),
                  "mean (sd)" = ~ qwraps2::mean_sd(.data$wt)),
           "Forward Gears" =
             list("Three" = ~ qwraps2::n_perc0(.data$gear == 3),
                  "Four"  = ~ qwraps2::n_perc0(.data$gear == 4),
                  "Five"  = ~ qwraps2::n_perc0(.data$gear == 5))
      )

    by_cyl <- summary_table(dplyr::group_by(mtcars2, cyl_factor), our_summary1)
    xtable(by_cyl)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文