循环遍历 R 中的 .csv 文件，计算相对频率？

发布于 2024-09-12 12:02:30 字数 153 浏览 4 评论 0原文

我是 R 新手，我正在尝试创建一个 .R 脚本，该脚本将打开我的 .csv 文件并计算一些频率。该文件中有标头，与它们关联的值为 1、0、NA 或 -4。我想要做的是遍历每个垂直行，然后计算它们的频率。我确信这是一个简单的脚本，但我还不确定 R 的语法是如何工作的。有人可以帮我开始做这个吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凤舞天涯 2024-09-19 12:02:30

确切的脚本会根据您的输入和您想要的输出类型而有所不同（只是打印到交互式控制台？写入 .csv？），但这是我的尝试：

#Read the data into .csv - it assumes headers
dat <- read.csv(file = "yourfile.csv")

#For right now, use this fake data
dat <- data.frame(x = c(-4, 0, 1, 1, -4, NA, NA, 0), y = c(1, 1, 1, 0, -4, NA, 0, NA))

#Get the frequency of values for each column, assuming every column consists of data
apply(X = dat, MARGIN = 2, FUN = function(x) {summary(factor(x))})

apply 函数将您提供的函数 (FUN) 应用到您提供的数据的边距（1 = 行，2 = 列）上。您可以赋予它任何您喜欢的功能。传递FUN=summary将为您提供每列的平均值、最小值、最大值等（因为它们是数字）。但是因子的summary()的默认方法是频率，这正是您所需要的。因此，不要传递摘要，而是欺骗 R 将您的数字视为一个因素：定义一个匿名函数 function(x) （apply 会知道 x 您指的是一次取一个的列）。设置此函数首先将 x 转换为一个因子 (factor(x))，然后对该因子进行汇总。这将返回一个矩阵，其中包含每列的频率。

这不是有史以来最优雅的代码，但我认为它会满足您的需求。

The exact script is going to vary based on your input and what kind of output you'd like (just printed to the interactive console? Written to .csv?), but here's my attempt:

#Read the data into .csv - it assumes headers
dat <- read.csv(file = "yourfile.csv")

#For right now, use this fake data
dat <- data.frame(x = c(-4, 0, 1, 1, -4, NA, NA, 0), y = c(1, 1, 1, 0, -4, NA, 0, NA))

#Get the frequency of values for each column, assuming every column consists of data
apply(X = dat, MARGIN = 2, FUN = function(x) {summary(factor(x))})

The apply function applies the function you give it (FUN) over the margin (1 = rows, 2 = columns) of the data that you give it. You can give it any function you like. Passing FUN = summary will give you the mean, min, max, etc. of each column (because they're numeric). But the default method of summary() for factors is frequencies, which is what you need. So instead of passing summary, trick R into seeing your numbers as a factor: define an anonymous function function(x) (apply will know that by x you're referring to the columns taken one at a time). Set this function to first convert x to a factor (factor(x)) and then summarize that factor. This will return a matrix with the frequencies for each column.

Not the most elegant code ever, but I think it'll get you what you need.

回复收藏 0 原文

~没有更多了~