循环遍历 R 中的 .csv 文件,计算相对频率?
我是 R 新手,我正在尝试创建一个 .R 脚本,该脚本将打开我的 .csv 文件并计算一些频率。该文件中有标头,与它们关联的值为 1、0、NA 或 -4。我想要做的是遍历每个垂直行,然后计算它们的频率。我确信这是一个简单的脚本,但我还不确定 R 的语法是如何工作的。有人可以帮我开始做这个吗?
I'm new to R and I'm trying to create a .R script that will open up a .csv file of mine and compute some frequencies. There are headers in this file and the values associated with them are either 1,0,NA, or -4. What I want to do is go through each vertical row and then compute the frequencies of them. I'm sure this is an easy script, but I'm not sure how the syntax of R works yet. Can anyone get me started on this please?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
确切的脚本会根据您的输入和您想要的输出类型而有所不同(只是打印到交互式控制台?写入 .csv?),但这是我的尝试:
apply
函数将您提供的函数 (FUN) 应用到您提供的数据的边距(1 = 行,2 = 列)上。您可以赋予它任何您喜欢的功能。传递FUN=summary
将为您提供每列的平均值、最小值、最大值等(因为它们是数字)。但是因子的summary()的默认方法是频率,这正是您所需要的。因此,不要传递摘要,而是欺骗 R 将您的数字视为一个因素:定义一个匿名函数function(x)
(apply 会知道 x 您指的是一次取一个的列)。设置此函数首先将 x 转换为一个因子 (factor(x)
),然后对该因子进行汇总。这将返回一个矩阵,其中包含每列的频率。这不是有史以来最优雅的代码,但我认为它会满足您的需求。
The exact script is going to vary based on your input and what kind of output you'd like (just printed to the interactive console? Written to .csv?), but here's my attempt:
The
apply
function applies the function you give it (FUN) over the margin (1 = rows, 2 = columns) of the data that you give it. You can give it any function you like. PassingFUN = summary
will give you the mean, min, max, etc. of each column (because they're numeric). But the default method of summary() for factors is frequencies, which is what you need. So instead of passing summary, trick R into seeing your numbers as a factor: define an anonymous functionfunction(x)
(apply will know that by x you're referring to the columns taken one at a time). Set this function to first convert x to a factor (factor(x)
) and then summarize that factor. This will return a matrix with the frequencies for each column.Not the most elegant code ever, but I think it'll get you what you need.