有条件地将字符串分成列
在调查中,常见的做法是提出问题,然后告诉参与者“选择所有适用的选项”。例如,“您喜欢吃哪些食物(请选择所有适用的选项)?” a)寿司,b)意大利面,c)汉堡。
假设四名 (N=4) 参与者回答了这个问题,数据可能如下所示。
food.df <- data.frame(id = c(1,2,3,4), food.choice = c("1,2", "", "1,2,3", "3"))
我想做的是使用一种对个体数量和食物选择属性数量(即寿司、意大利面、汉堡……)灵活的方法有条件地将它们分成独特的列。最终数据看起来像这样。
food.final <- data.frame(id= c(1,2,3,4), sushi = c(1,0,1,0), pasta = c(1,0,1,0), hamburger = c(0,0,1,1))
更高级的版本将允许条件分组。您可以将其视为按食物组、位置等进行分组。假设我们按“选定的含有蛋白质的食物”进行分组,则可以对其进行编码以反映总的选择。这可能看起来像这样。
food.group <- data.frame(id = c(1,2,3,4), protein = c(1,0,2,1), non.protein = c(1,0,1,0))
我尝试过使用 tidyr::separate、strsplit 和其他列分割函数,但似乎无法获得所需的结果。感谢对此的帮助,并希望答案可以帮助其他从事调查工作的 R 用户。
It is common in surveys to ask a question and then tell participants to "select all that apply". For example, "Which foods do you enjoy eating (Please select all that apply)?" a) Sushi, b) Pasta, c) Hamburger.
Assuming four (N=4) participants answered this question, the data could look like this.
food.df <- data.frame(id = c(1,2,3,4), food.choice = c("1,2", "", "1,2,3", "3"))
What I am trying to do is conditionally separate these into unique columns using a method that is flexible on the number of individuals and the number of food choice attributes (i.e. Sushi, Pasta, Hamburger, ....). The final data would look something like this.
food.final <- data.frame(id= c(1,2,3,4), sushi = c(1,0,1,0), pasta = c(1,0,1,0), hamburger = c(0,0,1,1))
The more advanced version of this would allow for conditional groupings. You can think of this as grouping by food groups, location, etc. Assuming we were grouping by "selected foods that have protein" this could be coded to reflect total choices. This could look something like this.
food.group <- data.frame(id = c(1,2,3,4), protein = c(1,0,2,1), non.protein = c(1,0,1,0))
I have tried to use tidyr::separate, strsplit, and other column splitting functions but cannot seem to get the desired outcome. Appreciate the help on this and hopefully, the answer helps other users of R who do survey work.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我们可以使用
fastdummies
-Output
如果重命名是自动的,请创建一个命名向量并使用
str_replace
在第二种情况下,我们可以使用
str_count
We may use
fastDummies
-output
If the renaming should be automatic, create a named vector and use
str_replace
For the second case, we may use
str_count
您可以创建或可能具有一个矩阵,该矩阵可以分配
foody
之类的所需信息。然后,您可以在逗号上轻松
strsplit
匹配带有foody
的ID。表格
创建一个长度的二进制匹配向量nrow(foody)
,在sapply
中,我们得到了一个矩阵mt
。最后,我们需要的只是用我们希望作为级别的功能创建
factor
的table
。为了方便起见,我们将其包装到功能f
中。请注意,数字字符串应按升序排序,无论如何它们可能是。
数据:
You could create or probably have a matrix that allocates the needed information like this
foody
.Then you could easily
strsplit
on the commas andmatch
the IDs withfoody
.tabulate
creates a binary matching vector of lengthnrow(foody)
and in ansapply
we get a matrixmt
.Finally all we need is to create a
table
of afactor
of each row with the feature we desire as levels. For convenience we wrap it into a functionf
.Note that the number strings should be sorted in ascending order, which they probably are anyway.
Data: