R-转换选项卡的字符串到长格式

发布于 2025-01-24 18:11:23 字数 367 浏览 3 评论 0原文

如何将选项卡或空间划界的字符串重塑为长格式?字符串(在此处称为标签)可以具有不同的长度。

我有这个

    var       label
1  work     100 101
2 sleep 500 409 200

,我想要这个

    var code
1  work  100
2  work  101
3 sleep  500
4 sleep  409
5 sleep  200


# data 
df = data.frame(var = c("work", 'sleep'), label = c('100 101', '500 409 200'))

How to you reshape a string delimited by a tab or space into a long format? The string (called label here) can be of different lengths.

I have this

    var       label
1  work     100 101
2 sleep 500 409 200

and I want this

    var code
1  work  100
2  work  101
3 sleep  500
4 sleep  409
5 sleep  200


# data 
df = data.frame(var = c("work", 'sleep'), label = c('100 101', '500 409 200'))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

衣神在巴黎 2025-01-31 18:11:23
library(tidyr)
df %>% 
    separate_rows(label)
# A tibble: 5 x 2
  var   label
  <chr> <chr>
1 work  100  
2 work  101  
3 sleep 500  
4 sleep 409  
5 sleep 200 
library(tidyr)
df %>% 
    separate_rows(label)
# A tibble: 5 x 2
  var   label
  <chr> <chr>
1 work  100  
2 work  101  
3 sleep 500  
4 sleep 409  
5 sleep 200 
赠佳期 2025-01-31 18:11:23

MAP

Map(cbind, df$var, strsplit(df$label, ' ')) |> do.call(what=rbind.data.frame)
#            V1  V2
# work.1   work 100
# work.2   work 101
# sleep.1 sleep 500
# sleep.2 sleep 409
# sleep.3 sleep 200

或中使用strsplit

by(df, rev(df$var), \(x) with(x, cbind(var, code=el(strsplit(label, split=' '))))) |>
  do.call(what=rbind.data.frame) 
#           var code
# sleep.1  work  100
# sleep.2  work  101
# work.1  sleep  500
# work.2  sleep  409
# work.3  sleep  200

Using strsplit in Map

Map(cbind, df$var, strsplit(df$label, ' ')) |> do.call(what=rbind.data.frame)
#            V1  V2
# work.1   work 100
# work.2   work 101
# sleep.1 sleep 500
# sleep.2 sleep 409
# sleep.3 sleep 200

or in by.

by(df, rev(df$var), \(x) with(x, cbind(var, code=el(strsplit(label, split=' '))))) |>
  do.call(what=rbind.data.frame) 
#           var code
# sleep.1  work  100
# sleep.2  work  101
# work.1  sleep  500
# work.2  sleep  409
# work.3  sleep  200
小伙你站住 2025-01-31 18:11:23

一个很好的答案已经发布。但是,假设您有一个奇怪的定界符,这样:

df = data.frame(var = c("work", 'sleep'), label = c('100-gh-101', '500-gh-409-gh-200'))

在这种情况下,您可以使用Regex和unnest()

df %>% 
  mutate(label2 = strsplit(label, "-gh-")) %>% 
  unnest(label2)

  var   label             label2
  <chr> <chr>             <chr> 
1 work  100--101          100   
2 work  100--101          101   
3 sleep 500-gh-409-gh-200 500   
4 sleep 500-gh-409-gh-200 409   
5 sleep 500-gh-409-gh-200 200 

A great answer was already posted. But let's say you had a strange delimiter, like this:

df = data.frame(var = c("work", 'sleep'), label = c('100-gh-101', '500-gh-409-gh-200'))

In that case, you could use regex and unnest():

df %>% 
  mutate(label2 = strsplit(label, "-gh-")) %>% 
  unnest(label2)

  var   label             label2
  <chr> <chr>             <chr> 
1 work  100--101          100   
2 work  100--101          101   
3 sleep 500-gh-409-gh-200 500   
4 sleep 500-gh-409-gh-200 409   
5 sleep 500-gh-409-gh-200 200 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文