将dataframe转换为r中的整洁格式

发布于 2025-02-05 02:59:23 字数 2171 浏览 1 评论 0 原文

我有一个具有如下结构的数据框架

> ls.str(df)

attachments : 'data.frame': 1103947 obs. of  2 variables:
 $ media_keys:List of 1103947
 $ poll_ids  :List of 1103947
author_id :  chr [1:1103947] "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" ...
conversation_id :  chr [1:1103947] "1006266341341519872" "1006265425791987715" "1006251577747869696" "1006246236171722753" "1006246168991600642" ...
created_at :  chr [1:1103947] "2018-06-11T20:06:05.000Z" "2018-06-11T20:02:27.000Z" "2018-06-11T19:07:26.000Z" "2018-06-11T18:46:12.000Z" ...
entities : 'data.frame':    1103947 obs. of  5 variables:
 $ mentions   :List of 1103947
 $ annotations:List of 1103947
 $ hashtags   :List of 1103947
 $ urls       :List of 1103947
 $ cashtags   :List of 1103947
geo : 'data.frame': 1103947 obs. of  2 variables:
 $ place_id   : chr  NA NA NA NA ...
 $ coordinates:'data.frame':    1103947 obs. of  2 variables:
id :  chr [1:1103947] "1006266341341519872" "1006265425791987715" "1006251577747869696" "1006246236171722753" "1006246168991600642" ...
in_reply_to_user_id :  chr [1:1103947] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ...

,我想将其转换为整洁的格式。我不知道有什么巧妙的功能要做吗? Google并没有多大帮助。提前致谢!

通过整洁的格式,我的意思是这样:

#> # A tibble: 25 × 31
#>    tweet_id       user_username text  conversation_id author_id lang  created_at
#>    <chr>          <chr>         <chr> <chr>           <chr>     <chr> <chr>     
#>  1 1406007405180… Phardiga      "RT … 14060074051803… 58755490  de    2021-06-1…
#>  2 1405617386405… dorothee_goe… "RT … 14056173864058… 97759337… de    2021-06-1…
#>  3 1405616047990… dejools       "RT … 14056160479909… 13065071… de    2021-06-1…
#>  4 1405615055555… LenaOetzel    "RT … 14056150555557… 97897581… de    2021-06-1…
#>  5 1405613064968… jenniferhenk… "RT … 14056130649684… 114774406 de    2021-06-1…
#>  6 1405610724026… Tobias_Schul… "Ihr… 14056107240266… 47919307  de    2021-06-1…
#>  7 1405393033558… HTMIBerlin    "
              

I have a data frame with a structure as follows

> ls.str(df)

attachments : 'data.frame': 1103947 obs. of  2 variables:
 $ media_keys:List of 1103947
 $ poll_ids  :List of 1103947
author_id :  chr [1:1103947] "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" ...
conversation_id :  chr [1:1103947] "1006266341341519872" "1006265425791987715" "1006251577747869696" "1006246236171722753" "1006246168991600642" ...
created_at :  chr [1:1103947] "2018-06-11T20:06:05.000Z" "2018-06-11T20:02:27.000Z" "2018-06-11T19:07:26.000Z" "2018-06-11T18:46:12.000Z" ...
entities : 'data.frame':    1103947 obs. of  5 variables:
 $ mentions   :List of 1103947
 $ annotations:List of 1103947
 $ hashtags   :List of 1103947
 $ urls       :List of 1103947
 $ cashtags   :List of 1103947
geo : 'data.frame': 1103947 obs. of  2 variables:
 $ place_id   : chr  NA NA NA NA ...
 $ coordinates:'data.frame':    1103947 obs. of  2 variables:
id :  chr [1:1103947] "1006266341341519872" "1006265425791987715" "1006251577747869696" "1006246236171722753" "1006246168991600642" ...
in_reply_to_user_id :  chr [1:1103947] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ...

and I want to convert it to a tidy format. Is there a neat little function to do this, that I don't know of? Google hasn't been to much help. Thanks in advance!

By tidy format, I mean something like this:

#> # A tibble: 25 × 31
#>    tweet_id       user_username text  conversation_id author_id lang  created_at
#>    <chr>          <chr>         <chr> <chr>           <chr>     <chr> <chr>     
#>  1 1406007405180… Phardiga      "RT … 14060074051803… 58755490  de    2021-06-1…
#>  2 1405617386405… dorothee_goe… "RT … 14056173864058… 97759337… de    2021-06-1…
#>  3 1405616047990… dejools       "RT … 14056160479909… 13065071… de    2021-06-1…
#>  4 1405615055555… LenaOetzel    "RT … 14056150555557… 97897581… de    2021-06-1…
#>  5 1405613064968… jenniferhenk… "RT … 14056130649684… 114774406 de    2021-06-1…
#>  6 1405610724026… Tobias_Schul… "Ihr… 14056107240266… 47919307  de    2021-06-1…
#>  7 1405393033558… HTMIBerlin    "????‍????…  14053930335589… 94052353… und   2021-06-1…
#>  8 1404808751857… Tobias_Schul… ".@j… 14048087518576… 47919307  de    2021-06-1…
#>  9 1404440929881… ASattelmacher "Oka… 14044409298812… 11508518… de    2021-06-1…
#> 10 1404393457427… dr_john_aus_b "#Ic… 14043934574273… 30635588… und   2021-06-1…

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

寻找一个思念的角度 2025-02-12 02:59:23

使用 Tidyr ,我们可以用打开包装(将data.frame列拆卸为常规列),然后使用 unnest 转换列表列到常规列

library(dplyr)
library(tidyr)
df %>% 
  unpack(where(is.data.frame)) %>%
  unnest(where(is.list))

- 输出

# A tibble: 3 × 6
  media_keys poll_ids author_id conversation_id mentions annotations
       <int>    <int>     <int>           <int>    <int>       <int>
1          1        4         1               1        1           4
2          2        5         2               2        2           5
3          3        6         3               3        3           6

数据

df <- structure(list(attachments = structure(list(media_keys = structure(list(
    1L, 2L, 3L), class = "AsIs"), poll_ids = structure(list(4L, 
    5L, 6L), class = "AsIs")), class = "data.frame", row.names = c(NA, 
-3L)), author_id = 1:3, conversation_id = 1:3, entities = structure(list(
    mentions = structure(list(1L, 2L, 3L), class = "AsIs"), 
annotations = structure(list(
        4L, 5L, 6L), class = "AsIs")), 
class = "data.frame", row.names = c(NA, 
-3L))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L))

With tidyr, we could wrap with unpack (to unpack the data.frame columns into regular columns) and then with unnest to convert the list columns to regular columns

library(dplyr)
library(tidyr)
df %>% 
  unpack(where(is.data.frame)) %>%
  unnest(where(is.list))

-output

# A tibble: 3 × 6
  media_keys poll_ids author_id conversation_id mentions annotations
       <int>    <int>     <int>           <int>    <int>       <int>
1          1        4         1               1        1           4
2          2        5         2               2        2           5
3          3        6         3               3        3           6

data

df <- structure(list(attachments = structure(list(media_keys = structure(list(
    1L, 2L, 3L), class = "AsIs"), poll_ids = structure(list(4L, 
    5L, 6L), class = "AsIs")), class = "data.frame", row.names = c(NA, 
-3L)), author_id = 1:3, conversation_id = 1:3, entities = structure(list(
    mentions = structure(list(1L, 2L, 3L), class = "AsIs"), 
annotations = structure(list(
        4L, 5L, 6L), class = "AsIs")), 
class = "data.frame", row.names = c(NA, 
-3L))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文