在某些列中旋转宽至长时间

发布于 2025-02-08 07:10:10 字数 4053 浏览 0 评论 0原文

我正在与作者合作的数据集合作。这是带有文章ID的五篇文章的子集,通讯作者(RP),对应作者的国家(国家),作者的全名(AF_1:AF_3)和作者的缩写(AU_1:AU_3):

articles <- structure(list(
ArtID = 1:5,
RP = c("DE GARRIDO, L","CURSEU, PL","HENIKE, T","DI VINCENZO, F","OMIGIE, D"),
Country = c("spain", "romania", "germany", "italy", NA),
AF_1 = c("DE GARRIDO, LUIS","CURSEU, PETRU L.","STIELER, MAXIMILIAN","DI VINCENZO, FAUSTO","OMIGIE, DIANA"),
AF_2 = c(NA,"SCHRUIJER, SANDRA G. L.","HENIKE, TASSILO","IACOPINO, VALENTINA","RICCI, JESSICA"),
AF_3 = c(NA, "FODOR, OANA C.", NA, NA, NA),
AU_1 = c("DE GARRIDO L", "CURSEU PL", "STIELER M","DI VINCENZO F", "OMIGIE D"),
AU_2 = c(NA, "SCHRUIJER SGL", "HENIKE T","IACOPINO V", "RICCI J"),
AU_3 = c(NA, "FODOR OC", NA, NA, NA)),
row.names = c(NA,-5L),class = c("data.frame"))

> articles
  ArtID             RP Country                AF_1                    AF_2           AF_3          AU_1          AU_2     AU_3
1     1  DE GARRIDO, L   spain    DE GARRIDO, LUIS                    <NA>           <NA>  DE GARRIDO L          <NA>     <NA>
2     2     CURSEU, PL romania    CURSEU, PETRU L. SCHRUIJER, SANDRA G. L. FODOR, OANA C.     CURSEU PL SCHRUIJER SGL FODOR OC
3     3      HENIKE, T germany STIELER, MAXIMILIAN         HENIKE, TASSILO           <NA>     STIELER M      HENIKE T     <NA>
4     4 DI VINCENZO, F   italy DI VINCENZO, FAUSTO     IACOPINO, VALENTINA           <NA> DI VINCENZO F    IACOPINO V     <NA>
5     5      OMIGIE, D    <NA>       OMIGIE, DIANA          RICCI, JESSICA           <NA>      OMIGIE D       RICCI J     <NA>

当调整数据时(文章) )对于长(作者),它会自动为所有作者(即,国家)复制其他列值。 PIVOT_LONGER代码和数据:

authors <- articles %>% pivot_longer(cols=starts_with(c("AF","AU")),names_to=c(".value","ArtAthID"),names_sep="_",values_drop_na=T)

> authors
# A tibble: 10 × 6
   ArtID RP             Country ArtAthID AF                      AU           
   <int> <chr>          <chr>   <chr>    <chr>                   <chr>        
 1     1 DE GARRIDO, L  spain   1        DE GARRIDO, LUIS        DE GARRIDO L 
 2     2 CURSEU, PL     romania 1        CURSEU, PETRU L.        CURSEU PL    
 3     2 CURSEU, PL     romania 2        SCHRUIJER, SANDRA G. L. SCHRUIJER SGL
 4     2 CURSEU, PL     romania 3        FODOR, OANA C.          FODOR OC     
 5     3 HENIKE, T      germany 1        STIELER, MAXIMILIAN     STIELER M    
 6     3 HENIKE, T      germany 2        HENIKE, TASSILO         HENIKE T     
 7     4 DI VINCENZO, F italy   1        DI VINCENZO, FAUSTO     DI VINCENZO F
 8     4 DI VINCENZO, F italy   2        IACOPINO, VALENTINA     IACOPINO V   
 9     5 OMIGIE, D      NA      1        OMIGIE, DIANA           OMIGIE D     
10     5 OMIGIE, D      NA      2        RICCI, JESSICA          RICCI J  

但是,当枢纽时,我希望一些列(Country,Uni,电子邮件)与相应的作者保持联系,并成为其他作者的NA。例如,国家应该看起来像这样:

> authors
# A tibble: 10 × 6
   ArtID RP             Country ArtAthID AF                      AU           
   <int> <chr>          <chr>   <chr>    <chr>                   <chr>        
 1     1 DE GARRIDO, L  spain   1        DE GARRIDO, LUIS        DE GARRIDO L 
 2     2 CURSEU, PL     romania 1        CURSEU, PETRU L.        CURSEU PL    
 3     2 CURSEU, PL     NA      2        SCHRUIJER, SANDRA G. L. SCHRUIJER SGL
 4     2 CURSEU, PL     NA      3        FODOR, OANA C.          FODOR OC     
 5     3 HENIKE, T      NA      1        STIELER, MAXIMILIAN     STIELER M    
 6     3 HENIKE, T      germany 2        HENIKE, TASSILO         HENIKE T     
 7     4 DI VINCENZO, F italy   1        DI VINCENZO, FAUSTO     DI VINCENZO F
 8     4 DI VINCENZO, F NA      2        IACOPINO, VALENTINA     IACOPINO V   
 9     5 OMIGIE, D      NA      1        OMIGIE, DIANA           OMIGIE D     
10     5 OMIGIE, D      NA      2        RICCI, JESSICA          RICCI J  

我尝试了一些事情(尝试使用RP和AU之间的匹配),但正在撞墙。关于如何执行此操作的任何建议吗?

感谢您的帮助!

I'm working with a dataset of author collaborations. Here's a subset of five articles with article ID, corresponding author (RP), corresponding author's country (Country), author full names (AF_1:AF_3), and author short names (AU_1:AU_3):

articles <- structure(list(
ArtID = 1:5,
RP = c("DE GARRIDO, L","CURSEU, PL","HENIKE, T","DI VINCENZO, F","OMIGIE, D"),
Country = c("spain", "romania", "germany", "italy", NA),
AF_1 = c("DE GARRIDO, LUIS","CURSEU, PETRU L.","STIELER, MAXIMILIAN","DI VINCENZO, FAUSTO","OMIGIE, DIANA"),
AF_2 = c(NA,"SCHRUIJER, SANDRA G. L.","HENIKE, TASSILO","IACOPINO, VALENTINA","RICCI, JESSICA"),
AF_3 = c(NA, "FODOR, OANA C.", NA, NA, NA),
AU_1 = c("DE GARRIDO L", "CURSEU PL", "STIELER M","DI VINCENZO F", "OMIGIE D"),
AU_2 = c(NA, "SCHRUIJER SGL", "HENIKE T","IACOPINO V", "RICCI J"),
AU_3 = c(NA, "FODOR OC", NA, NA, NA)),
row.names = c(NA,-5L),class = c("data.frame"))

> articles
  ArtID             RP Country                AF_1                    AF_2           AF_3          AU_1          AU_2     AU_3
1     1  DE GARRIDO, L   spain    DE GARRIDO, LUIS                    <NA>           <NA>  DE GARRIDO L          <NA>     <NA>
2     2     CURSEU, PL romania    CURSEU, PETRU L. SCHRUIJER, SANDRA G. L. FODOR, OANA C.     CURSEU PL SCHRUIJER SGL FODOR OC
3     3      HENIKE, T germany STIELER, MAXIMILIAN         HENIKE, TASSILO           <NA>     STIELER M      HENIKE T     <NA>
4     4 DI VINCENZO, F   italy DI VINCENZO, FAUSTO     IACOPINO, VALENTINA           <NA> DI VINCENZO F    IACOPINO V     <NA>
5     5      OMIGIE, D    <NA>       OMIGIE, DIANA          RICCI, JESSICA           <NA>      OMIGIE D       RICCI J     <NA>

When pivoting the data from wide (articles) to long (authors), it automatically copies other column values for all authors (i.e., country). pivot_longer code and data:

authors <- articles %>% pivot_longer(cols=starts_with(c("AF","AU")),names_to=c(".value","ArtAthID"),names_sep="_",values_drop_na=T)

> authors
# A tibble: 10 × 6
   ArtID RP             Country ArtAthID AF                      AU           
   <int> <chr>          <chr>   <chr>    <chr>                   <chr>        
 1     1 DE GARRIDO, L  spain   1        DE GARRIDO, LUIS        DE GARRIDO L 
 2     2 CURSEU, PL     romania 1        CURSEU, PETRU L.        CURSEU PL    
 3     2 CURSEU, PL     romania 2        SCHRUIJER, SANDRA G. L. SCHRUIJER SGL
 4     2 CURSEU, PL     romania 3        FODOR, OANA C.          FODOR OC     
 5     3 HENIKE, T      germany 1        STIELER, MAXIMILIAN     STIELER M    
 6     3 HENIKE, T      germany 2        HENIKE, TASSILO         HENIKE T     
 7     4 DI VINCENZO, F italy   1        DI VINCENZO, FAUSTO     DI VINCENZO F
 8     4 DI VINCENZO, F italy   2        IACOPINO, VALENTINA     IACOPINO V   
 9     5 OMIGIE, D      NA      1        OMIGIE, DIANA           OMIGIE D     
10     5 OMIGIE, D      NA      2        RICCI, JESSICA          RICCI J  

However, when pivoting, I want some columns (Country, Uni, Email) to stay with the corresponding author and be NA for the other authors. For example, Country should look like this:

> authors
# A tibble: 10 × 6
   ArtID RP             Country ArtAthID AF                      AU           
   <int> <chr>          <chr>   <chr>    <chr>                   <chr>        
 1     1 DE GARRIDO, L  spain   1        DE GARRIDO, LUIS        DE GARRIDO L 
 2     2 CURSEU, PL     romania 1        CURSEU, PETRU L.        CURSEU PL    
 3     2 CURSEU, PL     NA      2        SCHRUIJER, SANDRA G. L. SCHRUIJER SGL
 4     2 CURSEU, PL     NA      3        FODOR, OANA C.          FODOR OC     
 5     3 HENIKE, T      NA      1        STIELER, MAXIMILIAN     STIELER M    
 6     3 HENIKE, T      germany 2        HENIKE, TASSILO         HENIKE T     
 7     4 DI VINCENZO, F italy   1        DI VINCENZO, FAUSTO     DI VINCENZO F
 8     4 DI VINCENZO, F NA      2        IACOPINO, VALENTINA     IACOPINO V   
 9     5 OMIGIE, D      NA      1        OMIGIE, DIANA           OMIGIE D     
10     5 OMIGIE, D      NA      2        RICCI, JESSICA          RICCI J  

I've tried a few things (trying to use matching between RP and AU), but am hitting a wall. Any suggestions for how to do this?

Thanks for any help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

夏花。依旧 2025-02-15 07:10:10

我认为该枢轴正在按设计工作,因为它使数据与所有作者保持关联。您要做的是在特定的artid中删除code> country(也许还有其他)。

尝试以下操作:

authors %>%
  group_by(ArtID) %>%
  mutate(across(c(Country), ~ replace(., duplicated(.), .[NA][1]))) %>%
  ungroup()
# # A tibble: 10 x 6
#    ArtID RP             Country ArtAthID AF                      AU           
#    <int> <chr>          <chr>   <chr>    <chr>                   <chr>        
#  1     1 DE GARRIDO, L  spain   1        DE GARRIDO, LUIS        DE GARRIDO L 
#  2     2 CURSEU, PL     romania 1        CURSEU, PETRU L.        CURSEU PL    
#  3     2 CURSEU, PL     NA      2        SCHRUIJER, SANDRA G. L. SCHRUIJER SGL
#  4     2 CURSEU, PL     NA      3        FODOR, OANA C.          FODOR OC     
#  5     3 HENIKE, T      germany 1        STIELER, MAXIMILIAN     STIELER M    
#  6     3 HENIKE, T      NA      2        HENIKE, TASSILO         HENIKE T     
#  7     4 DI VINCENZO, F italy   1        DI VINCENZO, FAUSTO     DI VINCENZO F
#  8     4 DI VINCENZO, F NA      2        IACOPINO, VALENTINA     IACOPINO V   
#  9     5 OMIGIE, D      NA      1        OMIGIE, DIANA           OMIGIE D     
# 10     5 OMIGIE, D      NA      2        RICCI, JESSICA          RICCI J      

注意:

  • 您提到的是uni电子邮件的执行操作,但是这些列不在原始数据中(很好)。因此,我选择使用突变(跨(c(country),..))而不是更常规的突变(country =替换(..)) :在该c(。)向量中包含其他列,也许只是utate(contrate(c(country,uni,email),〜...))
  • 替换应该足够清楚,但是。[na] [1]是确保(na的替换)是na的相同作为原始列。至少有六个不同类别的na和一些R工具 - 尤其是dplyrtidyr - 在尝试组合时投诉an integer NA (NA_integer_) with a logical (NA) or string (NA_character_) or real/floating-point (NA_real_< /code>),仅举几例。使用。[Na]将始终给出正确的类。添加[1]是为了解决替换(。)的事实,要求第三个参数与要替换的值数量相同,不一定是与输入参数x相同的长度;当回收利用时,相同的长度会放松,因此我将其截断为始终长1。

您所需的artid = 3“德国”之前显示了na ,这似乎只是在样本数据。如果您希望它与RP和其他字段之间的匹配(grepl),那么也许这更好(尽管有一些示例的文章清除了所有国家/地区):

authors %>%
  mutate(tmp = mapply(function(x, ...) any(grepl(x, unlist(list(...)))), RP, AF, AU), across(c(Country), ~ if_else(tmp, ., .[NA]))) %>%
  select(-tmp)
# # A tibble: 10 x 6
#    ArtID RP             Country ArtAthID AF                      AU           
#    <int> <chr>          <chr>   <chr>    <chr>                   <chr>        
#  1     1 DE GARRIDO, L  spain   1        DE GARRIDO, LUIS        DE GARRIDO L 
#  2     2 CURSEU, PL     NA      1        CURSEU, PETRU L.        CURSEU PL    
#  3     2 CURSEU, PL     NA      2        SCHRUIJER, SANDRA G. L. SCHRUIJER SGL
#  4     2 CURSEU, PL     NA      3        FODOR, OANA C.          FODOR OC     
#  5     3 HENIKE, T      NA      1        STIELER, MAXIMILIAN     STIELER M    
#  6     3 HENIKE, T      germany 2        HENIKE, TASSILO         HENIKE T     
#  7     4 DI VINCENZO, F italy   1        DI VINCENZO, FAUSTO     DI VINCENZO F
#  8     4 DI VINCENZO, F NA      2        IACOPINO, VALENTINA     IACOPINO V   
#  9     5 OMIGIE, D      NA      1        OMIGIE, DIANA           OMIGIE D     
# 10     5 OMIGIE, D      NA      2        RICCI, JESSICA          RICCI J      

此第二种方法由于作者名称格式的不匹配而存在缺陷(例如,“ henike,t”!=“ henike t”)。如果您可以提出一种更好的方法将rp与其他列匹配,那么这可能更喜欢您。

I think the pivot is working as designed, as it is keeping the data associated with all authors. What you're asking to do is de-duplicate Country (and perhaps others) within a particular ArtID.

Try this:

authors %>%
  group_by(ArtID) %>%
  mutate(across(c(Country), ~ replace(., duplicated(.), .[NA][1]))) %>%
  ungroup()
# # A tibble: 10 x 6
#    ArtID RP             Country ArtAthID AF                      AU           
#    <int> <chr>          <chr>   <chr>    <chr>                   <chr>        
#  1     1 DE GARRIDO, L  spain   1        DE GARRIDO, LUIS        DE GARRIDO L 
#  2     2 CURSEU, PL     romania 1        CURSEU, PETRU L.        CURSEU PL    
#  3     2 CURSEU, PL     NA      2        SCHRUIJER, SANDRA G. L. SCHRUIJER SGL
#  4     2 CURSEU, PL     NA      3        FODOR, OANA C.          FODOR OC     
#  5     3 HENIKE, T      germany 1        STIELER, MAXIMILIAN     STIELER M    
#  6     3 HENIKE, T      NA      2        HENIKE, TASSILO         HENIKE T     
#  7     4 DI VINCENZO, F italy   1        DI VINCENZO, FAUSTO     DI VINCENZO F
#  8     4 DI VINCENZO, F NA      2        IACOPINO, VALENTINA     IACOPINO V   
#  9     5 OMIGIE, D      NA      1        OMIGIE, DIANA           OMIGIE D     
# 10     5 OMIGIE, D      NA      2        RICCI, JESSICA          RICCI J      

Notes:

  • You mentioned doing the same for Uni and Email, but those columns were not in the original data (which is fine). For this reason, I chose to use mutate(across(c(Country), ..)) instead of the more conventional mutate(Country = replace(..)): include your other columns in that c(.) vector, perhaps just mutate(across(c(Country, Uni, Email), ~ ...)).
  • The replace should be clear enough, but the .[NA][1] is to make sure that the replacement (of NA) is the same class of NA as the original column. There are at least six different classes of NA, and some R tools -- notably many within dplyr and tidyr -- complain when trying to combine an integer NA (NA_integer_) with a logical (NA) or string (NA_character_) or real/floating-point (NA_real_), to name a few. The use of .[NA] will always give the correct class. The addition of [1] is to get around the fact that replace(.) requires the third argument to be the same length as the number of values to replace, not necessarily the same length as the input argument x; this same-length is relaxed when recycling, so I truncate this to be always length-1.

Your desired output for ArtID=3 showed the NA before "germany", which seemed like just a typo in the sample data. If you want it matched (grepl) between RP and the other fields, then perhaps this works better (though some articles with this example clear out all countries):

authors %>%
  mutate(tmp = mapply(function(x, ...) any(grepl(x, unlist(list(...)))), RP, AF, AU), across(c(Country), ~ if_else(tmp, ., .[NA]))) %>%
  select(-tmp)
# # A tibble: 10 x 6
#    ArtID RP             Country ArtAthID AF                      AU           
#    <int> <chr>          <chr>   <chr>    <chr>                   <chr>        
#  1     1 DE GARRIDO, L  spain   1        DE GARRIDO, LUIS        DE GARRIDO L 
#  2     2 CURSEU, PL     NA      1        CURSEU, PETRU L.        CURSEU PL    
#  3     2 CURSEU, PL     NA      2        SCHRUIJER, SANDRA G. L. SCHRUIJER SGL
#  4     2 CURSEU, PL     NA      3        FODOR, OANA C.          FODOR OC     
#  5     3 HENIKE, T      NA      1        STIELER, MAXIMILIAN     STIELER M    
#  6     3 HENIKE, T      germany 2        HENIKE, TASSILO         HENIKE T     
#  7     4 DI VINCENZO, F italy   1        DI VINCENZO, FAUSTO     DI VINCENZO F
#  8     4 DI VINCENZO, F NA      2        IACOPINO, VALENTINA     IACOPINO V   
#  9     5 OMIGIE, D      NA      1        OMIGIE, DIANA           OMIGIE D     
# 10     5 OMIGIE, D      NA      2        RICCI, JESSICA          RICCI J      

This second method is flawed because of the mismatch in author name formats ("HENIKE, T" != "HENIKE T", for instance). If you can come up with a better way to match RP with the other columns, then this may be more to your liking.

甩你一脸翔 2025-02-15 07:10:10

我认为您无法在pivot_longer中做您想做的事,但是我认为这为您提供了一个“附加”变量所需的东西(这就是您在测试数据中给我们的全部。 ..)

articles %>% 
  group_by(Country) %>% 
  pivot_longer(
    cols=starts_with(c("AF","AU")),
    names_to=c(".value","ArtAthID"),
    names_sep="_",
    values_drop_na=T
  ) %>% 
  mutate(Country=ifelse(row_number() == 1, Country, NA)) %>% 
  ungroup()
# A tibble: 10 × 6
# Groups:   Country [5]
   ArtID RP             Country ArtAthID AF                      AU           
   <int> <chr>          <chr>   <chr>    <chr>                   <chr>        
 1     1 DE GARRIDO, L  spain   1        DE GARRIDO, LUIS        DE GARRIDO L 
 2     2 CURSEU, PL     romania 1        CURSEU, PETRU L.        CURSEU PL    
 3     2 CURSEU, PL     NA      2        SCHRUIJER, SANDRA G. L. SCHRUIJER SGL
 4     2 CURSEU, PL     NA      3        FODOR, OANA C.          FODOR OC     
 5     3 HENIKE, T      germany 1        STIELER, MAXIMILIAN     STIELER M    
 6     3 HENIKE, T      NA      2        HENIKE, TASSILO         HENIKE T     
 7     4 DI VINCENZO, F italy   1        DI VINCENZO, FAUSTO     DI VINCENZO F
 8     4 DI VINCENZO, F NA      2        IACOPINO, VALENTINA     IACOPINO V   
 9     5 OMIGIE, D      NA      1        OMIGIE, DIANA           OMIGIE D     
10     5 OMIGIE, D      NA      2        RICCI, JESSICA          RICCI J   

如果您需要多个附加变量,请将每个变量添加到group_by()调用中,然后在mutate()。类似(未经测试的代码):

...
%>% mutate(
      across(
        c(Country, Uni, Email), 
        function(x) ifelse(row_number() == 1, x, NA)
      )
    )

I don't think you can do what you want within pivot_longer, but I think this gives you what you need for one "additional" variable (Which is all you've given us in your test data...)

articles %>% 
  group_by(Country) %>% 
  pivot_longer(
    cols=starts_with(c("AF","AU")),
    names_to=c(".value","ArtAthID"),
    names_sep="_",
    values_drop_na=T
  ) %>% 
  mutate(Country=ifelse(row_number() == 1, Country, NA)) %>% 
  ungroup()
# A tibble: 10 × 6
# Groups:   Country [5]
   ArtID RP             Country ArtAthID AF                      AU           
   <int> <chr>          <chr>   <chr>    <chr>                   <chr>        
 1     1 DE GARRIDO, L  spain   1        DE GARRIDO, LUIS        DE GARRIDO L 
 2     2 CURSEU, PL     romania 1        CURSEU, PETRU L.        CURSEU PL    
 3     2 CURSEU, PL     NA      2        SCHRUIJER, SANDRA G. L. SCHRUIJER SGL
 4     2 CURSEU, PL     NA      3        FODOR, OANA C.          FODOR OC     
 5     3 HENIKE, T      germany 1        STIELER, MAXIMILIAN     STIELER M    
 6     3 HENIKE, T      NA      2        HENIKE, TASSILO         HENIKE T     
 7     4 DI VINCENZO, F italy   1        DI VINCENZO, FAUSTO     DI VINCENZO F
 8     4 DI VINCENZO, F NA      2        IACOPINO, VALENTINA     IACOPINO V   
 9     5 OMIGIE, D      NA      1        OMIGIE, DIANA           OMIGIE D     
10     5 OMIGIE, D      NA      2        RICCI, JESSICA          RICCI J   

if you need more than one additional variable, add each to the group_by() call and then use across() in the mutate(). Something like (untested code):

...
%>% mutate(
      across(
        c(Country, Uni, Email), 
        function(x) ifelse(row_number() == 1, x, NA)
      )
    )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文