R:在3列中提取值和插入
我有一个大型数据集,例如以下内容,并且我正在尝试根据列国家 /地区将值添加到3列。
Country<-c("Asia","Africa - Benin (Cotonou)",
"Europe - France (Paris)","Asia - China(Shanghai)", "Europe - United Kingdom (London)", "Europe - France (Orléans)"
, "Afrique - Togo (Lomé)", "Afrique - Sénégal (Dakar)", "Asia - Pakistan (Rahim Yar Khan)")
ID<-c(1,2,3,4,5,6,7,8,9)
mydata<-data.frame(ID,Country)
> mydata
> ID Country col1 col2 col3
> 1 1 Asia
> 2 2 Africa - Benin (Cotonou)
> 3 3 Europe - France (Paris)
> 4 4 Asia - China(Shanghai)
> 5 5 Europe - United Kingdom (London)
> 6 6 Europe - France (Orléans)
> 7 7 Afrique - Togo (Lomé)
> 8 8 Afrique - Sénégal (Dakar)
> 9 9 Asia - Pakistan (Rahim Yar Khan)
我尝试了以下操作,但是我对正则表达式有问题,
library(tidyr)
mydata <- mydata %>% separate(col = "Country", into = c("Col1", "Col2", "Col3"), remove = FALSE, fill = "right")
我得到的结果如下:
ID Country Col1 Col2 Col3
1 Asia Asia <NA> <NA>
2 Africa - Benin (Cotonou) Africa Benin Cotonou
3 Europe - France (Paris) Europe France Paris
4 Asia - China(Shanghai) Asia China Shanghai
5 Europe - United Kingdom (London) Europe United Kingdom
6 Europe - France (Orléans) Europe France Orl
7 Afrique - Togo (Lomé) Afrique Togo L
8 Afrique - Sénégal (Dakar) Afrique S n
9 Asia - Pakistan (Rahim Yar Khan) Asia Pakistan Rahim
第3列,第5,6,7,8和9列中缺少某些零件。
想要的结果是:
ID Country Col1 Col2 Col3
1 Asia Asia <NA> <NA>
2 Africa - Benin (Cotonou) Africa Benin Cotonou
3 Europe - France (Paris) Europe France Paris
4 Asia - China(Shanghai) Asia China Shanghai
5 Europe - United Kingdom (London) Europe United Kingdom London
6 Europe - France (Orléans) Europe France Orléans
7 Afrique - Togo (Lomé) Afrique Togo Lomé
8 Afrique - Sénégal (Dakar) Afrique Sénégal Dakar
9 Asia - Pakistan (Rahim Yar Khan) Asia Pakistan Rahim Yar Khan
我 关于如何这样做的建议?
I have a large dataset like the following and I'm trying to add value to 3 columns based on column Country.
Country<-c("Asia","Africa - Benin (Cotonou)",
"Europe - France (Paris)","Asia - China(Shanghai)", "Europe - United Kingdom (London)", "Europe - France (Orléans)"
, "Afrique - Togo (Lomé)", "Afrique - Sénégal (Dakar)", "Asia - Pakistan (Rahim Yar Khan)")
ID<-c(1,2,3,4,5,6,7,8,9)
mydata<-data.frame(ID,Country)
> mydata
> ID Country col1 col2 col3
> 1 1 Asia
> 2 2 Africa - Benin (Cotonou)
> 3 3 Europe - France (Paris)
> 4 4 Asia - China(Shanghai)
> 5 5 Europe - United Kingdom (London)
> 6 6 Europe - France (Orléans)
> 7 7 Afrique - Togo (Lomé)
> 8 8 Afrique - Sénégal (Dakar)
> 9 9 Asia - Pakistan (Rahim Yar Khan)
I tried the following but im having issue with the regular expression
library(tidyr)
mydata <- mydata %>% separate(col = "Country", into = c("Col1", "Col2", "Col3"), remove = FALSE, fill = "right")
The result that I get is the following:
ID Country Col1 Col2 Col3
1 Asia Asia <NA> <NA>
2 Africa - Benin (Cotonou) Africa Benin Cotonou
3 Europe - France (Paris) Europe France Paris
4 Asia - China(Shanghai) Asia China Shanghai
5 Europe - United Kingdom (London) Europe United Kingdom
6 Europe - France (Orléans) Europe France Orl
7 Afrique - Togo (Lomé) Afrique Togo L
8 Afrique - Sénégal (Dakar) Afrique S n
9 Asia - Pakistan (Rahim Yar Khan) Asia Pakistan Rahim
Some part are missing in column 3, row 5,6,7,8 and 9.
the result that i want is the following:
ID Country Col1 Col2 Col3
1 Asia Asia <NA> <NA>
2 Africa - Benin (Cotonou) Africa Benin Cotonou
3 Europe - France (Paris) Europe France Paris
4 Asia - China(Shanghai) Asia China Shanghai
5 Europe - United Kingdom (London) Europe United Kingdom London
6 Europe - France (Orléans) Europe France Orléans
7 Afrique - Togo (Lomé) Afrique Togo Lomé
8 Afrique - Sénégal (Dakar) Afrique Sénégal Dakar
9 Asia - Pakistan (Rahim Yar Khan) Asia Pakistan Rahim Yar Khan
Any suggestion on how to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这是我的第一个贡献,所以如果我错了,请原谅我。
我这样做了,可能不是最简单的方法,但我想它有效:
this is my first contribution so please forgive me if I am wrong.
I did it this way, may not be the easiest way but I guess it worked:
更新:要删除额外的空格,我们可以在代码末尾添加此行:
突变(跨越(start_with(“ col”),str_squish))
我们可以替换第一个分隔符
-
(
然后)我们有一个隔离器。
后词做
单独
,最后删除剩余的)
Update: to remove the extra spaces we could add this line at the end of the code:
mutate(across(starts_with("col"), str_squish))
We could replace the first separator
-
by(
thenwe get one separtor .
Afterwords do
separate
and finally remove the remaining)
Tidyr ::独立
将基于定界符(默认情况下是任何非alpha-numeric)将文本分开为列,因此默认情况下它在空格上分开。您可以使用额外的
参数将所有剩余的文本合并到第三列中:但是,最后我们会得到不必要的)。您可以通过突变删除它,也可以代替
单独
使用tidyr :: dricter
允许基于REGEX提取:tidyr::separate
will separate text into columns based on delimiter (which is by default any non alpha-numeric) so it separates on spaces by default. You can use theextra
argument to merge all the remaining text into the 3rd column like so:However, with this we get an unnecessary ) at the end. You can either remove this via a mutate or instead of
separate
usetidyr::extract
that allows extracting based on regex:data.table
解决方案:A
data.table
solution: