根据共同值将两个文件合并在一起,但具有不同数量的变量
我正在尝试将两个文件合并在一起,但是我一直会遇到以下错误:
Error: memory exhausted (limit reached?)
Error during wrapup: memory exhausted (limit reached?)
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
我正在使用以下代码:
FinalTweets <- merge(tweets2U, tweets2, by="author_id")
值数量不同,
我alo意识到我的文件对Tweets2U
'data.frame': 325256 obs. of 2 variables:
$ created_at: chr "2015-02-18T02:56:55.000Z" "2016-05-23T02:14:36.000Z" "2013-04-22T02:52:16.000Z" "2015-03-06T02:40:55.000Z" ...
$ author_id : chr "3024607164" "734568179457007617" "1371107096" "3063885536" ...
和Tweets2的
'data.frame': 338037 obs. of 4 variables:
$ author_id : chr "3024607164" "734568179457007617" "1371107096" "3063885536" ...
$ created_at : chr "2021-01-01T02:24:18.000Z" "2021-01-01T02:22:48.000Z" "2021-01-01T02:22:14.000Z" "2021-01-01T02:21:01.000Z" ...
$ text : chr "Super Game Talk Video Alpha! The #1 Indie Video Game Review Show hosted by puppets! No"| __truncated__ "I was testing my game, and caught a fish that was a gold star (top 3 percentile in size) and I though \"Oh I be"| __truncated__ "Hey hey everyone just got home from work. Time to finish artwork before 12 am to show.super excited. I can fina"| __truncated__ "Congratulation to Pretumos who won our Dec 2020 $100 Zeegift! nMore members giveaways "| __truncated__ ...
$ public_metrics.retweet_count: int 3 3 6 6 16 1 10 5 2 3 ...
并且有关如何解决此问题的任何建议? 也许其他功能可以起作用? 我还了解 left_join 函数可能是有用的
编辑:我已经更新了代码,但是我仍然在同一问题中运行,
jointdataset <- merge(tweets2U, tweets2, by = 'author_id', all.x= TRUE)
View(jointdataset)
FinalTweets <- merge(tweets2U, tweets2, by=c("author_id","created_at"))
View(FinalTweets)
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
我将在几分钟内再次尝试,而计算机上没有其他程序运行 我有16千兆的RAM,这让我感到困惑,因为为什么这里不够的
数据是最小可重现示例的数据,
> dput(head(tweets2U))
structure(list(created_at = c("2015-02-18T02:56:55.000Z", "2016-05-23T02:14:36.000Z",
"2013-04-22T02:52:16.000Z", "2015-03-06T02:40:55.000Z", "2016-03-31T10:53:21.000Z",
"2016-10-04T03:38:25.000Z"), author_id = c("3024607164", "734568179457007617",
"1371107096", "3063885536", "715492170392846336", "783149246178553856"
)), row.names = c(NA, 6L), class = "data.frame")
> dput(head(tweets2))
structure(list(author_id = c("3024607164", "734568179457007617",
"1371107096", "3063885536", "1274386153035173891", "750763458719653888"
), created_at = c("2021-01-01T02:24:18.000Z", "2021-01-01T02:22:48.000Z",
"2021-01-01T02:22:14.000Z", "2021-01-01T02:21:01.000Z", "2021-01-01T02:20:03.000Z",
"2021-01-01T02:19:46.000Z"), text = c("Super Game Talk Video Alpha! ! The #1 Indie Video Game Review Show hosted by puppets! Now on Roku: Smiley Crew TV. #indiegame #gamedev #indiedev #indievideogames #marketing #indiegames #videogames #apple #ios #steam #roku ",
"I was testing my game, and caught a fish that was a gold star (top 3 percentile in size) and I though \"Oh I better save the game\" Then I realized it's not done, and saving means nothing for me - but I really hope to evoke this feeling in others <U+0001F605>\n\n#gamedev #indiegame #pixelart ",
"Hey hey everyone just got home from work. Time to finish artwork before 12 am to show.super excited. I can finally make more music with this midi keyboard for my game. #indiegame #rpg #artsoon #solodev #indiedev #madewithunity #indiegamedev ",
"Congratulation to Pretumos who won our Dec 2020 $100 Zeegift! \nMore members giveaways here: <U+0001F381>\nGuest Giveaways here: \n@BlazedRTs @GamerGalsRT @SGH_RTs #gaming #gamingcommunity #indiegame @GamingRTweeters ",
"The lighting on my new level is really starting to come together<U+0001F60D>\n\nWhat do you think?\n\n#scifi #indiegame #art #gaming #game #indiegamedev #unity3d #games #twitchtv #shader #madewithunity #3d #3dart #twitch #stream #cyberpunk #artistsontwitter #indie #3dart ",
"Super Game Talk Video Alpha! The #1 Indie Video Game Review Show hosted by puppets! #indiegame #gamedev #indiedev #indievideogames #marketing #indiegames #videogames #apple #ios #steam "
), public_metrics.retweet_count = c(3L, 3L, 6L, 6L, 16L, 1L)), row.names = c(NA,
6L), c
预期的输出将导致我拥有一个文件 create_at,rutor_id and the Text和public_metrics.retweet_count匹配此fure_id
I am trying to merge together two files, but i keep getting the following error:
Error: memory exhausted (limit reached?)
Error during wrapup: memory exhausted (limit reached?)
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
I am using the following code:
FinalTweets <- merge(tweets2U, tweets2, by="author_id")
I alo realised that my files have a different number of values
For tweets2U
'data.frame': 325256 obs. of 2 variables:
$ created_at: chr "2015-02-18T02:56:55.000Z" "2016-05-23T02:14:36.000Z" "2013-04-22T02:52:16.000Z" "2015-03-06T02:40:55.000Z" ...
$ author_id : chr "3024607164" "734568179457007617" "1371107096" "3063885536" ...
and for tweets2
'data.frame': 338037 obs. of 4 variables:
$ author_id : chr "3024607164" "734568179457007617" "1371107096" "3063885536" ...
$ created_at : chr "2021-01-01T02:24:18.000Z" "2021-01-01T02:22:48.000Z" "2021-01-01T02:22:14.000Z" "2021-01-01T02:21:01.000Z" ...
$ text : chr "Super Game Talk Video Alpha! The #1 Indie Video Game Review Show hosted by puppets! No"| __truncated__ "I was testing my game, and caught a fish that was a gold star (top 3 percentile in size) and I though \"Oh I be"| __truncated__ "Hey hey everyone just got home from work. Time to finish artwork before 12 am to show.super excited. I can fina"| __truncated__ "Congratulation to Pretumos who won our Dec 2020 $100 Zeegift! nMore members giveaways "| __truncated__ ...
$ public_metrics.retweet_count: int 3 3 6 6 16 1 10 5 2 3 ...
Any recommendation on how to fix this ?
Maybe a different function could work ?
I also understand that the left_join function could be useful
Edit: I have updated my code, but i still run in the same issue
jointdataset <- merge(tweets2U, tweets2, by = 'author_id', all.x= TRUE)
View(jointdataset)
FinalTweets <- merge(tweets2U, tweets2, by=c("author_id","created_at"))
View(FinalTweets)
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
I will try again in a few minutes, with no other program running on the computer
I have 16 Giga of Ram, which makes me confused as why there is not enough
Here is the data for the min reproducible example
> dput(head(tweets2U))
structure(list(created_at = c("2015-02-18T02:56:55.000Z", "2016-05-23T02:14:36.000Z",
"2013-04-22T02:52:16.000Z", "2015-03-06T02:40:55.000Z", "2016-03-31T10:53:21.000Z",
"2016-10-04T03:38:25.000Z"), author_id = c("3024607164", "734568179457007617",
"1371107096", "3063885536", "715492170392846336", "783149246178553856"
)), row.names = c(NA, 6L), class = "data.frame")
> dput(head(tweets2))
structure(list(author_id = c("3024607164", "734568179457007617",
"1371107096", "3063885536", "1274386153035173891", "750763458719653888"
), created_at = c("2021-01-01T02:24:18.000Z", "2021-01-01T02:22:48.000Z",
"2021-01-01T02:22:14.000Z", "2021-01-01T02:21:01.000Z", "2021-01-01T02:20:03.000Z",
"2021-01-01T02:19:46.000Z"), text = c("Super Game Talk Video Alpha! ! The #1 Indie Video Game Review Show hosted by puppets! Now on Roku: Smiley Crew TV. #indiegame #gamedev #indiedev #indievideogames #marketing #indiegames #videogames #apple #ios #steam #roku ",
"I was testing my game, and caught a fish that was a gold star (top 3 percentile in size) and I though \"Oh I better save the game\" Then I realized it's not done, and saving means nothing for me - but I really hope to evoke this feeling in others <U+0001F605>\n\n#gamedev #indiegame #pixelart ",
"Hey hey everyone just got home from work. Time to finish artwork before 12 am to show.super excited. I can finally make more music with this midi keyboard for my game. #indiegame #rpg #artsoon #solodev #indiedev #madewithunity #indiegamedev ",
"Congratulation to Pretumos who won our Dec 2020 $100 Zeegift! \nMore members giveaways here: <U+0001F381>\nGuest Giveaways here: \n@BlazedRTs @GamerGalsRT @SGH_RTs #gaming #gamingcommunity #indiegame @GamingRTweeters ",
"The lighting on my new level is really starting to come together<U+0001F60D>\n\nWhat do you think?\n\n#scifi #indiegame #art #gaming #game #indiegamedev #unity3d #games #twitchtv #shader #madewithunity #3d #3dart #twitch #stream #cyberpunk #artistsontwitter #indie #3dart ",
"Super Game Talk Video Alpha! The #1 Indie Video Game Review Show hosted by puppets! #indiegame #gamedev #indiedev #indievideogames #marketing #indiegames #videogames #apple #ios #steam "
), public_metrics.retweet_count = c(3L, 3L, 6L, 6L, 16L, 1L)), row.names = c(NA,
6L), c
The expected output would lead to me having a single file with
Created_at, author_id and the text and the public_metrics.retweet_count that match this author_id
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
目前尚不清楚您想要哪种输出。您是否只需要匹配两个data.frame的行(
base :: merge,dplyr :: innit_join
)?还是将所有行保留在x(dplyr :: left_join
)中?或在y中(dplyr :: right_join
)。此外,您的错误是不可再现的。正如@limey所建议的那样,您应该发布一个完全可重复的示例,或直接通过dput(Tweet2u)
和dput(Tweets2)
或至少一部分数据(<代码) > dput(your_data))
dplyr :: left_join
:摘要的输出
:It is not clear what kind of output you want. Do you want only rows that match both data.frame (
base::merge, dplyr::inner_join
)? or preserve all rows in x (dplyr::left_join
)? or in y (dplyr::right_join
). Furthermore, your error is not reproducible. As suggested by @Limey you should post a fully reproducible example or directly your data viadput(tweet2U)
anddput(tweets2)
or at least part of your data (dput(head(your_data)
). However it is very strange that it could be a memory problem given the size of the data.In my PC this code works well. An example with
dplyr::left_join
:the output of
summary
: