根据常见单词的出现对 csv 数据进行排序
我有来自 csv 文件的大量数据,如下所示。
url1, comment1
url2, comment2
我需要找到评论之间的常用词,并根据每行中常用词的出现情况对行进行相应的排序。
目前,我能够获得常用单词,但我不知道如何在不耗尽内存的情况下对每个常用单词的行进行排序。
下面是我的非常低效的代码。
$data = array();
while (($row = fgetcsv($fh, 1024, ',')) !== false) {
$data[] = $row[1];
}
$str = preg_replace('/\s\s+/', ' ', trim(str_replace(array('!', '?', '.', ','), ' ', implode('', $data))));
$words = explode(" ", $str);
var_dump(array_count_values($words));
I have a large data coming from a csv file which looks something like below.
url1, comment1
url2, comment2
I need to find the common words between the comments and sort the rows accordingly based on the occurrence of the common words on each row.
At the moment I am able to get the common words but I'm lost as to how to sort the rows per common word without exhausting the memory.
Below is my very inefficient code.
$data = array();
while (($row = fgetcsv($fh, 1024, ',')) !== false) {
$data[] = $row[1];
}
$str = preg_replace('/\s\s+/', ' ', trim(str_replace(array('!', '?', '.', ','), ' ', implode('', $data))));
$words = explode(" ", $str);
var_dump(array_count_values($words));
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
将分解的数据/单词加载到数据库中听起来是个好主意,
或者你可以尝试这个:
Load the exploded data/words into database sounds like a good idea,
OR you can try this: