根据常见单词的出现对 csv 数据进行排序

发布于 2024-10-07 18:29:16 字数 503 浏览 2 评论 0原文

我有来自 csv 文件的大量数据,如下所示。

url1, comment1
url2, comment2

我需要找到评论之间的常用词,并根据每行中常用词的出现情况对行进行相应的排序。

目前,我能够获得常用单词,但我不知道如何在不耗尽内存的情况下对每个常用单词的行进行排序。

下面是我的非常低效的代码。

$data = array();
while (($row = fgetcsv($fh, 1024, ',')) !== false) {
  $data[] = $row[1];
}

$str = preg_replace('/\s\s+/', ' ', trim(str_replace(array('!', '?', '.', ','), ' ', implode('', $data))));

$words = explode(" ", $str);
var_dump(array_count_values($words));

I have a large data coming from a csv file which looks something like below.

url1, comment1
url2, comment2

I need to find the common words between the comments and sort the rows accordingly based on the occurrence of the common words on each row.

At the moment I am able to get the common words but I'm lost as to how to sort the rows per common word without exhausting the memory.

Below is my very inefficient code.

$data = array();
while (($row = fgetcsv($fh, 1024, ',')) !== false) {
  $data[] = $row[1];
}

$str = preg_replace('/\s\s+/', ' ', trim(str_replace(array('!', '?', '.', ','), ' ', implode('', $data))));

$words = explode(" ", $str);
var_dump(array_count_values($words));

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

人生百味 2024-10-14 18:29:16

将分解的数据/单词加载到数据库中听起来是个好主意,

或者你可以尝试这个:

$summary = array();
$data = array();
while (($row = fgetcsv($fh, 1024, ',')) !== false) 
{
  $data[] = $row[1];
  $str    = preg_replace('/\s\s+/', ' ', trim(str_replace(array('!', '?', '.', ','), ' ', $row[1])));
  $words  = explode(" ", $str); 
  foreach ($words as $word)
  {
    $word = strtolower($word); // lowercase to reduce variations
    $summary[$word]++;
  }
}
/* variable $summary will contains all your count */
/* take note on the size of $summary, could growth quite big */

Load the exploded data/words into database sounds like a good idea,

OR you can try this:

$summary = array();
$data = array();
while (($row = fgetcsv($fh, 1024, ',')) !== false) 
{
  $data[] = $row[1];
  $str    = preg_replace('/\s\s+/', ' ', trim(str_replace(array('!', '?', '.', ','), ' ', $row[1])));
  $words  = explode(" ", $str); 
  foreach ($words as $word)
  {
    $word = strtolower($word); // lowercase to reduce variations
    $summary[$word]++;
  }
}
/* variable $summary will contains all your count */
/* take note on the size of $summary, could growth quite big */
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文