当某些字段未用双引号引起来且包含新行时,将 CSV 文件加载到 MySQL 表中

发布于 2024-10-22 13:04:55 字数 600 浏览 2 评论 0原文

我有一个 CSV 文件,我想使用以下命令将其加载到 MySQL 表中:

LOAD DATA LOCAL INFILE '/path/to/file.csv'
INTO TABLE items
CHARACTER SET utf8
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
(field1, field2, field3, field4, field5);

我面临的问题是 csv 文件格式不正确,因为某些字段没有用双引号 ("") 括起来,并且还有新行。例如:(第三行)

"field1","field2","field3","field4","field5"
"aaaaa","bbbbb","ccccc","ddddd","eeeeee"
aaaa
aaaa,bbbbbbbb
bbbbb,"ccccc","dddddd","eeeee"

当我将 csv 文件导入 MySQL 时,字段内容内的换行符被解释为行终止。

那么……我该如何解决呢?正则表达式?一些 CSV 编辑器(我尝试过 CSVed 但没有成功)?谢谢。

I have a CSV file that I want to load into a MySQL table using the following command:

LOAD DATA LOCAL INFILE '/path/to/file.csv'
INTO TABLE items
CHARACTER SET utf8
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
(field1, field2, field3, field4, field5);

The problem I face is that the csv file is not properly formatted because some fields are not enclosed by double quotes ("") and have also new lines. e.g.: (third line)

"field1","field2","field3","field4","field5"
"aaaaa","bbbbb","ccccc","ddddd","eeeeee"
aaaa
aaaa,bbbbbbbb
bbbbb,"ccccc","dddddd","eeeee"

When I import the csv file into MySQL, newlines inside field contents are interpreted as a line termination.

So... how can I sort it out? Regex? Some CSV editor (I tried CSVed with no luck)? Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

执妄 2024-10-29 13:04:55

快速而肮脏的修复尝试:

$csv = str_replace("\r", "", $csv);

$data = array(array());

while (!empty($csv)) {

  // if in quotes
  if (substr($csv, 0, 1) == '"') {
    $found = preg_match('~[^\\\\]"~', $csv, $matches, PREG_OFFSET_CAPTURE, 1);

    if (!$found)
      die("No closing quote found");

    $data[count($data)-1][] = substr($csv, 1, $matches[0][1]);
    $csv = substr($csv, $matches[0][1] + 2);

  // if not in quotes
  } else {
    $pos = strpos($csv, ',');

    if ($pos === FALSE) {
      $data[count($data)-1][] = $csv;
      $csv = "";

    } else {
      $data[count($data)-1][] = substr($csv, 0, $pos);
      $csv = substr($csv, $pos);
    }
  }

  // comma => not the end of the row
  if (substr($csv, 0, 1) == ',') {
    $csv = substr($csv, 1);

  // newline => end of the row
  } else if (substr($csv, 0, 1) == "\n") {
    $csv = ltrim($csv);

    $data[] = array(); // new row

  } else if (!empty($csv)) {
    die("unexpected error in csv");
  }
}

print_r($data);

应用于您的数据片段输出:

Array
(
    [0] => Array
        (
            [0] => field1
            [1] => field2
            [2] => field3
            [3] => field4
            [4] => field5
        )

    [1] => Array
        (
            [0] => aaaaa
            [1] => bbbbb
            [2] => ccccc
            [3] => ddddd
            [4] => eeeeee
        )

    [2] => Array
        (
            [0] => aaaa
aaaa
            [1] => bbbbbbbb
bbbbb
            [2] => ccccc
            [3] => dddddd
            [4] => eeeee
        )

)

Quick and dirty fix attemt:

$csv = str_replace("\r", "", $csv);

$data = array(array());

while (!empty($csv)) {

  // if in quotes
  if (substr($csv, 0, 1) == '"') {
    $found = preg_match('~[^\\\\]"~', $csv, $matches, PREG_OFFSET_CAPTURE, 1);

    if (!$found)
      die("No closing quote found");

    $data[count($data)-1][] = substr($csv, 1, $matches[0][1]);
    $csv = substr($csv, $matches[0][1] + 2);

  // if not in quotes
  } else {
    $pos = strpos($csv, ',');

    if ($pos === FALSE) {
      $data[count($data)-1][] = $csv;
      $csv = "";

    } else {
      $data[count($data)-1][] = substr($csv, 0, $pos);
      $csv = substr($csv, $pos);
    }
  }

  // comma => not the end of the row
  if (substr($csv, 0, 1) == ',') {
    $csv = substr($csv, 1);

  // newline => end of the row
  } else if (substr($csv, 0, 1) == "\n") {
    $csv = ltrim($csv);

    $data[] = array(); // new row

  } else if (!empty($csv)) {
    die("unexpected error in csv");
  }
}

print_r($data);

Applied on your data snippet outputs:

Array
(
    [0] => Array
        (
            [0] => field1
            [1] => field2
            [2] => field3
            [3] => field4
            [4] => field5
        )

    [1] => Array
        (
            [0] => aaaaa
            [1] => bbbbb
            [2] => ccccc
            [3] => ddddd
            [4] => eeeeee
        )

    [2] => Array
        (
            [0] => aaaa
aaaa
            [1] => bbbbbbbb
bbbbb
            [2] => ccccc
            [3] => dddddd
            [4] => eeeee
        )

)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文