从 CSV 文件构建平面表格

发布于 2024-10-19 21:40:35 字数 663 浏览 1 评论 0原文

我有 500 个以下格式的 CSV 文件:

IndicatorA_Name.csv

        1900    1901    1902 ...
Norway  3      2       
Sweden  1      3       3
Denmark 5      2       3    
... 

IndicatorB_Name.csv

        1900    1901    1902 ...
Norway  1      3       4
Sweden  1      2       
Iceland 1      6       3    
... 
  • 列中为年份,行中为国家/地区。
  • 请注意,文件之间的国家/地区、年份和值可能有所不同。

我想遍历所有这些文件并制作一个具有以下结构的平面表(CSV 文件):

country, year, IndicatorA_Name, IndicatorB_Name, ...
Sweden, 1900, 1, 1
Sweden, 1901, 3, 2
Norway, 1900, 3, 1
...

最好使用 PHP 或 JavaScript,但我愿意学习新的东西。

I have 500 CSV files in this format:

IndicatorA_Name.csv

        1900    1901    1902 ...
Norway  3      2       
Sweden  1      3       3
Denmark 5      2       3    
... 

IndicatorB_Name.csv

        1900    1901    1902 ...
Norway  1      3       4
Sweden  1      2       
Iceland 1      6       3    
... 
  • Years in columns, countries in rows.
  • Notice that countries, years and values may differ between files.

I'd like to run through all these files and make a flat table (CSV file) with this structure:

country, year, IndicatorA_Name, IndicatorB_Name, ...
Sweden, 1900, 1, 1
Sweden, 1901, 3, 2
Norway, 1900, 3, 1
...

Preferably in PHP or JavaScript but I'm willing to learn something new.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

南城旧梦 2024-10-26 21:40:35

您可能应该编写类似以下代码的代码:

    $file = file_get_contents('file.csv');
    $lines = explode("\n", $file); //lines
    $years = explode(";", $lines[0]); //first line is years, so it gives us array of years
    for($i = 1, $c = count($lines)-1; $i < $c; ++$i){ //iterate over lines (excluding years)
        $lineData = explode(';', $lines[$i]); //array from line
        $country = $lineData[0]; //first line entry is country
        unset($lineData[0]); 
        $indicators = $lineData; //and the rest are indicators
        query('INSERT INTO data(country, year, IndicatorA_Name) VALUES(?,?,?)', $country, $year, $indicators[0]);
    }

You should probably code something like following code:

    $file = file_get_contents('file.csv');
    $lines = explode("\n", $file); //lines
    $years = explode(";", $lines[0]); //first line is years, so it gives us array of years
    for($i = 1, $c = count($lines)-1; $i < $c; ++$i){ //iterate over lines (excluding years)
        $lineData = explode(';', $lines[$i]); //array from line
        $country = $lineData[0]; //first line entry is country
        unset($lineData[0]); 
        $indicators = $lineData; //and the rest are indicators
        query('INSERT INTO data(country, year, IndicatorA_Name) VALUES(?,?,?)', $country, $year, $indicators[0]);
    }
不一样的天空 2024-10-26 21:40:35

我建议使用 fgetcsv (请参阅链接用法示例)或 str_getcsv (按照捷克学的建议,使用 "\t" 作为分隔符)。

这样,您就可以自动支持边缘情况,例如嵌入的分隔符(例如,逗号分隔文件中字段中的逗号)。通常最好不要重新发明轮子。

I'd suggest using fgetcsv (see link for a usage example) or str_getcsv (with "\t" as the delimiter as Czechnology suggests).

That way you automatically support edge-cases like embedded delimiters (eg a comma in a field in a comma-separated file). It's generally best not to re-invent the wheel.

情魔剑神 2024-10-26 21:40:35

使用

$lines = explode(PHP_EOL, $csv);
$data = array();
foreach ($lines as $line)
  $data[] = explode("\t", $line);

(如果它是制表符分隔的,就像您的示例中所示)并通过两个循环运行它。

编辑

这是经过测试的代码:

$csv1 = <<<TXT
        1900    1901    1902
Norway  3   2   
Sweden  1   3   3
Denmark 5   2   3
TXT;
$csv2 = <<<TXT
        1900    1901    1902
Norway  1   3   4
Sweden  1   2   
Iceland 1   6   3    
TXT;

$csvs = array(
  'IndicatorA_Name' => $csv1,
  'IndicatorB_Name' => $csv2);
/* of course, if you're pulling this from csv files, 
   you need to modify it accordingly, e.g.

$files = array('IndicatorA_Name', 'IndicatorB_Name', ...);
$csvs = array();
foreach ($files as $f)
  $csvs[] = file_get_contents($path . '/' . $f . '.csv');

   or use file(), then you don't need the first `explode` line later */


$data = array();
foreach ($csvs as $indicator => $csv) {
  $lines = explode(PHP_EOL, $csv);

  $header = explode("\t", array_shift($lines));
  foreach ($lines as $line) {
    $fields = explode("\t", $line);

    for ($i = 1; $i < count($fields); $i++) {
      $data[$fields[0]][$header[$i]][$indicator] = $fields[$i];
    }
  }
}

$rows = array();
foreach ($data as $country => $years) {
  foreach ($years as $year => $values) {
    $str = sprintf(PHP_EOL."('%s', '%d'", mysql_real_escape_string($country), intval($year));

    foreach (array_keys($csvs) as $indicator) {
      if (isset($values[$indicator]))
        $str .= sprintf(", '%s'", mysql_real_escape_string(trim($values[$indicator])));
      else
        $str .= ", ''";
    }
    $rows[] = $str . ")";
  }
}

$sql = "INSERT INTO table_name (".implode(',', array_keys($csvs)).") VALUES ".
       implode(',', $rows);

$sql 现在是

INSERT INTO table_name (IndicatorA_Name,IndicatorB_Name) VALUES 
('Norway', '1900', '3', '1'),
('Norway', '1901', '2', '3'),
('Norway', '1902', '', '4'),
('Sweden', '1900', '1', '1'),
('Sweden', '1901', '3', '2'),
('Sweden', '1902', '3', ''),
('Denmark', '1900', '5', ''),
('Denmark', '1901', '2', ''),
('Denmark', '1902', '3', ''),
('Iceland', '1900', '', '1'),
('Iceland', '1901', '', '6'),
('Iceland', '1902', '', '3')

Use

$lines = explode(PHP_EOL, $csv);
$data = array();
foreach ($lines as $line)
  $data[] = explode("\t", $line);

(if it's tab separated like it looks in your example) and run throu it with two loops.

EDIT

Here's a tested code:

$csv1 = <<<TXT
        1900    1901    1902
Norway  3   2   
Sweden  1   3   3
Denmark 5   2   3
TXT;
$csv2 = <<<TXT
        1900    1901    1902
Norway  1   3   4
Sweden  1   2   
Iceland 1   6   3    
TXT;

$csvs = array(
  'IndicatorA_Name' => $csv1,
  'IndicatorB_Name' => $csv2);
/* of course, if you're pulling this from csv files, 
   you need to modify it accordingly, e.g.

$files = array('IndicatorA_Name', 'IndicatorB_Name', ...);
$csvs = array();
foreach ($files as $f)
  $csvs[] = file_get_contents($path . '/' . $f . '.csv');

   or use file(), then you don't need the first `explode` line later */


$data = array();
foreach ($csvs as $indicator => $csv) {
  $lines = explode(PHP_EOL, $csv);

  $header = explode("\t", array_shift($lines));
  foreach ($lines as $line) {
    $fields = explode("\t", $line);

    for ($i = 1; $i < count($fields); $i++) {
      $data[$fields[0]][$header[$i]][$indicator] = $fields[$i];
    }
  }
}

$rows = array();
foreach ($data as $country => $years) {
  foreach ($years as $year => $values) {
    $str = sprintf(PHP_EOL."('%s', '%d'", mysql_real_escape_string($country), intval($year));

    foreach (array_keys($csvs) as $indicator) {
      if (isset($values[$indicator]))
        $str .= sprintf(", '%s'", mysql_real_escape_string(trim($values[$indicator])));
      else
        $str .= ", ''";
    }
    $rows[] = $str . ")";
  }
}

$sql = "INSERT INTO table_name (".implode(',', array_keys($csvs)).") VALUES ".
       implode(',', $rows);

$sql is now

INSERT INTO table_name (IndicatorA_Name,IndicatorB_Name) VALUES 
('Norway', '1900', '3', '1'),
('Norway', '1901', '2', '3'),
('Norway', '1902', '', '4'),
('Sweden', '1900', '1', '1'),
('Sweden', '1901', '3', '2'),
('Sweden', '1902', '3', ''),
('Denmark', '1900', '5', ''),
('Denmark', '1901', '2', ''),
('Denmark', '1902', '3', ''),
('Iceland', '1900', '', '1'),
('Iceland', '1901', '', '6'),
('Iceland', '1902', '', '3')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文