Postgresql：从 csv 文件复制，偶尔缺少列

发布于 2024-10-09 22:34:45 字数 434 浏览 4 评论 0 原文

我的 CSV 文件中有数十亿行数据。每行可以有 10 到 20 列。我想使用 COPY FROM 将数据加载到包含 20 列的表中。如果特定的 CSV 行仅包含 10 列数据，那么我希望 COPY FROM 将其余列（缺少值）设置为 NULL。我在 CREATE TABLE 语句中的每一列上指定 DEFAULT NULL。

我的问题： 这可以使用 COPY FROM 来完成吗？

编辑：Greenplum（基于 PostgreSQL 的数据库）有一个名为 FILL MISSING FIELDS 的开关，它执行我所描述的操作（请参阅他们的文档此处）。您会为 PostgreSQL 推荐哪些解决方法？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

長街聽風 2024-10-16 22:34:45

编写一个预处理脚本，在没有足够列的行上添加一些额外的逗号，或者将 CSV 转换为 TSV（制表符分隔）并在额外的列中放入“\N”。

回复收藏 0 原文

风情万种。 2024-10-16 22:34:45

我认为您不能使 COPY FROM 处理同一文件中不同数量的列。

如果总是缺少相同的 10 列，解决方法可能是首先将所有内容加载到具有单个 text 列的临时表中。

之后，您可以使用 SQL 拆分行并提取列，如下所示：

INSERT INTO target_table (col1, col2, col3, col4, col5, ...)
SELECT columns[1], columns[2], ...
FROM ( 
  SELECT string_to_array(big_column, ',') as columns
    FROM staging_table 
) t
WHERE array_length(columns) = 10

然后使用 array_length(columns) = 20 执行类似的操作

I don't think you can make COPY FROM deal with different number of columns inside the same file.

If it's always the same 10 columns that are missing, a workaround could be to first load everything into a staging table that has a single text column.

After that, you can use SQL to split the line and extract the columns, something like this:

INSERT INTO target_table (col1, col2, col3, col4, col5, ...)
SELECT columns[1], columns[2], ...
FROM ( 
  SELECT string_to_array(big_column, ',') as columns
    FROM staging_table 
) t
WHERE array_length(columns) = 10

and then do a similar thing with array_length(columns) = 20

回复收藏 0 原文