当前位置：文江博客话题详情

从一张大表到其他相关表的大量插入

发布于 2024-12-23 07:57:35 字数 387 浏览 1 评论 0原文

进入：

目前我已将所有数据抓取到一个 PostgreSQL 'Bigtable' 表中（大约有 120 万行）。现在我需要将设计拆分为单独的表，这些表都依赖于 Bigtable。一些表可能有子表。该模型看起来非常像雪花。

问题：

将数据插入表的最佳选择是什么？我想用“SQL”或 PLgSQL 编写的函数进行插入。但问题仍然出在自动生成的 ID 上。

另外，如果您知道哪些工具可以使这个问题的解决变得更容易，那么就发帖吧！

//编辑我添加了示例，这不是仅用于说明的真实情况

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

拧巴小姐 2024-12-30 07:57:35

1.2 M 行并不算太多。最好的工具是从控制台“psql”执行的sql脚本。如果您有较新版本的 Pg，那么您可以在必要时使用内联函数（DO 语句）。但最有用的命令可能是 INSERT INTO SELECT 语句。

-- file conversion.sql
DROP TABLE IF EXISTS f1 CASCADE;
CREATE TABLE f1(a int, b int);
INSERT INTO f1
   SELECT x1, y1 
      FROM data
     WHERE x1 = 10;

    ...

-- end file

psql mydb -f conversion.sql

1.2 M rows is not too much. The best tool is sql script executed from console "psql". If you have a some newer version of Pg, then you can use inline functions (DO statement) when it is necessary. But probably the most useful command is INSERT INTO SELECT statement.

-- file conversion.sql
DROP TABLE IF EXISTS f1 CASCADE;
CREATE TABLE f1(a int, b int);
INSERT INTO f1
   SELECT x1, y1 
      FROM data
     WHERE x1 = 10;

    ...

-- end file

psql mydb -f conversion.sql

回复收藏 0 原文

回忆凄美了谁 2024-12-30 07:57:35

如果我理解你的问题，你可以使用这样的 psql 函数：

CREATE OR REPLACE FUNCTION migration() RETURNS integer AS
$BODY$
DECLARE
   currentProductId   INTEGER;
   currentUserId      INTEGER;
   currentReg         RECORD;
BEGIN
   FOR currentReg IN
     SELECT * FROM bigtable
   LOOP
     -- Product
     SELECT productid INTO currentProductId 
     FROM product 
     WHERE name = currentReg.product_name;

     IF currentProductId IS NULL THEN
        EXECUTE 'INSERT INTO product (name) VALUES (''' || currentReg.product_name || ''') RETURNING productid' 
        INTO currentProductId;
     END IF;

     -- User
     SELECT userid INTO currentUserId 
     FROM user
     WHERE first_name = currentReg.first_name and last_name = currentReg.last_name;

     IF currentUserId IS NULL THEN
        EXECUTE 'INSERT INTO user (first_name, last_name) VALUES (''' || currentReg.first_name || ''', ''' || currentReg.last_name || ''') RETURNING userid' 
        INTO currentUserId;

        -- Insert into userAdded too with: currentUserId and currentProductId
        [...]

     END IF;

     -- Rest of tables
     [...]    

    END LOOP;
  RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;

select * from migration();

在这种情况下，假设每个表都运行自己的主键序列，并且我减少了表中的字段数量以简化。
我希望你有所帮助。

If I understand your question, you can use a psql function like this:

CREATE OR REPLACE FUNCTION migration() RETURNS integer AS
$BODY$
DECLARE
   currentProductId   INTEGER;
   currentUserId      INTEGER;
   currentReg         RECORD;
BEGIN
   FOR currentReg IN
     SELECT * FROM bigtable
   LOOP
     -- Product
     SELECT productid INTO currentProductId 
     FROM product 
     WHERE name = currentReg.product_name;

     IF currentProductId IS NULL THEN
        EXECUTE 'INSERT INTO product (name) VALUES (''' || currentReg.product_name || ''') RETURNING productid' 
        INTO currentProductId;
     END IF;

     -- User
     SELECT userid INTO currentUserId 
     FROM user
     WHERE first_name = currentReg.first_name and last_name = currentReg.last_name;

     IF currentUserId IS NULL THEN
        EXECUTE 'INSERT INTO user (first_name, last_name) VALUES (''' || currentReg.first_name || ''', ''' || currentReg.last_name || ''') RETURNING userid' 
        INTO currentUserId;

        -- Insert into userAdded too with: currentUserId and currentProductId
        [...]

     END IF;

     -- Rest of tables
     [...]    

    END LOOP;
  RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;

select * from migration();

In this case it's assumed that each table runs its own primary key sequence and I have reduced the number of fields in the tables to simplify.
I hope you have been helpful.

回复收藏 0 原文

_失温 2024-12-30 07:57:35

不需要为此使用函数（除非我误解了你的问题）

如果你的 id 列全部定义为 serial 列（即它们自动生成值），那么这可以通过简单的 INSERT 语句来完成。这假设目标表都是空的。

INSERT INTO users (firstname, lastname)
SELECT DISTINCT firstname, lastname
FROM bigtable;

INSERT INTO category (name)
SELECT DISTINCT category_name 
FROM bigtable;

-- the following assumes a column categoryid in the product table 
-- which is not visible from your screenshot
INSERT INTO product (product_name, description, categoryid)
SELECT DISTINCT b.product_name, b.description, c.categoryid 
FROM bigtable b
  JOIN category c ON c.category_name = b.category_name;

INSERT INTO product_added (product_productid, user_userid)
SELECT p.productid, u.userid
FROM bigtable b
  JOIN product p ON p.product_name = b.product_name
  JOIN users u ON u.firstname = b.firstname AND u.lastname = b.lastname

No need to use a function for this (unless I misunderstood your problem)

If your id columns are all defined as serial column (i.e. they automatically generate the values), then this can be done with simple INSERT statements. This assumes that the target tables are all empty.

INSERT INTO users (firstname, lastname)
SELECT DISTINCT firstname, lastname
FROM bigtable;

INSERT INTO category (name)
SELECT DISTINCT category_name 
FROM bigtable;

-- the following assumes a column categoryid in the product table 
-- which is not visible from your screenshot
INSERT INTO product (product_name, description, categoryid)
SELECT DISTINCT b.product_name, b.description, c.categoryid 
FROM bigtable b
  JOIN category c ON c.category_name = b.category_name;

INSERT INTO product_added (product_productid, user_userid)
SELECT p.productid, u.userid
FROM bigtable b
  JOIN product p ON p.product_name = b.product_name
  JOIN users u ON u.firstname = b.firstname AND u.lastname = b.lastname

回复收藏 0 原文

~没有更多了~