从一张大表到其他相关表的大量插入

发布于 2024-12-23 07:57:35 字数 387 浏览 1 评论 0原文

进入

目前我已将所有数据抓取到一个 PostgreSQL 'Bigtable' 表中(大约有 120 万行)。现在我需要将设计拆分为单独的表,这些表都依赖于 Bigtable。一些表可能有子表。该模型看起来非常像雪花。

问题

将数据插入表的最佳选择是什么?我想用“SQL”或 PLgSQL 编写的函数进行插入。但问题仍然出在自动生成的 ID 上。

另外,如果您知道哪些工具可以使这个问题的解决变得更容易,那么就发帖吧!

//编辑我添加了示例,这不是仅用于说明的真实情况.

Into:

Currently i have scraped all the data into one PostgreSQL 'Bigtable' table(there are about 1.2M rows). Now i need to split the design into separate tables which all have dependency on the Bigtable. Some of the tables might have subtables. The model looks pretty much like snowflake.

Problem:

What would be best option to inserting data into tables? I thought to make the insertion with functions written in 'SQL' or PLgSQL. But the problem is still with auto-generated ID-s.

Also if you know what tools might make this problem solving easier then post!

//Edit i have added example, this not the real case just for illustration.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

拧巴小姐 2024-12-30 07:57:35

1.2 M 行并不算太多。最好的工具是从控制台“psql”执行的sql脚本。如果您有较新版本的 Pg,那么您可以在必要时使用内联函数(DO 语句)。但最有用的命令可能是 INSERT INTO SELECT 语句。

-- file conversion.sql
DROP TABLE IF EXISTS f1 CASCADE;
CREATE TABLE f1(a int, b int);
INSERT INTO f1
   SELECT x1, y1 
      FROM data
     WHERE x1 = 10;

    ...

-- end file

psql mydb -f conversion.sql

1.2 M rows is not too much. The best tool is sql script executed from console "psql". If you have a some newer version of Pg, then you can use inline functions (DO statement) when it is necessary. But probably the most useful command is INSERT INTO SELECT statement.

-- file conversion.sql
DROP TABLE IF EXISTS f1 CASCADE;
CREATE TABLE f1(a int, b int);
INSERT INTO f1
   SELECT x1, y1 
      FROM data
     WHERE x1 = 10;

    ...

-- end file

psql mydb -f conversion.sql
回忆凄美了谁 2024-12-30 07:57:35

如果我理解你的问题,你可以使用这样的 psql 函数:

CREATE OR REPLACE FUNCTION migration() RETURNS integer AS
$BODY$
DECLARE
   currentProductId   INTEGER;
   currentUserId      INTEGER;
   currentReg         RECORD;
BEGIN
   FOR currentReg IN
     SELECT * FROM bigtable
   LOOP
     -- Product
     SELECT productid INTO currentProductId 
     FROM product 
     WHERE name = currentReg.product_name;

     IF currentProductId IS NULL THEN
        EXECUTE 'INSERT INTO product (name) VALUES (''' || currentReg.product_name || ''') RETURNING productid' 
        INTO currentProductId;
     END IF;

     -- User
     SELECT userid INTO currentUserId 
     FROM user
     WHERE first_name = currentReg.first_name and last_name = currentReg.last_name;

     IF currentUserId IS NULL THEN
        EXECUTE 'INSERT INTO user (first_name, last_name) VALUES (''' || currentReg.first_name || ''', ''' || currentReg.last_name || ''') RETURNING userid' 
        INTO currentUserId;

        -- Insert into userAdded too with: currentUserId and currentProductId
        [...]

     END IF;

     -- Rest of tables
     [...]    

    END LOOP;
  RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;

select * from migration();

在这种情况下,假设每个表都运行自己的主键序列,并且我减少了表中的字段数量以简化。
我希望你有所帮助。

If I understand your question, you can use a psql function like this:

CREATE OR REPLACE FUNCTION migration() RETURNS integer AS
$BODY$
DECLARE
   currentProductId   INTEGER;
   currentUserId      INTEGER;
   currentReg         RECORD;
BEGIN
   FOR currentReg IN
     SELECT * FROM bigtable
   LOOP
     -- Product
     SELECT productid INTO currentProductId 
     FROM product 
     WHERE name = currentReg.product_name;

     IF currentProductId IS NULL THEN
        EXECUTE 'INSERT INTO product (name) VALUES (''' || currentReg.product_name || ''') RETURNING productid' 
        INTO currentProductId;
     END IF;

     -- User
     SELECT userid INTO currentUserId 
     FROM user
     WHERE first_name = currentReg.first_name and last_name = currentReg.last_name;

     IF currentUserId IS NULL THEN
        EXECUTE 'INSERT INTO user (first_name, last_name) VALUES (''' || currentReg.first_name || ''', ''' || currentReg.last_name || ''') RETURNING userid' 
        INTO currentUserId;

        -- Insert into userAdded too with: currentUserId and currentProductId
        [...]

     END IF;

     -- Rest of tables
     [...]    

    END LOOP;
  RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;

select * from migration();

In this case it's assumed that each table runs its own primary key sequence and I have reduced the number of fields in the tables to simplify.
I hope you have been helpful.

_失温 2024-12-30 07:57:35

不需要为此使用函数(除非我误解了你的问题)

如果你的 id 列全部定义为 serial 列(即它们自动生成值),那么这可以通过简单的 INSERT 语句来完成。这假设目标表都是空的。

INSERT INTO users (firstname, lastname)
SELECT DISTINCT firstname, lastname
FROM bigtable;

INSERT INTO category (name)
SELECT DISTINCT category_name 
FROM bigtable;

-- the following assumes a column categoryid in the product table 
-- which is not visible from your screenshot
INSERT INTO product (product_name, description, categoryid)
SELECT DISTINCT b.product_name, b.description, c.categoryid 
FROM bigtable b
  JOIN category c ON c.category_name = b.category_name;

INSERT INTO product_added (product_productid, user_userid)
SELECT p.productid, u.userid
FROM bigtable b
  JOIN product p ON p.product_name = b.product_name
  JOIN users u ON u.firstname = b.firstname AND u.lastname = b.lastname

No need to use a function for this (unless I misunderstood your problem)

If your id columns are all defined as serial column (i.e. they automatically generate the values), then this can be done with simple INSERT statements. This assumes that the target tables are all empty.

INSERT INTO users (firstname, lastname)
SELECT DISTINCT firstname, lastname
FROM bigtable;

INSERT INTO category (name)
SELECT DISTINCT category_name 
FROM bigtable;

-- the following assumes a column categoryid in the product table 
-- which is not visible from your screenshot
INSERT INTO product (product_name, description, categoryid)
SELECT DISTINCT b.product_name, b.description, c.categoryid 
FROM bigtable b
  JOIN category c ON c.category_name = b.category_name;

INSERT INTO product_added (product_productid, user_userid)
SELECT p.productid, u.userid
FROM bigtable b
  JOIN product p ON p.product_name = b.product_name
  JOIN users u ON u.firstname = b.firstname AND u.lastname = b.lastname
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文