从一个表中提取列值并修改后插入到另一个表中

发布于 2025-01-12 23:51:40 字数 3090 浏览 3 评论 0原文

我创建了一个 PL/pgSQL 函数，它接受两个列名、一个“关系”和两个表名。它在一个表中查找不同的行并将它们插入到临时表中，删除任何具有空值的行，并将一列的所有值设置为关系。我有使用此功能的过程的第一部分。

create or replace function alt_edger(s text, v text, relation text, tbl text, tbl_src text)
    returns void
    language plpgsql as
$func$
begin
    raise notice 's: %, v: %, tbl: %, tbl_src: %', s,v,tbl,tbl_src;
    execute ('insert into '||tbl||' ("source", "target") select distinct "'||s||'","'||v||'" from '||tbl_src||'');
    execute ('DELETE FROM '||tbl||' WHERE "source" IS null or "target" is null'); 
end
$func$;

它的执行方式如下：

-- create a temporary table and execute the function twice
drop table if exists temp_stack;
create temporary table temp_stack("label" text, "source" text, "target" text, "attr" text, "graph" text);
 
select alt_edger('x_x', 'y_y', ':associated_with', 'temp_stack','pg_check_table' );
select alt_edger('Document Number', 'x_x', ':documents', 'temp_stack','pg_check_table' );

select * from temp_stack;

请注意，我还没有使用 relation。 INSERT 还应分配 relation，但我不知道如何实现这一点，例如：

label	source	target
:linked_with	638000	ARAS
:linked_with	202000	JASE
：关联_与	638010	JASE
：关联_与	638000	JASE
：关联_与	202100	JASE
:documents	A	638010
:documents	A	202000
:	A	202100
documents	:documents B	638000
:	documents A	638000
:documents	B	124004
:documents	B	202100

我的挑战是：

如何在 INSERT< 中集成 relation /代码>?当我尝试使用 VALUES 和逗号分隔时，我收到“选择附近错误”。
如何允许relation中以“:”开头的字符串？我在这里预计，包含冒号在过去给我带来了挑战。

我该怎么做？或者有更好的方法吗？

玩具数据模型：

drop table if exists pg_check_table;
create temporary table pg_check_table("Document Number" text, x_x int, y_y text);
insert into pg_check_table values ('A',202000,'JASE'),
('A',202100,'JASE'),
('A',638010,'JASE'),
('A',Null,'JASE'),
('A',Null,'JASE'),
('A',202100,'JASE'),
('A',638000,'JASE'),
('A',202100,'JASE'),
('B',638000,'JASE'),
('B',202100,null),
('B',638000,'JASE'),
('B',null,'ARAS'),
('B',638000,'ARAS'),
('B',null,'ARAS'),
('B',638000,null),
('B',124004,null);
alter table pg_check_table add row_num serial;
select * from pg_check_table;

原文

I have created a PL/pgSQL function that accepts two column names, a "relation", and two table names. It finds distinct rows in one table and inserts them in to a temporary table, deletes any row with a null value, and sets all values of one column to relation. I have the first part of the process using this function.

create or replace function alt_edger(s text, v text, relation text, tbl text, tbl_src text)
    returns void
    language plpgsql as
$func$
begin
    raise notice 's: %, v: %, tbl: %, tbl_src: %', s,v,tbl,tbl_src;
    execute ('insert into '||tbl||' ("source", "target") select distinct "'||s||'","'||v||'" from '||tbl_src||'');
    execute ('DELETE FROM '||tbl||' WHERE "source" IS null or "target" is null'); 
end
$func$;

It is executed as follows:

-- create a temporary table and execute the function twice
drop table if exists temp_stack;
create temporary table temp_stack("label" text, "source" text, "target" text, "attr" text, "graph" text);
 
select alt_edger('x_x', 'y_y', ':associated_with', 'temp_stack','pg_check_table' );
select alt_edger('Document Number', 'x_x', ':documents', 'temp_stack','pg_check_table' );

select * from temp_stack;

Note that I didn't use relation, yet. The INSERT shall also assign relation, but I can't figure out how to make that happen to get something like:

label	source	target
:associated_with	638000	ARAS
:associated_with	202000	JASE
:associated_with	638010	JASE
:associated_with	638000	JASE
:associated_with	202100	JASE
:documents	A	638010
:documents	A	202000
:documents	A	202100
:documents	B	638000
:documents	A	638000
:documents	B	124004
:documents	B	202100

My challenges are:

How to integrate relation in the INSERT? When I try to use VALUES and comma separation I get an "error near select".
How to allow strings starting with ":" in relation? I'm anticipating here, the inclusion of the colon has given me challenges in the past.

How can I do this? Or is there a better approach?

Toy data model:

drop table if exists pg_check_table;
create temporary table pg_check_table("Document Number" text, x_x int, y_y text);
insert into pg_check_table values ('A',202000,'JASE'),
('A',202100,'JASE'),
('A',638010,'JASE'),
('A',Null,'JASE'),
('A',Null,'JASE'),
('A',202100,'JASE'),
('A',638000,'JASE'),
('A',202100,'JASE'),
('B',638000,'JASE'),
('B',202100,null),
('B',638000,'JASE'),
('B',null,'ARAS'),
('B',638000,'ARAS'),
('B',null,'ARAS'),
('B',638000,null),
('B',124004,null);
alter table pg_check_table add row_num serial;
select * from pg_check_table;

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

那小子欠揍 2025-01-19 23:51:40

-- DROP FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text)
CREATE OR REPLACE FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text, OUT row_count int)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _sql text := format(
       'INSERT INTO pg_temp.%3$I (label, source, target)
        SELECT DISTINCT $1, %1$I, %2$I FROM pg_temp.%4$I
        WHERE (%1$I, %2$I) IS NOT NULL'
      , _s, _v, _tbl, _tbl_src);
BEGIN
   -- RAISE NOTICE '%', _sql;  -- debug
   EXECUTE _sql USING _relation;
   GET DIAGNOSTICS row_count = ROW_COUNT;  -- return number of inserted rows
END
$func$;

db<>fiddle 此处

最重要的是，使用<一href="https://www.postgresql.org/docs/current/functions-string.html#FUNCTIONS-STRING-FORMAT" rel="nofollow noreferrer">format()安全地连接动态 SQL 命令。并使用格式说明符 %I 作为标识符。这样，SQL 注入就不可能实现，并且标识符会被正确地用双引号括起来 - 保留非标准名称，例如 Document Number。这就是你原来的失败的地方。

我们也可以将 _relation 连接为要插入到 label 中的字符串。但将值传递给EXECUTE的更好方法是使用USING子句。传递给 EXECUTE 的 SQL 字符串内的 $1 是第一个 USING 参数的占位符。不要与在函数体外部EXECUTE上下文中引用函数参数的$1相混淆！（您可以传递任何字符串，前导冒号（：）并不重要，正确完成后不会解释该字符串。）请参阅：

我替换了 DELETE 在您的原始文件中，在 INSERT 的 SELECT 中添加 WHERE 子句。不要首先插入行，而是稍后再次删除它们。

(%1$I, %2$I) IS NOT NULL 仅当两个值均为 NOT NULL 时才符合条件。
关于这一点：

检查 Postgres 复合字段是否为null/empty

不要使用前缀“pg_”作为表名。这就是 Postgres 用于系统表的内容。别搞乱那些。

我使用 pg_temp 对已知临时表进行架构限定。这通常是可选的，因为默认情况下临时架构位于 search_path 的第一位。但这可以被更改（恶意），然后表名称将解析为 search_path 中任何现有的同名常规表。所以安全总比后悔好。请参阅：

如何search_path 影响标识符解析和“当前架构”

我使函数返回插入的行数。这是完全可选的！
由于我使用 OUT 参数执行此操作，因此我可以跳过 RETURNS 子句。请参阅：

我可以让 plpgsql 函数在不使用变量的情况下返回整数吗？

-- DROP FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text)
CREATE OR REPLACE FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text, OUT row_count int)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _sql text := format(
       'INSERT INTO pg_temp.%3$I (label, source, target)
        SELECT DISTINCT $1, %1$I, %2$I FROM pg_temp.%4$I
        WHERE (%1$I, %2$I) IS NOT NULL'
      , _s, _v, _tbl, _tbl_src);
BEGIN
   -- RAISE NOTICE '%', _sql;  -- debug
   EXECUTE _sql USING _relation;
   GET DIAGNOSTICS row_count = ROW_COUNT;  -- return number of inserted rows
END
$func$;

db<>fiddle here

Most importantly, use format() to concatenate your dynamic SQL commands safely. And use the format specifier %I for identifiers. This way, SQL injection is not possible and identifiers are double-quoted properly - preserving non-standard names like Document Number. That's where your original failed.

We could concatenate _relation as string to be inserted into label, too. But the preferable way to pass values to EXECUTE is with the USING clause. $1 inside the SQL string passed to EXECUTE is a placeholder for the first USING argument. Not to be confused with $1 referencing function parameters in the context of the function body outside EXECUTE! (You can pass any string, leading colon (:) does not matter, the string is not interpreted when done right.)
See:

I replaced the DELETE in your original with a WHERE clause to the SELECT of the INSERT. Don't insert rows in the first place, instead of deleting them again later.

(%1$I, %2$I) IS NOT NULL only qualifies when both values are NOT NULL.
About that:

Check if a Postgres composite field is null/empty

Don't use the prefix "pg_" for your table names. That's what Postgres uses for system tables. Don't mess with those.

I schema-qualify known temporary tables with pg_temp. That's typically optional as the temporary schema comes first in the search_path by default. But that can be changed (maliciously), and then the table name would resolve to any existing regular table of the same name in the search_path. So better safe than sorry. See:

How does the search_path influence identifier resolution and the "current schema"

I made the function return the number of inserted rows. That's totally optional!
Since I do that with an OUT parameter, I am allowed to skip the RETURNS clause. See: