从一个表中提取列值并修改后插入到另一个表中

发布于 2025-01-12 23:51:40 字数 3090 浏览 3 评论 0原文

我创建了一个 PL/pgSQL 函数,它接受两个列名、一个“关系”和两个表名。它在一个表中查找不同的行并将它们插入到临时表中,删除任何具有空值的行,并将一列的所有值设置为关系。我有使用此功能的过程的第一部分。

create or replace function alt_edger(s text, v text, relation text, tbl text, tbl_src text)
    returns void
    language plpgsql as
$func$
begin
    raise notice 's: %, v: %, tbl: %, tbl_src: %', s,v,tbl,tbl_src;
    execute ('insert into '||tbl||' ("source", "target") select distinct "'||s||'","'||v||'" from '||tbl_src||'');
    execute ('DELETE FROM '||tbl||' WHERE "source" IS null or "target" is null'); 
end
$func$;

它的执行方式如下:

-- create a temporary table and execute the function twice
drop table if exists temp_stack;
create temporary table temp_stack("label" text, "source" text, "target" text, "attr" text, "graph" text);
 
select alt_edger('x_x', 'y_y', ':associated_with', 'temp_stack','pg_check_table' );
select alt_edger('Document Number', 'x_x', ':documents', 'temp_stack','pg_check_table' );

select * from temp_stack;

请注意,我还没有使用 relationINSERT 还应分配 relation,但我不知道如何实现这一点,例如:

labelsourcetargetattrgraph
:linked_with638000ARAS
:linked_with202000JASE
:关联_与638010JASE
:关联_与638000JASE
:关联_与202100JASE
:documentsA638010
:documentsA202000
:A202100
documents:documents B638000
:documents A638000
:documentsB124004
:documentsB202100

我的挑战是:

  • 如何在 INSERT< 中集成 relation /代码>?当我尝试使用 VALUES 和逗号分隔时,我收到“选择附近错误”。
  • 如何允许relation中以“:”开头的字符串?我在这里预计,包含冒号在过去给我带来了挑战。

我该怎么做?或者有更好的方法吗?

玩具数据模型

drop table if exists pg_check_table;
create temporary table pg_check_table("Document Number" text, x_x int, y_y text);
insert into pg_check_table values ('A',202000,'JASE'),
('A',202100,'JASE'),
('A',638010,'JASE'),
('A',Null,'JASE'),
('A',Null,'JASE'),
('A',202100,'JASE'),
('A',638000,'JASE'),
('A',202100,'JASE'),
('B',638000,'JASE'),
('B',202100,null),
('B',638000,'JASE'),
('B',null,'ARAS'),
('B',638000,'ARAS'),
('B',null,'ARAS'),
('B',638000,null),
('B',124004,null);
alter table pg_check_table add row_num serial;
select * from pg_check_table;

I have created a PL/pgSQL function that accepts two column names, a "relation", and two table names. It finds distinct rows in one table and inserts them in to a temporary table, deletes any row with a null value, and sets all values of one column to relation. I have the first part of the process using this function.

create or replace function alt_edger(s text, v text, relation text, tbl text, tbl_src text)
    returns void
    language plpgsql as
$func$
begin
    raise notice 's: %, v: %, tbl: %, tbl_src: %', s,v,tbl,tbl_src;
    execute ('insert into '||tbl||' ("source", "target") select distinct "'||s||'","'||v||'" from '||tbl_src||'');
    execute ('DELETE FROM '||tbl||' WHERE "source" IS null or "target" is null'); 
end
$func$;

It is executed as follows:

-- create a temporary table and execute the function twice
drop table if exists temp_stack;
create temporary table temp_stack("label" text, "source" text, "target" text, "attr" text, "graph" text);
 
select alt_edger('x_x', 'y_y', ':associated_with', 'temp_stack','pg_check_table' );
select alt_edger('Document Number', 'x_x', ':documents', 'temp_stack','pg_check_table' );

select * from temp_stack;

Note that I didn't use relation, yet. The INSERT shall also assign relation, but I can't figure out how to make that happen to get something like:

labelsourcetargetattrgraph
:associated_with638000ARAS
:associated_with202000JASE
:associated_with638010JASE
:associated_with638000JASE
:associated_with202100JASE
:documentsA638010
:documentsA202000
:documentsA202100
:documentsB638000
:documentsA638000
:documentsB124004
:documentsB202100

My challenges are:

  • How to integrate relation in the INSERT? When I try to use VALUES and comma separation I get an "error near select".
  • How to allow strings starting with ":" in relation? I'm anticipating here, the inclusion of the colon has given me challenges in the past.

How can I do this? Or is there a better approach?

Toy data model:

drop table if exists pg_check_table;
create temporary table pg_check_table("Document Number" text, x_x int, y_y text);
insert into pg_check_table values ('A',202000,'JASE'),
('A',202100,'JASE'),
('A',638010,'JASE'),
('A',Null,'JASE'),
('A',Null,'JASE'),
('A',202100,'JASE'),
('A',638000,'JASE'),
('A',202100,'JASE'),
('B',638000,'JASE'),
('B',202100,null),
('B',638000,'JASE'),
('B',null,'ARAS'),
('B',638000,'ARAS'),
('B',null,'ARAS'),
('B',638000,null),
('B',124004,null);
alter table pg_check_table add row_num serial;
select * from pg_check_table;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

那小子欠揍 2025-01-19 23:51:40
-- DROP FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text)
CREATE OR REPLACE FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text, OUT row_count int)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _sql text := format(
       'INSERT INTO pg_temp.%3$I (label, source, target)
        SELECT DISTINCT $1, %1$I, %2$I FROM pg_temp.%4$I
        WHERE (%1$I, %2$I) IS NOT NULL'
      , _s, _v, _tbl, _tbl_src);
BEGIN
   -- RAISE NOTICE '%', _sql;  -- debug
   EXECUTE _sql USING _relation;
   GET DIAGNOSTICS row_count = ROW_COUNT;  -- return number of inserted rows
END
$func$;

db<>fiddle 此处

最重要的是,使用<一href="https://www.postgresql.org/docs/current/functions-string.html#FUNCTIONS-STRING-FORMAT" rel="nofollow noreferrer">format()安全地连接动态 SQL 命令。并使用格式说明符 %I 作为标识符。这样,SQL 注入就不可能实现,并且标识符会被正确地用双引号括起来 - 保留非标准名称,例如 Document Number。这就是你原来的失败的地方。

我们也可以将 _relation 连接为要插入到 label 中的字符串。但将传递给EXECUTE的更好方法是使用USING子句。传递给 EXECUTE 的 SQL 字符串内的 $1 是第一个 USING 参数的占位符。不要与在函数体外部EXECUTE上下文中引用函数参数的$1相混淆! (您可以传递任何字符串,前导冒号()并不重要,正确完成后不会解释该字符串。)
请参阅:

我替换了 DELETE 在您的原始文件中,在 INSERTSELECT 中添加 WHERE 子句。不要首先插入行,而是稍后再次删除它们。

(%1$I, %2$I) IS NOT NULL 仅当两个值均为 NOT NULL 时才符合条件。
关于这一点:

不要使用前缀“pg_”作为表名。这就是 Postgres 用于系统表的内容。别搞乱那些。

我使用 pg_temp 对已知临时表进行架构限定。这通常是可选的,因为默认情况下临时架构位于 search_path 的第一位。但这可以被更改(恶意),然后表名称将解析为 search_path 中任何现有的同名常规表。所以安全总比后悔好。请参阅:

我使函数返回插入的行数。这是完全可选的!
由于我使用 OUT 参数执行此操作,因此我可以跳过 RETURNS 子句。请参阅:

-- DROP FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text)
CREATE OR REPLACE FUNCTION alt_edger(_s text, _v text, _relation text, _tbl text, _tbl_src text, OUT row_count int)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _sql text := format(
       'INSERT INTO pg_temp.%3$I (label, source, target)
        SELECT DISTINCT $1, %1$I, %2$I FROM pg_temp.%4$I
        WHERE (%1$I, %2$I) IS NOT NULL'
      , _s, _v, _tbl, _tbl_src);
BEGIN
   -- RAISE NOTICE '%', _sql;  -- debug
   EXECUTE _sql USING _relation;
   GET DIAGNOSTICS row_count = ROW_COUNT;  -- return number of inserted rows
END
$func$;

db<>fiddle here

Most importantly, use format() to concatenate your dynamic SQL commands safely. And use the format specifier %I for identifiers. This way, SQL injection is not possible and identifiers are double-quoted properly - preserving non-standard names like Document Number. That's where your original failed.

We could concatenate _relation as string to be inserted into label, too. But the preferable way to pass values to EXECUTE is with the USING clause. $1 inside the SQL string passed to EXECUTE is a placeholder for the first USING argument. Not to be confused with $1 referencing function parameters in the context of the function body outside EXECUTE! (You can pass any string, leading colon (:) does not matter, the string is not interpreted when done right.)
See:

I replaced the DELETE in your original with a WHERE clause to the SELECT of the INSERT. Don't insert rows in the first place, instead of deleting them again later.

(%1$I, %2$I) IS NOT NULL only qualifies when both values are NOT NULL.
About that:

Don't use the prefix "pg_" for your table names. That's what Postgres uses for system tables. Don't mess with those.

I schema-qualify known temporary tables with pg_temp. That's typically optional as the temporary schema comes first in the search_path by default. But that can be changed (maliciously), and then the table name would resolve to any existing regular table of the same name in the search_path. So better safe than sorry. See:

I made the function return the number of inserted rows. That's totally optional!
Since I do that with an OUT parameter, I am allowed to skip the RETURNS clause. See:

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文